* Re: [PATCH 07/10] rust: driver: store pointers in `DeviceId`
From: Gary Guo @ 2026-06-19 17:12 UTC (permalink / raw)
To: sashiko-reviews, Gary Guo
Cc: dri-devel, devicetree, conor+dt, robh, linux-pci,
Danilo Krummrich, ojeda, Alexandre Courbot
In-Reply-To: <20260619170508.8E9351F00A3E@smtp.kernel.org>
On Fri Jun 19, 2026 at 6:05 PM BST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Critical] Arbitrary Memory Dereference / Privilege Escalation via PCI dynamic IDs (`new_id`)
> - [High] Undefined Behavior via unconditional null reference creation when `driver_data` is 0
> --
>
> commit 01871465d12a97f7be006ca329ebda24f458cf1e
> Author: Gary Guo <gary@garyguo.net>
>
> rust: driver: store pointers in `DeviceId`
>
> This commit changes Rust device IDs to store pointers directly in the
> driver_data field, matching the behavior of C drivers. It works around
> the limitation of missing provenance during const evaluation by wrapping
> raw IDs in MaybeUninit.
>
>> diff --git a/rust/kernel/device_id.rs b/rust/kernel/device_id.rs
>> index 84852a2d9ad78..59453588df0e7 100644
>> --- a/rust/kernel/device_id.rs
>> +++ b/rust/kernel/device_id.rs
> [ ... ]
>> @@ -47,15 +50,48 @@ pub unsafe trait RawDeviceIdIndex: RawDeviceId {
>> /// The offset (in bytes) to the context/data field in the raw device ID.
>> const DRIVER_DATA_OFFSET: usize;
>>
>> - /// The index stored at `DRIVER_DATA_OFFSET` of the implementor of the [`RawDeviceIdIndex`]
>> - /// trait.
>> - fn index(&self) -> usize;
>> + /// Obtain the data pointer stored inside the device ID.
>> + ///
>> + /// # Safety
>> + ///
>> + /// `&Self` must be stored inside a `IdArray<Self, U>`.
>> + unsafe fn info_unchecked<U>(&self) -> &'static U {
>> + // SAFETY: By safety requirement of the trait, this is `self.driver_data as *const U` and by
>> + // the safety requirement of the function, this is stored in `IdArray<Self, U>` so is
>> + // convertible to `&'static U`.
>> + unsafe {
>> + core::ptr::from_ref(self)
>> + .byte_add(Self::DRIVER_DATA_OFFSET)
>> + .cast::<&U>()
>> + .read()
>> + }
>> + }
>
> [Severity: High]
> Does this code create an unconditional null reference in Rust if the
> driver_data or driver_info field is 0?
>
> Creating a null reference is immediate undefined behavior in Rust, even for
> zero-sized types. This 0 value can occur in paths like:
>
> 1. IdArray::new_without_index() leaving driver_data fields as 0.
>
> 2. When userspace adds a dynamic USB ID via the sysfs new_id interface using
> fewer than 5 fields (omitting refVendor), USB core initializes driver_info
> to 0.
>
> [ ... ]
>> diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
>> index a3dd48f763532..a630c7fc6a853 100644
>> --- a/rust/kernel/pci.rs
>> +++ b/rust/kernel/pci.rs
> [ ... ]
>> @@ -110,10 +110,13 @@ extern "C" fn probe_callback(
>> // SAFETY: `DeviceId` is a `#[repr(transparent)]` wrapper of `struct pci_device_id` and
>> // does not add additional invariants, so it's safe to transmute.
>> let id = unsafe { &*id.cast::<DeviceId>() };
>> - let info = T::ID_TABLE.info(id.index());
>> +
>> + // SAFETY: `id` comes from `T::ID_TABLE` which is of type `IdArray<_, T::IdInfo>` or
>> + // `pci_device_id_any` which has 0 as driver_data.
>> + let info = unsafe { id.info_unchecked_opt::<T::IdInfo>() };
>
> [Severity: Critical]
> Can this allow an arbitrary memory dereference or bypass kernel lockdown?
>
> The PCI subsystem supports dynamic IDs via the sysfs new_id interface
> (drivers/pci/pci-driver.c), which parses up to 7 arguments, the last being
> driver_data (%lx).
>
> This allows a privileged user (root) to supply an arbitrary pointer value.
> The code here will blindly cast this integer to an Option<&'static U> via
> info_unchecked_opt() and dereference it if non-zero, allowing arbitrary
> kernel memory access.
False positive because dyn ID requires the data to be matching one of the
existing data.
>
> [ ... ]
>> diff --git a/rust/kernel/usb.rs b/rust/kernel/usb.rs
>> index 500b5e0ba4ea9..8aeff50117552 100644
>> --- a/rust/kernel/usb.rs
>> +++ b/rust/kernel/usb.rs
> [ ... ]
>> @@ -89,7 +89,8 @@ extern "C" fn probe_callback(
>> // does not add additional invariants, so it's safe to transmute.
>> let id = unsafe { &*id.cast::<DeviceId>() };
>>
>> - let info = T::ID_TABLE.info(id.index());
>> + // SAFETY: `id` comes from `T::ID_TABLE` which is of type `IdArray<_, T::IdInfo>`.
>> + let info = unsafe { id.info_unchecked::<T::IdInfo>() };
>> let data = T::probe(intf, id, info);
>
> [Severity: High]
> Does this code trigger undefined behavior when the underlying USB device ID
> driver_info field is 0?
>
> When the id is constructed using IdArray::new_without_index() or when
> dynamically added via sysfs new_id with fewer fields, the driver_info is 0.
> /sys/bus/
> This probe unconditionally expects a valid non-null reference and creates
> one even for 0-initialized IDs via info_unchecked().
Looks like I need to convert USB code to use `Option<&Self::IdInfo>` too, like
what I did to PCI.
Best,
Gary
^ permalink raw reply
* Re: [PATCH v4 08/16] spi: cadence-quadspi: add PHY tuning support
From: Mark Brown @ 2026-06-19 17:33 UTC (permalink / raw)
To: Santhosh Kumar K
Cc: robh, krzk+dt, conor+dt, miquel.raynal, richard, vigneshr,
pratyush, mwalle, takahiro.kuwano, linux-spi, devicetree,
linux-kernel, linux-mtd, praneeth, u-kumar1, a-dutta
In-Reply-To: <20260618073725.84733-9-s-k6@ti.com>
[-- Attachment #1: Type: text/plain, Size: 910 bytes --]
On Thu, Jun 18, 2026 at 01:07:17PM +0530, Santhosh Kumar K wrote:
> The Cadence QSPI controller supports a delay-line PHY for high-speed
> operation. Without calibration the PHY is unused and read capture relies
> on a fixed delay, limiting throughput at frequencies above the base
> operating speed.
> +static int cqspi_get_phy_pattern_offset(struct device *dev, u32 *offset)
> +{
> + struct device_node *np, *flash_np = NULL, *part_np;
> + const __be32 *reg;
> + int len;
> +
> + if (!dev || !dev->of_node)
> + return -EINVAL;
> +
> + for_each_child_of_node(dev->of_node, np) {
> + if (of_node_name_prefix(np, "flash")) {
> + flash_np = np;
> + break;
> + }
> + }
This isn't going to do the right thing if there's more than one flash,
that doesn't seem a super sensible hardware configuration but I'm not
sure I see anything stopping it being set up and system integrators do
enjoy differentiating.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v6 07/10] ACPI: APEI: introduce GHES helper
From: Julian Braha @ 2026-06-19 17:46 UTC (permalink / raw)
To: Ahmed Tiba, Rafael J. Wysocki, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Len Brown,
Saket Dumbre, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
Shuah Khan
Cc: linux-kernel, linux-acpi, acpica-devel, linux-cxl, devicetree,
linux-edac, linux-doc, Dmitry.Lamerov
In-Reply-To: <81dd6d0d-427f-49ae-9573-fbe84dc2185a@arm.com>
On 6/19/26 16:45, Ahmed Tiba wrote:
> GHES_CPER_HELPERS is intended for both the ACPI GHES path and the DT
> firmware-first provider, so I do not want to tie it to ACPI.
So what's the plan to fix the build error when ACPI is disabled:
https://lore.kernel.org/all/0f131ee4-d335-45d2-b6ae-49c18df1353b@gmail.com/
- Julian Braha
^ permalink raw reply
* [PATCH v2] arm64: dts: qcom: install DT overlays via dtbs_install
From: Vishwas Udupa @ 2026-06-19 18:09 UTC (permalink / raw)
To: andersson
Cc: vudupa, conor+dt, devicetree, kbajaj, konradybcio, krzk+dt,
linux-arm-msm, robh, snb, lminier, Vishwas Udupa
In-Reply-To: <20260428123725.3457865-1-vudupa@qti.qualcomm.com>
Overlay binaries (.dtbo) are currently only built implicitly as
dependencies of composite firmware images and are therefore absent from the
kernel install output.
Even when final DTBs are produced at build time, some distributions rely on
standalone DTBOs for runtime selection and application by bootloaders or
firmware. Without explicit install targets, DTBOs are not picked up by the
standard dtbs_install flow and therefore cannot be packaged alongside DTBs.
Add explicit dtb-$(CONFIG_ARCH_QCOM) entries for all DT overlays defined in
this Makefile so they are installed via dtbs_install, matching existing
DTB install behaviour.
Signed-off-by: Vishwas Udupa <vishwas.udupa@oss.qualcomm.com>
---
Changes in v2:
- Address review feedback from Krzysztof Kozlowski
- Drop inclusion of EL2 DTBOs
- Link to v1: https://lore.kernel.org/all/20260428123725.3457865-1-vudupa@qti.qualcomm.com/
arch/arm64/boot/dts/qcom/Makefile | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/Makefile b/arch/arm64/boot/dts/qcom/Makefile
index 4ba8e7306419..44ed2b591363 100644
--- a/arch/arm64/boot/dts/qcom/Makefile
+++ b/arch/arm64/boot/dts/qcom/Makefile
@@ -7,7 +7,9 @@ apq8016-sbc-usb-host-dtbs := apq8016-sbc.dtb apq8016-sbc-usb-host.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sar2130p-qar2130p.dtb
dtb-$(CONFIG_ARCH_QCOM) += apq8016-sbc-d3-camera-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += apq8016-sbc-d3-camera-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += apq8016-sbc-usb-host.dtb
+dtb-$(CONFIG_ARCH_QCOM) += apq8016-sbc-usb-host.dtbo
dtb-$(CONFIG_ARCH_QCOM) += apq8016-schneider-hmibsc.dtb
dtb-$(CONFIG_ARCH_QCOM) += apq8039-t2.dtb
dtb-$(CONFIG_ARCH_QCOM) += apq8094-sony-xperia-kitakami-karin_windy.dtb
@@ -47,13 +49,16 @@ lemans-evk-camera-csi1-imx577-dtbs := lemans-evk.dtb lemans-evk-camera-csi1-imx5
lemans-evk-camera-dtbs := lemans-evk.dtb lemans-evk-camera.dtbo
dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-camera-csi1-imx577.dtb
+dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-camera-csi1-imx577.dtbo
dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-camera.dtb
+dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-camera.dtbo
lemans-evk-el2-dtbs := lemans-evk.dtb lemans-el2.dtbo
dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-el2.dtb
lemans-evk-ifp-mezzanine-dtbs := lemans-evk.dtb lemans-evk-ifp-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-ifp-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-ifp-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += mahua-crd.dtb
dtb-$(CONFIG_ARCH_QCOM) += milos-fairphone-fp6.dtb
dtb-$(CONFIG_ARCH_QCOM) += monaco-arduino-monza.dtb
@@ -61,12 +66,14 @@ dtb-$(CONFIG_ARCH_QCOM) += monaco-evk.dtb
monaco-evk-camera-imx577-dtbs := monaco-evk.dtb monaco-evk-camera-imx577.dtbo
dtb-$(CONFIG_ARCH_QCOM) += monaco-evk-camera-imx577.dtb
+dtb-$(CONFIG_ARCH_QCOM) += monaco-evk-camera-imx577.dtbo
monaco-evk-el2-dtbs := monaco-evk.dtb monaco-el2.dtbo
dtb-$(CONFIG_ARCH_QCOM) += monaco-evk-el2.dtb
monaco-evk-ifp-mezzanine-dtbs := monaco-evk.dtb monaco-evk-ifp-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += monaco-evk-ifp-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += monaco-evk-ifp-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += msm8216-samsung-fortuna3g.dtb
dtb-$(CONFIG_ARCH_QCOM) += msm8916-acer-a1-724.dtb
dtb-$(CONFIG_ARCH_QCOM) += msm8916-alcatel-idol347.dtb
@@ -175,7 +182,9 @@ qcs6490-rb3gen2-vision-mezzanine-dtbs := qcs6490-rb3gen2.dtb qcs6490-rb3gen2-vis
qcs6490-rb3gen2-industrial-mezzanine-dtbs := qcs6490-rb3gen2.dtb qcs6490-rb3gen2-industrial-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qcs6490-rb3gen2-industrial-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += qcs6490-rb3gen2-industrial-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qcs6490-rb3gen2-vision-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += qcs6490-rb3gen2-vision-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qcs6490-thundercomm-minipc-g1iot.dtb
dtb-$(CONFIG_ARCH_QCOM) += qcs6490-thundercomm-rubikpi3.dtb
dtb-$(CONFIG_ARCH_QCOM) += qcs8300-ride.dtb
@@ -199,6 +208,7 @@ dtb-$(CONFIG_ARCH_QCOM) += qrb2210-rb1.dtb
qrb2210-rb1-vision-mezzanine-dtbs := qrb2210-rb1.dtb qrb2210-rb1-vision-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qrb2210-rb1-vision-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += qrb2210-rb1-vision-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qrb4210-rb2.dtb
dtb-$(CONFIG_ARCH_QCOM) += qrb5165-rb5.dtb
@@ -206,6 +216,7 @@ dtb-$(CONFIG_ARCH_QCOM) += qrb5165-rb5.dtb
qrb5165-rb5-vision-mezzanine-dtbs := qrb5165-rb5.dtb qrb5165-rb5-vision-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qrb5165-rb5-vision-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += qrb5165-rb5-vision-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += qru1000-idp.dtb
dtb-$(CONFIG_ARCH_QCOM) += sa8155p-adp.dtb
dtb-$(CONFIG_ARCH_QCOM) += sa8295p-adp.dtb
@@ -308,6 +319,7 @@ dtb-$(CONFIG_ARCH_QCOM) += sdm845-db845c.dtb
sdm845-db845c-navigation-mezzanine-dtbs := sdm845-db845c.dtb sdm845-db845c-navigation-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sdm845-db845c-navigation-mezzanine.dtb
+dtb-$(CONFIG_ARCH_QCOM) += sdm845-db845c-navigation-mezzanine.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sdm845-google-crosshatch.dtb
dtb-$(CONFIG_ARCH_QCOM) += sdm845-google-blueline.dtb
dtb-$(CONFIG_ARCH_QCOM) += sdm845-lg-judyln.dtb
@@ -372,7 +384,9 @@ sm8550-hdk-rear-camera-card-dtbs := sm8550-hdk.dtb sm8550-hdk-rear-camera-card.d
dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk-display-card-rear-camera-card.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk-display-card.dtb
+dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk-display-card.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk-rear-camera-card.dtb
+dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk-rear-camera-card.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sm8550-hdk.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8550-mtp.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8550-qrd.dtb
@@ -386,7 +400,9 @@ sm8650-hdk-rear-camera-card-dtbs := sm8650-hdk.dtb sm8650-hdk-rear-camera-card.d
dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk-display-card-rear-camera-card.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk-display-card.dtb
+dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk-display-card.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk-rear-camera-card.dtb
+dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk-rear-camera-card.dtbo
dtb-$(CONFIG_ARCH_QCOM) += sm8650-hdk.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8650-mtp.dtb
dtb-$(CONFIG_ARCH_QCOM) += sm8650-qrd.dtb
@@ -395,12 +411,14 @@ dtb-$(CONFIG_ARCH_QCOM) += sm8750-qrd.dtb
dtb-$(CONFIG_ARCH_QCOM) += talos-evk.dtb
talos-evk-usb1-peripheral-dtbs := talos-evk.dtb talos-evk-usb1-peripheral.dtbo
dtb-$(CONFIG_ARCH_QCOM) += talos-evk-usb1-peripheral.dtb
+dtb-$(CONFIG_ARCH_QCOM) += talos-evk-usb1-peripheral.dtbo
dtb-$(CONFIG_ARCH_QCOM) += talos-evk-camera-imx577.dtbo
talos-evk-camera-imx577-dtbs := talos-evk.dtb talos-evk-camera-imx577.dtbo
dtb-$(CONFIG_ARCH_QCOM) += talos-evk-camera-imx577.dtb
talos-evk-lvds-auo,g133han01-dtbs := \
talos-evk.dtb talos-evk-lvds-auo,g133han01.dtbo
dtb-$(CONFIG_ARCH_QCOM) += talos-evk-lvds-auo,g133han01.dtb
+dtb-$(CONFIG_ARCH_QCOM) += talos-evk-lvds-auo,g133han01.dtbo
x1e001de-devkit-el2-dtbs := x1e001de-devkit.dtb x1-el2.dtbo
dtb-$(CONFIG_ARCH_QCOM) += x1e001de-devkit.dtb x1e001de-devkit-el2.dtb
x1e78100-lenovo-thinkpad-t14s-el2-dtbs := x1e78100-lenovo-thinkpad-t14s.dtb x1-el2.dtbo
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v2] arm64: dts: qcom: install DT overlays via dtbs_install
From: sashiko-bot @ 2026-06-19 18:15 UTC (permalink / raw)
To: Vishwas Udupa; +Cc: robh, devicetree, conor+dt
In-Reply-To: <20260619180951.325272-1-vishwas.udupa@oss.qualcomm.com>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The commit message inaccurately claims to add 'all DT overlays defined in this Makefile' while intentionally omitting EL2 overlays.
--
commit fbeb39cf8d645f798555d038d5b4a1e5e3393c7a
Author: Vishwas Udupa <vishwas.udupa@oss.qualcomm.com>
arm64: dts: qcom: install DT overlays via dtbs_install
This commit adds explicit install targets for Device Tree overlays (.dtbo)
in the qcom Makefile so they can be packaged alongside normal DTBs via the
standard dtbs_install flow.
> Even when final DTBs are produced at build time, some distributions rely on
> standalone DTBOs for runtime selection and application by bootloaders or
> firmware. Without explicit install targets, DTBOs are not picked up by the
> standard dtbs_install flow and therefore cannot be packaged alongside DTBs.
>
> Add explicit dtb-$(CONFIG_ARCH_QCOM) entries for all DT overlays defined in
> this Makefile so they are installed via dtbs_install, matching existing
> DTB install behaviour.
[Severity: Medium]
Is this description accurate? The commit message states that all DT overlays
defined in the Makefile are added, but the patch appears to systematically
omit EL2 overlays.
[ ... ]
> --- a/arch/arm64/boot/dts/qcom/Makefile
> +++ b/arch/arm64/boot/dts/qcom/Makefile
[ ... ]
> lemans-evk-el2-dtbs := lemans-evk.dtb lemans-el2.dtbo
>
> dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-el2.dtb
> lemans-evk-ifp-mezzanine-dtbs := lemans-evk.dtb lemans-evk-ifp-mezzanine.dtbo
> dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-ifp-mezzanine.dtb
> +dtb-$(CONFIG_ARCH_QCOM) += lemans-evk-ifp-mezzanine.dtbo
For example, lemans-evk-ifp-mezzanine.dtbo is explicitly added here, but
the adjacent lemans-el2.dtbo is conspicuously skipped. Similar omissions
occur for monaco-el2.dtbo and x1-el2.dtbo further down in the patch.
Could the commit message be updated to clarify that EL2 overlays are
intentionally excluded from standalone installation?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619180951.325272-1-vishwas.udupa@oss.qualcomm.com?part=1
^ permalink raw reply
* [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
This series adds initial RISC-V QoS support: the Ssqosid extension [1]
(srmcfg CSR), the CBQRI controller interface [2] integrated with resctrl
[3], and DT-based platform driver for cache controllers. It has been
tested both on the Tenstorrent Ascalon Shared Cache controller as well
as a Qemu implementation [4].
Note that this series only implements support for resctrl CAT using
CBQRI capacity allocation control. cc_block_mask maps onto resctrl's
existing cbm schema. However, cc_cunits is not supported as there is no
existing equivalent for capacity units in the resctrl schemata.
I had previously been iterating on an RFC series [5] that did a full
implementation of CBQRI including capacity monitoring, bandwidth
allocation and monitoring, as well as a parser for the ACPI RQSC table.
The bandwidth controls for CBQRI do not fit well into resctrl's existing
throttle based MB schemata. I believe that the path forward is
Reinette's generic schema description proof of concept [6] but that will
take time to mature. My plan is to rebase the full support of CBQRI on
to the generic schema once it is ready.
[1] https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
[2] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
[3] https://docs.kernel.org/filesystems/resctrl.html
[4] https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
[5] https://lore.kernel.org/linux-riscv/20260601-ssqosid-cbqri-rqsc-v7-0-v6-16-baf00f50028a@kernel.org/
[6] https://lore.kernel.org/all/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/
---
Drew Fustini (8):
dt-bindings: riscv: Add Ssqosid extension description
riscv: Detect the Ssqosid extension
riscv: Add support for srmcfg CSR from Ssqosid extension
riscv_cbqri: Add capacity controller probe and allocation device ops
riscv_cbqri: resctrl: Add cache allocation via capacity block mask
riscv: Enable resctrl filesystem for Ssqosid
dt-bindings: riscv: Add generic CBQRI controller binding
riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
.../devicetree/bindings/riscv/extensions.yaml | 6 +
.../devicetree/bindings/riscv/riscv,cbqri.yaml | 109 +++
MAINTAINERS | 15 +
arch/riscv/Kconfig | 20 +
arch/riscv/include/asm/csr.h | 5 +
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/processor.h | 3 +
arch/riscv/include/asm/qos.h | 86 +++
arch/riscv/include/asm/resctrl.h | 152 ++++
arch/riscv/include/asm/switch_to.h | 3 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/qos.c | 91 +++
drivers/resctrl/Kconfig | 44 ++
drivers/resctrl/Makefile | 7 +
drivers/resctrl/cbqri_capacity.c | 132 ++++
drivers/resctrl/cbqri_devices.c | 511 ++++++++++++++
drivers/resctrl/cbqri_internal.h | 110 +++
drivers/resctrl/cbqri_resctrl.c | 774 +++++++++++++++++++++
include/linux/riscv_cbqri.h | 47 ++
20 files changed, 2119 insertions(+)
---
base-commit: 4fa3f5fabb30bf00d7475d5a33459ea83d639bf9
change-id: 20260610-dfustini-atl-sc-cbqri-dt-410c8e2711dd
Best regards,
--
Drew Fustini <fustini@kernel.org>
^ permalink raw reply
* [PATCH 1/8] dt-bindings: riscv: Add Ssqosid extension description
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Document the ratified Supervisor-mode Quality of Service ID (Ssqosid)
extension v1.0.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 2b0a8a93bb21..1c6f091518d4 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -232,6 +232,12 @@ properties:
ratified at commit d70011dde6c2 ("Update to ratified state")
of riscv-j-extension.
+ - const: ssqosid
+ description: |
+ The standard Ssqosid extension for Quality of Service ID is
+ ratified as v1.0 in commit d9c616497fde ("Merge pull
+ request #7 from ved-rivos/Ratified") of riscv-ssqosid.
+
- const: ssstateen
description: |
The standard Ssstateen extension for supervisor-mode view of the
--
2.43.0
^ permalink raw reply related
* [PATCH 2/8] riscv: Detect the Ssqosid extension
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Ssqosid is the RISC-V Quality-of-Service (QoS) Identifiers specification
which defines the Supervisor Resource Management Configuration (srmcfg)
register.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ef8e5f55c8d..b83dae5cebb9 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -112,6 +112,7 @@
#define RISCV_ISA_EXT_ZCLSD 103
#define RISCV_ISA_EXT_ZICFILP 104
#define RISCV_ISA_EXT_ZICFISS 105
+#define RISCV_ISA_EXT_SSQOSID 106
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index f46aa5602d74..668a7e71ff1c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -582,6 +582,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
+ __RISCV_ISA_EXT_DATA(ssqosid, RISCV_ISA_EXT_SSQOSID),
__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
__RISCV_ISA_EXT_DATA(svade, RISCV_ISA_EXT_SVADE),
__RISCV_ISA_EXT_DATA_VALIDATE(svadu, RISCV_ISA_EXT_SVADU, riscv_ext_svadu_validate),
--
2.43.0
^ permalink raw reply related
* [PATCH 3/8] riscv: Add support for srmcfg CSR from Ssqosid extension
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Add support for the srmcfg CSR defined in the Ssqosid ISA extension.
The CSR contains two fields:
- Resource Control ID (RCID) for resource allocation
- Monitoring Counter ID (MCID) for tracking resource usage
Requests from a hart to shared resources are tagged with these IDs,
allowing resource usage to be associated with the running task.
Add a srmcfg field to thread_struct with the same format as the CSR so
the scheduler can set the RCID and MCID for each task on context
switch. A per-cpu cpu_srmcfg variable mirrors the CSR state to avoid
redundant writes. L1D-hot memory access is faster than a CSR read and
avoids traps under virtualization.
A per-cpu cpu_srmcfg_default holds the default srmcfg for each CPU as
set by resctrl CPU group assignment. On context switch, RCID and MCID
inherit from the CPU default independently: a task whose thread RCID
field is zero takes the CPU default's RCID, and likewise for MCID.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 8 ++++
arch/riscv/Kconfig | 18 ++++++++
arch/riscv/include/asm/csr.h | 5 +++
arch/riscv/include/asm/processor.h | 3 ++
arch/riscv/include/asm/qos.h | 86 +++++++++++++++++++++++++++++++++++
arch/riscv/include/asm/switch_to.h | 3 ++
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/qos.c | 91 ++++++++++++++++++++++++++++++++++++++
8 files changed, 216 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 069b4aa6b523..e2a7f9766355 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23291,6 +23291,14 @@ F: drivers/perf/riscv_pmu.c
F: drivers/perf/riscv_pmu_legacy.c
F: drivers/perf/riscv_pmu_sbi.c
+RISC-V QOS RESCTRL SUPPORT
+M: Drew Fustini <fustini@kernel.org>
+R: yunhui cui <cuiyunhui@bytedance.com>
+L: linux-riscv@lists.infradead.org
+S: Supported
+F: arch/riscv/include/asm/qos.h
+F: arch/riscv/kernel/qos.c
+
RISC-V RPMI AND MPXY DRIVERS
M: Rahul Pathak <rahul@summations.net>
M: Anup Patel <anup@brainfault.org>
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3f0a647218e4..ee586925f972 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -590,6 +590,24 @@ config RISCV_ISA_SVNAPOT
If you don't know what to do here, say Y.
+config RISCV_ISA_SSQOSID
+ bool "Ssqosid extension support for supervisor mode Quality of Service ID"
+ depends on 64BIT
+ default n
+ help
+ Adds support for the Ssqosid ISA extension (Supervisor-mode
+ Quality of Service ID).
+
+ Ssqosid defines the srmcfg CSR which allows the system to tag the
+ running process with an RCID (Resource Control ID) and MCID
+ (Monitoring Counter ID). The RCID is used to determine resource
+ allocation. The MCID is used to track resource usage in event
+ counters.
+
+ For example, a cache controller may use the RCID to apply a
+ cache partitioning scheme and use the MCID to track how much
+ cache a process, or a group of processes, is using.
+
config RISCV_ISA_SVPBMT
bool "Svpbmt extension support for supervisor mode page-based memory types"
depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..7bce928e5daa 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -84,6 +84,10 @@
#define SATP_ASID_MASK _AC(0xFFFF, UL)
#endif
+/* SRMCFG fields */
+#define SRMCFG_RCID_MASK GENMASK(11, 0)
+#define SRMCFG_MCID_MASK GENMASK(27, 16)
+
/* Exception cause high bit - is an interrupt if set */
#define CAUSE_IRQ_FLAG (_AC(1, UL) << (__riscv_xlen - 1))
@@ -328,6 +332,7 @@
#define CSR_STVAL 0x143
#define CSR_SIP 0x144
#define CSR_SATP 0x180
+#define CSR_SRMCFG 0x181
#define CSR_STIMECMP 0x14D
#define CSR_STIMECMPH 0x15D
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 812517b2cec1..49a386d74cd3 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -123,6 +123,9 @@ struct thread_struct {
/* A forced icache flush is not needed if migrating to the previous cpu. */
unsigned int prev_cpu;
#endif
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+ u32 srmcfg;
+#endif
};
/* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
new file mode 100644
index 000000000000..600d889ef63d
--- /dev/null
+++ b/arch/riscv/include/asm/qos.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_QOS_H
+#define _ASM_RISCV_QOS_H
+
+#include <linux/percpu-defs.h>
+
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+
+#include <linux/bitfield.h>
+#include <linux/cpufeature.h>
+#include <linux/sched.h>
+
+#include <asm/csr.h>
+#include <asm/fence.h>
+#include <asm/hwcap.h>
+
+/* cached value of srmcfg csr for each cpu */
+DECLARE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DECLARE_PER_CPU(u32, cpu_srmcfg_default);
+
+static inline void __switch_to_srmcfg(struct task_struct *next)
+{
+ u32 thread_srmcfg, default_srmcfg;
+
+ thread_srmcfg = READ_ONCE(next->thread.srmcfg);
+ default_srmcfg = __this_cpu_read(cpu_srmcfg_default);
+
+ /*
+ * RCID and MCID inherit from cpu_srmcfg_default independently.
+ * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are both 0,
+ * so a per-field zero means "no task assignment for this
+ * dimension" and the CPU default supplies that field. The fully
+ * unassigned (thread.srmcfg == 0) and fully assigned (both
+ * fields non-zero) cases short-circuit the field math.
+ */
+ if (thread_srmcfg == 0) {
+ thread_srmcfg = default_srmcfg;
+ } else {
+ u32 rcid = FIELD_GET(SRMCFG_RCID_MASK, thread_srmcfg);
+ u32 mcid = FIELD_GET(SRMCFG_MCID_MASK, thread_srmcfg);
+
+ if (rcid == 0 || mcid == 0) {
+ if (rcid == 0)
+ rcid = FIELD_GET(SRMCFG_RCID_MASK, default_srmcfg);
+ if (mcid == 0)
+ mcid = FIELD_GET(SRMCFG_MCID_MASK, default_srmcfg);
+ thread_srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, rcid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, mcid);
+ }
+ }
+
+ if (thread_srmcfg != __this_cpu_read(cpu_srmcfg)) {
+ /*
+ * Drain stores from the outgoing task before the CSR write
+ * so they retain the previous RCID/MCID tag at the cache
+ * interconnect.
+ */
+ RISCV_FENCE(rw, o);
+
+ __this_cpu_write(cpu_srmcfg, thread_srmcfg);
+ csr_write(CSR_SRMCFG, thread_srmcfg);
+ /*
+ * Order the csrw before the new task's loads/stores so they
+ * pick up the new tag. Zicsr 6.1.1 makes CSR writes weakly
+ * ordered (device-output) vs memory ops. Ssqosid v1.0 is
+ * silent so honor the general CSR rule.
+ */
+ RISCV_FENCE(o, rw);
+ }
+}
+
+static __always_inline bool has_srmcfg(void)
+{
+ return riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID);
+}
+
+#else /* ! CONFIG_RISCV_ISA_SSQOSID */
+
+struct task_struct;
+static __always_inline bool has_srmcfg(void) { return false; }
+static inline void __switch_to_srmcfg(struct task_struct *next) { }
+
+#endif /* CONFIG_RISCV_ISA_SSQOSID */
+#endif /* _ASM_RISCV_QOS_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..1c7ea53ec012 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -14,6 +14,7 @@
#include <asm/processor.h>
#include <asm/ptrace.h>
#include <asm/csr.h>
+#include <asm/qos.h>
#ifdef CONFIG_FPU
extern void __fstate_save(struct task_struct *save_to);
@@ -119,6 +120,8 @@ do { \
__switch_to_fpu(__prev, __next); \
if (has_vector() || has_xtheadvector()) \
__switch_to_vector(__prev, __next); \
+ if (has_srmcfg()) \
+ __switch_to_srmcfg(__next); \
if (switch_to_should_flush_icache(__next)) \
local_flush_icache_all(); \
__switch_to_envcfg(__next); \
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..ebe1c3588177 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -128,3 +128,5 @@ obj-$(CONFIG_ACPI_NUMA) += acpi_numa.o
obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o
obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
+
+obj-$(CONFIG_RISCV_ISA_SSQOSID) += qos.o
diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
new file mode 100644
index 000000000000..42f1ff9b219d
--- /dev/null
+++ b/arch/riscv/kernel/qos.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
+#include <linux/cpuhotplug.h>
+#include <linux/notifier.h>
+#include <linux/percpu-defs.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+#include <asm/qos.h>
+
+/*
+ * Cached value of srmcfg csr for each cpu. Seeded to U32_MAX so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR. The encoding
+ * MCID << 16 | RCID with both fields well under 16 bits can never
+ * produce this sentinel. This covers early-boot context switches that
+ * happen before riscv_srmcfg_init() runs as an arch_initcall.
+ */
+DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DEFINE_PER_CPU(u32, cpu_srmcfg_default);
+
+/*
+ * Invalidate the per-CPU srmcfg cache, used as both the cpuhp startup and
+ * teardown callback. The sentinel is a value no real srmcfg encoding can
+ * produce (MCID << 16 | RCID, both fields well under 16 bits) so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR.
+ *
+ * Ssqosid v1.0 leaves CSR state across hart stop/start implementation-
+ * defined, so the cached value cannot be trusted after online. Invalidating
+ * on offline as well means the sentinel persists across the offline period:
+ * a CPU brought back online finds the cache already invalidated before it is
+ * schedulable, closing the window where a task scheduled before the startup
+ * callback runs could match a stale cache and skip the CSR write while the
+ * hardware CSR was reset across hart stop/start.
+ */
+static int riscv_srmcfg_reset_cache(unsigned int cpu)
+{
+ per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+ return 0;
+}
+
+/*
+ * CPU PM notifier: invalidate the cached srmcfg on resume from a deep
+ * idle / suspend. Ssqosid v1.0 leaves CSR_SRMCFG state across low-power
+ * transitions implementation-defined, and the boot CPU never goes
+ * through the cpuhp online callback during system suspend, so without
+ * this hook __switch_to_srmcfg() would skip the CSR write when the
+ * outgoing task happens to share its srmcfg with the pre-suspend cache.
+ */
+static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
+ unsigned long action, void *unused)
+{
+ switch (action) {
+ case CPU_PM_EXIT:
+ case CPU_PM_ENTER_FAILED:
+ __this_cpu_write(cpu_srmcfg, U32_MAX);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block riscv_srmcfg_pm_nb = {
+ .notifier_call = riscv_srmcfg_pm_notify,
+};
+
+static int __init riscv_srmcfg_init(void)
+{
+ int err;
+
+ if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
+ return 0;
+
+ /*
+ * cpuhp_setup_state() invokes the startup callback locally on every
+ * already-online CPU, so no separate seed loop is needed here.
+ */
+ err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
+ riscv_srmcfg_reset_cache, riscv_srmcfg_reset_cache);
+ if (err < 0) {
+ pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
+ err);
+ return err;
+ }
+
+ cpu_pm_register_notifier(&riscv_srmcfg_pm_nb);
+ return 0;
+}
+arch_initcall(riscv_srmcfg_init);
--
2.43.0
^ permalink raw reply related
* [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Add support for the RISC-V CBQRI capacity controller (CC). A platform
driver passes a cbqri_controller_info descriptor together with the cache
level to riscv_cbqri_register_cc_dt(), which probes the controller and
adds it to the controller list.
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 3 +
drivers/resctrl/Kconfig | 25 ++
drivers/resctrl/Makefile | 5 +
drivers/resctrl/cbqri_devices.c | 511 +++++++++++++++++++++++++++++++++++++++
drivers/resctrl/cbqri_internal.h | 110 +++++++++
include/linux/riscv_cbqri.h | 47 ++++
6 files changed, 701 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index e2a7f9766355..c090d52e9fa0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23298,6 +23298,9 @@ L: linux-riscv@lists.infradead.org
S: Supported
F: arch/riscv/include/asm/qos.h
F: arch/riscv/kernel/qos.c
+F: drivers/resctrl/cbqri_devices.c
+F: drivers/resctrl/cbqri_internal.h
+F: include/linux/riscv_cbqri.h
RISC-V RPMI AND MPXY DRIVERS
M: Rahul Pathak <rahul@summations.net>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 672abea3b03c..8b16f69df17c 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -29,3 +29,28 @@ config ARM64_MPAM_RESCTRL_FS
default y if ARM64_MPAM_DRIVER && RESCTRL_FS
select RESCTRL_RMID_DEPENDS_ON_CLOSID
select RESCTRL_ASSIGN_FIXED
+
+menuconfig RISCV_CBQRI_DRIVER
+ bool "RISC-V CBQRI driver"
+ depends on RISCV && RISCV_ISA_SSQOSID
+ help
+ Capacity QoS Register Interface (CBQRI) driver for RISC-V cache
+ QoS resources. CBQRI exposes cache capacity allocation through
+ the resctrl filesystem at /sys/fs/resctrl when RESCTRL_FS is also
+ enabled.
+
+ RISCV_ISA_SSQOSID provides the srmcfg CSR that tags each hart's
+ memory traffic with the RCID consumed by CBQRI controllers.
+
+if RISCV_CBQRI_DRIVER
+
+config RISCV_CBQRI_DRIVER_DEBUG
+ bool "Enable debug messages from the CBQRI driver"
+ help
+ Say yes here to enable debug messages from the CBQRI driver.
+
+ This adds pr_debug() output covering controller probe and
+ per-controller registration steps. Useful when bringing up a
+ new platform; otherwise leave disabled to avoid log noise.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 4f6d0e81f9b8..28085036d895 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -3,3 +3,8 @@ mpam-y += mpam_devices.o
mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
+
+obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
+cbqri-y += cbqri_devices.o
+
+ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
new file mode 100644
index 000000000000..cc4ec3f25ac1
--- /dev/null
+++ b/drivers/resctrl/cbqri_devices.c
@@ -0,0 +1,511 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ioport.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+
+#include "cbqri_internal.h"
+
+LIST_HEAD(cbqri_controllers);
+
+/* Set capacity block mask (cc_block_mask) */
+static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
+{
+ iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+}
+
+static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
+ u64 *regp)
+{
+ u64 reg;
+ int ret;
+
+ /*
+ * Sleeping poll: caller holds ctrl->lock as a sleeping mutex, so
+ * 10us/1ms is safe under PREEMPT_RT.
+ */
+ ret = readq_poll_timeout(ctrl->base + reg_offset, reg,
+ !FIELD_GET(CBQRI_CONTROL_REGISTERS_BUSY_MASK, reg),
+ 10, 1000);
+ if (ret)
+ return ret;
+ if (regp)
+ *regp = reg;
+ return 0;
+}
+
+/*
+ * Perform capacity allocation control operation on capacity controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
+ int rcid, u32 at)
+{
+ int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
+ int status;
+ u64 reg;
+
+ lockdep_assert_held(&ctrl->lock);
+
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout before starting operation\n");
+ return -EIO;
+ }
+ FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, ®, operation);
+ FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, ®, rcid);
+
+ /*
+ * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
+ * that report supports_alloc_at_code. On controllers that don't,
+ * AT is reserved-zero and the op acts on both halves.
+ */
+ reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;
+ if (ctrl->cc.supports_alloc_at_code)
+ reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);
+
+ iowrite64(reg, ctrl->base + reg_offset);
+
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout during operation\n");
+ return -EIO;
+ }
+
+ status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+ if (status != CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS) {
+ pr_err_ratelimited("operation %d failed: status=%d\n", operation, status);
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/*
+ * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
+ *
+ * AT-capable controllers with CDP off need a second CONFIG_LIMIT on the
+ * other AT half (the spec encodes AT only as 0=Data / 1=Code, there is
+ * no "both halves" value). CDP-on issues separate per-type writes from
+ * resctrl, so a single CONFIG_LIMIT per call is correct.
+ */
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ const struct cbqri_cc_config *cfg)
+{
+ bool need_at_mirror;
+ u64 saved_cbm = 0;
+ int err = 0;
+ u64 reg;
+
+ mutex_lock(&ctrl->lock);
+
+ need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
+
+ /*
+ * Capture the cfg->at half CBM before any write so a partial
+ * AT-mirror failure can revert and keep the two halves consistent.
+ * Pre-clear cc_block_mask so a silent firmware no-op (status
+ * SUCCESS but staging not updated) shows as a zero readback
+ * rather than carrying stale data from a prior op.
+ */
+ if (need_at_mirror) {
+ cbqri_set_cbm(ctrl, 0);
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+ saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+ }
+
+ /* Set capacity block mask (cc_block_mask) */
+ cbqri_set_cbm(ctrl, cfg->cbm);
+
+ /* Capacity config limit operation for the AT half implied by cfg->at */
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+
+ /*
+ * CDP-off mirror: on AT-capable controllers, also program the
+ * other AT half with the same mask so the two halves stay in sync.
+ */
+ if (need_at_mirror) {
+ u32 other = (cfg->at == CBQRI_CONTROL_REGISTERS_AT_CODE) ?
+ CBQRI_CONTROL_REGISTERS_AT_DATA :
+ CBQRI_CONTROL_REGISTERS_AT_CODE;
+
+ cbqri_set_cbm(ctrl, cfg->cbm);
+ err = cbqri_cc_alloc_op(ctrl,
+ CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, other);
+ if (err < 0) {
+ int rerr;
+
+ /*
+ * Best-effort revert of the cfg->at half so the two
+ * halves stay in sync. A schemata read sees only one
+ * half, so silent divergence would otherwise report
+ * the new value as if the write had succeeded.
+ */
+ cbqri_set_cbm(ctrl, saved_cbm);
+ rerr = cbqri_cc_alloc_op(ctrl,
+ CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, cfg->at);
+ if (rerr < 0)
+ pr_err_ratelimited("AT-mirror revert failed (err=%d), AT halves diverged\n",
+ rerr);
+ goto out;
+ }
+ }
+
+ /* Clear cc_block_mask before read limit to verify op works */
+ cbqri_set_cbm(ctrl, 0);
+
+ /* Perform a capacity read limit operation to verify blockmask */
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+
+ /*
+ * Read capacity blockmask and narrow to u32 to match resctrl's CBM
+ * width. cbqri_probe_cc() rejects ncblks > 32 so the upper bits are
+ * reserved zero.
+ */
+ reg = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+ if (lower_32_bits(reg) != cfg->cbm) {
+ pr_err_ratelimited("CBM verify mismatch (reg=%llx != cbm=%llx)\n",
+ reg, cfg->cbm);
+ err = -EIO;
+ }
+
+out:
+ mutex_unlock(&ctrl->lock);
+ return err;
+}
+
+/*
+ * Read the configured CBM for closid on the at half via READ_LIMIT.
+ * Pre-clears cc_block_mask before the op so a silent firmware no-op
+ * (status SUCCESS but staging not updated) is detectable in cbm_out.
+ */
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ u32 at, u32 *cbm_out)
+{
+ int err;
+
+ mutex_lock(&ctrl->lock);
+ cbqri_set_cbm(ctrl, 0);
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, closid, at);
+ if (err == 0) {
+ /*
+ * cc_block_mask is a 64-bit MMIO register. resctrl exposes the
+ * CBM as a u32. cbqri_probe_cc() rejects ncblks > 32 so the
+ * upper 32 bits are reserved zero by the spec. Narrow
+ * explicitly via lower_32_bits() so the assumption is visible
+ * at the read site.
+ */
+ *cbm_out = lower_32_bits(ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF));
+ }
+ mutex_unlock(&ctrl->lock);
+ return err;
+}
+
+static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
+ int operation, int *status, bool *access_type_supported)
+{
+ const u64 active_mask = CBQRI_CONTROL_REGISTERS_OP_MASK |
+ CBQRI_CONTROL_REGISTERS_AT_MASK |
+ CBQRI_CONTROL_REGISTERS_RCID_MASK;
+ u64 reg, saved_reg;
+ int at;
+
+ /*
+ * Default the output to false so the status==0 (feature not
+ * implemented) path returns a deterministic value to the caller
+ * rather than leaving an uninitialized bool.
+ */
+ *access_type_supported = false;
+
+ /* Keep the initial register value to preserve the WPRI fields */
+ reg = ioread64(ctrl->base + reg_offset);
+ saved_reg = reg;
+
+ /* Drain any in-flight firmware op before issuing our own write. */
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, &saved_reg) < 0) {
+ pr_err("BUSY timeout before probe operation\n");
+ return -EIO;
+ }
+
+ /*
+ * Execute the requested operation with all active fields
+ * (OP/AT/RCID) zeroed except OP itself. Every bit not in
+ * active_mask is WPRI and gets carried over from saved_reg.
+ */
+ reg = (saved_reg & ~active_mask) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation);
+ iowrite64(reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout during operation\n");
+ return -EIO;
+ }
+
+ /* Get the operation status */
+ *status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+
+ /*
+ * Check for the AT support if the register is implemented
+ * (if not, the status value will remain 0)
+ */
+ if (*status != 0) {
+ /*
+ * Re-issue operation with AT=CODE so the controller
+ * latches AT=CODE on supported hardware (or resets it to 0
+ * on hardware that doesn't). OP must be a defined CBQRI op
+ * here. OP=0 is a no-op and would silently disable CDP.
+ */
+ reg = (saved_reg & ~active_mask) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK,
+ CBQRI_CONTROL_REGISTERS_AT_CODE);
+ iowrite64(reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err("BUSY timeout setting AT field\n");
+ return -EIO;
+ }
+
+ /*
+ * If the AT field value has been reset to zero,
+ * then the AT support is not present
+ */
+ at = FIELD_GET(CBQRI_CONTROL_REGISTERS_AT_MASK, reg);
+ if (at == CBQRI_CONTROL_REGISTERS_AT_CODE)
+ *access_type_supported = true;
+ }
+
+ /*
+ * Restore the original register value.
+ * Clear OP to avoid re-triggering the probe op.
+ */
+ saved_reg &= ~CBQRI_CONTROL_REGISTERS_OP_MASK;
+ iowrite64(saved_reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, NULL) < 0) {
+ pr_err("BUSY timeout restoring register value\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int cbqri_probe_cc(struct cbqri_controller *ctrl)
+{
+ int err, status;
+ int ver_major, ver_minor;
+ u64 reg;
+
+ reg = ioread64(ctrl->base + CBQRI_CC_CAPABILITIES_OFF);
+ if (reg == 0)
+ return -ENODEV;
+
+ ver_minor = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MINOR_MASK, reg);
+ ver_major = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK, reg);
+ ctrl->cc.ncblks = FIELD_GET(CBQRI_CC_CAPABILITIES_NCBLKS_MASK, reg);
+
+ pr_debug("version=%d.%d ncblks=%d cache_level=%d\n",
+ ver_major, ver_minor,
+ ctrl->cc.ncblks, ctrl->cache.cache_level);
+
+ /*
+ * NCBLKS == 0 would divide-by-zero in the schemata math while
+ * ctrl->lock is held.
+ */
+ if (!ctrl->cc.ncblks) {
+ pr_warn("CC at %pa has 0 capacity blocks, skipping\n",
+ &ctrl->addr);
+ return -ENODEV;
+ }
+
+ if (ctrl->cc.ncblks > 32) {
+ pr_warn("CC at %pa has ncblks=%u > 32 (resctrl CBM is u32), skipping\n",
+ &ctrl->addr, ctrl->cc.ncblks);
+ return -ENODEV;
+ }
+
+ /* Probe allocation features */
+ err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
+ CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ &status, &ctrl->cc.supports_alloc_at_code);
+ if (err)
+ return err;
+
+ if (status == CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS)
+ ctrl->alloc_capable = true;
+
+ return 0;
+}
+
+static int cbqri_probe_controller(struct cbqri_controller *ctrl)
+{
+ int err;
+
+ pr_debug("controller info: type=%d addr=%pa size=%pa max-rcid=%u\n",
+ ctrl->type, &ctrl->addr, &ctrl->size, ctrl->rcid_count);
+
+ if (!ctrl->addr) {
+ pr_warn("controller has invalid addr=0x0, skipping\n");
+ return -EINVAL;
+ }
+
+ if (ctrl->size < CBQRI_CTRL_MIN_REG_SPAN) {
+ pr_warn("controller at %pa: size %pa < minimum 0x%x, skipping\n",
+ &ctrl->addr, &ctrl->size, CBQRI_CTRL_MIN_REG_SPAN);
+ return -EINVAL;
+ }
+
+ if (!request_mem_region(ctrl->addr, ctrl->size, "cbqri_controller")) {
+ pr_err("request_mem_region failed for %pa\n", &ctrl->addr);
+ return -EBUSY;
+ }
+
+ ctrl->base = ioremap(ctrl->addr, ctrl->size);
+ if (!ctrl->base) {
+ pr_err("ioremap failed for %pa\n", &ctrl->addr);
+ err = -ENOMEM;
+ goto err_release;
+ }
+
+ switch (ctrl->type) {
+ case CBQRI_CONTROLLER_TYPE_CAPACITY:
+ err = cbqri_probe_cc(ctrl);
+ break;
+ default:
+ pr_err("unknown controller type %d\n", ctrl->type);
+ err = -ENODEV;
+ break;
+ }
+
+ if (err)
+ goto err_iounmap;
+
+ return 0;
+
+err_iounmap:
+ iounmap(ctrl->base);
+ ctrl->base = NULL;
+err_release:
+ release_mem_region(ctrl->addr, ctrl->size);
+ return err;
+}
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl)
+{
+ /*
+ * cbqri_probe_controller() clears ctrl->base on its error paths and
+ * releases the mem region itself, so reach into both only when
+ * destroy is rolling back a successful probe.
+ */
+ if (ctrl->base) {
+ iounmap(ctrl->base);
+ release_mem_region(ctrl->addr, ctrl->size);
+ }
+ kfree(ctrl);
+}
+
+/**
+ * riscv_cbqri_register_cc_dt() - register a DT-described capacity controller
+ * @info: registration descriptor. info->cache_id is used as the
+ * resctrl domain id. info->type must be CAPACITY.
+ * @cache_level: cache level (2 or 3) the controller backs, mapped to the
+ * resctrl L2/L3 resource by the resctrl glue.
+ * @cpu_mask: CPUs that share this cache.
+ *
+ * The cache topology is supplied directly by the caller. A device-tree
+ * platform driver that already knows which CPUs share the cache and at what
+ * level passes that in. There is no firmware table to resolve it from.
+ *
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask)
+{
+ struct cbqri_controller *ctrl;
+ int err;
+
+ if (!info->addr) {
+ pr_warn("skipping controller with invalid addr=0x0\n");
+ return -EINVAL;
+ }
+
+ if (info->type != CBQRI_CONTROLLER_TYPE_CAPACITY) {
+ pr_warn("register_cc_dt called with non-capacity type %u\n",
+ info->type);
+ return -EINVAL;
+ }
+
+ if (!cpu_mask || cpumask_empty(cpu_mask)) {
+ pr_warn("register_cc_dt called with empty cpu_mask\n");
+ return -EINVAL;
+ }
+
+ ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+ if (!ctrl)
+ return -ENOMEM;
+
+ mutex_init(&ctrl->lock);
+
+ ctrl->addr = info->addr;
+ ctrl->size = info->size;
+ ctrl->type = info->type;
+ ctrl->rcid_count = info->rcid_count;
+
+ /*
+ * SRMCFG encodes RCID in 12 bits. Reject an out-of-range count rather
+ * than silently truncating in every FIELD_PREP(SRMCFG_RCID_MASK, closid)
+ * on the schedule-in fast path.
+ */
+ if (ctrl->rcid_count > FIELD_MAX(SRMCFG_RCID_MASK) + 1) {
+ pr_warn("CC at %pa has RCID count %u beyond the 12-bit SRMCFG field, skipping\n",
+ &ctrl->addr, ctrl->rcid_count);
+ cbqri_controller_destroy(ctrl);
+ return -EINVAL;
+ }
+
+ ctrl->cache.cache_id = info->cache_id;
+ ctrl->cache.cache_level = cache_level;
+ cpumask_copy(&ctrl->cache.cpu_mask, cpu_mask);
+
+ err = cbqri_probe_controller(ctrl);
+ if (err) {
+ cbqri_controller_destroy(ctrl);
+ return err;
+ }
+
+ /*
+ * Allocation capability comes from the capabilities register probed
+ * above, not from device tree. rcid_count only bounds the RCID range,
+ * so a controller the hardware reports as alloc-capable but described
+ * with no RCID count cannot be driven. Reject that inconsistency. A
+ * monitoring-only controller (not alloc_capable) needs no RCID count.
+ */
+ if (ctrl->alloc_capable && !ctrl->rcid_count) {
+ pr_warn("CC at %pa is alloc-capable but has no RCID count, skipping\n",
+ &ctrl->addr);
+ cbqri_controller_destroy(ctrl);
+ return -EINVAL;
+ }
+
+ list_add_tail(&ctrl->list, &cbqri_controllers);
+ return 0;
+}
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
new file mode 100644
index 000000000000..cd6bc879b320
--- /dev/null
+++ b/drivers/resctrl/cbqri_internal.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+#define _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/* Capacity Controller (CC) MMIO register offsets. */
+#define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_ALLOC_CTL_OFF 24
+#define CBQRI_CC_BLOCK_MASK_OFF 32
+
+/*
+ * Smallest MMIO span the driver actually accesses: highest defined
+ * register offset (0x20) plus the 8-byte register width. Used by
+ * cbqri_probe_controller() to reject undersized firmware-supplied
+ * mappings before request_mem_region/ioremap, so a u64 access at
+ * BLOCK_MASK does not walk past the end of the mapping.
+ */
+#define CBQRI_CTRL_MIN_REG_SPAN 0x28u
+
+#define CBQRI_CC_CAPABILITIES_VER_MINOR_MASK GENMASK_ULL(3, 0)
+#define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK GENMASK_ULL(7, 4)
+#define CBQRI_CC_CAPABILITIES_NCBLKS_MASK GENMASK_ULL(23, 8)
+
+/*
+ * CC control registers are 64-bit. Keep every field mask GENMASK_ULL so
+ * FIELD_MODIFY() or ~mask on a u64 register never zero-extends a 32-bit
+ * mask and clobbers STATUS/BUSY/WPRI in bits 63:32 if RV32 support is
+ * added in the future.
+ */
+#define CBQRI_CONTROL_REGISTERS_OP_MASK GENMASK_ULL(4, 0)
+#define CBQRI_CONTROL_REGISTERS_AT_MASK GENMASK_ULL(7, 5)
+/* AT field values (CBQRI Table 1): data vs code half for CDP */
+#define CBQRI_CONTROL_REGISTERS_AT_DATA 0
+#define CBQRI_CONTROL_REGISTERS_AT_CODE 1
+#define CBQRI_CONTROL_REGISTERS_RCID_MASK GENMASK_ULL(19, 8)
+#define CBQRI_CONTROL_REGISTERS_STATUS_MASK GENMASK_ULL(38, 32)
+#define CBQRI_CONTROL_REGISTERS_BUSY_MASK GENMASK_ULL(39, 39)
+
+#define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT 2
+#define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS 1
+
+/* Capacity Controller hardware capabilities */
+struct riscv_cbqri_capacity_caps {
+ u16 ncblks;
+ bool supports_alloc_at_code;
+};
+
+/**
+ * struct cbqri_cc_config - desired capacity allocation state for one rcid
+ * @cbm: capacity block mask
+ * @at: AT half the @cbm applies to (CBQRI_CONTROL_REGISTERS_AT_DATA
+ * or CBQRI_CONTROL_REGISTERS_AT_CODE)
+ * @cdp_enabled: when false and the controller supports AT, mirror @cbm
+ * into the other AT half so both stay in sync
+ */
+struct cbqri_cc_config {
+ u64 cbm;
+ u32 at;
+ bool cdp_enabled;
+};
+
+struct cbqri_controller {
+ void __iomem *base;
+ /*
+ * Serializes the write-then-poll-busy MMIO sequences on this
+ * controller. Each CBQRI op may busy-wait up to 1 ms on slow
+ * firmware, so use a sleeping mutex (paired with the sleeping
+ * readq_poll_timeout() in cbqri_wait_busy_flag()) to keep
+ * preemption enabled, which is required for PREEMPT_RT.
+ * All resctrl-arch entry points run in process context.
+ */
+ struct mutex lock;
+
+ struct riscv_cbqri_capacity_caps cc;
+
+ bool alloc_capable;
+
+ phys_addr_t addr;
+ phys_addr_t size;
+ enum cbqri_controller_type type;
+ u32 rcid_count;
+
+ struct list_head list;
+
+ struct cache_controller {
+ u32 cache_level;
+ struct cpumask cpu_mask;
+ /* Cache id used as the resctrl domain id */
+ u32 cache_id;
+ } cache;
+};
+
+extern struct list_head cbqri_controllers;
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl);
+
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ const struct cbqri_cc_config *cfg);
+
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ u32 at, u32 *cbm_out);
+
+#endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */
diff --git a/include/linux/riscv_cbqri.h b/include/linux/riscv_cbqri.h
new file mode 100644
index 000000000000..cd62398bd5cb
--- /dev/null
+++ b/include/linux/riscv_cbqri.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Public registration API for the RISC-V Capacity QoS Register Interface
+ * (CBQRI) driver. Discovery layers (device tree platform drivers) call
+ * riscv_cbqri_register_cc_dt() to hand a capacity controller descriptor to
+ * the driver, which owns all subsequent state.
+ */
+#ifndef _LINUX_RISCV_CBQRI_H
+#define _LINUX_RISCV_CBQRI_H
+
+#include <linux/types.h>
+
+struct cpumask;
+
+enum cbqri_controller_type {
+ CBQRI_CONTROLLER_TYPE_CAPACITY,
+};
+
+/**
+ * struct cbqri_controller_info - registration descriptor
+ * @addr: MMIO base address of the controller's register interface
+ * @size: size of the MMIO region
+ * @type: controller type (capacity)
+ * @rcid_count: number of supported RCIDs
+ * @cache_id: cache id used as the resctrl domain id
+ */
+struct cbqri_controller_info {
+ phys_addr_t addr;
+ phys_addr_t size;
+ enum cbqri_controller_type type;
+ u32 rcid_count;
+ u32 cache_id;
+};
+
+#if IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask);
+#else
+static inline int
+riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask)
+{
+ return -ENODEV;
+}
+#endif
+
+#endif /* _LINUX_RISCV_CBQRI_H */
--
2.43.0
^ permalink raw reply related
* [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
RDT_RESOURCE_L3 schemata.
Mismatched CC caps at the same cache level are treated as a fatal
configuration error since fs/resctrl exposes a single per-rid cap
set. Domains are created lazily in the cpuhp online callback so
cpu_mask reflects only currently online CPUs.
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 2 +
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/resctrl.h | 152 ++++++++
drivers/resctrl/Kconfig | 4 +
drivers/resctrl/Makefile | 1 +
drivers/resctrl/cbqri_resctrl.c | 774 +++++++++++++++++++++++++++++++++++++++
6 files changed, 934 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index c090d52e9fa0..85d50efb6e5f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23297,9 +23297,11 @@ R: yunhui cui <cuiyunhui@bytedance.com>
L: linux-riscv@lists.infradead.org
S: Supported
F: arch/riscv/include/asm/qos.h
+F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
F: drivers/resctrl/cbqri_devices.c
F: drivers/resctrl/cbqri_internal.h
+F: drivers/resctrl/cbqri_resctrl.c
F: include/linux/riscv_cbqri.h
RISC-V RPMI AND MPXY DRIVERS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ee586925f972..9eb65d0eaa07 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -594,6 +594,7 @@ config RISCV_ISA_SSQOSID
bool "Ssqosid extension support for supervisor mode Quality of Service ID"
depends on 64BIT
default n
+ select ARCH_HAS_CPU_RESCTRL
help
Adds support for the Ssqosid ISA extension (Supervisor-mode
Quality of Service ID).
diff --git a/arch/riscv/include/asm/resctrl.h b/arch/riscv/include/asm/resctrl.h
new file mode 100644
index 000000000000..7392a099b6f8
--- /dev/null
+++ b/arch/riscv/include/asm/resctrl.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_RESCTRL_H
+#define _ASM_RISCV_RESCTRL_H
+
+#include <linux/resctrl_types.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include <asm/qos.h>
+
+struct rdt_resource;
+
+/*
+ * Sentinel "no CLOSID assigned" used by resctrl_arch_rmid_idx_decode().
+ * fs/resctrl treats this opaquely. CBQRI uses MCID directly as the linear
+ * rmid index, so closid is unused on decode.
+ */
+#define RISCV_RESCTRL_EMPTY_CLOSID ((u32)~0)
+
+/*
+ * Terminology mapping between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *
+ * CLOSID on x86 is RCID on RISC-V
+ * RMID on x86 is MCID on RISC-V
+ * CDP on x86 is AT (access type) on RISC-V
+ *
+ * Each fast-path arch entry point below is the RISC-V realization of the
+ * generic contract documented in <linux/resctrl.h>. Comments here describe
+ * only the RISC-V-specific behavior (srmcfg encoding, CBQRI controller
+ * lookup, MCID-as-index policy).
+ */
+
+/**
+ * resctrl_arch_alloc_capable() - any CBQRI controller exposes resctrl alloc
+ *
+ * Returns true once at least one CBQRI controller has successfully probed for
+ * a resctrl-exposed cache capacity allocation feature. Only meaningful after
+ * cbqri_resctrl_setup() runs at late_initcall.
+ */
+bool resctrl_arch_alloc_capable(void);
+
+/**
+ * resctrl_arch_mon_capable() - any CBQRI controller exposes resctrl monitoring
+ *
+ * The CBQRI driver implements capacity allocation only and wires up no
+ * monitoring events, so this always returns false. fs/resctrl references it
+ * unconditionally, hence the stub.
+ */
+bool resctrl_arch_mon_capable(void);
+
+/**
+ * resctrl_arch_rmid_idx_encode() - encode (RCID, MCID) into a linear index
+ * @closid: RCID (resource control id)
+ * @rmid: MCID (monitoring counter id)
+ *
+ * RISC-V uses MCID directly as the linear index into per-RMID arrays
+ * managed by fs/resctrl, since CBQRI controllers admit any MCID for any
+ * RCID. closid is unused here. CDP is encoded via the AT field on each
+ * CBQRI op rather than via the index.
+ */
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_rmid_idx_decode() - inverse of resctrl_arch_rmid_idx_encode()
+ * @idx: linear index
+ * @closid: out: always RISCV_RESCTRL_EMPTY_CLOSID
+ * @rmid: out: the MCID that @idx encodes
+ */
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+
+/**
+ * resctrl_arch_set_cpu_default_closid_rmid() - install per-CPU srmcfg default
+ * @cpu: CPU number
+ * @closid: RCID to use when no task is matched
+ * @rmid: MCID to use when no task is matched
+ *
+ * Sets the per-CPU cpu_srmcfg_default so __switch_to_srmcfg() can fall back
+ * to the CPU's default RCID/MCID for default-group tasks (those whose
+ * thread.srmcfg encodes to 0, i.e. closid == RESCTRL_RESERVED_CLOSID and
+ * rmid == RESCTRL_RESERVED_RMID). Implements resctrl allocation rule 2
+ * ("CPU default") on RISC-V.
+ */
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_sched_in() - context-switch hook to install task RCID/MCID
+ * @tsk: the task being scheduled in
+ *
+ * Called from finish_task_switch() to write tsk->thread.srmcfg into the
+ * srmcfg CSR. Tasks tagged with RISCV_RESCTRL_EMPTY_CLOSID inherit the
+ * per-CPU default set via resctrl_arch_set_cpu_default_closid_rmid().
+ */
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
+/**
+ * resctrl_arch_set_closid_rmid() - tag a task with an RCID/MCID
+ * @tsk: task to tag
+ * @closid: RCID to install
+ * @rmid: MCID to install
+ *
+ * Updates tsk->thread.srmcfg with the encoded (RCID, MCID) pair. The new
+ * value takes effect on the next resctrl_arch_sched_in() for this task.
+ */
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_match_closid() - test whether a task carries a given RCID
+ * @tsk: task
+ * @closid: RCID
+ */
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+
+/**
+ * resctrl_arch_match_rmid() - test whether a task carries a given (RCID, MCID)
+ * @tsk: task
+ * @closid: RCID
+ * @rmid: MCID
+ */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_mon_ctx_alloc() - allocate per-monitor-event arch context
+ * @r: resctrl resource being monitored
+ * @evtid: which monitor event needs context
+ *
+ * The CBQRI driver implements no monitoring events, so there is no per-event
+ * context to allocate and the stub returns NULL. fs/resctrl references it
+ * unconditionally before checking resctrl_arch_mon_capable().
+ */
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+
+/**
+ * resctrl_arch_mon_ctx_free() - release context returned by mon_ctx_alloc()
+ * @r: resctrl resource
+ * @evtid: monitor event id
+ * @arch_mon_ctx: pointer returned by resctrl_arch_mon_ctx_alloc()
+ */
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid,
+ void *arch_mon_ctx);
+
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ return val;
+}
+
+/* Not needed for RISC-V */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
+#endif /* _ASM_RISCV_RESCTRL_H */
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 8b16f69df17c..0887b6a9fac1 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -54,3 +54,7 @@ config RISCV_CBQRI_DRIVER_DEBUG
new platform; otherwise leave disabled to avoid log noise.
endif
+
+config RISCV_CBQRI_RESCTRL_FS
+ bool
+ default y if RISCV_CBQRI_DRIVER && RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 28085036d895..ed737b4461b9 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
cbqri-y += cbqri_devices.o
+cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS) += cbqri_resctrl.o
ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
new file mode 100644
index 000000000000..d354129cc34f
--- /dev/null
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -0,0 +1,774 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpufeature.h>
+#include <linux/cpuhotplug.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+#include <asm/qos.h>
+
+#include "cbqri_internal.h"
+
+struct cbqri_resctrl_res {
+ struct cbqri_controller *ctrl;
+ struct rdt_resource resctrl_res;
+ bool cdp_enabled;
+};
+
+struct cbqri_resctrl_dom {
+ struct rdt_ctrl_domain resctrl_ctrl_dom;
+ struct cbqri_controller *hw_ctrl;
+};
+
+static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
+
+static bool exposed_alloc_capable;
+
+/* Protects ctrl_domain list mutations across CPU hotplug. */
+static DEFINE_MUTEX(cbqri_domain_list_lock);
+
+static struct rdt_ctrl_domain *
+cbqri_find_ctrl_domain(struct list_head *h, int id)
+{
+ struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+ return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
+}
+
+/* Map a hardware cache level to its resctrl resource id, or -ENODEV. */
+static int cbqri_cache_level_to_rid(u32 cache_level)
+{
+ switch (cache_level) {
+ case 2:
+ return RDT_RESOURCE_L2;
+ case 3:
+ return RDT_RESOURCE_L3;
+ default:
+ return -ENODEV;
+ }
+}
+
+static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
+ struct rdt_resource *r,
+ u32 closid, enum resctrl_conf_type t,
+ u64 cbm)
+{
+ struct cbqri_resctrl_res *hw_res =
+ container_of(r, struct cbqri_resctrl_res, resctrl_res);
+ struct cbqri_cc_config cfg = {
+ .cbm = cbm,
+ .at = (t == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+ CBQRI_CONTROL_REGISTERS_AT_DATA,
+ .cdp_enabled = hw_res->cdp_enabled,
+ };
+
+ return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
+}
+
+bool resctrl_arch_alloc_capable(void)
+{
+ return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+ return false;
+}
+
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+ if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+ return false;
+ return cbqri_resctrl_resources[rid].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
+{
+ struct cbqri_resctrl_res *cbqri_res;
+
+ if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+ return -ENODEV;
+
+ cbqri_res = &cbqri_resctrl_resources[rid];
+ if (!cbqri_res->resctrl_res.cdp_capable)
+ return -ENODEV;
+
+ cbqri_res->cdp_enabled = enable;
+ return 0;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+ if (l >= RDT_NUM_RESOURCES)
+ return NULL;
+
+ return &cbqri_resctrl_resources[l].resctrl_res;
+}
+
+/*
+ * fs/resctrl unconditionally references the symbols below before checking
+ * mon_capable. They are stubs for features CBQRI does not yet support.
+ */
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+ return false;
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+ enum resctrl_event_id evtid)
+{
+ return NULL;
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+ enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+}
+
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid, u64 *val)
+{
+ return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ return false;
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ return -EOPNOTSUPP;
+}
+
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid)
+{
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+ return false;
+}
+
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+ return -EOPNOTSUPP;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+{
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, enum resctrl_event_id eventid)
+{
+}
+
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+ return -ENODATA;
+}
+
+/*
+ * Note about terminology between x86 (Intel RDT/AMD QoS) and RISC-V:
+ * CLOSID on x86 is RCID on RISC-V
+ * RMID on x86 is MCID on RISC-V
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
+{
+ struct cbqri_resctrl_res *hw_res;
+
+ hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
+
+ if (!hw_res->ctrl)
+ return 0;
+
+ return hw_res->ctrl->rcid_count;
+}
+
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+ return 1;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+ return rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+ *closid = RISCV_RESCTRL_EMPTY_CLOSID;
+ *rmid = idx;
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+ u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+ WRITE_ONCE(per_cpu(cpu_srmcfg_default, cpu), srmcfg);
+}
+
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+ __switch_to_srmcfg(tsk);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+ WRITE_ONCE(tsk->thread.srmcfg, srmcfg);
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+ struct resctrl_cpu_defaults *r = info;
+
+ lockdep_assert_preemption_disabled();
+
+ if (r) {
+ resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+ r->closid, r->rmid);
+ }
+
+ resctrl_arch_sched_in(current);
+}
+
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+ return FIELD_GET(SRMCFG_RCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == closid;
+}
+
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ return FIELD_GET(SRMCFG_MCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == rmid;
+}
+
+void resctrl_arch_pre_mount(void)
+{
+ /* All controllers discovered at boot via late_initcall. Nothing to do. */
+}
+
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ struct cbqri_resctrl_dom *dom;
+
+ dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+ if (!r->alloc_capable)
+ return -EINVAL;
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+ default:
+ return -EINVAL;
+ }
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+ struct resctrl_staged_config *cfg;
+ enum resctrl_conf_type t;
+ struct rdt_ctrl_domain *d;
+ int err = 0;
+
+ /* Walking r->ctrl_domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ cfg = &d->staged_config[t];
+ if (!cfg->have_new_ctrl)
+ continue;
+ err = resctrl_arch_update_one(r, d, closid, t, cfg->new_ctrl);
+ if (err)
+ return err;
+ }
+ }
+ return err;
+}
+
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type type)
+{
+ struct cbqri_resctrl_dom *hw_dom;
+ struct cbqri_controller *ctrl;
+ u32 at;
+ u32 val;
+ int err;
+
+ hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+ ctrl = hw_dom->hw_ctrl;
+ val = resctrl_get_default_ctrl(r);
+
+ if (!r->alloc_capable)
+ return val;
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ at = (type == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+ CBQRI_CONTROL_REGISTERS_AT_DATA;
+ err = cbqri_read_cache_config(ctrl, closid, at, &val);
+ if (err < 0)
+ val = resctrl_get_default_ctrl(r);
+ break;
+ default:
+ break;
+ }
+
+ return val;
+}
+
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+ struct cbqri_resctrl_res *hw_res;
+ struct rdt_ctrl_domain *d;
+ enum resctrl_conf_type t;
+ u32 default_ctrl;
+ int i;
+
+ lockdep_assert_cpus_held();
+
+ hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+ default_ctrl = resctrl_get_default_ctrl(r);
+
+ if (!hw_res->ctrl)
+ return;
+
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+ for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ int rerr;
+
+ rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+ if (rerr)
+ pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+ r->rid, i, t, rerr);
+ }
+ }
+ }
+}
+
+static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
+{
+ struct cbqri_resctrl_dom *hw_dom;
+ struct rdt_ctrl_domain *domain;
+
+ hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);
+ if (!hw_dom)
+ return NULL;
+
+ hw_dom->hw_ctrl = ctrl;
+ domain = &hw_dom->resctrl_ctrl_dom;
+
+ INIT_LIST_HEAD(&domain->hdr.list);
+
+ return domain;
+}
+
+static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
+{
+ struct cbqri_resctrl_res *hw_res;
+ enum resctrl_conf_type t;
+ int err = 0;
+ int i;
+
+ hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+
+ for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+ /*
+ * Seed both DATA and CODE staged slots so a later mount
+ * with -o cdp does not see stale CODE values.
+ * On non-AT controllers cbqri_cc_alloc_op() masks AT to 0
+ * so all three iterations land on the same hardware state.
+ * The redundant writes are harmless.
+ */
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ err = resctrl_arch_update_one(r, d, i, t,
+ resctrl_get_default_ctrl(r));
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+/*
+ * Walk cbqri_controllers and pick one capacity controller (CC) per cache
+ * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
+ * one CC sits at the same level (e.g. one per socket), they must agree on
+ * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
+ * exposes a single set of caps per rid. The first matching controller wins.
+ */
+static int cbqri_resctrl_pick_caches(void)
+{
+ struct cbqri_controller *ctrl;
+
+ list_for_each_entry(ctrl, &cbqri_controllers, list) {
+ struct cbqri_resctrl_res *cbqri_res;
+ int rid;
+
+ if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+ continue;
+ if (!ctrl->alloc_capable)
+ continue;
+
+ rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+ if (rid < 0) {
+ pr_err("unknown cache level %d\n",
+ ctrl->cache.cache_level);
+ return rid;
+ }
+
+ cbqri_res = &cbqri_resctrl_resources[rid];
+ if (cbqri_res->ctrl) {
+ /*
+ * CCs at the same cache level must agree on every cap
+ * resctrl exposes globally. Reject mismatches at pick
+ * time so the inconsistency is visible at boot.
+ */
+ if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
+ cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
+ cbqri_res->ctrl->cc.supports_alloc_at_code !=
+ ctrl->cc.supports_alloc_at_code ||
+ cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
+ pr_err("L%d controllers have mismatched capabilities\n",
+ ctrl->cache.cache_level);
+ return -EINVAL;
+ }
+ continue;
+ }
+
+ cbqri_res->ctrl = ctrl;
+ }
+
+ return 0;
+}
+
+/*
+ * Fill the rdt_resource fields for one picked rid. An rid with no picked
+ * controller is left untouched so it stays out of resctrl_arch_get_resource().
+ */
+static void cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
+{
+ struct cbqri_controller *ctrl = cbqri_res->ctrl;
+ struct rdt_resource *res = &cbqri_res->resctrl_res;
+
+ if (!ctrl)
+ return;
+
+ switch (res->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
+ res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+ res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
+ RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
+ res->cache.cbm_len = ctrl->cc.ncblks;
+ res->cache.shareable_bits = 0;
+ res->cache.min_cbm_bits = 1;
+ res->cache.arch_has_sparse_bitmasks = false;
+ res->cdp_capable = ctrl->cc.supports_alloc_at_code;
+ res->alloc_capable = ctrl->alloc_capable;
+ INIT_LIST_HEAD(&res->ctrl_domains);
+ INIT_LIST_HEAD(&res->mon_domains);
+ break;
+ default:
+ break;
+ }
+}
+
+static void cbqri_resctrl_accumulate_caps(void)
+{
+ int rid;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ continue;
+ if (hw_res->ctrl->alloc_capable)
+ exposed_alloc_capable = true;
+ }
+}
+
+/*
+ * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
+ * resource res, seeded with cpu and identified by dom_id. Caller must
+ * hold cbqri_domain_list_lock and must have already verified that no
+ * existing ctrl_domain on res carries this id.
+ */
+static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
+ struct rdt_resource *res,
+ unsigned int cpu, int dom_id)
+{
+ struct rdt_ctrl_domain *domain;
+ struct list_head *pos = NULL;
+ int err;
+
+ domain = cbqri_new_domain(ctrl);
+ if (!domain)
+ return ERR_PTR(-ENOMEM);
+
+ cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+ domain->hdr.id = dom_id;
+ domain->hdr.type = RESCTRL_CTRL_DOMAIN;
+
+ err = cbqri_init_domain_ctrlval(res, domain);
+ if (err) {
+ kfree(container_of(domain, struct cbqri_resctrl_dom,
+ resctrl_ctrl_dom));
+ return ERR_PTR(err);
+ }
+
+ /* Insert sorted by id so user-visible ordering is deterministic. */
+ resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
+ list_add_tail(&domain->hdr.list, pos);
+
+ resctrl_online_ctrl_domain(res, domain);
+
+ return domain;
+}
+
+static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
+ unsigned int cpu)
+{
+ struct cbqri_resctrl_res *hw_res;
+ struct rdt_ctrl_domain *domain;
+ struct rdt_resource *res;
+ int dom_id;
+ int rid;
+
+ rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+ if (rid < 0)
+ return 0;
+ hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ return 0;
+
+ res = &hw_res->resctrl_res;
+ dom_id = ctrl->cache.cache_id;
+
+ domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+ if (domain) {
+ cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+ return 0;
+ }
+
+ domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+ if (IS_ERR(domain))
+ return PTR_ERR(domain);
+
+ return 0;
+}
+
+static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
+ unsigned int cpu)
+{
+ struct rdt_ctrl_domain *domain, *tmp;
+
+ list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
+ if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
+ continue;
+ cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+ if (cpumask_empty(&domain->hdr.cpu_mask)) {
+ resctrl_offline_ctrl_domain(res, domain);
+ list_del(&domain->hdr.list);
+ kfree(container_of(domain, struct cbqri_resctrl_dom,
+ resctrl_ctrl_dom));
+ }
+ }
+}
+
+/*
+ * Remove a CPU from every domain it was attached to. The per-resource
+ * detach helpers act only when the CPU is set in a domain's mask, so this
+ * is idempotent and undoes a partial online attach as well as a full
+ * offline. Caller holds cbqri_domain_list_lock.
+ */
+static void cbqri_detach_cpu_from_all_ctrls(unsigned int cpu)
+{
+ int rid;
+
+ lockdep_assert_held(&cbqri_domain_list_lock);
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ continue;
+ cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+ }
+}
+
+/*
+ * Attach a CPU to every controller that claims it. On failure, detach the
+ * CPU from everything attached so far: the cpuhp core does not run this
+ * state's offline teardown when its startup fails, so a partial attach
+ * would otherwise leak into the domain cpu_masks. Caller holds
+ * cbqri_domain_list_lock.
+ */
+static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
+{
+ struct cbqri_controller *ctrl;
+ int err = 0;
+
+ lockdep_assert_held(&cbqri_domain_list_lock);
+
+ list_for_each_entry(ctrl, &cbqri_controllers, list) {
+ if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+ continue;
+ if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+ continue;
+ if (!ctrl->alloc_capable)
+ continue;
+
+ err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+ if (err) {
+ cbqri_detach_cpu_from_all_ctrls(cpu);
+ break;
+ }
+ }
+
+ return err;
+}
+
+static bool cbqri_resctrl_inited;
+
+static void cbqri_resctrl_teardown(void)
+{
+ int rid;
+
+ if (!cbqri_resctrl_inited)
+ return;
+
+ resctrl_exit();
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ hw_res->ctrl = NULL;
+ hw_res->cdp_enabled = false;
+ }
+ exposed_alloc_capable = false;
+ cbqri_resctrl_inited = false;
+}
+
+static int cbqri_resctrl_setup(void)
+{
+ int rid;
+ int err;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+ cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
+
+ err = cbqri_resctrl_pick_caches();
+ if (err)
+ return err;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+ cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
+
+ cbqri_resctrl_accumulate_caps();
+
+ if (!exposed_alloc_capable) {
+ pr_debug("no resctrl-capable CBQRI controllers found\n");
+ return -ENODEV;
+ }
+
+ err = resctrl_init();
+ if (err)
+ return err;
+
+ cbqri_resctrl_inited = true;
+ return 0;
+}
+
+static int cbqri_resctrl_online_cpu(unsigned int cpu)
+{
+ int err;
+
+ mutex_lock(&cbqri_domain_list_lock);
+ err = cbqri_attach_cpu_to_all_ctrls(cpu);
+ mutex_unlock(&cbqri_domain_list_lock);
+ if (err)
+ return err;
+
+ /*
+ * Seed the per-CPU default RCID/MCID to the reserved (0, 0) pair and
+ * notify the resctrl core so it tracks this CPU in the default group.
+ */
+ resctrl_arch_set_cpu_default_closid_rmid(cpu, 0, 0);
+ resctrl_online_cpu(cpu);
+ return 0;
+}
+
+static int cbqri_resctrl_offline_cpu(unsigned int cpu)
+{
+ resctrl_offline_cpu(cpu);
+
+ mutex_lock(&cbqri_domain_list_lock);
+ cbqri_detach_cpu_from_all_ctrls(cpu);
+ mutex_unlock(&cbqri_domain_list_lock);
+ return 0;
+}
+
+static int __init cbqri_arch_late_init(void)
+{
+ int err;
+
+ if (!riscv_isa_extension_available(NULL, SSQOSID))
+ return -ENODEV;
+
+ err = cbqri_resctrl_setup();
+ if (err)
+ return err;
+
+ err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
+ cbqri_resctrl_online_cpu,
+ cbqri_resctrl_offline_cpu);
+ if (err < 0) {
+ cbqri_resctrl_teardown();
+ return err;
+ }
+
+ return 0;
+}
+late_initcall(cbqri_arch_late_init);
--
2.43.0
^ permalink raw reply related
* [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally.
The resctrl filesystem integration is gated separately by
RISCV_CBQRI_RESCTRL_FS, a silent option that defaults to y when both
RISCV_CBQRI_DRIVER and RESCTRL_FS are enabled. Enabling the resctrl
filesystem itself stays a user choice via the standard fs/Kconfig
MISC_FILESYSTEMS menu.
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9eb65d0eaa07..cc261de01107 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
depends on 64BIT
default n
select ARCH_HAS_CPU_RESCTRL
+ select RISCV_CBQRI_DRIVER
help
Adds support for the Ssqosid ISA extension (Supervisor-mode
Quality of Service ID).
--
2.43.0
^ permalink raw reply related
* [PATCH 7/8] dt-bindings: riscv: Add generic CBQRI controller binding
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
Document the generic "riscv,cbqri-capacity-controller" and
"riscv,cbqri-bandwidth-controller" compatibles for controllers that
implement the RISC-V CBQRI specification, together with the common
riscv,cbqri-rcid / riscv,cbqri-mcid properties and the optional
riscv,cbqri-cache phandle that links a capacity controller to the cache
node whose capacity it allocates.
The CBQRI register interface is architected, so an implementation pairs
a device-specific compatible (for example "tenstorrent,ascalon-sc-cbqri")
with the generic one as a fallback. This lets a single binding and driver
cover every CBQRI controller without a per-vendor schema.
Assisted-by: Claude:claude-opus-4-8
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
.../devicetree/bindings/riscv/riscv,cbqri.yaml | 109 +++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 110 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml b/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
new file mode 100644
index 000000000000..cf007dcac0d7
--- /dev/null
+++ b/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
@@ -0,0 +1,109 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/riscv/riscv,cbqri.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V Capacity and Bandwidth Register Interface (CBQRI) controller
+
+description: |
+ The RISC-V CBQRI specification defines capacity-controller and
+ bandwidth-controller register blocks that allow cache capacity and memory
+ bandwidth to be allocated to resource control IDs (RCIDs) and monitored per
+ monitoring counter ID (MCID):
+ https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf
+
+ The register interface is architected, so a CBQRI controller is described by
+ a generic "riscv,cbqri-capacity-controller" or "riscv,cbqri-bandwidth-controller"
+ compatible. A device-specific compatible should be listed as the first entry
+ so a driver can apply integration-specific behavior (for example, the cache
+ level the controller backs).
+
+ A single controller register block may implement resource allocation and/or
+ usage monitoring: a capacity controller can allocate cache capacity to RCIDs
+ and monitor capacity usage per MCID, and a bandwidth controller likewise for
+ memory bandwidth. The allocation and monitoring register sets live in the
+ same block, and which of them a controller actually implements is discovered
+ at runtime from its capabilities register. The compatible therefore names the
+ controller type only; it does not distinguish allocation-only, monitoring-only
+ or combined controllers, and no property declares monitoring support.
+
+maintainers:
+ - Drew Fustini <fustini@kernel.org>
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - description: Tenstorrent Ascalon Shared Cache
+ const: tenstorrent,ascalon-sc-cbqri
+ - const: riscv,cbqri-capacity-controller
+ - enum:
+ - riscv,cbqri-capacity-controller
+ - riscv,cbqri-bandwidth-controller
+
+ reg:
+ maxItems: 1
+ description:
+ The CBQRI controller register block.
+
+ riscv,cbqri-rcid:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The maximum number of RCIDs the controller supports. RCIDs are the
+ resource-control IDs that allocation operations target.
+
+ riscv,cbqri-mcid:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The maximum number of MCIDs the controller supports. MCIDs are the
+ monitoring-counter IDs that usage-monitoring operations target; present
+ on controllers that implement monitoring.
+
+ riscv,cbqri-cache:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ Phandle to the cache node whose capacity this controller allocates.
+ Applies to capacity controllers that back a CPU cache; the cache level
+ and the harts sharing the cache are taken from that node's cache
+ topology.
+
+required:
+ - compatible
+ - reg
+
+allOf:
+ # The Ascalon SC is a fixed cache capacity-allocation controller: the driver
+ # cannot register it without the RCID count and the cache it backs.
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: tenstorrent,ascalon-sc-cbqri
+ then:
+ required:
+ - riscv,cbqri-rcid
+ - riscv,cbqri-cache
+
+additionalProperties: false
+
+examples:
+ - |
+ l2_cache: l2-cache {
+ compatible = "cache";
+ cache-level = <2>;
+ cache-unified;
+ cache-size = <0xc00000>;
+ cache-sets = <512>;
+ cache-block-size = <64>;
+ };
+
+ cache-controller@a21a00c0 {
+ compatible = "tenstorrent,ascalon-sc-cbqri",
+ "riscv,cbqri-capacity-controller";
+ reg = <0xa21a00c0 0xf40>;
+ riscv,cbqri-rcid = <16>;
+ riscv,cbqri-cache = <&l2_cache>;
+ };
+
+...
diff --git a/MAINTAINERS b/MAINTAINERS
index 85d50efb6e5f..f8b995420c11 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23296,6 +23296,7 @@ M: Drew Fustini <fustini@kernel.org>
R: yunhui cui <cuiyunhui@bytedance.com>
L: linux-riscv@lists.infradead.org
S: Supported
+F: Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
F: arch/riscv/include/asm/qos.h
F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
--
2.43.0
^ permalink raw reply related
* [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>
The CBQRI core exposes riscv_cbqri_register_cc_dt() so a device-tree
platform driver can register a capacity controller whose cache topology
it already knows.
Add a platform driver, bound to the generic compatible, that discovers a
CBQRI capacity controller backing a CPU cache from its DT node and
registers it via riscv_cbqri_register_cc_dt() as the resctrl cache
allocation resource. The controller node carries a riscv,cbqri-cache
phandle to the cache it governs.
The driver reads cache-level from that node and matches it against
cacheinfo to obtain both the resctrl domain id and the set of harts that
share the cache. The node must also provide riscv,cbqri-rcid. Nothing in
the driver is vendor-specific, so any SoC that describes a CBQRI
capacity controller in device tree can reuse it unchanged. The DT "reg"
describes the CBQRI capacity-controller register block directly, so the
driver only ever touches the CBQRI registers.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 1 +
drivers/resctrl/Kconfig | 15 +++++
drivers/resctrl/Makefile | 1 +
drivers/resctrl/cbqri_capacity.c | 132 +++++++++++++++++++++++++++++++++++++++
4 files changed, 149 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index f8b995420c11..53d29dbbdb63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23300,6 +23300,7 @@ F: Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
F: arch/riscv/include/asm/qos.h
F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
+F: drivers/resctrl/cbqri_capacity.c
F: drivers/resctrl/cbqri_devices.c
F: drivers/resctrl/cbqri_internal.h
F: drivers/resctrl/cbqri_resctrl.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 0887b6a9fac1..bc1e71cc6dfc 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -53,6 +53,21 @@ config RISCV_CBQRI_DRIVER_DEBUG
per-controller registration steps. Useful when bringing up a
new platform; otherwise leave disabled to avoid log noise.
+config RISCV_CBQRI_CAPACITY
+ bool "RISC-V CBQRI cache capacity-allocation controller"
+ depends on OF
+ help
+ Device-tree platform driver for a RISC-V CBQRI capacity controller
+ that backs a CPU cache, matching the generic
+ "riscv,cbqri-capacity-controller" compatible. The controller node
+ carries a phandle to the cache it governs; the driver derives the
+ cache level and the set of harts that share it from the kernel
+ cache topology and registers it as the resctrl cache-allocation
+ resource.
+
+ Say N unless your SoC describes a CBQRI capacity controller in its
+ device tree.
+
endif
config RISCV_CBQRI_RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index ed737b4461b9..78d729534e3e 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -7,5 +7,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
cbqri-y += cbqri_devices.o
cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS) += cbqri_resctrl.o
+cbqri-$(CONFIG_RISCV_CBQRI_CAPACITY) += cbqri_capacity.o
ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_capacity.c b/drivers/resctrl/cbqri_capacity.c
new file mode 100644
index 000000000000..2172432eb328
--- /dev/null
+++ b/drivers/resctrl/cbqri_capacity.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Platform driver for a RISC-V CBQRI capacity controller that backs a CPU
+ * cache. The controller is described in device tree by the generic
+ * "riscv,cbqri-capacity-controller" compatible together with a phandle to the
+ * cache node it governs. The driver hands it to the CBQRI core, which probes
+ * the capabilities register and exposes a controller that supports allocation
+ * as the resctrl cache allocation resource for that cache.
+ */
+
+#define pr_fmt(fmt) "cbqri-capacity: " fmt
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/ioport.h>
+#include <linux/mod_devicetable.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/types.h>
+
+static int cbqri_capacity_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct cbqri_controller_info info = {};
+ struct device_node *cache_np;
+ cpumask_var_t cpu_mask;
+ struct resource *res;
+ u32 rcid_count, cache_level;
+ int cache_id, cpu, ret;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -EINVAL;
+
+ ret = of_property_read_u32(dev->of_node, "riscv,cbqri-rcid", &rcid_count);
+ if (ret) {
+ dev_err(dev, "missing riscv,cbqri-rcid\n");
+ return ret;
+ }
+
+ cache_np = of_parse_phandle(dev->of_node, "riscv,cbqri-cache", 0);
+ if (!cache_np) {
+ dev_err(dev, "missing riscv,cbqri-cache phandle\n");
+ return -EINVAL;
+ }
+
+ ret = of_property_read_u32(cache_np, "cache-level", &cache_level);
+ if (ret) {
+ dev_err(dev, "%pOF: missing cache-level\n", cache_np);
+ goto out_put;
+ }
+
+ if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
+ ret = -ENOMEM;
+ goto out_put;
+ }
+
+ /*
+ * Associate the controller with its cache instance via
+ * cacheinfo. The matching cache provides the cache id and the
+ * set of harts that share the cache.
+ */
+ cache_id = -1;
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, cache_level);
+
+ if (ci && ci->fw_token == cache_np) {
+ cache_id = ci->id;
+ cpumask_copy(cpu_mask, &ci->shared_cpu_map);
+ break;
+ }
+ }
+ cpus_read_unlock();
+
+ if (cache_id < 0) {
+ dev_err(dev, "%pOF: no online hart reports an L%u cache for this node\n",
+ cache_np, cache_level);
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ info.type = CBQRI_CONTROLLER_TYPE_CAPACITY;
+ info.addr = res->start;
+ info.size = resource_size(res);
+ info.rcid_count = rcid_count;
+ info.cache_id = cache_id;
+
+ ret = riscv_cbqri_register_cc_dt(&info, cache_level, cpu_mask);
+ if (ret) {
+ dev_err(dev, "failed to register capacity controller: %d\n", ret);
+ goto out_free;
+ }
+
+ dev_info(dev, "registered L%u capacity controller at %pa (cache_id=%d, rcid=%u)\n",
+ cache_level, &info.addr, cache_id, rcid_count);
+
+out_free:
+ free_cpumask_var(cpu_mask);
+out_put:
+ of_node_put(cache_np);
+ return ret;
+}
+
+static const struct of_device_id cbqri_capacity_of_match[] = {
+ { .compatible = "riscv,cbqri-capacity-controller" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, cbqri_capacity_of_match);
+
+static struct platform_driver cbqri_capacity_driver = {
+ .probe = cbqri_capacity_probe,
+ .driver = {
+ .name = "cbqri-capacity",
+ .of_match_table = cbqri_capacity_of_match,
+ /*
+ * The controller is registered permanently into the
+ * CBQRI core for the life of the system. Block unbind
+ * so userspace cannot leave a dangling controller.
+ */
+ .suppress_bind_attrs = true,
+ },
+};
+
+/*
+ * Register at device_initcall so probe runs before the CBQRI core's
+ * late_initcall which walks the cbqri_controllers list.
+ */
+builtin_platform_driver(cbqri_capacity_driver);
--
2.43.0
^ permalink raw reply related
* Re: [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops
From: sashiko-bot @ 2026-06-19 18:40 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-rt-devel, linux-kernel, devicetree, conor+dt, robh
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-4-e79a7723fab0@kernel.org>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unsynchronized mutation of the global `cbqri_controllers` list during controller registration.
--
commit 9ebbab9c601133916a2f8c8128683147a9f601b3
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: Add capacity controller probe and allocation device ops
This commit adds support for the RISC-V CBQRI capacity controller. It
provides a platform driver interface to probe the controller and register it
to a global controller list.
> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> new file mode 100644
> index 0000000000000..cc4ec3f25ac18
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_devices.c
[ ... ]
> +int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
> + u32 cache_level, const struct cpumask *cpu_mask)
> +{
[ ... ]
> + if (ctrl->alloc_capable && !ctrl->rcid_count) {
> + pr_warn("CC at %pa is alloc-capable but has no RCID count, skipping\n",
> + &ctrl->addr);
> + cbqri_controller_destroy(ctrl);
> + return -EINVAL;
> + }
> +
> + list_add_tail(&ctrl->list, &cbqri_controllers);
[Severity: High]
Does this list insertion need synchronization?
If the kernel is configured to probe platform devices asynchronously (e.g.,
via the driver_async_probe kernel parameter) and there are multiple CBQRI
capacity controller nodes in the device tree, multiple probes could execute
riscv_cbqri_register_cc_dt() concurrently.
This lockless modification to the global cbqri_controllers list could
corrupt the list's next and prev pointers and result in a kernel panic.
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=4
^ permalink raw reply
* Re: [PATCH 3/8] riscv: Add support for srmcfg CSR from Ssqosid extension
From: sashiko-bot @ 2026-06-19 18:40 UTC (permalink / raw)
To: Drew Fustini; +Cc: robh, linux-kernel, linux-rt-devel, conor+dt, devicetree
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-3-e79a7723fab0@kernel.org>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The CPU PM exit notifier invalidates the software CSR cache but fails to eagerly restore the hardware CSR_SRMCFG, leaving the currently executing task running with incorrect QoS IDs during system resume.
- [Medium] The CPU hotplug teardown callback's cache invalidation is prematurely overwritten by the idle task context switch, leaving the hardware CSR out of sync when the CPU comes back online.
--
commit 083587978f6f45cace44a919e47c9276b0701827
Author: Drew Fustini <fustini@kernel.org>
riscv: Add support for srmcfg CSR from Ssqosid extension
This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
extension. It allows setting the RCID and MCID for each task on context
switch to associate resource usage with the running task, and caches the
CSR state per-cpu to avoid redundant writes.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
> diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> index 0000000000000..42f1ff9b219d9
> --- /dev/null
> +++ b/arch/riscv/kernel/qos.c
[ ... ]
> +static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
> + unsigned long action, void *unused)
> +{
> + switch (action) {
> + case CPU_PM_EXIT:
> + case CPU_PM_ENTER_FAILED:
> + __this_cpu_write(cpu_srmcfg, U32_MAX);
> + break;
[Severity: Medium]
Will the currently executing task run with an incorrect hardware QoS ID during
resume?
When the system resumes from deep idle or suspend, the hardware CSR_SRMCFG
is reset to an implementation-defined state. While this invalidates the
software cache, it doesn't immediately reprogram the hardware CSR.
Because there is no immediate context switch, does this leave the currently
running task executing with the reset hardware state until it is eventually
scheduled out?
Should this explicitly call __switch_to_srmcfg(current) to eagerly restore
the hardware state?
> + }
> + return NOTIFY_OK;
> +}
[ ... ]
> +static int __init riscv_srmcfg_init(void)
> +{
> + int err;
> +
> + if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> + return 0;
> +
> + /*
> + * cpuhp_setup_state() invokes the startup callback locally on every
> + * already-online CPU, so no separate seed loop is needed here.
> + */
> + err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> + riscv_srmcfg_reset_cache, riscv_srmcfg_reset_cache);
[Severity: Medium]
Is it possible the teardown callback's cache invalidation gets prematurely
overwritten by the idle task context switch?
The teardown callback runs at CPUHP_AP_ONLINE_DYN during CPU offline, which
sets the cache to U32_MAX. However, after this callback finishes, the dying
CPU must still context switch to the idle task to halt.
Will this context switch trigger __switch_to_srmcfg(idle_task), overwriting
the U32_MAX sentinel before the CPU powers off?
If so, when the CPU comes back online with a reset hardware CSR, and
schedules its first task, could it match the stale cache and skip the CSR
write, leaving the hardware CSR out of sync?
> + if (err < 0) {
> + pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> + err);
> + return err;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=3
^ permalink raw reply
* Re: [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
From: sashiko-bot @ 2026-06-19 18:41 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-rt-devel, devicetree, conor+dt, linux-kernel, robh
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-8-e79a7723fab0@kernel.org>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] `get_cpu_cacheinfo_level()` fails to discover capacity controllers on split caches.
Pre-existing issues:
- [High] Global list `cbqri_controllers` is modified concurrently without a lock.
--
commit a409dc1e72a5139a209d981205820cfe3a0eb512
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
The CBQRI core exposes riscv_cbqri_register_cc_dt() so a device-tree
platform driver can register a capacity controller whose cache topology
it already knows.
> diff --git a/drivers/resctrl/cbqri_capacity.c b/drivers/resctrl/cbqri_capacity.c
> new file mode 100644
> index 0000000000000..2172432eb3287
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_capacity.c
[ ... ]
> + /*
> + * Associate the controller with its cache instance via
> + * cacheinfo. The matching cache provides the cache id and the
> + * set of harts that share the cache.
> + */
> + cache_id = -1;
> + cpus_read_lock();
> + for_each_online_cpu(cpu) {
> + struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, cache_level);
> +
> + if (ci && ci->fw_token == cache_np) {
> + cache_id = ci->id;
> + cpumask_copy(cpu_mask, &ci->shared_cpu_map);
> + break;
> + }
> + }
> + cpus_read_unlock();
[Severity: High]
Will this fail to discover capacity controllers on split caches?
get_cpu_cacheinfo_level() only returns the first cache leaf that matches
the requested level and stops searching. If a CPU has multiple caches at
the same level (e.g., split L1 Instruction and L1 Data caches), and the
capacity controller governs the second one, get_cpu_cacheinfo_level() will
return the first cache. Its fw_token will not match cache_np.
Consequently, the loop will never find the matching cache, cache_id will
remain -1, and the probe will incorrectly fail with -ENODEV.
[ ... ]
> + info.type = CBQRI_CONTROLLER_TYPE_CAPACITY;
> + info.addr = res->start;
> + info.size = resource_size(res);
> + info.rcid_count = rcid_count;
> + info.cache_id = cache_id;
> +
> + ret = riscv_cbqri_register_cc_dt(&info, cache_level, cpu_mask);
> + if (ret) {
> + dev_err(dev, "failed to register capacity controller: %d\n", ret);
> + goto out_free;
> + }
[Severity: High]
This is a pre-existing issue, but can this trigger list corruption if
multiple instances of the platform driver probe concurrently?
The probe function cbqri_capacity_probe() calls
riscv_cbqri_register_cc_dt(). Inside riscv_cbqri_register_cc_dt(), the newly
allocated controller is appended to the global cbqri_controllers list:
riscv_cbqri_register_cc_dt() {
...
list_add_tail(&ctrl->list, &cbqri_controllers);
return 0;
}
This modification happens without any locks. If the driver core probes
multiple CBQRI devices concurrently (e.g., due to asynchronous probing),
list_add_tail() will race on the shared list. Additionally, an asynchronous
probe could race with CPU hotplug events iterating over cbqri_controllers.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=8
^ permalink raw reply
* Re: [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
From: sashiko-bot @ 2026-06-19 18:42 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-kernel, linux-rt-devel, conor+dt, devicetree, robh
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-5-e79a7723fab0@kernel.org>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] Presence of an unsupported cache level causes complete resctrl initialization failure instead of skipping the unknown cache.
- [Low] Enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without CONFIG_RISCV_CBQRI_DRIVER results in a build failure due to missing resctrl_arch_* implementations.
Pre-existing issues:
- [High] Software fails to set the BUSY bit when initiating CBQRI control operations, causing the hardware to never execute the command.
--
commit 44771fe47dad2b92018db7b06dd244e6b25cc218
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: resctrl: Add cache allocation via capacity block mask
This commit wires up the CBQRI capacity controllers into the resctrl
subsystem as RDT_RESOURCE_L2 and RDT_RESOURCE_L3 schemata. It treats
mismatched capacity controller capabilities at the same cache level as a
fatal configuration error. It also lazily creates domains in the CPU
hotplug online callback.
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ee586925f9722..9eb65d0eaa070 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -594,6 +594,7 @@ config RISCV_ISA_SSQOSID
> bool "Ssqosid extension support for supervisor mode Quality of Service ID"
> depends on 64BIT
> default n
> + select ARCH_HAS_CPU_RESCTRL
[Severity: Low]
Does enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without
CONFIG_RISCV_CBQRI_DRIVER result in a build failure?
Selecting ARCH_HAS_CPU_RESCTRL here expects the architecture implementations
to be available. However, looking at drivers/resctrl/Kconfig,
RISCV_CBQRI_RESCTRL_FS is only selected if RISCV_CBQRI_DRIVER is enabled.
Could this configuration mismatch cause a link error during the kernel
build because the required resctrl_arch_* hooks are not compiled?
[ ... ]
> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> new file mode 100644
> index 0000000000000..d354129cc34fe
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -0,0 +1,774 @@
[ ... ]
> +static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
> + struct rdt_resource *r,
> + u32 closid, enum resctrl_conf_type t,
> + u64 cbm)
> +{
> + struct cbqri_resctrl_res *hw_res =
> + container_of(r, struct cbqri_resctrl_res, resctrl_res);
> + struct cbqri_cc_config cfg = {
> + .cbm = cbm,
> + .at = (t == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
> + CBQRI_CONTROL_REGISTERS_AT_DATA,
> + .cdp_enabled = hw_res->cdp_enabled,
> + };
> +
> + return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
> +}
[Severity: High]
This is a pre-existing issue, but does the software fail to set the BUSY bit
when initiating CBQRI control operations in the underlying hardware driver?
This problem was introduced earlier in the same series. Looking at
cbqri_cc_alloc_op() and cbqri_probe_feature() in
drivers/resctrl/cbqri_devices.c, the driver modifies and writes the control
register but omits setting the CBQRI_CONTROL_REGISTERS_BUSY_MASK bit to 1
before issuing the MMIO write.
Because the hardware relies on the BUSY bit to trigger the state machine,
would the operations never start? It seems the subsequent wait-loop
incorrectly succeeds immediately because it polls for BUSY == 0, leading
the driver to silently believe the hardware executed the command.
[ ... ]
> +static int cbqri_resctrl_pick_caches(void)
> +{
> + struct cbqri_controller *ctrl;
> +
> + list_for_each_entry(ctrl, &cbqri_controllers, list) {
> + struct cbqri_resctrl_res *cbqri_res;
> + int rid;
> +
> + if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> + continue;
> + if (!ctrl->alloc_capable)
> + continue;
> +
> + rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
> + if (rid < 0) {
> + pr_err("unknown cache level %d\n",
> + ctrl->cache.cache_level);
> + return rid;
[Severity: Medium]
If the device tree specifies a CBQRI controller for an unsupported cache
level like L4, does this entirely abort the setup of the resctrl filesystem?
Instead of returning -ENODEV and causing complete resctrl initialization
failure, would it be better to skip the unknown cache with a continue
statement? Otherwise, this prevents QoS from functioning on perfectly valid
and supported caches like L2 and L3.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=5
^ permalink raw reply
* Re: [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid
From: sashiko-bot @ 2026-06-19 18:45 UTC (permalink / raw)
To: Drew Fustini; +Cc: devicetree, robh, conor+dt, linux-kernel, linux-rt-devel
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-6-e79a7723fab0@kernel.org>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The CBQRI cache controllers lose their MMIO configurations (Capacity Block Masks) across system suspend, but the driver implements no mechanism to restore them on resume.
- [High] The per-CPU `CSR_SRMCFG` hardware register is not restored upon CPU wakeup from low-power states, allowing execution with an invalid RCID.
--
commit eb40435445d4659ca3c62bd3e4e35f35a4c254c5
Author: Drew Fustini <fustini@kernel.org>
riscv: Enable resctrl filesystem for Ssqosid
This commit enables the CBQRI driver for RISC-V when the Ssqosid ISA extension
is enabled. It links the RISC-V architecture support with the generic resctrl
filesystem logic.
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 9eb65d0eaa070..cc261de01107f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
> depends on 64BIT
> default n
> select ARCH_HAS_CPU_RESCTRL
> + select RISCV_CBQRI_DRIVER
[Severity: High]
Does enabling this driver introduce a regression in power management?
Looking at the CBQRI driver in drivers/resctrl/cbqri_capacity.c, the platform
driver cbqri_capacity_driver is registered but doesn't implement any
dev_pm_ops:
drivers/resctrl/cbqri_capacity.c:
static struct platform_driver cbqri_capacity_driver = {
.probe = cbqri_capacity_probe,
.driver = {
.name = "cbqri-capacity",
...
During deep system suspend, power to the SoC cache controllers is typically
gated, causing the partitioned Capacity Block Masks (CBMs) managed in
drivers/resctrl/cbqri_devices.c to revert to default permissive values.
Since the driver lacks PM callbacks (like dev_pm_ops or syscore_ops) to
restore the software-configured state on resume, will cache partitioning
guarantees be broken across suspend/resume cycles?
[Severity: High]
Does this also introduce a regression with restoring the CPU CSR_SRMCFG
register after low-power transitions?
In arch/riscv/kernel/qos.c:riscv_srmcfg_pm_notify(), the CPU_PM_EXIT handler
invalidates the software cache by writing U32_MAX to cpu_srmcfg:
case CPU_PM_EXIT:
case CPU_PM_ENTER_FAILED:
__this_cpu_write(cpu_srmcfg, U32_MAX);
break;
However, it does not explicitly rewrite the hardware CSR_SRMCFG register
itself.
Since the Ssqosid extension leaves the CSR state implementation-defined
across low-power transitions, could the CPU resume execution with the
reset value (0)?
This would mean memory accesses are performed with an incorrect RCID until
the next context switch invokes __switch_to_srmcfg(), which would
temporarily break cache partitioning guarantees.
> help
> Adds support for the Ssqosid ISA extension (Supervisor-mode
> Quality of Service ID).
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=6
^ permalink raw reply
* Re: [PATCH 00/11] ARM: NXP: Drop NOMMU platform support
From: Arnd Bergmann @ 2026-06-19 19:17 UTC (permalink / raw)
To: Vladimir Zapolskiy, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Stefan Agner, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Russell King, Abel Vesa,
Peng Fan, Michael Turquette, Stephen Boyd, Brian Masney,
Aisheng Dong, Jacky Bai, NXP S32 Linux Team, Linus Walleij,
Piotr Wojtaszczyk, Kees Cook, Gustavo A. R. Silva
Cc: linux-arm-kernel, imx, devicetree, linux-kernel, linux-clk,
open list:GPIO SUBSYSTEM, linux-hardening, Frank Li
In-Reply-To: <7d946861-c3cb-4512-9d5f-9f4cb9b7ee8a@kernel.org>
On Fri, Jun 19, 2026, at 18:07, Vladimir Zapolskiy wrote:
> On 6/19/26 18:40, Frank.Li@oss.nxp.com wrote:
>> Commercial users and hardware vendors migrated to Zephyr or other RTOS
>> solutions years ago, leaving the NOMMU platform support effectively
>> unused and unmaintained.
>>
>> Remove the obsolete support to reduce maintenance burden and simplify the
>> Freescale/nxp platform code.
>>
>> Some driver code still be kept and may clean up later since it is possible
>> reused by other SoC.
Thanks a lot for going through these already!
>> Signed-off-by: Frank Li <Frank.Li@nxp.com>
>
> This change is a bit too early to happen, I prefer to get it orchestrated
> by Arnd. So, as for today I NAK the change for its NXP LPC part.
I am planning to post a series of deprecation notices for platforms
that I would like to remove for one reason or another. Since it's only
one more merge before the next (S)LTS kernel, my idea was to post
the series once v7.2-rc1 is out, merge it into 7.3-LTS and remove
the deprecated code early next year once the LTS release is announced.
I have rebased uploaded my current draft to
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git/log/?h=rfc-arm-deprecation-7.2
None of those are finalized of course, and we can do any part
of it earlier or later (or not at all) if there is a good reason.
If you to remove the vf610m4/imx7d-cm4/lpc43xx/lpc18xx portions
earlier, that is definitely fine with me. For imxrt1050, there a
slightly higher chance that this is still used, so I would
prefer to wait for the LTS kernel on that one.
>> Frank Li (11):
>> ARM: dts: vf610m4: Remove NOMMU platform support
>> ARM: dts: imxrt1050: Remove NOMMU platform support
>> ARM: imx: Remove NOMMU platform support
>> clk: imx: imxrt1050: Remove NOMMU platform support
>> pinctrl: freescale: IMXRT: Remove NOMMU platform support
>> ARM: imxrt_defconfig: Remove NOMMU platform support
>> ARM: dts: lpc: Remove NOMMU platform support
>> ARM: mach-lpc: Remove NOMMU platform support
>> ARM: configs: lpc*: Remove NOMMU platform support
>> clk: nxp: lpc: Remove NOMMU platform support
>> pinctrl: nxp: lpc: Remove NOMMU platform support
>
> NXP LPC32xx is ARMv5 and it has MMU, hence it's plainly out of scope of
> the proposed "dropping NOMMU platform support".
Agreed. There are a few more platforms with MMU that I would like
to drop because they were never converted to devicetree support,
but there is nothing wrong with lpc32xx.
Arnd
^ permalink raw reply
* [PATCH v2 0/3] Add Qualcomm SPEL powercap driver
From: Manaf Meethalavalappu Pallikunhi @ 2026-06-19 20:39 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Rafael J. Wysocki,
Bjorn Andersson, Konrad Dybcio, Daniel Lezcano
Cc: Gaurav Kohli, linux-arm-msm, devicetree, linux-kernel, linux-pm,
Manaf Meethalavalappu Pallikunhi
This patch series introduces support for Qualcomm's SPEL (SoC Power
and Electrical Limits) hardware, which provides power monitoring and
limiting capabilities for various power domains in Qualcomm SoCs.
The SPEL hardware enables:
- Real-time energy consumption monitoring across multiple power domains
- Hardware-enforced power limits with configurable time windows
- Support for up to 4 power limits (PL1-PL4) per domain
- Hierarchical power domain organization (SYS -> SOC -> subsystems)
The driver integrates with the Linux powercap framework, exposing SPEL
capabilities through standard sysfs interfaces. This allows userspace
applications and thermal management daemons to monitor energy consumption
and configure power limits for optimal power/performance balance.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
---
Changes in v2:
- Fixed DT issues: corrected unit-address mismatch and made compatible
platform-specific.(Konrad/Krzysztof)
- Cleaned up bitfield handling: switched to GENMASK, FIELD_* macros.(Konrad)
- Converted hex values to lowercase and simplified array definitions.(Konrad)
- used devm_platform_ioremap_resource_byname().(Konrad)
- Refactored data structures: merged domain data into spel_domain_info
and removed hardcoded limits using ARRAY_SIZE().(Daniel)
- Added bounds checks (TIME_WINDOW_MAX) and fixed time window RMW
handling (preserving reserved bits + firmware notify).(Sashiko)
- Resolved use-after-free, replaced allocation strategy, and
added proper cleanup in .release().(Sashiko)
- Updated time window calculation logic.
- Removed cyclic dependency, improved validation logic, and aligned
naming consistency.(Daniel)
- Link to v1: https://lore.kernel.org/r/20260519-qcom_spel_driver_upstream-v1-0-75356d1b7f94@oss.qualcomm.com
---
Manaf Meethalavalappu Pallikunhi (3):
dt-bindings: power: limits: Describe Qualcomm SPEL hardware
powercap: qcom: Add SPEL powercap driver
arm64: dts: qcom: glymur: Enable SPEL powercap driver
.../bindings/power/limits/qcom,spel.yaml | 47 ++
MAINTAINERS | 7 +
arch/arm64/boot/dts/qcom/glymur.dtsi | 10 +
drivers/powercap/Kconfig | 13 +
drivers/powercap/Makefile | 1 +
drivers/powercap/qcom_spel.c | 776 +++++++++++++++++++++
6 files changed, 854 insertions(+)
---
base-commit: 80dd246accce631c328ea43294e53b2b2dd2aa32
change-id: 20260519-qcom_spel_driver_upstream-11efd8c85e6e
Best regards,
--
Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
^ permalink raw reply
* [PATCH v2 1/3] dt-bindings: power: limits: Describe Qualcomm SPEL hardware
From: Manaf Meethalavalappu Pallikunhi @ 2026-06-19 20:39 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Rafael J. Wysocki,
Bjorn Andersson, Konrad Dybcio, Daniel Lezcano
Cc: Gaurav Kohli, linux-arm-msm, devicetree, linux-kernel, linux-pm,
Manaf Meethalavalappu Pallikunhi
In-Reply-To: <20260620-qcom_spel_driver_upstream-v2-0-a3ee6837c18f@oss.qualcomm.com>
The Qualcomm SoC Power and Electrical Limits (SPEL) provides hardware
based power monitoring and limiting capabilities for various domains.
Add a DeviceTree binding to describe the SPEL block on Qualcomm's SoC.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
---
.../bindings/power/limits/qcom,spel.yaml | 47 ++++++++++++++++++++++
MAINTAINERS | 6 +++
2 files changed, 53 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/limits/qcom,spel.yaml b/Documentation/devicetree/bindings/power/limits/qcom,spel.yaml
new file mode 100644
index 000000000000..4c6e6cbfbfe4
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/limits/qcom,spel.yaml
@@ -0,0 +1,47 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/power/limits/qcom,spel.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm SoC Power and Electrical Limits (SPEL)
+
+maintainers:
+ - Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
+
+description:
+ The Qualcomm SPEL (SoC Power and Electrical Limits) provides hardware-based
+ power monitoring and limiting capabilities for various power domains in
+ Qualcomm SoCs.
+
+properties:
+ compatible:
+ const: qcom,glymur-spel
+
+ reg:
+ maxItems: 3
+
+ reg-names:
+ items:
+ - const: config
+ - const: constraints
+ - const: nodes
+
+required:
+ - compatible
+ - reg
+ - reg-names
+
+additionalProperties: false
+
+examples:
+ - |
+ power-limits@ef3b000 {
+ compatible = "qcom,glymur-spel";
+ reg = <0x0ef3b000 0x1000>,
+ <0x0ef3d000 0x1000>,
+ <0x0ef3e000 0x1000>;
+ reg-names = "config",
+ "constraints",
+ "nodes";
+ };
diff --git a/MAINTAINERS b/MAINTAINERS
index 3dd58a16f06a..c63f147e8c54 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22233,6 +22233,12 @@ S: Maintained
F: Documentation/devicetree/bindings/power/supply/qcom,pmi8998-charger.yaml
F: drivers/power/supply/qcom_smbx.c
+QUALCOMM SPEL POWERCAP DRIVER
+M: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
+L: linux-arm-msm@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/power/limits/qcom,spel.yaml
+
QUALCOMM PPE DRIVER
M: Luo Jie <quic_luoj@quicinc.com>
L: netdev@vger.kernel.org
--
2.43.0
^ permalink raw reply related
* [PATCH v2 2/3] powercap: qcom: Add SPEL powercap driver
From: Manaf Meethalavalappu Pallikunhi @ 2026-06-19 20:39 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Rafael J. Wysocki,
Bjorn Andersson, Konrad Dybcio, Daniel Lezcano
Cc: Gaurav Kohli, linux-arm-msm, devicetree, linux-kernel, linux-pm,
Manaf Meethalavalappu Pallikunhi
In-Reply-To: <20260620-qcom_spel_driver_upstream-v2-0-a3ee6837c18f@oss.qualcomm.com>
The Qualcomm SoC Power and Electrical Limits (SPEL) provides hardware
based power monitoring and limiting capabilities for various power
domains including System, SoC, CPU clusters, GPU, and various other
subsystems.
The driver integrates with the Linux powercap framework, exposing SPEL
capabilities through powercap sysfs interfaces.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
---
MAINTAINERS | 1 +
drivers/powercap/Kconfig | 13 +
drivers/powercap/Makefile | 1 +
drivers/powercap/qcom_spel.c | 776 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 791 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index c63f147e8c54..5c7542754ab6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22238,6 +22238,7 @@ M: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
L: linux-arm-msm@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/power/limits/qcom,spel.yaml
+F: drivers/powercap/qcom_spel.c
QUALCOMM PPE DRIVER
M: Luo Jie <quic_luoj@quicinc.com>
diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 03c4c796d993..e3a47c653499 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -93,4 +93,17 @@ config DTPM_DEVFREQ
help
This enables support for device power limitation based on
energy model.
+
+config QCOM_SPEL
+ tristate "Qualcomm SPEL Powercap driver"
+ depends on ARM64 || COMPILE_TEST
+ help
+ This enables support for the Qualcomm SoC Power and Electrical
+ Limits (SPEL) hardware, which allows power limits to be
+ enforced and monitored on Qualcomm SoCs.
+
+ SPEL provides energy monitoring and power capping for multiple
+ domains including system, SoC, CPU clusters, GPU, and various
+ other subsystems.
+
endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 5ab0dce565b9..8235fb9d3df6 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
obj-$(CONFIG_INTEL_RAPL_TPMI) += intel_rapl_tpmi.o
obj-$(CONFIG_IDLE_INJECT) += idle_inject.o
obj-$(CONFIG_ARM_SCMI_POWERCAP) += arm_scmi_powercap.o
+obj-$(CONFIG_QCOM_SPEL) += qcom_spel.o
diff --git a/drivers/powercap/qcom_spel.c b/drivers/powercap/qcom_spel.c
new file mode 100644
index 000000000000..4dd91cf36ccc
--- /dev/null
+++ b/drivers/powercap/qcom_spel.c
@@ -0,0 +1,776 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Qualcomm SPEL (SoC Power and Electrical Limits) Driver
+ *
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/powercap.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+/* SPEL register bitmasks */
+#define ENERGY_STATUS_MASK GENMASK(31, 0)
+
+#define POWER_LIMIT_MASK GENMASK(14, 0)
+#define POWER_LIMIT_ENABLE BIT(31)
+
+#define TIME_WINDOW_MASK_L GENMASK(14, 0)
+#define TIME_WINDOW_MASK_H GENMASK(22, 16)
+#define TIME_WINDOW_MAX ((FIELD_MAX(TIME_WINDOW_MASK_H) << 15) | \
+ FIELD_MAX(TIME_WINDOW_MASK_L))
+
+#define ENERGY_UNIT_MASK GENMASK(19, 16)
+#define TIME_UNIT_MASK GENMASK(11, 8)
+#define POWER_UNIT_MASK GENMASK(2, 0)
+
+#define LIMITS_CAPABILITY_OFFSET 0x20
+#define ENERGY_RPT_UNIT_OFFSET 0x04
+
+#define ENERGY_UNIT_SCALE 1000
+
+#define SPEL_DOMAIN_NAME_LENGTH 16
+
+/* Domain types */
+enum spel_domain_type {
+ SPEL_DOMAIN_SYS,
+ SPEL_DOMAIN_SOC,
+ SPEL_DOMAIN_CL0,
+ SPEL_DOMAIN_CL1,
+ SPEL_DOMAIN_CL2,
+ SPEL_DOMAIN_IGPU,
+ SPEL_DOMAIN_DGPU,
+ SPEL_DOMAIN_NSP,
+ SPEL_DOMAIN_MMCX,
+ SPEL_DOMAIN_INFRA,
+ SPEL_DOMAIN_DRAM,
+ SPEL_DOMAIN_MDM,
+ SPEL_DOMAIN_WLAN,
+ SPEL_DOMAIN_USB1,
+ SPEL_DOMAIN_USB2,
+ SPEL_DOMAIN_USB3,
+ SPEL_DOMAIN_MAX,
+};
+
+/* Power limit IDs */
+enum spel_power_limit_id {
+ POWER_LIMIT1,
+ POWER_LIMIT2,
+ POWER_LIMIT3,
+ POWER_LIMIT4,
+ POWER_LIMITS_MAX,
+};
+
+/* Unit types for conversion */
+enum unit_type {
+ POWER_UNIT,
+ ENERGY_UNIT,
+ TIME_UNIT,
+};
+
+/* Power limit operation types */
+enum pl_ops_type {
+ PL_LIMIT,
+ PL_TIME_WINDOW,
+};
+
+static const char * const pl_names[] = {
+ [POWER_LIMIT1] = "pl1",
+ [POWER_LIMIT2] = "pl2",
+ [POWER_LIMIT3] = "pl3",
+ [POWER_LIMIT4] = "pl4",
+};
+
+/**
+ * struct spel_domain_info - Domain configuration
+ * @name: Domain name
+ * @offset: Register offset in node base
+ */
+struct spel_domain_info {
+ const char *name;
+ u32 offset;
+};
+
+/* Domain configuration */
+static const struct spel_domain_info domain_info[SPEL_DOMAIN_MAX] = {
+ [SPEL_DOMAIN_SYS] = { "sys", 0x40 },
+ [SPEL_DOMAIN_SOC] = { "soc", 0x00 },
+ [SPEL_DOMAIN_CL0] = { "cl0", 0x5c },
+ [SPEL_DOMAIN_CL1] = { "cl1", 0x60 },
+ [SPEL_DOMAIN_CL2] = { "cl2", 0x64 },
+ [SPEL_DOMAIN_IGPU] = { "igpu", 0x08 },
+ [SPEL_DOMAIN_DGPU] = { "dgpu", 0x44 },
+ [SPEL_DOMAIN_NSP] = { "nsp", 0x0c },
+ [SPEL_DOMAIN_MMCX] = { "mmcx", 0x10 },
+ [SPEL_DOMAIN_INFRA] = { "infra", 0x18 },
+ [SPEL_DOMAIN_DRAM] = { "dram", 0x1c },
+ [SPEL_DOMAIN_MDM] = { "mdm", 0x48 },
+ [SPEL_DOMAIN_WLAN] = { "wlan", 0x4c },
+ [SPEL_DOMAIN_USB1] = { "usb1", 0x50 },
+ [SPEL_DOMAIN_USB2] = { "usb2", 0x54 },
+ [SPEL_DOMAIN_USB3] = { "usb3", 0x58 },
+};
+
+/**
+ * struct spel_constraint_info - Power limit constraint information
+ * @limit_offset: Register offset for power limit value
+ * @time_window_offset: Register offset for time window
+ * @supported_mask: Bit mask in capability register
+ * @domain_id: Domain this constraint applies to
+ * @pl_id: Power limit ID (PL1, PL2, etc.)
+ */
+struct spel_constraint_info {
+ u32 limit_offset;
+ u32 time_window_offset;
+ u32 supported_mask;
+ enum spel_domain_type domain_id;
+ int pl_id;
+};
+
+/* Constraint configuration */
+static const struct spel_constraint_info constraints[] = {
+ /* SYS domain constraints */
+ { 0x10, 0x70, BIT(0), SPEL_DOMAIN_SYS, POWER_LIMIT1 },
+ { 0x14, 0x74, BIT(1), SPEL_DOMAIN_SYS, POWER_LIMIT2 },
+ { 0x18, 0x78, BIT(2), SPEL_DOMAIN_SYS, POWER_LIMIT3 },
+ { 0x1c, 0x7c, BIT(3), SPEL_DOMAIN_SYS, POWER_LIMIT4 },
+ /* SoC domain constraints */
+ { 0x00, 0x60, BIT(4), SPEL_DOMAIN_SOC, POWER_LIMIT1 },
+ { 0x04, 0x64, BIT(5), SPEL_DOMAIN_SOC, POWER_LIMIT2 },
+ { 0x08, 0x68, BIT(6), SPEL_DOMAIN_SOC, POWER_LIMIT3 },
+ { 0x0c, 0x6c, BIT(7), SPEL_DOMAIN_SOC, POWER_LIMIT4 },
+};
+
+/**
+ * struct spel_domain - SPEL power domain
+ * @power_zone: Powercap zone
+ * @lock: Mutex protecting register access
+ * @sp: Parent sys domain
+ * @status_reg: Energy counter register
+ * @name: Domain name
+ * @id: Domain type ID
+ */
+struct spel_domain {
+ struct powercap_zone power_zone;
+ struct mutex lock; /* Protects register read/write operations */
+ void *sp;
+ void __iomem *status_reg;
+ char name[SPEL_DOMAIN_NAME_LENGTH];
+ enum spel_domain_type id;
+};
+
+/**
+ * struct spel_system - SPEL system
+ * @domains: Array of domains
+ * @power_zone: Parent powercap zone
+ * @node_base: Base address for node registers
+ * @constraint_base: Base address for constraint registers
+ * @config_base: Base address for config registers
+ * @control_type: Powercap control type
+ * @dev: Device pointer for logging
+ * @limits: Supported power limits per domain
+ * @power_unit: Power unit in microWatts (common for all domains)
+ * @energy_unit: Energy unit in nanoJoules (common for all domains)
+ * @time_unit: Time unit in microseconds (common for all domains)
+ */
+struct spel_system {
+ struct spel_domain *domains;
+ struct powercap_zone *power_zone;
+ void __iomem *node_base;
+ void __iomem *constraint_base;
+ void __iomem *config_base;
+ struct powercap_control_type *control_type;
+ struct device *dev;
+ int limits[SPEL_DOMAIN_MAX];
+ unsigned int power_unit;
+ unsigned int energy_unit;
+ unsigned int time_unit;
+};
+
+#define power_zone_to_spel_domain(_zone) \
+ container_of(_zone, struct spel_domain, power_zone)
+
+static bool is_pl_valid(struct spel_domain *sd, int pl)
+{
+ struct spel_system *sp = sd->sp;
+
+ return !!(sp->limits[sd->id] & BIT(pl));
+}
+
+static int get_pl_ops_offset(struct spel_domain *sd, int pl, enum pl_ops_type pl_op)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(constraints); i++) {
+ const struct spel_constraint_info *ci = &constraints[i];
+
+ if (ci->domain_id == sd->id && ci->pl_id == pl) {
+ switch (pl_op) {
+ case PL_LIMIT:
+ return ci->limit_offset;
+ case PL_TIME_WINDOW:
+ return ci->time_window_offset;
+ default:
+ return -EOPNOTSUPP;
+ }
+ }
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static u64 spel_unit_xlate(struct spel_domain *sd, enum unit_type type,
+ u64 value, int to_raw)
+{
+ struct spel_system *sp = sd->sp;
+ u64 units, scale;
+
+ switch (type) {
+ case POWER_UNIT:
+ units = sp->power_unit;
+ scale = 1;
+ break;
+ case ENERGY_UNIT:
+ units = sp->energy_unit;
+ scale = ENERGY_UNIT_SCALE;
+ break;
+ case TIME_UNIT:
+ units = sp->time_unit;
+ scale = 1;
+ break;
+ default:
+ return value;
+ }
+
+ if (to_raw)
+ return DIV_ROUND_CLOSEST_ULL(value * scale, units);
+
+ value *= units;
+ return div64_u64(value, scale);
+}
+
+static int spel_read_pl_data(struct spel_domain *sd, int pl,
+ enum pl_ops_type pl_op, bool xlate, u64 *data)
+{
+ struct spel_system *sp = sd->sp;
+ void __iomem *reg_addr;
+ u64 value;
+ int offset;
+
+ if (!is_pl_valid(sd, pl))
+ return -EINVAL;
+
+ offset = get_pl_ops_offset(sd, pl, pl_op);
+ if (offset < 0)
+ return offset;
+
+ guard(mutex)(&sd->lock);
+
+ reg_addr = sp->constraint_base + offset;
+ value = readl(reg_addr);
+
+ switch (pl_op) {
+ case PL_LIMIT:
+ value = FIELD_GET(POWER_LIMIT_MASK, value);
+ if (xlate)
+ *data = spel_unit_xlate(sd, POWER_UNIT, value, 0);
+ else
+ *data = value;
+ break;
+ case PL_TIME_WINDOW:
+ /* Decode time window: bits [22:16] are upper 7 bits, [14:0] are lower 15 bits */
+ value = (FIELD_GET(TIME_WINDOW_MASK_H, value) << 15) |
+ FIELD_GET(TIME_WINDOW_MASK_L, value);
+ if (xlate)
+ *data = spel_unit_xlate(sd, TIME_UNIT, value, 0);
+ else
+ *data = value;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int spel_write_pl_data(struct spel_domain *sd, int pl,
+ enum pl_ops_type pl_op, unsigned long long value)
+{
+ struct spel_system *sp = sd->sp;
+ void __iomem *reg_addr;
+ u64 reg_val, new_val;
+ int offset;
+
+ if (!is_pl_valid(sd, pl))
+ return -EINVAL;
+
+ offset = get_pl_ops_offset(sd, pl, pl_op);
+ if (offset < 0)
+ return offset;
+
+ guard(mutex)(&sd->lock);
+
+ reg_addr = sp->constraint_base + offset;
+ reg_val = readl(reg_addr);
+
+ switch (pl_op) {
+ case PL_LIMIT:
+ new_val = spel_unit_xlate(sd, POWER_UNIT, value, 1);
+ if (new_val > FIELD_MAX(POWER_LIMIT_MASK))
+ return -EINVAL;
+ reg_val = (reg_val & ~POWER_LIMIT_MASK) | FIELD_PREP(POWER_LIMIT_MASK, new_val);
+
+ /*
+ * Enable/Disable PL based on the value:
+ * - If value is 0, disable the PL (clear enable bit)
+ * - If value is non-zero, enable the PL (set enable bit)
+ */
+ if (new_val == 0)
+ reg_val &= ~POWER_LIMIT_ENABLE;
+ else
+ reg_val |= POWER_LIMIT_ENABLE;
+
+ writel(reg_val, reg_addr);
+ return 0;
+
+ case PL_TIME_WINDOW:
+ /*
+ * Encode time window: upper 7 bits to [22:16], lower 15 bits to [14:0]
+ */
+ new_val = spel_unit_xlate(sd, TIME_UNIT, value, 1);
+ if (new_val > TIME_WINDOW_MAX)
+ return -EINVAL;
+ /* Read-modify-write to preserve other bits */
+ reg_val = (reg_val & ~(TIME_WINDOW_MASK_H | TIME_WINDOW_MASK_L)) |
+ FIELD_PREP(TIME_WINDOW_MASK_H, new_val >> 15) |
+ FIELD_PREP(TIME_WINDOW_MASK_L, new_val);
+ writel(reg_val, reg_addr);
+
+ /*
+ * Time window register update doesn't trigger firmware interrupt.
+ * Write to the PL register with current value to trigger the interrupt.
+ */
+ offset = get_pl_ops_offset(sd, pl, PL_LIMIT);
+ if (offset >= 0) {
+ reg_addr = sp->constraint_base + offset;
+ reg_val = readl(reg_addr);
+ writel(reg_val, reg_addr);
+ }
+ return 0;
+
+ default:
+ return -EINVAL;
+ }
+}
+
+static int spel_get_energy_counter(struct powercap_zone *power_zone, u64 *energy_raw)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ u64 value;
+
+ value = readl(sd->status_reg);
+
+ *energy_raw = spel_unit_xlate(sd, ENERGY_UNIT, value, 0);
+
+ return 0;
+}
+
+static int spel_get_max_energy_counter(struct powercap_zone *pcd_dev, u64 *energy)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(pcd_dev);
+
+ *energy = spel_unit_xlate(sd, ENERGY_UNIT, ENERGY_STATUS_MASK, 0);
+
+ return 0;
+}
+
+static int spel_release_zone(struct powercap_zone *power_zone)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ struct spel_system *sp = sd->sp;
+
+ /* Free the domains array when the last zone (SYS domain) is released */
+ if (sd->id == SPEL_DOMAIN_SYS)
+ kfree(sp->domains);
+
+ return 0;
+}
+
+static int spel_find_nr_power_limit(struct spel_domain *sd)
+{
+ int i, nr_pl = 0;
+
+ for (i = 0; i < ARRAY_SIZE(pl_names); i++) {
+ if (is_pl_valid(sd, i))
+ nr_pl++;
+ }
+
+ return nr_pl;
+}
+
+static const struct powercap_zone_ops zone_ops = {
+ .get_energy_uj = spel_get_energy_counter,
+ .get_max_energy_range_uj = spel_get_max_energy_counter,
+ .release = spel_release_zone,
+};
+
+static int spel_constraint_to_pl(struct spel_domain *sd, int cid)
+{
+ int i, id;
+
+ for (i = 0, id = 0; i < ARRAY_SIZE(pl_names); i++) {
+ if (is_pl_valid(sd, i) && id++ == cid)
+ return i;
+ }
+
+ return -EINVAL;
+}
+
+static int spel_set_power_limit(struct powercap_zone *power_zone, int cid,
+ u64 power_limit)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ int id;
+
+ id = spel_constraint_to_pl(sd, cid);
+ if (id < 0)
+ return id;
+
+ return spel_write_pl_data(sd, id, PL_LIMIT, power_limit);
+}
+
+static int spel_get_power_limit(struct powercap_zone *power_zone, int cid,
+ u64 *data)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ u64 val;
+ int ret, id;
+
+ id = spel_constraint_to_pl(sd, cid);
+ if (id < 0)
+ return id;
+
+ ret = spel_read_pl_data(sd, id, PL_LIMIT, true, &val);
+ if (!ret)
+ *data = val;
+
+ return ret;
+}
+
+static int spel_set_time_window(struct powercap_zone *power_zone, int cid,
+ u64 window)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ int id;
+
+ id = spel_constraint_to_pl(sd, cid);
+ if (id < 0)
+ return id;
+
+ return spel_write_pl_data(sd, id, PL_TIME_WINDOW, window);
+}
+
+static int spel_get_time_window(struct powercap_zone *power_zone, int cid,
+ u64 *data)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ u64 val;
+ int ret, id;
+
+ id = spel_constraint_to_pl(sd, cid);
+ if (id < 0)
+ return id;
+
+ ret = spel_read_pl_data(sd, id, PL_TIME_WINDOW, true, &val);
+ if (!ret)
+ *data = val;
+
+ return ret;
+}
+
+static const char *spel_get_constraint_name(struct powercap_zone *power_zone,
+ int cid)
+{
+ struct spel_domain *sd = power_zone_to_spel_domain(power_zone);
+ int id;
+
+ id = spel_constraint_to_pl(sd, cid);
+ if (id >= 0 && id < ARRAY_SIZE(pl_names))
+ return pl_names[id];
+
+ return NULL;
+}
+
+static const struct powercap_zone_constraint_ops constraint_ops = {
+ .set_power_limit_uw = spel_set_power_limit,
+ .get_power_limit_uw = spel_get_power_limit,
+ .set_time_window_us = spel_set_time_window,
+ .get_time_window_us = spel_get_time_window,
+ .get_name = spel_get_constraint_name,
+};
+
+static void spel_init_domains(struct spel_system *sp)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(domain_info); i++) {
+ struct spel_domain *sd = &sp->domains[i];
+
+ sd->sp = sp;
+ snprintf(sd->name, SPEL_DOMAIN_NAME_LENGTH, "%s",
+ domain_info[i].name);
+ sd->id = i;
+ sd->status_reg = sp->node_base + domain_info[i].offset;
+
+ /* PL1 is always supported (required for powercap registration) */
+ sp->limits[i] = BIT(POWER_LIMIT1);
+ }
+}
+
+static void spel_update_unit(struct spel_system *sp)
+{
+ u32 value, shift;
+
+ /* Read power_unit and time_unit from offset 0x0 */
+ value = readl(sp->config_base);
+
+ /*
+ * Unit calculation: 1 / (2^shift)
+ * Masks limit: TIME_UNIT (4 bits, max 15), POWER_UNIT (3 bits, max 7).
+ */
+ shift = FIELD_GET(POWER_UNIT_MASK, value);
+ sp->power_unit = 1000000 / (1 << shift);
+
+ shift = FIELD_GET(TIME_UNIT_MASK, value);
+ /*
+ * Time window in register is in milliseconds.
+ */
+ sp->time_unit = 1000 * (1 << shift);
+
+ /* Read energy_unit from ENERGY_RPT_UNIT_OFFSET */
+ value = readl(sp->config_base + ENERGY_RPT_UNIT_OFFSET);
+
+ /*
+ * Unit calculation: 1 / (2^shift)
+ * Masks limit: ENERGY_UNIT (4 bits, max 15).
+ */
+ shift = FIELD_GET(ENERGY_UNIT_MASK, value);
+ sp->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << shift);
+
+ dev_dbg(sp->dev, "Units: energy=%dnJ, time=%dus, power=%duW\n",
+ sp->energy_unit, sp->time_unit, sp->power_unit);
+}
+
+static void spel_detect_powerlimit(struct spel_domain *sd)
+{
+ struct spel_system *sp = sd->sp;
+ u32 capabilities;
+ int i, j;
+
+ capabilities = readl(sp->config_base + LIMITS_CAPABILITY_OFFSET);
+
+ /*
+ * Detect power limits from hardware capabilities.
+ * Start from index 1 (POWER_LIMIT2) since PL1 is always enabled in spel_init_domains().
+ */
+ for (i = 1; i < ARRAY_SIZE(pl_names); i++) {
+ for (j = 0; j < ARRAY_SIZE(constraints); j++) {
+ const struct spel_constraint_info *ci = &constraints[j];
+
+ if (ci->domain_id == sd->id && ci->pl_id == i) {
+ if (capabilities & ci->supported_mask)
+ sp->limits[sd->id] |= BIT(i);
+ break;
+ }
+ }
+ }
+}
+
+static int spel_init_system(struct spel_system *sp, struct device *dev)
+{
+ int i, ret;
+
+ /* Read unit configuration (common for all domains) */
+ spel_update_unit(sp);
+
+ sp->domains = kcalloc(ARRAY_SIZE(domain_info),
+ sizeof(struct spel_domain), GFP_KERNEL);
+ if (!sp->domains)
+ return -ENOMEM;
+
+ spel_init_domains(sp);
+
+ for (i = 0; i < ARRAY_SIZE(domain_info); i++) {
+ struct spel_domain *sd = &sp->domains[i];
+
+ ret = devm_mutex_init(dev, &sd->lock);
+ if (ret) {
+ dev_err(dev, "Failed to initialize mutex for domain %s\n", sd->name);
+ kfree(sp->domains);
+ return ret;
+ }
+
+ spel_detect_powerlimit(sd);
+ }
+
+ return 0;
+}
+
+static int spel_register_powercap(struct spel_system *sp)
+{
+ struct spel_domain *sd;
+ struct powercap_zone *power_zone;
+ int nr_pl, ret, i;
+
+ /* Register SYS domain as parent zone */
+ sd = &sp->domains[SPEL_DOMAIN_SYS];
+ nr_pl = spel_find_nr_power_limit(sd);
+
+ power_zone = powercap_register_zone(&sd->power_zone,
+ sp->control_type, sd->name,
+ NULL, &zone_ops, nr_pl,
+ &constraint_ops);
+ if (IS_ERR(power_zone)) {
+ dev_err(sp->dev, "Failed to register power zone %s\n",
+ sd->name);
+ return PTR_ERR(power_zone);
+ }
+ sp->power_zone = power_zone;
+
+ /* Register other domains as children */
+ for (i = 0; i < ARRAY_SIZE(domain_info); i++) {
+ struct powercap_zone *parent;
+
+ if (i == SPEL_DOMAIN_SYS)
+ continue;
+
+ sd = &sp->domains[i];
+
+ /* SOC is child of SYS, others are children of SOC */
+ if (i == SPEL_DOMAIN_SOC)
+ parent = sp->power_zone;
+ else
+ parent = &sp->domains[SPEL_DOMAIN_SOC].power_zone;
+
+ nr_pl = spel_find_nr_power_limit(sd);
+ power_zone = powercap_register_zone(&sd->power_zone,
+ sp->control_type,
+ sd->name, parent,
+ &zone_ops, nr_pl,
+ &constraint_ops);
+
+ if (IS_ERR(power_zone)) {
+ dev_err(sp->dev, "Failed to register power_zone %s\n",
+ sd->name);
+ ret = PTR_ERR(power_zone);
+ goto err_cleanup;
+ }
+ }
+
+ return 0;
+
+err_cleanup:
+ /* Unregister in reverse order: children first, then SOC, then SYS */
+ for (i = i - 1; i >= 0; i--)
+ powercap_unregister_zone(sp->control_type, &sp->domains[i].power_zone);
+
+ return ret;
+}
+
+static int spel_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct spel_system *sp;
+ int ret;
+
+ sp = devm_kzalloc(dev, sizeof(*sp), GFP_KERNEL);
+ if (!sp)
+ return -ENOMEM;
+
+ sp->dev = dev;
+
+ /* Map config registers (units, capabilities) */
+ sp->config_base = devm_platform_ioremap_resource_byname(pdev, "config");
+ if (IS_ERR(sp->config_base))
+ return PTR_ERR(sp->config_base);
+
+ /* Map constraint registers (power limits) */
+ sp->constraint_base = devm_platform_ioremap_resource_byname(pdev, "constraints");
+ if (IS_ERR(sp->constraint_base))
+ return PTR_ERR(sp->constraint_base);
+
+ /* Map spel domain registers (energy counters) */
+ sp->node_base = devm_platform_ioremap_resource_byname(pdev, "nodes");
+ if (IS_ERR(sp->node_base))
+ return PTR_ERR(sp->node_base);
+
+ sp->control_type = powercap_register_control_type(NULL, "qcom-spel",
+ NULL);
+ if (IS_ERR(sp->control_type)) {
+ dev_err(dev, "Failed to register control type\n");
+ return PTR_ERR(sp->control_type);
+ }
+
+ /* Initialize system and domains */
+ ret = spel_init_system(sp, dev);
+ if (ret) {
+ dev_err(dev, "Failed to initialize system\n");
+ goto err_unregister_control;
+ }
+
+ ret = spel_register_powercap(sp);
+ if (ret) {
+ dev_err(dev, "Failed to register powercap zones\n");
+ if (!sp->power_zone)
+ kfree(sp->domains);
+ goto err_unregister_control;
+ }
+
+ platform_set_drvdata(pdev, sp);
+
+ return 0;
+
+err_unregister_control:
+ powercap_unregister_control_type(sp->control_type);
+ return ret;
+}
+
+static void spel_remove(struct platform_device *pdev)
+{
+ struct spel_system *sp = platform_get_drvdata(pdev);
+ int i;
+
+ /* Unregister in reverse order: children first, then SOC, then SYS */
+ for (i = ARRAY_SIZE(domain_info) - 1; i >= 0; i--)
+ powercap_unregister_zone(sp->control_type, &sp->domains[i].power_zone);
+
+ powercap_unregister_control_type(sp->control_type);
+}
+
+static const struct of_device_id spel_of_match[] = {
+ { .compatible = "qcom,glymur-spel" },
+ { }
+};
+MODULE_DEVICE_TABLE(of, spel_of_match);
+
+static struct platform_driver spel_driver = {
+ .probe = spel_probe,
+ .remove = spel_remove,
+ .driver = {
+ .name = "qcom_spel",
+ .of_match_table = spel_of_match,
+ },
+};
+
+module_platform_driver(spel_driver);
+
+MODULE_DESCRIPTION("Qualcomm SPEL Powercap Driver");
+MODULE_LICENSE("GPL");
--
2.43.0
^ permalink raw reply related
* [PATCH v2 3/3] arm64: dts: qcom: glymur: Enable SPEL powercap driver
From: Manaf Meethalavalappu Pallikunhi @ 2026-06-19 20:39 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Rafael J. Wysocki,
Bjorn Andersson, Konrad Dybcio, Daniel Lezcano
Cc: Gaurav Kohli, linux-arm-msm, devicetree, linux-kernel, linux-pm,
Manaf Meethalavalappu Pallikunhi
In-Reply-To: <20260620-qcom_spel_driver_upstream-v2-0-a3ee6837c18f@oss.qualcomm.com>
The Qualcomm SoC Power and Electrical Limits (SPEL) provides hardware
based power monitoring and limiting capabilities for various power
domains including System, SoC, CPU clusters, GPU, and various other
subsystems for glymur.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
---
arch/arm64/boot/dts/qcom/glymur.dtsi | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/glymur.dtsi b/arch/arm64/boot/dts/qcom/glymur.dtsi
index 72c7dc6e4f09..9cf9180e4f8f 100644
--- a/arch/arm64/boot/dts/qcom/glymur.dtsi
+++ b/arch/arm64/boot/dts/qcom/glymur.dtsi
@@ -5033,6 +5033,16 @@ spmi_bus2: spmi@c48000 {
};
};
+ power-limits@ef3b000 {
+ compatible = "qcom,glymur-spel";
+ reg = <0x0 0x0ef3b000 0x0 0x1000>,
+ <0x0 0x0ef3d000 0x0 0x1000>,
+ <0x0 0x0ef3e000 0x0 0x1000>;
+ reg-names = "config",
+ "constraints",
+ "nodes";
+ };
+
tlmm: pinctrl@f100000 {
compatible = "qcom,glymur-tlmm";
reg = <0x0 0x0f100000 0x0 0xf00000>;
--
2.43.0
^ permalink raw reply related
* Re: [PATCH 09/12] rtc: rzn1: Use temporary variable for struct device
From: Wolfram Sang @ 2026-06-19 20:48 UTC (permalink / raw)
To: Lad, Prabhakar
Cc: Miquel Raynal, Alexandre Belloni, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Geert Uytterhoeven,
Magnus Damm, linux-rtc, linux-renesas-soc, devicetree,
linux-kernel, Biju Das, Fabrizio Castro, Lad Prabhakar
In-Reply-To: <CA+V-a8t-phAuaH5g_3Nt=2d=KPw2QqA6r1dqOZv9k-Xhp5q2Zw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]
Hi Prabhakar,
> > > Could userspace still interact with the RTC during this window via ioctls
> > > or sysfs, potentially causing synchronous bus faults since the hardware is
> > > already suspended? Should teardown be bound using
> > > devm_add_action_or_reset() to guarantee correct reverse execution order?
> >
> > [wsa] Or maybe not use devm then?
> >
> Maybe just add two `devm_add_action_or_reset()` calls: one to clear
> interrupts and another to call pm_runtime_put()?
Why two? I wouldn't think it matters much if the two are separated. Main
problem seems to me the disabled clocks because of pm_runtime_put()? But
maybe I am overlooking sth?
> > > [Severity: High]
> > > This is a pre-existing issue, but does using pm_runtime_put() here (and in
> > > rzn1_rtc_remove) leave the device powered on indefinitely?
> > >
> > > Since devm_pm_runtime_enable(dev) schedules pm_runtime_disable() to
> > > execute during devres cleanup, calling pm_runtime_put() only queues an
> > > asynchronous idle check. The immediate return triggers devres cleanup,
> > > which executes a barrier that explicitly cancels pending async operations.
> > >
> > > Should this use pm_runtime_put_sync() instead to ensure the device is
> > > synchronously suspended before teardown?
> > >
> Although there were some patches accepted for similar kind of issue
> reported by Sashiko, do you think I should switch to
> pm_runtime_put_sync() (Ive not seen any issues)
I am not a PM expert, so I can't guide you. The report from Sashiko
sounds reasonable to me. But you'd have to look up the code path to
verify the reasoning. Or ask an PM expert.
Happy hacking,
Wolfram
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox