Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally.

The resctrl filesystem integration is gated separately by
RISCV_CBQRI_RESCTRL_FS, a silent option that defaults to y when both
RISCV_CBQRI_DRIVER and RESCTRL_FS are enabled. Enabling the resctrl
filesystem itself stays a user choice via the standard fs/Kconfig
MISC_FILESYSTEMS menu.

Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9eb65d0eaa07..cc261de01107 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
 	depends on 64BIT
 	default n
 	select ARCH_HAS_CPU_RESCTRL
+	select RISCV_CBQRI_DRIVER
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).

-- 
2.43.0


^ permalink raw reply related

* [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
RDT_RESOURCE_L3 schemata.

Mismatched CC caps at the same cache level are treated as a fatal
configuration error since fs/resctrl exposes a single per-rid cap
set. Domains are created lazily in the cpuhp online callback so
cpu_mask reflects only currently online CPUs.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   2 +
 arch/riscv/Kconfig               |   1 +
 arch/riscv/include/asm/resctrl.h | 152 ++++++++
 drivers/resctrl/Kconfig          |   4 +
 drivers/resctrl/Makefile         |   1 +
 drivers/resctrl/cbqri_resctrl.c  | 774 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 934 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c090d52e9fa0..85d50efb6e5f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23297,9 +23297,11 @@ R:	yunhui cui <cuiyunhui@bytedance.com>
 L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/include/asm/resctrl.h
 F:	arch/riscv/kernel/qos.c
 F:	drivers/resctrl/cbqri_devices.c
 F:	drivers/resctrl/cbqri_internal.h
+F:	drivers/resctrl/cbqri_resctrl.c
 F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ee586925f972..9eb65d0eaa07 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -594,6 +594,7 @@ config RISCV_ISA_SSQOSID
 	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
 	depends on 64BIT
 	default n
+	select ARCH_HAS_CPU_RESCTRL
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).
diff --git a/arch/riscv/include/asm/resctrl.h b/arch/riscv/include/asm/resctrl.h
new file mode 100644
index 000000000000..7392a099b6f8
--- /dev/null
+++ b/arch/riscv/include/asm/resctrl.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_RESCTRL_H
+#define _ASM_RISCV_RESCTRL_H
+
+#include <linux/resctrl_types.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include <asm/qos.h>
+
+struct rdt_resource;
+
+/*
+ * Sentinel "no CLOSID assigned" used by resctrl_arch_rmid_idx_decode().
+ * fs/resctrl treats this opaquely. CBQRI uses MCID directly as the linear
+ * rmid index, so closid is unused on decode.
+ */
+#define RISCV_RESCTRL_EMPTY_CLOSID	((u32)~0)
+
+/*
+ * Terminology mapping between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *
+ *  CLOSID on x86 is RCID on RISC-V
+ *    RMID on x86 is MCID on RISC-V
+ *     CDP on x86 is AT (access type) on RISC-V
+ *
+ * Each fast-path arch entry point below is the RISC-V realization of the
+ * generic contract documented in <linux/resctrl.h>. Comments here describe
+ * only the RISC-V-specific behavior (srmcfg encoding, CBQRI controller
+ * lookup, MCID-as-index policy).
+ */
+
+/**
+ * resctrl_arch_alloc_capable() - any CBQRI controller exposes resctrl alloc
+ *
+ * Returns true once at least one CBQRI controller has successfully probed for
+ * a resctrl-exposed cache capacity allocation feature. Only meaningful after
+ * cbqri_resctrl_setup() runs at late_initcall.
+ */
+bool resctrl_arch_alloc_capable(void);
+
+/**
+ * resctrl_arch_mon_capable() - any CBQRI controller exposes resctrl monitoring
+ *
+ * The CBQRI driver implements capacity allocation only and wires up no
+ * monitoring events, so this always returns false. fs/resctrl references it
+ * unconditionally, hence the stub.
+ */
+bool resctrl_arch_mon_capable(void);
+
+/**
+ * resctrl_arch_rmid_idx_encode() - encode (RCID, MCID) into a linear index
+ * @closid: RCID (resource control id)
+ * @rmid:   MCID (monitoring counter id)
+ *
+ * RISC-V uses MCID directly as the linear index into per-RMID arrays
+ * managed by fs/resctrl, since CBQRI controllers admit any MCID for any
+ * RCID. closid is unused here. CDP is encoded via the AT field on each
+ * CBQRI op rather than via the index.
+ */
+u32  resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_rmid_idx_decode() - inverse of resctrl_arch_rmid_idx_encode()
+ * @idx:    linear index
+ * @closid: out: always RISCV_RESCTRL_EMPTY_CLOSID
+ * @rmid:   out: the MCID that @idx encodes
+ */
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+
+/**
+ * resctrl_arch_set_cpu_default_closid_rmid() - install per-CPU srmcfg default
+ * @cpu:    CPU number
+ * @closid: RCID to use when no task is matched
+ * @rmid:   MCID to use when no task is matched
+ *
+ * Sets the per-CPU cpu_srmcfg_default so __switch_to_srmcfg() can fall back
+ * to the CPU's default RCID/MCID for default-group tasks (those whose
+ * thread.srmcfg encodes to 0, i.e. closid == RESCTRL_RESERVED_CLOSID and
+ * rmid == RESCTRL_RESERVED_RMID). Implements resctrl allocation rule 2
+ * ("CPU default") on RISC-V.
+ */
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_sched_in() - context-switch hook to install task RCID/MCID
+ * @tsk: the task being scheduled in
+ *
+ * Called from finish_task_switch() to write tsk->thread.srmcfg into the
+ * srmcfg CSR. Tasks tagged with RISCV_RESCTRL_EMPTY_CLOSID inherit the
+ * per-CPU default set via resctrl_arch_set_cpu_default_closid_rmid().
+ */
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
+/**
+ * resctrl_arch_set_closid_rmid() - tag a task with an RCID/MCID
+ * @tsk:    task to tag
+ * @closid: RCID to install
+ * @rmid:   MCID to install
+ *
+ * Updates tsk->thread.srmcfg with the encoded (RCID, MCID) pair. The new
+ * value takes effect on the next resctrl_arch_sched_in() for this task.
+ */
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_match_closid() - test whether a task carries a given RCID
+ * @tsk:    task
+ * @closid: RCID
+ */
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+
+/**
+ * resctrl_arch_match_rmid() - test whether a task carries a given (RCID, MCID)
+ * @tsk:    task
+ * @closid: RCID
+ * @rmid:   MCID
+ */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_mon_ctx_alloc() - allocate per-monitor-event arch context
+ * @r:     resctrl resource being monitored
+ * @evtid: which monitor event needs context
+ *
+ * The CBQRI driver implements no monitoring events, so there is no per-event
+ * context to allocate and the stub returns NULL. fs/resctrl references it
+ * unconditionally before checking resctrl_arch_mon_capable().
+ */
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+
+/**
+ * resctrl_arch_mon_ctx_free() - release context returned by mon_ctx_alloc()
+ * @r:            resctrl resource
+ * @evtid:        monitor event id
+ * @arch_mon_ctx: pointer returned by resctrl_arch_mon_ctx_alloc()
+ */
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid,
+			       void *arch_mon_ctx);
+
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+	return val;
+}
+
+/* Not needed for RISC-V */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
+#endif /* _ASM_RISCV_RESCTRL_H */
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 8b16f69df17c..0887b6a9fac1 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -54,3 +54,7 @@ config RISCV_CBQRI_DRIVER_DEBUG
 	  new platform; otherwise leave disabled to avoid log noise.
 
 endif
+
+config RISCV_CBQRI_RESCTRL_FS
+	bool
+	default y if RISCV_CBQRI_DRIVER && RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 28085036d895..ed737b4461b9 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
 
 obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
 cbqri-y						+= cbqri_devices.o
+cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS)		+= cbqri_resctrl.o
 
 ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
new file mode 100644
index 000000000000..d354129cc34f
--- /dev/null
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -0,0 +1,774 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpufeature.h>
+#include <linux/cpuhotplug.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+#include <asm/qos.h>
+
+#include "cbqri_internal.h"
+
+struct cbqri_resctrl_res {
+	struct cbqri_controller *ctrl;
+	struct rdt_resource     resctrl_res;
+	bool                    cdp_enabled;
+};
+
+struct cbqri_resctrl_dom {
+	struct rdt_ctrl_domain  resctrl_ctrl_dom;
+	struct cbqri_controller *hw_ctrl;
+};
+
+static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
+
+static bool exposed_alloc_capable;
+
+/* Protects ctrl_domain list mutations across CPU hotplug. */
+static DEFINE_MUTEX(cbqri_domain_list_lock);
+
+static struct rdt_ctrl_domain *
+cbqri_find_ctrl_domain(struct list_head *h, int id)
+{
+	struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+	return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
+}
+
+/* Map a hardware cache level to its resctrl resource id, or -ENODEV. */
+static int cbqri_cache_level_to_rid(u32 cache_level)
+{
+	switch (cache_level) {
+	case 2:
+		return RDT_RESOURCE_L2;
+	case 3:
+		return RDT_RESOURCE_L3;
+	default:
+		return -ENODEV;
+	}
+}
+
+static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
+					struct rdt_resource *r,
+					u32 closid, enum resctrl_conf_type t,
+					u64 cbm)
+{
+	struct cbqri_resctrl_res *hw_res =
+		container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	struct cbqri_cc_config cfg = {
+		.cbm = cbm,
+		.at = (t == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+					CBQRI_CONTROL_REGISTERS_AT_DATA,
+		.cdp_enabled = hw_res->cdp_enabled,
+	};
+
+	return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
+}
+
+bool resctrl_arch_alloc_capable(void)
+{
+	return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+	return false;
+}
+
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return false;
+	return cbqri_resctrl_resources[rid].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
+{
+	struct cbqri_resctrl_res *cbqri_res;
+
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return -ENODEV;
+
+	cbqri_res = &cbqri_resctrl_resources[rid];
+	if (!cbqri_res->resctrl_res.cdp_capable)
+		return -ENODEV;
+
+	cbqri_res->cdp_enabled = enable;
+	return 0;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+	if (l >= RDT_NUM_RESOURCES)
+		return NULL;
+
+	return &cbqri_resctrl_resources[l].resctrl_res;
+}
+
+/*
+ * fs/resctrl unconditionally references the symbols below before checking
+ * mon_capable. They are stubs for features CBQRI does not yet support.
+ */
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+	return false;
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+				 enum resctrl_event_id evtid)
+{
+	return NULL;
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+			       enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			      u32 cntr_id, bool assign)
+{
+}
+
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			   u32 unused, u32 rmid, int cntr_id,
+			   enum resctrl_event_id eventid, u64 *val)
+{
+	return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, int cntr_id,
+			     enum resctrl_event_id eventid)
+{
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+{
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
+{
+}
+
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
+			   void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+	return -ENODATA;
+}
+
+/*
+ * Note about terminology between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *   CLOSID on x86 is RCID on RISC-V
+ *     RMID on x86 is MCID on RISC-V
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
+{
+	struct cbqri_resctrl_res *hw_res;
+
+	hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	return hw_res->ctrl->rcid_count;
+}
+
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+	return 1;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+	return rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+	*closid = RISCV_RESCTRL_EMPTY_CLOSID;
+	*rmid = idx;
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(per_cpu(cpu_srmcfg_default, cpu), srmcfg);
+}
+
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+	__switch_to_srmcfg(tsk);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(tsk->thread.srmcfg, srmcfg);
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+	struct resctrl_cpu_defaults *r = info;
+
+	lockdep_assert_preemption_disabled();
+
+	if (r) {
+		resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+							 r->closid, r->rmid);
+	}
+
+	resctrl_arch_sched_in(current);
+}
+
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+	return FIELD_GET(SRMCFG_RCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == closid;
+}
+
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	return FIELD_GET(SRMCFG_MCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == rmid;
+}
+
+void resctrl_arch_pre_mount(void)
+{
+	/* All controllers discovered at boot via late_initcall. Nothing to do. */
+}
+
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+	struct cbqri_resctrl_dom *dom;
+
+	dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+	if (!r->alloc_capable)
+		return -EINVAL;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+	default:
+		return -EINVAL;
+	}
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+	struct resctrl_staged_config *cfg;
+	enum resctrl_conf_type t;
+	struct rdt_ctrl_domain *d;
+	int err = 0;
+
+	/* Walking r->ctrl_domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			cfg = &d->staged_config[t];
+			if (!cfg->have_new_ctrl)
+				continue;
+			err = resctrl_arch_update_one(r, d, closid, t, cfg->new_ctrl);
+			if (err)
+				return err;
+		}
+	}
+	return err;
+}
+
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type type)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	u32 at;
+	u32 val;
+	int err;
+
+	hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+	ctrl = hw_dom->hw_ctrl;
+	val = resctrl_get_default_ctrl(r);
+
+	if (!r->alloc_capable)
+		return val;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		at = (type == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+					  CBQRI_CONTROL_REGISTERS_AT_DATA;
+		err = cbqri_read_cache_config(ctrl, closid, at, &val);
+		if (err < 0)
+			val = resctrl_get_default_ctrl(r);
+		break;
+	default:
+		break;
+	}
+
+	return val;
+}
+
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *d;
+	enum resctrl_conf_type t;
+	u32 default_ctrl;
+	int i;
+
+	lockdep_assert_cpus_held();
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	default_ctrl = resctrl_get_default_ctrl(r);
+
+	if (!hw_res->ctrl)
+		return;
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+			for (t = 0; t < CDP_NUM_TYPES; t++) {
+				int rerr;
+
+				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+				if (rerr)
+					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+							   r->rid, i, t, rerr);
+			}
+		}
+	}
+}
+
+static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct rdt_ctrl_domain *domain;
+
+	hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);
+	if (!hw_dom)
+		return NULL;
+
+	hw_dom->hw_ctrl = ctrl;
+	domain = &hw_dom->resctrl_ctrl_dom;
+
+	INIT_LIST_HEAD(&domain->hdr.list);
+
+	return domain;
+}
+
+static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
+{
+	struct cbqri_resctrl_res *hw_res;
+	enum resctrl_conf_type t;
+	int err = 0;
+	int i;
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+
+	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+		/*
+		 * Seed both DATA and CODE staged slots so a later mount
+		 * with -o cdp does not see stale CODE values.
+		 * On non-AT controllers cbqri_cc_alloc_op() masks AT to 0
+		 * so all three iterations land on the same hardware state.
+		 * The redundant writes are harmless.
+		 */
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			err = resctrl_arch_update_one(r, d, i, t,
+						      resctrl_get_default_ctrl(r));
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Walk cbqri_controllers and pick one capacity controller (CC) per cache
+ * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
+ * one CC sits at the same level (e.g. one per socket), they must agree on
+ * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
+ * exposes a single set of caps per rid. The first matching controller wins.
+ */
+static int cbqri_resctrl_pick_caches(void)
+{
+	struct cbqri_controller *ctrl;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		struct cbqri_resctrl_res *cbqri_res;
+		int rid;
+
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+		if (rid < 0) {
+			pr_err("unknown cache level %d\n",
+			       ctrl->cache.cache_level);
+			return rid;
+		}
+
+		cbqri_res = &cbqri_resctrl_resources[rid];
+		if (cbqri_res->ctrl) {
+			/*
+			 * CCs at the same cache level must agree on every cap
+			 * resctrl exposes globally. Reject mismatches at pick
+			 * time so the inconsistency is visible at boot.
+			 */
+			if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
+			    cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
+			    cbqri_res->ctrl->cc.supports_alloc_at_code !=
+				    ctrl->cc.supports_alloc_at_code ||
+			    cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
+				pr_err("L%d controllers have mismatched capabilities\n",
+				       ctrl->cache.cache_level);
+				return -EINVAL;
+			}
+			continue;
+		}
+
+		cbqri_res->ctrl = ctrl;
+	}
+
+	return 0;
+}
+
+/*
+ * Fill the rdt_resource fields for one picked rid. An rid with no picked
+ * controller is left untouched so it stays out of resctrl_arch_get_resource().
+ */
+static void cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
+{
+	struct cbqri_controller *ctrl = cbqri_res->ctrl;
+	struct rdt_resource *res = &cbqri_res->resctrl_res;
+
+	if (!ctrl)
+		return;
+
+	switch (res->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
+		res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+		res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
+				    RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
+		res->cache.cbm_len = ctrl->cc.ncblks;
+		res->cache.shareable_bits = 0;
+		res->cache.min_cbm_bits = 1;
+		res->cache.arch_has_sparse_bitmasks = false;
+		res->cdp_capable = ctrl->cc.supports_alloc_at_code;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+	default:
+		break;
+	}
+}
+
+static void cbqri_resctrl_accumulate_caps(void)
+{
+	int rid;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		if (hw_res->ctrl->alloc_capable)
+			exposed_alloc_capable = true;
+	}
+}
+
+/*
+ * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
+ * resource res, seeded with cpu and identified by dom_id. Caller must
+ * hold cbqri_domain_list_lock and must have already verified that no
+ * existing ctrl_domain on res carries this id.
+ */
+static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
+							struct rdt_resource *res,
+							unsigned int cpu, int dom_id)
+{
+	struct rdt_ctrl_domain *domain;
+	struct list_head *pos = NULL;
+	int err;
+
+	domain = cbqri_new_domain(ctrl);
+	if (!domain)
+		return ERR_PTR(-ENOMEM);
+
+	cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+	domain->hdr.id = dom_id;
+	domain->hdr.type = RESCTRL_CTRL_DOMAIN;
+
+	err = cbqri_init_domain_ctrlval(res, domain);
+	if (err) {
+		kfree(container_of(domain, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom));
+		return ERR_PTR(err);
+	}
+
+	/* Insert sorted by id so user-visible ordering is deterministic. */
+	resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
+	list_add_tail(&domain->hdr.list, pos);
+
+	resctrl_online_ctrl_domain(res, domain);
+
+	return domain;
+}
+
+static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
+					unsigned int cpu)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *domain;
+	struct rdt_resource *res;
+	int dom_id;
+	int rid;
+
+	rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+	if (rid < 0)
+		return 0;
+	hw_res = &cbqri_resctrl_resources[rid];
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	res = &hw_res->resctrl_res;
+	dom_id = ctrl->cache.cache_id;
+
+	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (domain) {
+		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+		return 0;
+	}
+
+	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+	if (IS_ERR(domain))
+		return PTR_ERR(domain);
+
+	return 0;
+}
+
+static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
+					       unsigned int cpu)
+{
+	struct rdt_ctrl_domain *domain, *tmp;
+
+	list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
+		if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
+			continue;
+		cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+		if (cpumask_empty(&domain->hdr.cpu_mask)) {
+			resctrl_offline_ctrl_domain(res, domain);
+			list_del(&domain->hdr.list);
+			kfree(container_of(domain, struct cbqri_resctrl_dom,
+					   resctrl_ctrl_dom));
+		}
+	}
+}
+
+/*
+ * Remove a CPU from every domain it was attached to. The per-resource
+ * detach helpers act only when the CPU is set in a domain's mask, so this
+ * is idempotent and undoes a partial online attach as well as a full
+ * offline. Caller holds cbqri_domain_list_lock.
+ */
+static void cbqri_detach_cpu_from_all_ctrls(unsigned int cpu)
+{
+	int rid;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+	}
+}
+
+/*
+ * Attach a CPU to every controller that claims it. On failure, detach the
+ * CPU from everything attached so far: the cpuhp core does not run this
+ * state's offline teardown when its startup fails, so a partial attach
+ * would otherwise leak into the domain cpu_masks. Caller holds
+ * cbqri_domain_list_lock.
+ */
+static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
+{
+	struct cbqri_controller *ctrl;
+	int err = 0;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+		if (err) {
+			cbqri_detach_cpu_from_all_ctrls(cpu);
+			break;
+		}
+	}
+
+	return err;
+}
+
+static bool cbqri_resctrl_inited;
+
+static void cbqri_resctrl_teardown(void)
+{
+	int rid;
+
+	if (!cbqri_resctrl_inited)
+		return;
+
+	resctrl_exit();
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		hw_res->ctrl = NULL;
+		hw_res->cdp_enabled = false;
+	}
+	exposed_alloc_capable = false;
+	cbqri_resctrl_inited = false;
+}
+
+static int cbqri_resctrl_setup(void)
+{
+	int rid;
+	int err;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+		cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
+
+	err = cbqri_resctrl_pick_caches();
+	if (err)
+		return err;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+		cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
+
+	cbqri_resctrl_accumulate_caps();
+
+	if (!exposed_alloc_capable) {
+		pr_debug("no resctrl-capable CBQRI controllers found\n");
+		return -ENODEV;
+	}
+
+	err = resctrl_init();
+	if (err)
+		return err;
+
+	cbqri_resctrl_inited = true;
+	return 0;
+}
+
+static int cbqri_resctrl_online_cpu(unsigned int cpu)
+{
+	int err;
+
+	mutex_lock(&cbqri_domain_list_lock);
+	err = cbqri_attach_cpu_to_all_ctrls(cpu);
+	mutex_unlock(&cbqri_domain_list_lock);
+	if (err)
+		return err;
+
+	/*
+	 * Seed the per-CPU default RCID/MCID to the reserved (0, 0) pair and
+	 * notify the resctrl core so it tracks this CPU in the default group.
+	 */
+	resctrl_arch_set_cpu_default_closid_rmid(cpu, 0, 0);
+	resctrl_online_cpu(cpu);
+	return 0;
+}
+
+static int cbqri_resctrl_offline_cpu(unsigned int cpu)
+{
+	resctrl_offline_cpu(cpu);
+
+	mutex_lock(&cbqri_domain_list_lock);
+	cbqri_detach_cpu_from_all_ctrls(cpu);
+	mutex_unlock(&cbqri_domain_list_lock);
+	return 0;
+}
+
+static int __init cbqri_arch_late_init(void)
+{
+	int err;
+
+	if (!riscv_isa_extension_available(NULL, SSQOSID))
+		return -ENODEV;
+
+	err = cbqri_resctrl_setup();
+	if (err)
+		return err;
+
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
+				cbqri_resctrl_online_cpu,
+				cbqri_resctrl_offline_cpu);
+	if (err < 0) {
+		cbqri_resctrl_teardown();
+		return err;
+	}
+
+	return 0;
+}
+late_initcall(cbqri_arch_late_init);

-- 
2.43.0


^ permalink raw reply related

* [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

Add support for the RISC-V CBQRI capacity controller (CC). A platform
driver passes a cbqri_controller_info descriptor together with the cache
level to riscv_cbqri_register_cc_dt(), which probes the controller and
adds it to the controller list.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   3 +
 drivers/resctrl/Kconfig          |  25 ++
 drivers/resctrl/Makefile         |   5 +
 drivers/resctrl/cbqri_devices.c  | 511 +++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 110 +++++++++
 include/linux/riscv_cbqri.h      |  47 ++++
 6 files changed, 701 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e2a7f9766355..c090d52e9fa0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23298,6 +23298,9 @@ L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
 F:	arch/riscv/kernel/qos.c
+F:	drivers/resctrl/cbqri_devices.c
+F:	drivers/resctrl/cbqri_internal.h
+F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 672abea3b03c..8b16f69df17c 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -29,3 +29,28 @@ config ARM64_MPAM_RESCTRL_FS
 	default y if ARM64_MPAM_DRIVER && RESCTRL_FS
 	select RESCTRL_RMID_DEPENDS_ON_CLOSID
 	select RESCTRL_ASSIGN_FIXED
+
+menuconfig RISCV_CBQRI_DRIVER
+	bool "RISC-V CBQRI driver"
+	depends on RISCV && RISCV_ISA_SSQOSID
+	help
+	  Capacity QoS Register Interface (CBQRI) driver for RISC-V cache
+	  QoS resources. CBQRI exposes cache capacity allocation through
+	  the resctrl filesystem at /sys/fs/resctrl when RESCTRL_FS is also
+	  enabled.
+
+	  RISCV_ISA_SSQOSID provides the srmcfg CSR that tags each hart's
+	  memory traffic with the RCID consumed by CBQRI controllers.
+
+if RISCV_CBQRI_DRIVER
+
+config RISCV_CBQRI_DRIVER_DEBUG
+	bool "Enable debug messages from the CBQRI driver"
+	help
+	  Say yes here to enable debug messages from the CBQRI driver.
+
+	  This adds pr_debug() output covering controller probe and
+	  per-controller registration steps.  Useful when bringing up a
+	  new platform; otherwise leave disabled to avoid log noise.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 4f6d0e81f9b8..28085036d895 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -3,3 +3,8 @@ mpam-y						+= mpam_devices.o
 mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS)		+= mpam_resctrl.o
 
 ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
+
+obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
+cbqri-y						+= cbqri_devices.o
+
+ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
new file mode 100644
index 000000000000..cc4ec3f25ac1
--- /dev/null
+++ b/drivers/resctrl/cbqri_devices.c
@@ -0,0 +1,511 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ioport.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+
+#include "cbqri_internal.h"
+
+LIST_HEAD(cbqri_controllers);
+
+/* Set capacity block mask (cc_block_mask) */
+static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
+{
+	iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+}
+
+static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
+				u64 *regp)
+{
+	u64 reg;
+	int ret;
+
+	/*
+	 * Sleeping poll: caller holds ctrl->lock as a sleeping mutex, so
+	 * 10us/1ms is safe under PREEMPT_RT.
+	 */
+	ret = readq_poll_timeout(ctrl->base + reg_offset, reg,
+				 !FIELD_GET(CBQRI_CONTROL_REGISTERS_BUSY_MASK, reg),
+				 10, 1000);
+	if (ret)
+		return ret;
+	if (regp)
+		*regp = reg;
+	return 0;
+}
+
+/*
+ * Perform capacity allocation control operation on capacity controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
+			     int rcid, u32 at)
+{
+	int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
+	int status;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
+
+	/*
+	 * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
+	 * that report supports_alloc_at_code. On controllers that don't,
+	 * AT is reserved-zero and the op acts on both halves.
+	 */
+	reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;
+	if (ctrl->cc.supports_alloc_at_code)
+		reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);
+
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+	if (status != CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS) {
+		pr_err_ratelimited("operation %d failed: status=%d\n", operation, status);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
+ *
+ * AT-capable controllers with CDP off need a second CONFIG_LIMIT on the
+ * other AT half (the spec encodes AT only as 0=Data / 1=Code, there is
+ * no "both halves" value). CDP-on issues separate per-type writes from
+ * resctrl, so a single CONFIG_LIMIT per call is correct.
+ */
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg)
+{
+	bool need_at_mirror;
+	u64 saved_cbm = 0;
+	int err = 0;
+	u64 reg;
+
+	mutex_lock(&ctrl->lock);
+
+	need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
+
+	/*
+	 * Capture the cfg->at half CBM before any write so a partial
+	 * AT-mirror failure can revert and keep the two halves consistent.
+	 * Pre-clear cc_block_mask so a silent firmware no-op (status
+	 * SUCCESS but staging not updated) shows as a zero readback
+	 * rather than carrying stale data from a prior op.
+	 */
+	if (need_at_mirror) {
+		cbqri_set_cbm(ctrl, 0);
+		err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+					closid, cfg->at);
+		if (err < 0)
+			goto out;
+		saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	}
+
+	/* Set capacity block mask (cc_block_mask) */
+	cbqri_set_cbm(ctrl, cfg->cbm);
+
+	/* Capacity config limit operation for the AT half implied by cfg->at */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * CDP-off mirror: on AT-capable controllers, also program the
+	 * other AT half with the same mask so the two halves stay in sync.
+	 */
+	if (need_at_mirror) {
+		u32 other = (cfg->at == CBQRI_CONTROL_REGISTERS_AT_CODE) ?
+			    CBQRI_CONTROL_REGISTERS_AT_DATA :
+			    CBQRI_CONTROL_REGISTERS_AT_CODE;
+
+		cbqri_set_cbm(ctrl, cfg->cbm);
+		err = cbqri_cc_alloc_op(ctrl,
+					CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+					closid, other);
+		if (err < 0) {
+			int rerr;
+
+			/*
+			 * Best-effort revert of the cfg->at half so the two
+			 * halves stay in sync. A schemata read sees only one
+			 * half, so silent divergence would otherwise report
+			 * the new value as if the write had succeeded.
+			 */
+			cbqri_set_cbm(ctrl, saved_cbm);
+			rerr = cbqri_cc_alloc_op(ctrl,
+						 CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+						 closid, cfg->at);
+			if (rerr < 0)
+				pr_err_ratelimited("AT-mirror revert failed (err=%d), AT halves diverged\n",
+						   rerr);
+			goto out;
+		}
+	}
+
+	/* Clear cc_block_mask before read limit to verify op works */
+	cbqri_set_cbm(ctrl, 0);
+
+	/* Perform a capacity read limit operation to verify blockmask */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * Read capacity blockmask and narrow to u32 to match resctrl's CBM
+	 * width. cbqri_probe_cc() rejects ncblks > 32 so the upper bits are
+	 * reserved zero.
+	 */
+	reg = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	if (lower_32_bits(reg) != cfg->cbm) {
+		pr_err_ratelimited("CBM verify mismatch (reg=%llx != cbm=%llx)\n",
+				   reg, cfg->cbm);
+		err = -EIO;
+	}
+
+out:
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+/*
+ * Read the configured CBM for closid on the at half via READ_LIMIT.
+ * Pre-clears cc_block_mask before the op so a silent firmware no-op
+ * (status SUCCESS but staging not updated) is detectable in cbm_out.
+ */
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    u32 at, u32 *cbm_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	cbqri_set_cbm(ctrl, 0);
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, closid, at);
+	if (err == 0) {
+		/*
+		 * cc_block_mask is a 64-bit MMIO register. resctrl exposes the
+		 * CBM as a u32. cbqri_probe_cc() rejects ncblks > 32 so the
+		 * upper 32 bits are reserved zero by the spec. Narrow
+		 * explicitly via lower_32_bits() so the assumption is visible
+		 * at the read site.
+		 */
+		*cbm_out = lower_32_bits(ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF));
+	}
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
+			       int operation, int *status, bool *access_type_supported)
+{
+	const u64 active_mask = CBQRI_CONTROL_REGISTERS_OP_MASK |
+				CBQRI_CONTROL_REGISTERS_AT_MASK |
+				CBQRI_CONTROL_REGISTERS_RCID_MASK;
+	u64 reg, saved_reg;
+	int at;
+
+	/*
+	 * Default the output to false so the status==0 (feature not
+	 * implemented) path returns a deterministic value to the caller
+	 * rather than leaving an uninitialized bool.
+	 */
+	*access_type_supported = false;
+
+	/* Keep the initial register value to preserve the WPRI fields */
+	reg = ioread64(ctrl->base + reg_offset);
+	saved_reg = reg;
+
+	/* Drain any in-flight firmware op before issuing our own write. */
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &saved_reg) < 0) {
+		pr_err("BUSY timeout before probe operation\n");
+		return -EIO;
+	}
+
+	/*
+	 * Execute the requested operation with all active fields
+	 * (OP/AT/RCID) zeroed except OP itself. Every bit not in
+	 * active_mask is WPRI and gets carried over from saved_reg.
+	 */
+	reg = (saved_reg & ~active_mask) |
+	      FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation);
+	iowrite64(reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	/* Get the operation status */
+	*status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+
+	/*
+	 * Check for the AT support if the register is implemented
+	 * (if not, the status value will remain 0)
+	 */
+	if (*status != 0) {
+		/*
+		 * Re-issue operation with AT=CODE so the controller
+		 * latches AT=CODE on supported hardware (or resets it to 0
+		 * on hardware that doesn't). OP must be a defined CBQRI op
+		 * here. OP=0 is a no-op and would silently disable CDP.
+		 */
+		reg = (saved_reg & ~active_mask) |
+		      FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation) |
+		      FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK,
+				 CBQRI_CONTROL_REGISTERS_AT_CODE);
+		iowrite64(reg, ctrl->base + reg_offset);
+		if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+			pr_err("BUSY timeout setting AT field\n");
+			return -EIO;
+		}
+
+		/*
+		 * If the AT field value has been reset to zero,
+		 * then the AT support is not present
+		 */
+		at = FIELD_GET(CBQRI_CONTROL_REGISTERS_AT_MASK, reg);
+		if (at == CBQRI_CONTROL_REGISTERS_AT_CODE)
+			*access_type_supported = true;
+	}
+
+	/*
+	 * Restore the original register value.
+	 * Clear OP to avoid re-triggering the probe op.
+	 */
+	saved_reg &= ~CBQRI_CONTROL_REGISTERS_OP_MASK;
+	iowrite64(saved_reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, NULL) < 0) {
+		pr_err("BUSY timeout restoring register value\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int cbqri_probe_cc(struct cbqri_controller *ctrl)
+{
+	int err, status;
+	int ver_major, ver_minor;
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_CC_CAPABILITIES_OFF);
+	if (reg == 0)
+		return -ENODEV;
+
+	ver_minor = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MINOR_MASK, reg);
+	ver_major = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK, reg);
+	ctrl->cc.ncblks = FIELD_GET(CBQRI_CC_CAPABILITIES_NCBLKS_MASK, reg);
+
+	pr_debug("version=%d.%d ncblks=%d cache_level=%d\n",
+		 ver_major, ver_minor,
+		 ctrl->cc.ncblks, ctrl->cache.cache_level);
+
+	/*
+	 * NCBLKS == 0 would divide-by-zero in the schemata math while
+	 * ctrl->lock is held.
+	 */
+	if (!ctrl->cc.ncblks) {
+		pr_warn("CC at %pa has 0 capacity blocks, skipping\n",
+			&ctrl->addr);
+		return -ENODEV;
+	}
+
+	if (ctrl->cc.ncblks > 32) {
+		pr_warn("CC at %pa has ncblks=%u > 32 (resctrl CBM is u32), skipping\n",
+			&ctrl->addr, ctrl->cc.ncblks);
+		return -ENODEV;
+	}
+
+	/* Probe allocation features */
+	err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
+				  CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				  &status, &ctrl->cc.supports_alloc_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS)
+		ctrl->alloc_capable = true;
+
+	return 0;
+}
+
+static int cbqri_probe_controller(struct cbqri_controller *ctrl)
+{
+	int err;
+
+	pr_debug("controller info: type=%d addr=%pa size=%pa max-rcid=%u\n",
+		 ctrl->type, &ctrl->addr, &ctrl->size, ctrl->rcid_count);
+
+	if (!ctrl->addr) {
+		pr_warn("controller has invalid addr=0x0, skipping\n");
+		return -EINVAL;
+	}
+
+	if (ctrl->size < CBQRI_CTRL_MIN_REG_SPAN) {
+		pr_warn("controller at %pa: size %pa < minimum 0x%x, skipping\n",
+			&ctrl->addr, &ctrl->size, CBQRI_CTRL_MIN_REG_SPAN);
+		return -EINVAL;
+	}
+
+	if (!request_mem_region(ctrl->addr, ctrl->size, "cbqri_controller")) {
+		pr_err("request_mem_region failed for %pa\n", &ctrl->addr);
+		return -EBUSY;
+	}
+
+	ctrl->base = ioremap(ctrl->addr, ctrl->size);
+	if (!ctrl->base) {
+		pr_err("ioremap failed for %pa\n", &ctrl->addr);
+		err = -ENOMEM;
+		goto err_release;
+	}
+
+	switch (ctrl->type) {
+	case CBQRI_CONTROLLER_TYPE_CAPACITY:
+		err = cbqri_probe_cc(ctrl);
+		break;
+	default:
+		pr_err("unknown controller type %d\n", ctrl->type);
+		err = -ENODEV;
+		break;
+	}
+
+	if (err)
+		goto err_iounmap;
+
+	return 0;
+
+err_iounmap:
+	iounmap(ctrl->base);
+	ctrl->base = NULL;
+err_release:
+	release_mem_region(ctrl->addr, ctrl->size);
+	return err;
+}
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl)
+{
+	/*
+	 * cbqri_probe_controller() clears ctrl->base on its error paths and
+	 * releases the mem region itself, so reach into both only when
+	 * destroy is rolling back a successful probe.
+	 */
+	if (ctrl->base) {
+		iounmap(ctrl->base);
+		release_mem_region(ctrl->addr, ctrl->size);
+	}
+	kfree(ctrl);
+}
+
+/**
+ * riscv_cbqri_register_cc_dt() - register a DT-described capacity controller
+ * @info:        registration descriptor. info->cache_id is used as the
+ *               resctrl domain id. info->type must be CAPACITY.
+ * @cache_level: cache level (2 or 3) the controller backs, mapped to the
+ *               resctrl L2/L3 resource by the resctrl glue.
+ * @cpu_mask:    CPUs that share this cache.
+ *
+ * The cache topology is supplied directly by the caller. A device-tree
+ * platform driver that already knows which CPUs share the cache and at what
+ * level passes that in. There is no firmware table to resolve it from.
+ *
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+			       u32 cache_level, const struct cpumask *cpu_mask)
+{
+	struct cbqri_controller *ctrl;
+	int err;
+
+	if (!info->addr) {
+		pr_warn("skipping controller with invalid addr=0x0\n");
+		return -EINVAL;
+	}
+
+	if (info->type != CBQRI_CONTROLLER_TYPE_CAPACITY) {
+		pr_warn("register_cc_dt called with non-capacity type %u\n",
+			info->type);
+		return -EINVAL;
+	}
+
+	if (!cpu_mask || cpumask_empty(cpu_mask)) {
+		pr_warn("register_cc_dt called with empty cpu_mask\n");
+		return -EINVAL;
+	}
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return -ENOMEM;
+
+	mutex_init(&ctrl->lock);
+
+	ctrl->addr = info->addr;
+	ctrl->size = info->size;
+	ctrl->type = info->type;
+	ctrl->rcid_count = info->rcid_count;
+
+	/*
+	 * SRMCFG encodes RCID in 12 bits. Reject an out-of-range count rather
+	 * than silently truncating in every FIELD_PREP(SRMCFG_RCID_MASK, closid)
+	 * on the schedule-in fast path.
+	 */
+	if (ctrl->rcid_count > FIELD_MAX(SRMCFG_RCID_MASK) + 1) {
+		pr_warn("CC at %pa has RCID count %u beyond the 12-bit SRMCFG field, skipping\n",
+			&ctrl->addr, ctrl->rcid_count);
+		cbqri_controller_destroy(ctrl);
+		return -EINVAL;
+	}
+
+	ctrl->cache.cache_id = info->cache_id;
+	ctrl->cache.cache_level = cache_level;
+	cpumask_copy(&ctrl->cache.cpu_mask, cpu_mask);
+
+	err = cbqri_probe_controller(ctrl);
+	if (err) {
+		cbqri_controller_destroy(ctrl);
+		return err;
+	}
+
+	/*
+	 * Allocation capability comes from the capabilities register probed
+	 * above, not from device tree. rcid_count only bounds the RCID range,
+	 * so a controller the hardware reports as alloc-capable but described
+	 * with no RCID count cannot be driven. Reject that inconsistency. A
+	 * monitoring-only controller (not alloc_capable) needs no RCID count.
+	 */
+	if (ctrl->alloc_capable && !ctrl->rcid_count) {
+		pr_warn("CC at %pa is alloc-capable but has no RCID count, skipping\n",
+			&ctrl->addr);
+		cbqri_controller_destroy(ctrl);
+		return -EINVAL;
+	}
+
+	list_add_tail(&ctrl->list, &cbqri_controllers);
+	return 0;
+}
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
new file mode 100644
index 000000000000..cd6bc879b320
--- /dev/null
+++ b/drivers/resctrl/cbqri_internal.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+#define _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/* Capacity Controller (CC) MMIO register offsets. */
+#define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_ALLOC_CTL_OFF   24
+#define CBQRI_CC_BLOCK_MASK_OFF  32
+
+/*
+ * Smallest MMIO span the driver actually accesses: highest defined
+ * register offset (0x20) plus the 8-byte register width. Used by
+ * cbqri_probe_controller() to reject undersized firmware-supplied
+ * mappings before request_mem_region/ioremap, so a u64 access at
+ * BLOCK_MASK does not walk past the end of the mapping.
+ */
+#define CBQRI_CTRL_MIN_REG_SPAN  0x28u
+
+#define CBQRI_CC_CAPABILITIES_VER_MINOR_MASK  GENMASK_ULL(3, 0)
+#define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK  GENMASK_ULL(7, 4)
+#define CBQRI_CC_CAPABILITIES_NCBLKS_MASK     GENMASK_ULL(23, 8)
+
+/*
+ * CC control registers are 64-bit. Keep every field mask GENMASK_ULL so
+ * FIELD_MODIFY() or ~mask on a u64 register never zero-extends a 32-bit
+ * mask and clobbers STATUS/BUSY/WPRI in bits 63:32 if RV32 support is
+ * added in the future.
+ */
+#define CBQRI_CONTROL_REGISTERS_OP_MASK      GENMASK_ULL(4, 0)
+#define CBQRI_CONTROL_REGISTERS_AT_MASK      GENMASK_ULL(7, 5)
+/* AT field values (CBQRI Table 1): data vs code half for CDP */
+#define CBQRI_CONTROL_REGISTERS_AT_DATA      0
+#define CBQRI_CONTROL_REGISTERS_AT_CODE      1
+#define CBQRI_CONTROL_REGISTERS_RCID_MASK    GENMASK_ULL(19, 8)
+#define CBQRI_CONTROL_REGISTERS_STATUS_MASK  GENMASK_ULL(38, 32)
+#define CBQRI_CONTROL_REGISTERS_BUSY_MASK    GENMASK_ULL(39, 39)
+
+#define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
+#define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
+
+/* Capacity Controller hardware capabilities */
+struct riscv_cbqri_capacity_caps {
+	u16 ncblks;
+	bool supports_alloc_at_code;
+};
+
+/**
+ * struct cbqri_cc_config - desired capacity allocation state for one rcid
+ * @cbm:         capacity block mask
+ * @at:          AT half the @cbm applies to (CBQRI_CONTROL_REGISTERS_AT_DATA
+ *               or CBQRI_CONTROL_REGISTERS_AT_CODE)
+ * @cdp_enabled: when false and the controller supports AT, mirror @cbm
+ *               into the other AT half so both stay in sync
+ */
+struct cbqri_cc_config {
+	u64  cbm;
+	u32  at;
+	bool cdp_enabled;
+};
+
+struct cbqri_controller {
+	void __iomem *base;
+	/*
+	 * Serializes the write-then-poll-busy MMIO sequences on this
+	 * controller. Each CBQRI op may busy-wait up to 1 ms on slow
+	 * firmware, so use a sleeping mutex (paired with the sleeping
+	 * readq_poll_timeout() in cbqri_wait_busy_flag()) to keep
+	 * preemption enabled, which is required for PREEMPT_RT.
+	 * All resctrl-arch entry points run in process context.
+	 */
+	struct mutex lock;
+
+	struct riscv_cbqri_capacity_caps cc;
+
+	bool alloc_capable;
+
+	phys_addr_t addr;
+	phys_addr_t size;
+	enum cbqri_controller_type type;
+	u32 rcid_count;
+
+	struct list_head list;
+
+	struct cache_controller {
+		u32 cache_level;
+		struct cpumask cpu_mask;
+		/* Cache id used as the resctrl domain id */
+		u32 cache_id;
+	} cache;
+};
+
+extern struct list_head cbqri_controllers;
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl);
+
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg);
+
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    u32 at, u32 *cbm_out);
+
+#endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */
diff --git a/include/linux/riscv_cbqri.h b/include/linux/riscv_cbqri.h
new file mode 100644
index 000000000000..cd62398bd5cb
--- /dev/null
+++ b/include/linux/riscv_cbqri.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Public registration API for the RISC-V Capacity QoS Register Interface
+ * (CBQRI) driver. Discovery layers (device tree platform drivers) call
+ * riscv_cbqri_register_cc_dt() to hand a capacity controller descriptor to
+ * the driver, which owns all subsequent state.
+ */
+#ifndef _LINUX_RISCV_CBQRI_H
+#define _LINUX_RISCV_CBQRI_H
+
+#include <linux/types.h>
+
+struct cpumask;
+
+enum cbqri_controller_type {
+	CBQRI_CONTROLLER_TYPE_CAPACITY,
+};
+
+/**
+ * struct cbqri_controller_info - registration descriptor
+ * @addr:        MMIO base address of the controller's register interface
+ * @size:        size of the MMIO region
+ * @type:        controller type (capacity)
+ * @rcid_count:  number of supported RCIDs
+ * @cache_id:    cache id used as the resctrl domain id
+ */
+struct cbqri_controller_info {
+	phys_addr_t			addr;
+	phys_addr_t			size;
+	enum cbqri_controller_type	type;
+	u32				rcid_count;
+	u32				cache_id;
+};
+
+#if IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+			       u32 cache_level, const struct cpumask *cpu_mask);
+#else
+static inline int
+riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+			   u32 cache_level, const struct cpumask *cpu_mask)
+{
+	return -ENODEV;
+}
+#endif
+
+#endif /* _LINUX_RISCV_CBQRI_H */

-- 
2.43.0


^ permalink raw reply related

* [PATCH 3/8] riscv: Add support for srmcfg CSR from Ssqosid extension
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

Add support for the srmcfg CSR defined in the Ssqosid ISA extension.
The CSR contains two fields:

  - Resource Control ID (RCID) for resource allocation
  - Monitoring Counter ID (MCID) for tracking resource usage

Requests from a hart to shared resources are tagged with these IDs,
allowing resource usage to be associated with the running task.

Add a srmcfg field to thread_struct with the same format as the CSR so
the scheduler can set the RCID and MCID for each task on context
switch. A per-cpu cpu_srmcfg variable mirrors the CSR state to avoid
redundant writes. L1D-hot memory access is faster than a CSR read and
avoids traps under virtualization.

A per-cpu cpu_srmcfg_default holds the default srmcfg for each CPU as
set by resctrl CPU group assignment. On context switch, RCID and MCID
inherit from the CPU default independently: a task whose thread RCID
field is zero takes the CPU default's RCID, and likewise for MCID.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                        |  8 ++++
 arch/riscv/Kconfig                 | 18 ++++++++
 arch/riscv/include/asm/csr.h       |  5 +++
 arch/riscv/include/asm/processor.h |  3 ++
 arch/riscv/include/asm/qos.h       | 86 +++++++++++++++++++++++++++++++++++
 arch/riscv/include/asm/switch_to.h |  3 ++
 arch/riscv/kernel/Makefile         |  2 +
 arch/riscv/kernel/qos.c            | 91 ++++++++++++++++++++++++++++++++++++++
 8 files changed, 216 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 069b4aa6b523..e2a7f9766355 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23291,6 +23291,14 @@ F:	drivers/perf/riscv_pmu.c
 F:	drivers/perf/riscv_pmu_legacy.c
 F:	drivers/perf/riscv_pmu_sbi.c
 
+RISC-V QOS RESCTRL SUPPORT
+M:	Drew Fustini <fustini@kernel.org>
+R:	yunhui cui <cuiyunhui@bytedance.com>
+L:	linux-riscv@lists.infradead.org
+S:	Supported
+F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/kernel/qos.c
+
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
 M:	Anup Patel <anup@brainfault.org>
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3f0a647218e4..ee586925f972 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -590,6 +590,24 @@ config RISCV_ISA_SVNAPOT
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_SSQOSID
+	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
+	depends on 64BIT
+	default n
+	help
+	  Adds support for the Ssqosid ISA extension (Supervisor-mode
+	  Quality of Service ID).
+
+	  Ssqosid defines the srmcfg CSR which allows the system to tag the
+	  running process with an RCID (Resource Control ID) and MCID
+	  (Monitoring Counter ID). The RCID is used to determine resource
+	  allocation. The MCID is used to track resource usage in event
+	  counters.
+
+	  For example, a cache controller may use the RCID to apply a
+	  cache partitioning scheme and use the MCID to track how much
+	  cache a process, or a group of processes, is using.
+
 config RISCV_ISA_SVPBMT
 	bool "Svpbmt extension support for supervisor mode page-based memory types"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..7bce928e5daa 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -84,6 +84,10 @@
 #define SATP_ASID_MASK	_AC(0xFFFF, UL)
 #endif
 
+/* SRMCFG fields */
+#define SRMCFG_RCID_MASK	GENMASK(11, 0)
+#define SRMCFG_MCID_MASK	GENMASK(27, 16)
+
 /* Exception cause high bit - is an interrupt if set */
 #define CAUSE_IRQ_FLAG		(_AC(1, UL) << (__riscv_xlen - 1))
 
@@ -328,6 +332,7 @@
 #define CSR_STVAL		0x143
 #define CSR_SIP			0x144
 #define CSR_SATP		0x180
+#define CSR_SRMCFG		0x181
 
 #define CSR_STIMECMP		0x14D
 #define CSR_STIMECMPH		0x15D
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 812517b2cec1..49a386d74cd3 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -123,6 +123,9 @@ struct thread_struct {
 	/* A forced icache flush is not needed if migrating to the previous cpu. */
 	unsigned int prev_cpu;
 #endif
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+	u32 srmcfg;
+#endif
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
new file mode 100644
index 000000000000..600d889ef63d
--- /dev/null
+++ b/arch/riscv/include/asm/qos.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_QOS_H
+#define _ASM_RISCV_QOS_H
+
+#include <linux/percpu-defs.h>
+
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+
+#include <linux/bitfield.h>
+#include <linux/cpufeature.h>
+#include <linux/sched.h>
+
+#include <asm/csr.h>
+#include <asm/fence.h>
+#include <asm/hwcap.h>
+
+/* cached value of srmcfg csr for each cpu */
+DECLARE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DECLARE_PER_CPU(u32, cpu_srmcfg_default);
+
+static inline void __switch_to_srmcfg(struct task_struct *next)
+{
+	u32 thread_srmcfg, default_srmcfg;
+
+	thread_srmcfg = READ_ONCE(next->thread.srmcfg);
+	default_srmcfg = __this_cpu_read(cpu_srmcfg_default);
+
+	/*
+	 * RCID and MCID inherit from cpu_srmcfg_default independently.
+	 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are both 0,
+	 * so a per-field zero means "no task assignment for this
+	 * dimension" and the CPU default supplies that field. The fully
+	 * unassigned (thread.srmcfg == 0) and fully assigned (both
+	 * fields non-zero) cases short-circuit the field math.
+	 */
+	if (thread_srmcfg == 0) {
+		thread_srmcfg = default_srmcfg;
+	} else {
+		u32 rcid = FIELD_GET(SRMCFG_RCID_MASK, thread_srmcfg);
+		u32 mcid = FIELD_GET(SRMCFG_MCID_MASK, thread_srmcfg);
+
+		if (rcid == 0 || mcid == 0) {
+			if (rcid == 0)
+				rcid = FIELD_GET(SRMCFG_RCID_MASK, default_srmcfg);
+			if (mcid == 0)
+				mcid = FIELD_GET(SRMCFG_MCID_MASK, default_srmcfg);
+			thread_srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, rcid) |
+					FIELD_PREP(SRMCFG_MCID_MASK, mcid);
+		}
+	}
+
+	if (thread_srmcfg != __this_cpu_read(cpu_srmcfg)) {
+		/*
+		 * Drain stores from the outgoing task before the CSR write
+		 * so they retain the previous RCID/MCID tag at the cache
+		 * interconnect.
+		 */
+		RISCV_FENCE(rw, o);
+
+		__this_cpu_write(cpu_srmcfg, thread_srmcfg);
+		csr_write(CSR_SRMCFG, thread_srmcfg);
+		/*
+		 * Order the csrw before the new task's loads/stores so they
+		 * pick up the new tag. Zicsr 6.1.1 makes CSR writes weakly
+		 * ordered (device-output) vs memory ops. Ssqosid v1.0 is
+		 * silent so honor the general CSR rule.
+		 */
+		RISCV_FENCE(o, rw);
+	}
+}
+
+static __always_inline bool has_srmcfg(void)
+{
+	return riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID);
+}
+
+#else /* ! CONFIG_RISCV_ISA_SSQOSID  */
+
+struct task_struct;
+static __always_inline bool has_srmcfg(void) { return false; }
+static inline void __switch_to_srmcfg(struct task_struct *next) { }
+
+#endif /* CONFIG_RISCV_ISA_SSQOSID */
+#endif /* _ASM_RISCV_QOS_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..1c7ea53ec012 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -14,6 +14,7 @@
 #include <asm/processor.h>
 #include <asm/ptrace.h>
 #include <asm/csr.h>
+#include <asm/qos.h>
 
 #ifdef CONFIG_FPU
 extern void __fstate_save(struct task_struct *save_to);
@@ -119,6 +120,8 @@ do {							\
 		__switch_to_fpu(__prev, __next);	\
 	if (has_vector() || has_xtheadvector())		\
 		__switch_to_vector(__prev, __next);	\
+	if (has_srmcfg())				\
+		__switch_to_srmcfg(__next);		\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..ebe1c3588177 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -128,3 +128,5 @@ obj-$(CONFIG_ACPI_NUMA)	+= acpi_numa.o
 
 obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o
 obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
+
+obj-$(CONFIG_RISCV_ISA_SSQOSID) += qos.o
diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
new file mode 100644
index 000000000000..42f1ff9b219d
--- /dev/null
+++ b/arch/riscv/kernel/qos.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
+#include <linux/cpuhotplug.h>
+#include <linux/notifier.h>
+#include <linux/percpu-defs.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+#include <asm/qos.h>
+
+/*
+ * Cached value of srmcfg csr for each cpu. Seeded to U32_MAX so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR. The encoding
+ * MCID << 16 | RCID with both fields well under 16 bits can never
+ * produce this sentinel. This covers early-boot context switches that
+ * happen before riscv_srmcfg_init() runs as an arch_initcall.
+ */
+DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DEFINE_PER_CPU(u32, cpu_srmcfg_default);
+
+/*
+ * Invalidate the per-CPU srmcfg cache, used as both the cpuhp startup and
+ * teardown callback. The sentinel is a value no real srmcfg encoding can
+ * produce (MCID << 16 | RCID, both fields well under 16 bits) so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR.
+ *
+ * Ssqosid v1.0 leaves CSR state across hart stop/start implementation-
+ * defined, so the cached value cannot be trusted after online. Invalidating
+ * on offline as well means the sentinel persists across the offline period:
+ * a CPU brought back online finds the cache already invalidated before it is
+ * schedulable, closing the window where a task scheduled before the startup
+ * callback runs could match a stale cache and skip the CSR write while the
+ * hardware CSR was reset across hart stop/start.
+ */
+static int riscv_srmcfg_reset_cache(unsigned int cpu)
+{
+	per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+	return 0;
+}
+
+/*
+ * CPU PM notifier: invalidate the cached srmcfg on resume from a deep
+ * idle / suspend. Ssqosid v1.0 leaves CSR_SRMCFG state across low-power
+ * transitions implementation-defined, and the boot CPU never goes
+ * through the cpuhp online callback during system suspend, so without
+ * this hook __switch_to_srmcfg() would skip the CSR write when the
+ * outgoing task happens to share its srmcfg with the pre-suspend cache.
+ */
+static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
+				  unsigned long action, void *unused)
+{
+	switch (action) {
+	case CPU_PM_EXIT:
+	case CPU_PM_ENTER_FAILED:
+		__this_cpu_write(cpu_srmcfg, U32_MAX);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block riscv_srmcfg_pm_nb = {
+	.notifier_call = riscv_srmcfg_pm_notify,
+};
+
+static int __init riscv_srmcfg_init(void)
+{
+	int err;
+
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
+		return 0;
+
+	/*
+	 * cpuhp_setup_state() invokes the startup callback locally on every
+	 * already-online CPU, so no separate seed loop is needed here.
+	 */
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
+				riscv_srmcfg_reset_cache, riscv_srmcfg_reset_cache);
+	if (err < 0) {
+		pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
+			err);
+		return err;
+	}
+
+	cpu_pm_register_notifier(&riscv_srmcfg_pm_nb);
+	return 0;
+}
+arch_initcall(riscv_srmcfg_init);

-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/8] riscv: Detect the Ssqosid extension
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

Ssqosid is the RISC-V Quality-of-Service (QoS) Identifiers specification
which defines the Supervisor Resource Management Configuration (srmcfg)
register.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/include/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpufeature.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ef8e5f55c8d..b83dae5cebb9 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -112,6 +112,7 @@
 #define RISCV_ISA_EXT_ZCLSD		103
 #define RISCV_ISA_EXT_ZICFILP		104
 #define RISCV_ISA_EXT_ZICFISS		105
+#define RISCV_ISA_EXT_SSQOSID		106
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index f46aa5602d74..668a7e71ff1c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -582,6 +582,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
+	__RISCV_ISA_EXT_DATA(ssqosid, RISCV_ISA_EXT_SSQOSID),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svade, RISCV_ISA_EXT_SVADE),
 	__RISCV_ISA_EXT_DATA_VALIDATE(svadu, RISCV_ISA_EXT_SVADU, riscv_ext_svadu_validate),

-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/8] dt-bindings: riscv: Add Ssqosid extension description
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc
In-Reply-To: <20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org>

Document the ratified Supervisor-mode Quality of Service ID (Ssqosid)
extension v1.0.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 2b0a8a93bb21..1c6f091518d4 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -232,6 +232,12 @@ properties:
             ratified at commit d70011dde6c2 ("Update to ratified state")
             of riscv-j-extension.
 
+        - const: ssqosid
+          description: |
+            The standard Ssqosid extension for Quality of Service ID is
+            ratified as v1.0 in commit d9c616497fde ("Merge pull
+            request #7 from ved-rivos/Ratified") of riscv-ssqosid.
+
         - const: ssstateen
           description: |
             The standard Ssstateen extension for supervisor-mode view of the

-- 
2.43.0


^ permalink raw reply related

* [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
  To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
	Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
	Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
	Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
	Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
	Paul Walmsley, Peter Newman, Radim Krčmář,
	Reinette Chatre, Rob Herring, Samuel Holland,
	Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
	Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
  Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
	linux-doc

This series adds initial RISC-V QoS support: the Ssqosid extension [1]
(srmcfg CSR), the CBQRI controller interface [2] integrated with resctrl
[3], and DT-based platform driver for cache controllers. It has been
tested both on the Tenstorrent Ascalon Shared Cache controller as well
as a Qemu implementation [4].

Note that this series only implements support for resctrl CAT using
CBQRI capacity allocation control. cc_block_mask maps onto resctrl's
existing cbm schema. However, cc_cunits is not supported as there is no
existing equivalent for capacity units in the resctrl schemata.

I had previously been iterating on an RFC series [5] that did a full
implementation of CBQRI including capacity monitoring, bandwidth
allocation and monitoring, as well as a parser for the ACPI RQSC table.
The bandwidth controls for CBQRI do not fit well into resctrl's existing
throttle based MB schemata. I believe that the path forward is
Reinette's generic schema description proof of concept [6] but that will
take time to mature. My plan is to rebase the full support of CBQRI on
to the generic schema once it is ready.

[1] https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
[2] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
[3] https://docs.kernel.org/filesystems/resctrl.html
[4] https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
[5] https://lore.kernel.org/linux-riscv/20260601-ssqosid-cbqri-rqsc-v7-0-v6-16-baf00f50028a@kernel.org/
[6] https://lore.kernel.org/all/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/

---
Drew Fustini (8):
      dt-bindings: riscv: Add Ssqosid extension description
      riscv: Detect the Ssqosid extension
      riscv: Add support for srmcfg CSR from Ssqosid extension
      riscv_cbqri: Add capacity controller probe and allocation device ops
      riscv_cbqri: resctrl: Add cache allocation via capacity block mask
      riscv: Enable resctrl filesystem for Ssqosid
      dt-bindings: riscv: Add generic CBQRI controller binding
      riscv_cbqri: Add CBQRI cache capacity-allocation platform driver

 .../devicetree/bindings/riscv/extensions.yaml      |   6 +
 .../devicetree/bindings/riscv/riscv,cbqri.yaml     | 109 +++
 MAINTAINERS                                        |  15 +
 arch/riscv/Kconfig                                 |  20 +
 arch/riscv/include/asm/csr.h                       |   5 +
 arch/riscv/include/asm/hwcap.h                     |   1 +
 arch/riscv/include/asm/processor.h                 |   3 +
 arch/riscv/include/asm/qos.h                       |  86 +++
 arch/riscv/include/asm/resctrl.h                   | 152 ++++
 arch/riscv/include/asm/switch_to.h                 |   3 +
 arch/riscv/kernel/Makefile                         |   2 +
 arch/riscv/kernel/cpufeature.c                     |   1 +
 arch/riscv/kernel/qos.c                            |  91 +++
 drivers/resctrl/Kconfig                            |  44 ++
 drivers/resctrl/Makefile                           |   7 +
 drivers/resctrl/cbqri_capacity.c                   | 132 ++++
 drivers/resctrl/cbqri_devices.c                    | 511 ++++++++++++++
 drivers/resctrl/cbqri_internal.h                   | 110 +++
 drivers/resctrl/cbqri_resctrl.c                    | 774 +++++++++++++++++++++
 include/linux/riscv_cbqri.h                        |  47 ++
 20 files changed, 2119 insertions(+)
---
base-commit: 4fa3f5fabb30bf00d7475d5a33459ea83d639bf9
change-id: 20260610-dfustini-atl-sc-cbqri-dt-410c8e2711dd

Best regards,
--  
Drew Fustini <fustini@kernel.org>


^ permalink raw reply

* Re: [PATCH v6 07/10] ACPI: APEI: introduce GHES helper
From: Julian Braha @ 2026-06-19 17:46 UTC (permalink / raw)
  To: Ahmed Tiba, Rafael J. Wysocki, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Len Brown,
	Saket Dumbre, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
	Shuah Khan
  Cc: linux-kernel, linux-acpi, acpica-devel, linux-cxl, devicetree,
	linux-edac, linux-doc, Dmitry.Lamerov
In-Reply-To: <81dd6d0d-427f-49ae-9573-fbe84dc2185a@arm.com>

On 6/19/26 16:45, Ahmed Tiba wrote:
> GHES_CPER_HELPERS is intended for both the ACPI GHES path and the DT
> firmware-first provider, so I do not want to tie it to ACPI.

So what's the plan to fix the build error when ACPI is disabled:
https://lore.kernel.org/all/0f131ee4-d335-45d2-b6ae-49c18df1353b@gmail.com/

- Julian Braha

^ permalink raw reply

* Re: [PATCH 0/3] docs/zh_CN: update translation of doc-guide/sphinx.rst
From: Jiandong Qiu @ 2026-06-19 16:28 UTC (permalink / raw)
  To: Jonathan Corbet, alexs, si.yanteng; +Cc: dzm91, skhan, linux-doc, linux-kernel
In-Reply-To: <87ldcatxm5.fsf@trenco.lwn.net>

On 6/19/26 10:26 PM, Jonathan Corbet wrote:
> I've added a couple of comments, though I need to defer to others to
> judge the translation work itself.  I do have one question, though: did
> you do the translation yourself, or did you use some sort of tool?  In
> the latter case, you need to document that usage with Assisted-by tags.

Yes, I used Codex to help with the translation. Sorry for not
documenting that in the original submission. I will add the appropriate
Assisted-by tag in the next version.

Thanks,
Jiandong

^ permalink raw reply

* Re: [PATCH 2/3] docs/zh_CN: add process/changes.rst translation
From: Jiandong Qiu @ 2026-06-19 16:25 UTC (permalink / raw)
  To: Jonathan Corbet, alexs, si.yanteng; +Cc: dzm91, skhan, linux-doc, linux-kernel
In-Reply-To: <87pl1mtxqc.fsf@trenco.lwn.net>

On 6/19/26 10:23 PM, Jonathan Corbet wrote:
> Here too, we don't need this label.
> 
> (Yes, I'm quibbling on details because I am in no position to judge the
> translation itself :)
Hi Jon,

Thank you for pointing this out.

I don't think this is quibbling at all; it sounds like a good cleanup to
me. I was following the structure of the original English document
closely, so I did not realize that these top-of-file labels were
something we should avoid now.

I will drop these labels in the next version and just refer to the files
by name where needed.

Thanks,
Jiandong

^ permalink raw reply

* Re: [PATCH v6 15/19] drm/connector: Add new atomic_create_state callback
From: Luca Ceresoli @ 2026-06-19 16:24 UTC (permalink / raw)
  To: Maxime Ripard, Maarten Lankhorst, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jonathan Corbet, Shuah Khan, Dmitry Baryshkov,
	Jyri Sarha, Tomi Valkeinen, Andrzej Hajda, Neil Armstrong,
	Robert Foss, Laurent Pinchart, Jonas Karlman, Jernej Skrabec,
	Simon Ser, Harry Wentland, Melissa Wen, Sebastian Wick, Alex Hung,
	Jani Nikula, Rodrigo Vivi, Joonas Lahtinen, Tvrtko Ursulin,
	Chen-Yu Tsai, Samuel Holland, Dave Stevenson, Maíra Canal,
	Raspberry Pi Kernel Maintenance
  Cc: dri-devel, linux-doc, linux-kernel, Daniel Stone, intel-gfx,
	intel-xe, linux-arm-kernel, linux-sunxi, Laurent Pinchart
In-Reply-To: <20260526-drm-mode-config-init-v6-15-852346394200@kernel.org>

Hello Maxime, Dmitry, all,

On Tue May 26, 2026 at 6:46 PM CEST, Maxime Ripard wrote:
> Commit 47b5ac7daa46 ("drm/atomic: Add new atomic_create_state callback
> to drm_private_obj") introduced a new pattern for allocating drm object
> states.
>
> Instead of relying on the reset() callback, it created a new
> atomic_create_state hook. This is helpful because reset is a bit
> overloaded: it's used to create the initial software state, reset it,
> but also reset the hardware.
>
> It can also be used either at probe time, to create the initial state
> and possibly reset the hardware to an expected default, but also during
> suspend/resume.
>
> Both these cases come with different expectations too: during the
> initialization, we want to initialize all states, but during
> suspend/resume, drm_private_states for example are expected to be kept
> around.
>
> reset() also isn't fallible, which makes it harder to handle
> initialization errors properly. This is only really relevant for some
> drivers though, since all the helpers for reset only create a new
> state, and don't touch the hardware at all.
>
> It was thus decided to create a new hook that would allocate and
> initialize a pristine state without any side effect:
> atomic_create_state to untangle a bit some of it, and to separate the
> initialization with the actual reset one might need during a
> suspend/resume.
>
> Continue the transition to the new pattern with connectors.
>
> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
> Signed-off-by: Maxime Ripard <mripard@kernel.org>

As I'm rebasing another series on current drm-misc-next, which now includes
this patch, I ran into troubles and I'm not sure what is the right thing to
do. I hope you can help me clarify this. See below for my question.

FTR the series I'm rebasing is "drm bridge hotplug", but the question is
not specific to that series.

> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -616,11 +616,19 @@ int drmm_connector_hdmi_init(struct drm_device *dev,
>
>  	/*
>  	 * drm_connector_attach_max_bpc_property() requires the
>  	 * connector to have a state.
>  	 */
> -	if (connector->funcs->reset)
> +	if (connector->funcs->atomic_create_state) {
> +		struct drm_connector_state *state;
> +
> +		state = connector->funcs->atomic_create_state(connector);
> +		if (IS_ERR(state))
> +			return PTR_ERR(state);
> +
> +		connector->state = state;
> +	} else if (connector->funcs->reset)
>  		connector->funcs->reset(connector);

Here a state is added to connector->state, and that's fine.

However non-HDMI connectors don't get a state created by default.

I was hit by this with the drm_bridge_connector which it can add either an
HDMI or a non-HDMI connector [0]. In the former case it calls
drmm_connector_hdmi_init(), which creates the state (in the hunk quoted
above). In the latter case, as I experienced at runtime and confirmed by
code inspection, it does not create a state: no one calls
connector->funcs->atomic_create_state.

I suspect this is related to patch 19/19 which converted the
drm_bridge_connector from drm_atomic_helper_connector_reset() to
drm_atomic_helper_connector_create_state(), and only the former sets
'connector->state = conn_state'.

Generally speaking, looks like a state is created only for HDMI
connectors.

The hardware I have uses the drm_bridge_connector in the non-HDMI case, so
the state is not created and this results in a NULL pointer deref later on,
in my case it's in in drm_atomic_connector_get_property().

Am I missing anything obvious?

For now I've come up with a quick workaround, adding (roughly after
connector init at [1]):

        if (!connector->state)
                connector->state = drm_bridge_connector_create_state(connector);

I'm not sure which would be the best solution. Maybe taking the whole
atomic_create_state/reset state creation calls [2] from
drmm_connector_hdmi_init() and hoist them up into
drmm_connector_init(), so all connectors benefit?

Let me know what you think.

[0] https://gitlab.freedesktop.org/drm/misc/kernel/-/blob/7a921d111810652672e02c392b35fdcefa4d5030/drivers/gpu/drm/display/drm_bridge_connector.c#L995-1029
[1] https://gitlab.freedesktop.org/drm/misc/kernel/-/blob/7a921d111810652672e02c392b35fdcefa4d5030/drivers/gpu/drm/display/drm_bridge_connector.c#L1030
[2] https://gitlab.freedesktop.org/drm/misc/kernel/-/blob/7a921d111810652672e02c392b35fdcefa4d5030/drivers/gpu/drm/drm_connector.c#L617-631

Kind regards,
Luca

--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH v6 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider
From: Borislav Petkov @ 2026-06-19 16:16 UTC (permalink / raw)
  To: Ahmed Tiba
  Cc: Rafael J. Wysocki, Tony Luck, Hanjun Guo, Mauro Carvalho Chehab,
	Shuai Xue, Len Brown, Saket Dumbre, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Jonathan Corbet, Shuah Khan, linux-kernel,
	linux-acpi, acpica-devel, linux-cxl, devicetree, linux-edac,
	linux-doc, Dmitry.Lamerov
In-Reply-To: <eccbf574-a145-47af-889b-ca6dd80f98f2@arm.com>

On Fri, Jun 19, 2026 at 04:41:40PM +0100, Ahmed Tiba wrote:
> I will address the issues introduced by this series. Pre-existing
> behaviour is carried forward unchanged.

So you carve out that code, you use it for your use case while *knowing* there
are preexisting bugs. Wonderful.

Sorry, first bug fixes then features.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* RE: [PATCH net-next v5 12/15] onsemi: s2500: Add driver support for TS2500 MAC-PHY
From: Selvamani Rajagopal @ 2026-06-19 16:05 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Andrew Lunn, Piergiorgio Beruto, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Parthiban Veerasooran, Richard Cochran, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Simon Horman, Jonathan Corbet,
	Shuah Khan, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org, Jerry Ray
In-Reply-To: <ajVKfBKPuNk9zN7b@monoceros>


Thanks for your feedback. Will take care of all the three comments.

> -----Original Message-----
> Subject: Re: [PATCH net-next v5 12/15] onsemi: s2500: Add driver support for TS2500
> MAC-PHY
> 
> On Sun, Jun 14, 2026 at 10:00:28AM -0700, Selvamani Rajagopal via B4 Relay wrote:
> > +static const struct of_device_id s2500_of_match[] = {
> > +	{ .compatible = "onnn,s2500" },
> > +	{}
> 
> s/{}/{ }/
> 
> > +};
> > +
> > +static const struct spi_device_id s2500_ids[] = {
> > +	{ "s2500" },
> > +	{}
> > +};
> 
> Please make this:
> 
> static const struct spi_device_id s2500_ids[] = {
> 	{ .name = "s2500" },
> 	{ }
> };
> 
> > +MODULE_DEVICE_TABLE(spi, s2500_ids);
> > +
> > +static struct spi_driver s2500_driver = {
> > +	.driver = {
> > +		.name	= DRV_NAME,
> > +		.of_match_table = s2500_of_match,
> > +	},
> > +	.probe		= s2500_probe,
> > +	.remove		= s2500_remove,
> > +	.id_table	= s2500_ids,
> 
> Tastes are different, but the idea to align = is usually screwed by
> follow up patches. Here it's broken from the start. If you ask me: Use a
> single space before each =.
> 
> > +};
> > +
> > +module_spi_driver(s2500_driver);
> 
> Usually there is no empty line between the driver struct and the macro
> registering it.
> 
>> 
> Best regards
> Uwe


^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Nuno Sá @ 2026-06-19 15:54 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Janani Sunil, Jonathan Cameron, Rodrigo Alencar, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <20260619-concierge-doozy-9c161533c369@spud>

On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:
> On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:
> > On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:
> > > On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:
> > > > On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:
> > > > > 
> > > > > On 6/14/26 21:44, Jonathan Cameron wrote:
> > > > > > On Tue, 9 Jun 2026 16:47:23 +0200
> > > > > > Janani Sunil <jan.sun97@gmail.com> wrote:
> > > > > > 
> > > > > > > On 5/26/26 15:11, Rodrigo Alencar wrote:
> > > > > > > > On 26/05/19 05:42PM, Janani Sunil wrote:
> > > > > > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > > > > > integrated precision reference.
> > > > > > > > ...
> > > > > > > > Probably others may comment on that, but...
> > > > > > > > 
> > > > > > > > This parent node may support device addressing for multi-device support through
> > > > > > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > > > > > other resources like the toggle pins or reset and enable.
> > > > > > > > 
> > > > > > > > That way I suppose that an example would look like...
> > > > > > > > > +
> > > > > > > > > +patternProperties:
> > > > > > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > > > > > +    type: object
> > > > > > > > > +    description: Child nodes for individual channel configuration
> > > > > > > > > +
> > > > > > > > > +    properties:
> > > > > > > > > +      reg:
> > > > > > > > > +        description: Channel number.
> > > > > > > > > +        minimum: 0
> > > > > > > > > +        maximum: 15
> > > > > > > > > +
> > > > > > > > > +      adi,output-range-microvolt:
> > > > > > > > > +        description: |
> > > > > > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > > > > > +        oneOf:
> > > > > > > > > +          - items:
> > > > > > > > > +              - const: 0
> > > > > > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > > > > > +          - items:
> > > > > > > > > +              - const: -5000000
> > > > > > > > > +              - const: 5000000
> > > > > > > > > +          - items:
> > > > > > > > > +              - const: -10000000
> > > > > > > > > +              - const: 10000000
> > > > > > > > > +          - items:
> > > > > > > > > +              - const: -15000000
> > > > > > > > > +              - const: 15000000
> > > > > > > > > +          - items:
> > > > > > > > > +              - const: -20000000
> > > > > > > > > +              - const: 20000000
> > > > > > > > > +
> > > > > > > > > +    required:
> > > > > > > > > +      - reg
> > > > > > > > > +
> > > > > > > > > +    additionalProperties: false
> > > > > > > > > +
> > > > > > > > > +required:
> > > > > > > > > +  - compatible
> > > > > > > > > +  - reg
> > > > > > > > > +  - vdd-supply
> > > > > > > > > +  - avdd-supply
> > > > > > > > > +  - hvdd-supply
> > > > > > > > > +
> > > > > > > > > +dependencies:
> > > > > > > > > +  spi-cpha: [ spi-cpol ]
> > > > > > > > > +  spi-cpol: [ spi-cpha ]
> > > > > > > > > +
> > > > > > > > > +allOf:
> > > > > > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > > > > > +
> > > > > > > > > +unevaluatedProperties: false
> > > > > > > > > +
> > > > > > > > > +examples:
> > > > > > > > > +  - |
> > > > > > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > > > > > +
> > > > > > > > > +    spi {
> > > > > > > > > +        #address-cells = <1>;
> > > > > > > > > +        #size-cells = <0>;
> > > > > > > > > +
> > > > > > > > > +        dac@0 {
> > > > > > > > > +            compatible = "adi,ad5529r-16";
> > > > > > > > > +            reg = <0>;
> > > > > > > > > +            spi-max-frequency = <25000000>;
> > > > > > > > > +
> > > > > > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > > > > > +
> > > > > > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > +
> > > > > > > > > +            #address-cells = <1>;
> > > > > > > > > +            #size-cells = <0>;
> > > > > > > > > +
> > > > > > > > > +            channel@0 {
> > > > > > > > > +                reg = <0>;
> > > > > > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > +            };
> > > > > > > > > +
> > > > > > > > > +            channel@1 {
> > > > > > > > > +                reg = <1>;
> > > > > > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > +            };
> > > > > > > > > +
> > > > > > > > > +            channel@2 {
> > > > > > > > > +                reg = <2>;
> > > > > > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > +            };
> > > > > > > > > +        };
> > > > > > > > > +    };
> > > > > > > > ...
> > > > > > > > 
> > > > > > > > 	spi {
> > > > > > > > 		#address-cells = <1>;
> > > > > > > > 		#size-cells = <0>;
> > > > > > > > 
> > > > > > > > 		multi-dac@0 {
> > > > > > > > 			compatible = "adi,ad5529r-16";
> > > > > > > > 			reg = <0>;
> > > > > > > > 			spi-max-frequency = <25000000>;
> > > > > > > > 
> > > > > > > > 			#address-cells = <1>;
> > > > > > > > 			#size-cells = <0>;
> > > > > > > > 
> > > > > > > > 			dac@0 {
> > > > > > > > 				reg = <0>;
> > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > 
> > > > > > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > 
> > > > > > > > 				#address-cells = <1>;
> > > > > > > > 				#size-cells = <0>;
> > > > > > > > 
> > > > > > > > 				channel@0 {
> > > > > > > > 					reg = <0>;
> > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > 				};
> > > > > > > > 
> > > > > > > > 				channel@1 {
> > > > > > > > 					reg = <1>;
> > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > 				};
> > > > > > > > 
> > > > > > > > 				channel@2 {
> > > > > > > > 					reg = <2>;
> > > > > > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > > > > > 				};
> > > > > > > > 			}
> > > > > > > > 
> > > > > > > > 			dac@1 {
> > > > > > > > 				reg = <1>;
> > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > 
> > > > > > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > > > > > 
> > > > > > > > 				#address-cells = <1>;
> > > > > > > > 				#size-cells = <0>;
> > > > > > > > 
> > > > > > > > 				channel@0 {
> > > > > > > > 					reg = <0>;
> > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > 				};
> > > > > > > > 
> > > > > > > > 				channel@1 {
> > > > > > > > 					reg = <1>;
> > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > 				};
> > > > > > > > 			}
> > > > > > > > 		};
> > > > > > > > 	};
> > > > > > > > 
> > > > > > > > then you might need something like:
> > > > > > > > 
> > > > > > > > 	patternProperties:
> > > > > > > > 		"^dac@[0-3]$":
> > > > > > > > 
> > > > > > > > and put most of the things under this node pattern.
> > > > > > > > 
> > > > > > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > > > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > > > > > 
> > > > > > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > > > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > > > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).
> > > > > > > Hi Rodrigo,
> > > > > > > 
> > > > > > > Thank you for looking at this.
> > > > > > > 
> > > > > > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > > > > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > > > > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > > > > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > > > > > speculatively without a validating use case.
> > > > > > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > > > > > looking at it another way a kind of distributed SPI mux.
> > > > > > 
> > > > > > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > > > > > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > > > > > That would leave the path open to supporting the addressing at a later date.
> > > > > > An alternative might be to look at it like a chained device setup. In those we pretend there
> > > > > > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > > > > > coupled whereas for those devices it tends to be you have to read / write the same register
> > > > > > in all devices in the chain as one big SPI message.
> > > > > > 
> > > > > > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > > > > > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > > > > > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > > > > > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > > > > > longer term how to support it cleanly in SPI.
> > > > 
> > > > I'd swear I have seen this before, from some Microchip devices. Let me
> > > > see if I can find what I am thinking of...
> > > 
> > > 
> > > microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> > > slightly different properties.
> > > 
> > >   microchip,device-addr:
> > >     description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> > >     $ref: /schemas/types.yaml#/definitions/uint32
> > >     enum: [0, 1, 2, 3]
> > >     default: 0
> > > 
> > > and
> > > 
> > > 
> > >   microchip,hw-device-address:
> > >     $ref: /schemas/types.yaml#/definitions/uint32
> > >     minimum: 0
> > >     maximum: 3
> > >     description:
> > >       The address is set on a per-device basis by fuses in the factory,
> > >       configured on request. If not requested, the fuses are set for 0x1.
> > >       The device address is part of the device markings to avoid
> > >       potential confusion. This address is coded on two bits, so four possible
> > >       addresses are available when multiple devices are present on the same
> > >       SPI bus with only one Chip Select line for all devices.
> > >       Each device communication starts by a CS falling edge, followed by the
> > >       clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> > >       which is first one on the wire).
> > > 
> > > This sounds exactly like the sort of feature that you're dealing with
> > > here?
> > > 
> > 
> > The core idea yes but for this chip, things are a bit more annoying (but
> > Janani can correct me if I'm wrong). Here, each device can, in theory,
> > have it's own supplies, pins and at the very least, channels with maybe
> > different scales. That is why Janani is proposing dac nodes. Given I
> > honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> > about solving this at the spi level.
> > 
> > Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> > together in the same bus.
> 
> I'm definitely missing something, because that property for the
> microchip devices is not impacted what else is on the bus. AFAICT, you
> could have an mcp3911 and an mcp3564 on the same bus even though both
> are completely different devices with different drivers. They have
> individual device nodes and their own supplies etc etc. These aren't
> per-channel properties on an adc or dac, they're per child device on a
> spi bus.

Maybe I'm the one missing something :). IIRC, spi would not allow two
devices on the same CS right? Because for this chip we would need
something like:

spi {
	dac@0 {
		reg = <0>;
		adi,pin-id = <0>;
	};

	dac@1 {
		reg = <0>; // which seems already problematic?
		adi,pin-id <1>;
	};

	...

	//up to 4
};

- Nuno Sá


^ permalink raw reply

* Re: [PATCH v6 07/10] ACPI: APEI: introduce GHES helper
From: Ahmed Tiba @ 2026-06-19 15:45 UTC (permalink / raw)
  To: Julian Braha, Rafael J. Wysocki, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Len Brown,
	Saket Dumbre, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
	Shuah Khan
  Cc: linux-kernel, linux-acpi, acpica-devel, linux-cxl, devicetree,
	linux-edac, linux-doc, Dmitry.Lamerov
In-Reply-To: <58f7163f-2fce-41e9-bc35-d1d8e6f4a298@gmail.com>

On 17/06/2026 18:17, Julian Braha wrote:
> Hi Ahmed,
> 
> On 6/17/26 14:54, Ahmed Tiba wrote:
> 
>> +config GHES_CPER_HELPERS
>> +	bool
>> +	select UEFI_CPER
> 
> This config option should probably also depend on ACPI (could just move
> it into the if ACPI..endif block), or at least have a comment that
> selector options ensure ACPI is enabled.
> 
> - Julian Braha

GHES_CPER_HELPERS is intended for both the ACPI GHES path and the DT
firmware-first provider, so I do not want to tie it to ACPI.

Best regards,
Ahmed

^ permalink raw reply

* Re: [PATCH v6 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider
From: Ahmed Tiba @ 2026-06-19 15:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, Tony Luck, Hanjun Guo, Mauro Carvalho Chehab,
	Shuai Xue, Len Brown, Saket Dumbre, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Jonathan Corbet, Shuah Khan, linux-kernel,
	linux-acpi, acpica-devel, linux-cxl, devicetree, linux-edac,
	linux-doc, Dmitry.Lamerov
In-Reply-To: <20260618164807.GAajQhR9J_00j4LxaC@fat_crate.local>

On 18/06/2026 17:48, Borislav Petkov wrote:
> On Wed, Jun 17, 2026 at 02:54:38PM +0100, Ahmed Tiba wrote:
>> This is v6 of the GHES refactor series. Compared to v5, it addresses
>> the latest review comments and tightens the DT CPER provider and
>> related helper wiring.
> 
> Sashiko has comments:
> 
> https://sashiko.dev/#/patchset/20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-0-91f725174aa0%40arm.com
> 

I will address the issues introduced by this series. Pre-existing
behaviour is carried forward unchanged.

Best regards,
Ahmed

^ permalink raw reply

* Re: [PATCH v4 3/5] rpmsg: virtio_rpmsg_bus: get buffer size from config space
From: Shah, Tanmay @ 2026-06-19 15:31 UTC (permalink / raw)
  To: Arnaud POULIQUEN, tanmay.shah, andersson, mathieu.poirier, corbet,
	skhan
  Cc: linux-remoteproc, linux-doc, linux-kernel
In-Reply-To: <1c39fbbc-83d9-4ad2-bf79-fc2f64fe6e44@foss.st.com>



On 6/19/2026 2:45 AM, Arnaud POULIQUEN wrote:
> 
> 
> On 6/18/26 18:31, Shah, Tanmay wrote:
>>
>>
>> On 6/18/2026 3:32 AM, Arnaud POULIQUEN wrote:
>>>
>>>
>>> On 6/17/26 19:41, Shah, Tanmay wrote:
>>>>
>>>>
>>>> On 6/17/2026 4:15 AM, Arnaud POULIQUEN wrote:
>>>>> Hi Tanmay,
>>>>>
>>>>> On 6/15/26 22:20, Tanmay Shah wrote:
>>>>>> 512 bytes isn't always suitable for all case, let firmware
>>>>>> maker decide the best value from resource table.
>>>>>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit.
>>>>>>
>>>>>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
>>>>>> ---
>>>>>>
>>>>>> Changes in v4: squash to virtio rpmsg config patch
>>>>>>      - Introduce new patch to modify rpmsg.rst documentation
>>>>>>      - check version is always 1.
>>>>>>      - check size field is same as size of struct virtio_rpmsg_config
>>>>>>      - introduce alignment field
>>>>>>      - check alignment field is power of 2
>>>>>>      - check tx and rx buf size is aligned with alignment passed
>>>>>> in the
>>>>>>        structure
>>>>>>
>>>>>> Changes in v3:
>>>>>>      - change version field from u16 to u8
>>>>>>      - introduce size field in the rpmsg_virtio_config structure
>>>>>>      - check version field is set to any non-zero value.
>>>>>>      - check size field is not 0.
>>>>>>      - Remove field for private config, as not needed for now.
>>>>>>      - add documentation of rpmsg_virtio_config structure
>>>>>>
>>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c   | 129 +++++++++++++++++++++++
>>>>>> +-----
>>>>>>     include/linux/rpmsg/virtio_rpmsg.h |  50 +++++++++++
>>>>>>     2 files changed, 160 insertions(+), 19 deletions(-)
>>>>>>     create mode 100644 include/linux/rpmsg/virtio_rpmsg.h
>>>>>>
>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/
>>>>>> virtio_rpmsg_bus.c
>>>>>> index 99df1ae07055..a59925f870a4 100644
>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>> @@ -15,11 +15,13 @@
>>>>>>     #include <linux/idr.h>
>>>>>>     #include <linux/jiffies.h>
>>>>>>     #include <linux/kernel.h>
>>>>>> +#include <linux/log2.h>
>>>>>>     #include <linux/module.h>
>>>>>>     #include <linux/mutex.h>
>>>>>>     #include <linux/rpmsg.h>
>>>>>>     #include <linux/rpmsg/byteorder.h>
>>>>>>     #include <linux/rpmsg/ns.h>
>>>>>> +#include <linux/rpmsg/virtio_rpmsg.h>
>>>>>>     #include <linux/scatterlist.h>
>>>>>>     #include <linux/slab.h>
>>>>>>     #include <linux/sched.h>
>>>>>> @@ -39,7 +41,8 @@
>>>>>>      * @tx_bufs:    kernel address of tx buffers
>>>>>>      * @num_rx_buf: total number of rx buffers
>>>>>>      * @num_tx_buf: total number of tx buffers
>>>>>> - * @buf_size:   size of one rx or tx buffer
>>>>>> + * @rx_buf_size: size of one rx buffer
>>>>>> + * @tx_buf_size: size of one tx buffer
>>>>>>      * @last_tx_buf: index of last tx buffer used
>>>>>>      * @bufs_dma:    dma base addr of the buffers
>>>>>>      * @tx_lock:    protects svq and tx_bufs, to allow concurrent
>>>>>> senders.
>>>>>> @@ -59,7 +62,8 @@ struct virtproc_info {
>>>>>>         void *rx_bufs, *tx_bufs;
>>>>>>         unsigned int num_rx_buf;
>>>>>>         unsigned int num_tx_buf;
>>>>>> -    unsigned int buf_size;
>>>>>> +    unsigned int rx_buf_size;
>>>>>> +    unsigned int tx_buf_size;
>>>>>>         int last_tx_buf;
>>>>>>         dma_addr_t bufs_dma;
>>>>>>         struct mutex tx_lock;
>>>>>> @@ -68,9 +72,6 @@ struct virtproc_info {
>>>>>>         wait_queue_head_t sendq;
>>>>>>     };
>>>>>>     -/* The feature bitmap for virtio rpmsg */
>>>>>> -#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>>>>>> notifications */
>>>>>> -
>>>>>>     /**
>>>>>>      * struct rpmsg_hdr - common header for all rpmsg messages
>>>>>>      * @src: source address
>>>>>> @@ -128,7 +129,7 @@ struct virtio_rpmsg_channel {
>>>>>>      * processor.
>>>>>>      */
>>>>>>     #define MAX_RPMSG_NUM_BUFS    (256)
>>>>>> -#define MAX_RPMSG_BUF_SIZE    (512)
>>>>>> +#define DEFAULT_RPMSG_BUF_SIZE    (512)
>>>>>>       /*
>>>>>>      * Local addresses are dynamically allocated on-demand.
>>>>>> @@ -444,7 +445,7 @@ static void *get_a_tx_buf(struct virtproc_info
>>>>>> *vrp)
>>>>>>           /* either pick the next unused tx buffer */
>>>>>>         if (vrp->last_tx_buf < vrp->num_tx_buf)
>>>>>> -        ret = vrp->tx_bufs + vrp->buf_size * vrp->last_tx_buf++;
>>>>>> +        ret = vrp->tx_bufs + vrp->tx_buf_size * vrp->last_tx_buf++;
>>>>>>         /* or recycle a used one */
>>>>>>         else
>>>>>>             ret = virtqueue_get_buf(vrp->svq, &len);
>>>>>> @@ -514,7 +515,7 @@ static int rpmsg_send_offchannel_raw(struct
>>>>>> rpmsg_device *rpdev,
>>>>>>          * messaging), or to improve the buffer allocator, to support
>>>>>>          * variable-length buffer sizes.
>>>>>>          */
>>>>>> -    if (len > vrp->buf_size - sizeof(struct rpmsg_hdr)) {
>>>>>> +    if (len > vrp->tx_buf_size - sizeof(struct rpmsg_hdr)) {
>>>>>>             dev_err(dev, "message is too big (%d)\n", len);
>>>>>>             return -EMSGSIZE;
>>>>>>         }
>>>>>> @@ -647,7 +648,7 @@ static ssize_t virtio_rpmsg_get_mtu(struct
>>>>>> rpmsg_endpoint *ept)
>>>>>>         struct rpmsg_device *rpdev = ept->rpdev;
>>>>>>         struct virtio_rpmsg_channel *vch =
>>>>>> to_virtio_rpmsg_channel(rpdev);
>>>>>>     -    return vch->vrp->buf_size - sizeof(struct rpmsg_hdr);
>>>>>> +    return vch->vrp->tx_buf_size - sizeof(struct rpmsg_hdr);
>>>>>>     }
>>>>>>       static int rpmsg_recv_single(struct virtproc_info *vrp, struct
>>>>>> device *dev,
>>>>>> @@ -673,7 +674,7 @@ static int rpmsg_recv_single(struct virtproc_info
>>>>>> *vrp, struct device *dev,
>>>>>>          * We currently use fixed-sized buffers, so trivially
>>>>>> sanitize
>>>>>>          * the reported payload length.
>>>>>>          */
>>>>>> -    if (len > vrp->buf_size ||
>>>>>> +    if (len > vrp->rx_buf_size ||
>>>>>>             msg_len > (len - sizeof(struct rpmsg_hdr))) {
>>>>>>             dev_warn(dev, "inbound msg too big: (%d, %d)\n", len,
>>>>>> msg_len);
>>>>>>             return -EINVAL;
>>>>>> @@ -706,7 +707,7 @@ static int rpmsg_recv_single(struct virtproc_info
>>>>>> *vrp, struct device *dev,
>>>>>>             dev_warn_ratelimited(dev, "msg received with no
>>>>>> recipient\n");
>>>>>>           /* publish the real size of the buffer */
>>>>>> -    rpmsg_sg_init(&sg, msg, vrp->buf_size);
>>>>>> +    rpmsg_sg_init(&sg, msg, vrp->rx_buf_size);
>>>>>>           /* add the buffer back to the remote processor's
>>>>>> virtqueue */
>>>>>>         err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, msg, GFP_KERNEL);
>>>>>> @@ -820,10 +821,13 @@ static int rpmsg_probe(struct virtio_device
>>>>>> *vdev)
>>>>>>         struct virtproc_info *vrp;
>>>>>>         struct virtio_rpmsg_channel *vch = NULL;
>>>>>>         struct rpmsg_device *rpdev_ns, *rpdev_ctrl;
>>>>>> +    u16 rpmsg_buf_align = 0;
>>>>>>         void *bufs_va;
>>>>>>         int err = 0, i;
>>>>>>         size_t total_buf_space;
>>>>>>         bool notify;
>>>>>> +    u8 version;
>>>>>> +    u16 size;
>>>>>>           vrp = kzalloc_obj(*vrp);
>>>>>>         if (!vrp)
>>>>>> @@ -855,9 +859,90 @@ static int rpmsg_probe(struct virtio_device
>>>>>> *vdev)
>>>>>>         else
>>>>>>             vrp->num_tx_buf = MAX_RPMSG_NUM_BUFS;
>>>>>>     -    vrp->buf_size = MAX_RPMSG_BUF_SIZE;
>>>>>> +    /*
>>>>>> +     * If VIRTIO_RPMSG_F_BUFSZ feature is supported, then configure
>>>>>> buf
>>>>>> +     * size from virtio device config space from the resource table.
>>>>>> +     * If the feature is not supported, then assign default buf
>>>>>> size.
>>>>>> +     */
>>>>>> +    if (virtio_has_feature(vdev, VIRTIO_RPMSG_F_BUFSZ)) {
>>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>>> +                 version, &version);
>>>>>> +
>>>>>> +        /* for now we support only v1 */
>>>>>> +        if (version != RPMSG_VDEV_CONFIG_V1) {
>>>>>> +            dev_err(&vdev->dev,
>>>>>> +                "unsupported vdev config version %u\n", version);
>>>>>> +            err = -EINVAL;
>>>>>> +            goto vqs_del;
>>>>>> +        }
>>>>>> +
>>>>>> +        /* size of the config space must match */
>>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>>> +                 size, &size);
>>>>>> +        if (size != sizeof(struct virtio_rpmsg_config)) {
>>>>>> +            dev_err(&vdev->dev, "invalid size of vdev config %u\n",
>>>>>> +                size);
>>>>>> +            err = -EINVAL;
>>>>>> +            goto vqs_del;
>>>>>> +        }
>>>>>>     -    total_buf_space = (vrp->num_rx_buf + vrp->num_tx_buf) * vrp-
>>>>>>> buf_size;
>>>>>> +        /*
>>>>>> +         * Optional alignment applied to each buffer size and to
>>>>>> the TX
>>>>>> +         * buffer base address (e.g. to align buffers on a cache
>>>>>> line).
>>>>>> +         * It must be a power of two; zero means no extra alignment.
>>>>>> +         */
>>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>>> +                 rpmsg_buf_align, &rpmsg_buf_align);
>>>>>> +        if (rpmsg_buf_align && !is_power_of_2(rpmsg_buf_align)) {
>>>>>> +            dev_err(&vdev->dev,
>>>>>> +                "bad vdev config: rpmsg_buf_align %u is not a power
>>>>>> of two\n",
>>>>>> +                rpmsg_buf_align);
>>>>>> +            err = -EINVAL;
>>>>>> +            goto vqs_del;
>>>>>> +        }
>>>>>> +
>>>>>> +        /* note: tx and rx are defined from remote view */
>>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>>> +                 txbuf_size, &vrp->rx_buf_size);
>>>>>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>>>>>> +                 rxbuf_size, &vrp->tx_buf_size);
>>>>>> +
>>>>>> +        /* The buffers must hold at least the rpmsg header */
>>>>>> +        if (vrp->rx_buf_size < sizeof(struct rpmsg_hdr) ||
>>>>>> +            vrp->tx_buf_size < sizeof(struct rpmsg_hdr)) {
>>>>>> +            dev_err(&vdev->dev,
>>>>>> +                "bad vdev config: rx buf sz = %u, tx buf sz = %u\n",
>>>>>> +                vrp->rx_buf_size, vrp->tx_buf_size);
>>>>>> +            err = -EINVAL;
>>>>>> +            goto vqs_del;
>>>>>> +        }
>>>>>> +
>>>>>> +        /*
>>>>>> +         * The buffer size must be aligned to the provided
>>>>>> alignment for
>>>>>> +         * so that the start address of tx bufs can be aligned.
>>>>>> +         */
>>>>>
>>>>> 'tx' to remove as  it also concerns Rx buffers
>>>>>
>>>>
>>>> Ack.
>>>>
>>>>>
>>>>> What about removing this check to manage alignment during buffer
>>>>> allocation?
>>>>>
>>>>> For example, if the alignment is on a 64-bit address and the tx_buffer
>>>>> and rx_buffer sizes are 40 bytes, 48 bytes can be allocated in memory
>>>>> for each buffer, and the virtio descriptor can be filled with aligned
>>>>> addresses.
>>>>>
>>>>> In other words, the rpmsg_buf_align field contains the alignment
>>>>> constraint from the remote processor. If the Linux kernel wants to
>>>>> impose another alignment constraint, it must test or update
>>>>> rpmsg_buf_align, but it must not impose alignment on the buffer size.
>>>>>
>>>>>
>>>>
>>>> This part I don't understand. `rpmsg_buf_align` is alignment for only
>>>> single buffer size. The linux kernel is checking that single rx buf
>>>> size
>>>> and tx buf size is aligned with `rpmsg_buf_align` as firmware has
>>>> claimed.
>>>>
>>>> For reference the openamp-system-reference PR:
>>>> https://github.com/OpenAMP/openamp-system-reference/pull/106/changes
>>>>
>>>>      .vdev_config = {
>>>>          .version = 1,
>>>>          .reserved = 0,
>>>>          .size = (uint16_t)(sizeof(struct rpmsg_virtio_config) -
>>>> sizeof(bool)),
>>>>          .alignment = RPMSG_BUF_ALIGN,
>>>>          .reserved1 = 0,
>>>>          /* Tx for host */
>>>>          .h2r_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
>>>>          /* Rx for host */
>>>>          .r2h_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
>>>>      },
>>>>
>>>> IIUC, The linux kernel is not really supposed to modify
>>>> `rpmsg_buf_align`. It only uses it to check that firmware has assigned
>>>> correct size of single rx and tx buffer.
>>>>
>>>>
>>>> When the linux kernel uses dma_alloc_coherent() API it aligns total
>>>> buffer size with page size. That is different than single tx buf size
>>>> and single rx buf size. The total buf size alignment to page size is
>>>> irrelevant to `rpmsg_buf_align` field.
>>>>
>>>> Please let me know if I am missing something or didn't understand your
>>>> comment. I prefer that `rpmsg_buf_align` should be only modified by the
>>>> firmware and not the linux kernel.
>>>
>>>
>>> Sorry it was unclear, let try to reexplain my suggestion:
>>>
>>> Two alignment constraints can apply:
>>> - The remote processor can require an alignment through
>>>    vdev_config::alignment.
>>> - The main processor, which runs Linux or another operating system (OS),
>>>    can require a different alignment, for example, for cache alignment.
>>> In current Linux implementation no constraint in Linux.
>>> nevertheless  I would be in favor of taking into account such future
>>> constraint without imposing constraint on the buffer sizes.
>>
>> Is this ever going to be ture? Is it ever possible that Linux and remote
>> has different cache alignment? IIUC, both will be using same cache and
>> so same alignment will be applicable. That is why only signle alignment
>> is required.
> 
> Some remote processors, for example, some Arm Cortex-M33, do not
> integrate cache. Even if cache exists, cache can be enabled on one
> processor, but not on the other.
> 

Okay, how about introducing two alignment in that case?
vdev_config::rpmsg_buf_align_remote, and vdev_config::rpmsg_buf_align_host ?

If remote doesn't have cache, then remote alignment will be 0, and the
*_host alignment can be applied. The rsc_table can provide both, and the
*_host will take priority over *_remote.


>>
>>> Based on that in short term the local 'rpmsg_buf_align' would still
>>> computed
>>> only from vdev_config::alignment (not update of vdev_config::alignment).
>>>
>>> virtio_cread(vdev, struct virtio_rpmsg_config,
>>>                   rpmsg_buf_align, &rpmsg_buf_align);
>>>
>>> Then you could use use ALIGN() helper:
>>>
>>> unsigned int rx_buf_align_size = ALIGN(vrp->rx_buf_size,
>>>                         rpmsg_buf_align);
>>> unsigned int tx_buf_align_size = ALIGN(vrp->tx_buf_size,
>>>                         rpmsg_buf_align);
>>>
>>
>> This is where I have different opinion. Instead of Linux using ALIGN()
>> macro, can we expect that firmware must assign the aligned buffer size
>> with vdev_config::rpmsg_buf_align? And so Linux will fail if the buffer
>> size is not aligned already from the firmware side. That is why I had
>> introduced checks instead of doing alignment by linux.
>>
>>> total_buf_space = (vrp->num_rx_buf * rx_buf_align_size) +
>>>            (vrp->num_tx_buf * tx_buf_align_size);
>>>
>>> vrp->tx_bufs = bufs_va + vrp->num_rx_buf * rx_buf_align_size;
>>>
>>> Apply the same rule to cpu_addr in the vring descriptor:
>>>
>>> void *cpu_addr = vrp->rx_bufs + i * rx_buf_align_size;
>>>
>>> rpmsg_sg_init(&sg, cpu_addr, vrp->rx_buf_size);
>>>
>>> With this approach, the buffer addresses remain aligned
>>> independently of vdev_config::Rxbuf_size and vdev_config::txbuf_size.
>>> Don't hesitate if it is still not clear!
>>
>> How they remain aligned independent of tx/rx_buf_size? tx_bufs address
>> is still calculated based on rx_buf_align_size, so its alignment still
>> depends on rx_buf_align_size which is derived using
>> vdev_config::rpmsg_buf_align.>
>> I think we are trying to achive the same thing, but implementation is
>> differnt. We just need to decide where the alignment should be done?
>>
>> Either on the linux side? Or in the firmware resource table?
>>
>> I prefer that the firmware should already provide aligned buffer size,
>> and Linux should only check it. If alignment is not done, then simply
>> fail with error. That way, firmware also knows the correct size of the
>> buffer. If Linux does the alignment, then the firmware is not aware of
>> the correct size that is used by the linux.
>>
>> I am open to move the alignment operation to the linux side with the
>> reasonable justification.
> 
> That remains a suggestion. My main concern with the implementation is
> that RPMsg size should depend only on the max playlod size needed, not
> also on the memory alignment.

Okay, I think this is a good reason to apply alignment on the linux
side. If I understand correctly, the rpmsg buffer size will be used as
it is from the rsc table, but vdev_config::alignment will be used only
to decide the start address of the next buffer. If that is the
intention, then I agree, and I will refactor the patch accordingly.

> 
> If this constraint is kept, it must be imposed on all other non-Linux
> solutions. Otherwise, the remote implementation depends on the main
> processor implementation.
> 
> From my POV, It would be preferable not to impose such constraint when
> possible.
> 

Okay.

> Thanks,
> Arnaud
> 
>>
>> Thank You,
>> Tanmay
>>
>>>>
>>>>
>>>>>> +        if (rpmsg_buf_align &&
>>>>>> +            (!IS_ALIGNED(vrp->rx_buf_size, rpmsg_buf_align) ||
>>>>>> +             !IS_ALIGNED(vrp->tx_buf_size, rpmsg_buf_align))) {
>>>>>> +            dev_err(&vdev->dev,
>>>>>> +                "bad vdev config: buf sizes (rx %u, tx %u) not
>>>>>> aligned to %u\n",
>>>>>> +                vrp->rx_buf_size, vrp->tx_buf_size,
>>>>>> +                rpmsg_buf_align);
>>>>>> +            err = -EINVAL;
>>>>>> +            goto vqs_del;
>>>>>> +        }
>>>>>> +
>>>>>> +        dev_dbg(&vdev->dev,
>>>>>> +            "vdev config: ver=%u, align=0x%x, rx sz = 0x%x, tx sz =
>>>>>> 0x%x\n",
>>>>>> +            version, rpmsg_buf_align, vrp->rx_buf_size,
>>>>>> +            vrp->tx_buf_size);
>>>>>> +    } else {
>>>>>> +        vrp->rx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>>>>>> +        vrp->tx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>>>>>> +    }
>>>>>> +
>>>>>> +    total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>>>>>> +              (vrp->num_tx_buf * vrp->tx_buf_size);
>>>>>>           /* allocate coherent memory for the buffers */
>>>>>>         bufs_va = dma_alloc_coherent(vdev->dev.parent,
>>>>>> @@ -874,15 +959,20 @@ static int rpmsg_probe(struct virtio_device
>>>>>> *vdev)
>>>>>>         /* first part of the buffers is dedicated for RX */
>>>>>>         vrp->rx_bufs = bufs_va;
>>>>>>     -    /* and second part is dedicated for TX */
>>>>>> -    vrp->tx_bufs = bufs_va + vrp->num_rx_buf * vrp->buf_size;
>>>>>> +    /*
>>>>>> +     * Here buf_va is aligned to a page. Also rx buf size is aligned
>>>>>> with
>>>>>> +     * cache line alignment provided by the firmware, so tx buf's
>>>>>> start
>>>>>> +     * address is guranteed to be aligned with the alignment
>>>>>> provided by
>>>>>> +     * the firmware.
>>>>>> +     */
>>>>>> +    vrp->tx_bufs = bufs_va + (vrp->num_rx_buf * vrp->rx_buf_size);
>>>>>>           /* set up the receive buffers */
>>>>>>         for (i = 0; i < vrp->num_rx_buf; i++) {
>>>>>>             struct scatterlist sg;
>>>>>> -        void *cpu_addr = vrp->rx_bufs + i * vrp->buf_size;
>>>>>> +        void *cpu_addr = vrp->rx_bufs + i * vrp->rx_buf_size;
>>>>>>     -        rpmsg_sg_init(&sg, cpu_addr, vrp->buf_size);
>>>>>> +        rpmsg_sg_init(&sg, cpu_addr, vrp->rx_buf_size);
>>>>>>               err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, cpu_addr,
>>>>>>                           GFP_KERNEL);
>>>>>> @@ -965,8 +1055,8 @@ static int rpmsg_remove_device(struct device
>>>>>> *dev, void *data)
>>>>>>     static void rpmsg_remove(struct virtio_device *vdev)
>>>>>>     {
>>>>>>         struct virtproc_info *vrp = vdev->priv;
>>>>>> -    unsigned int num_bufs = vrp->num_rx_buf + vrp->num_tx_buf;
>>>>>> -    size_t total_buf_space = num_bufs * vrp->buf_size;
>>>>>> +    size_t total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>>>>>> +                 (vrp->num_tx_buf * vrp->tx_buf_size);
>>>>>>         int ret;
>>>>>>           virtio_reset_device(vdev);
>>>>>> @@ -992,6 +1082,7 @@ static struct virtio_device_id id_table[] = {
>>>>>>       static unsigned int features[] = {
>>>>>>         VIRTIO_RPMSG_F_NS,
>>>>>> +    VIRTIO_RPMSG_F_BUFSZ,
>>>>>>     };
>>>>>>       static struct virtio_driver virtio_ipc_driver = {
>>>>>> diff --git a/include/linux/rpmsg/virtio_rpmsg.h b/include/linux/
>>>>>> rpmsg/
>>>>>> virtio_rpmsg.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..7e14da68fd17
>>>>>> --- /dev/null
>>>>>> +++ b/include/linux/rpmsg/virtio_rpmsg.h
>>>>>> @@ -0,0 +1,50 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>>>> +/*
>>>>>> + * Copyright (C) Pinecone Inc. 2019
>>>>>> + * Copyright (C) Xiang Xiao <xiaoxiang@pinecone.net>
>>>>>> + * Copyright (C) Advanced Micro Devices, Inc. 2026
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef _LINUX_VIRTIO_RPMSG_H
>>>>>> +#define _LINUX_VIRTIO_RPMSG_H
>>>>>> +
>>>>>> +#include <linux/types.h>
>>>>>> +#include <linux/virtio_types.h>
>>>>>> +
>>>>>> +/* The feature bitmap for virtio rpmsg */
>>>>>> +#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>>>>>> notifications */
>>>>>> +#define VIRTIO_RPMSG_F_BUFSZ    1 /* RP get buffer size from config
>>>>>> space */
>>>>>> +
>>>>>> +/* Version of struct virtio_rpmsg_config understood by this
>>>>>> driver */
>>>>>> +#define RPMSG_VDEV_CONFIG_V1    1
>>>>>> +
>>>>>> +/**
>>>>>> + * struct virtio_rpmsg_config - config space for rpmsg virtio device
>>>>>> + *
>>>>>> + * @version:    version of this structure, currently
>>>>>> %RPMSG_VDEV_CONFIG_V1.
>>>>>> + * @reserved:    reserved for padding, must be zero.
>>>>>> + * @size:    size of this structure in bytes.
>>>>>> + * @rpmsg_buf_align:    required alignment in bytes for each buffer.
>>>>>> Must be a
>>>>>> + *        power of two so that both the buffer sizes and the TX
>>>>>> buffer
>>>>>> + *        base address can be aligned (e.g. to a cache line).
>>>>>> + * @reserved1:    reserved for padding, must be zero. Keeps the
>>>>>> following 32-bit
>>>>>> + *        fields naturally aligned.
>>>>>> + * @txbuf_size:    Tx buf size from remote's view. For Linux this is
>>>>>> rx buf size.
>>>>>> + * @rxbuf_size:    Rx buf size from remote's view. For Linux this is
>>>>>> tx buf size.
>>>>>> + *
>>>>>> + * This is the configuration structure shared by the device and the
>>>>>> driver,
>>>>>> + * read when %VIRTIO_RPMSG_F_BUFSZ is negotiated. The fields are
>>>>>> laid
>>>>>> out so
>>>>>> + * the structure is naturally 32-bit aligned.
>>>>>> + */
>>>>>> +struct virtio_rpmsg_config {
>>>>>> +    u8 version;
>>>>>> +    u8 reserved;
>>>>>
>>>>> Why about defining the version type to u16 to avoid the reserved
>>>>> field?
>>>>>
>>>>>> +    __virtio16 size;
>>>>>> +    __virtio16 rpmsg_buf_align;
>>>>>> +    __virtio16 reserved1;
>>>>>
>>>>> Seems useless if __packed prevents the compiler from inserting extra
>>>>> padding
>>>>> bytes between fields,
>>>>>
>>>>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>>>>> +    __virtio32 txbuf_size;
>>>>>> +    __virtio32 rxbuf_size;
>>>>>> +} __packed;
>>>>>
>>>>> proposal
>>>>>
>>>>> +struct virtio_rpmsg_config {
>>>>> +    __virtio16 version;
>>>>> +    __virtio16 size;
>>>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>>>> +    __virtio32 txbuf_size;
>>>>> +    __virtio32 rxbuf_size;
>>>>> +    __virtio16 rpmsg_buf_align;
>>>>> +} __packed;
>>>>> +
>>>>>
>>>>
>>>> I am okay with the above proposal with minor difference:
>>>>
>>>> My proposal:
>>>>
>>>> +struct virtio_rpmsg_config {
>>>> +    u8 version;
>>>> +    __virtio16 size;
>>>> +    __virtio16 rpmsg_buf_align;
>>>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>>>> +    __virtio32 txbuf_size;
>>>> +    __virtio32 rxbuf_size;
>>>> +} __packed;
>>>>
>>>> I just want to keep version field 8-bit, as we will probably never use
>>>> upper byte of that field if we use 16-bit. Rest is okay. If the
>>>> strucutre is packed then reserved bytes are not needed.
>>>>
>>>> Please let me know your view.
>>>
>>> No strong opinion on that. In the end, this structure is read only one
>>> time.
>>> If it is acceptable to Mathieu, it is acceptable to me.
>>>
>>> Thanks,
>>> Arnaud
>>>
>>>>
>>>> Thanks,
>>>> Tanmay
>>>>
>>>>
>>>>> Regards,
>>>>> Arnaud
>>>>>
>>>>>> +
>>>>>> +#endif /* _LINUX_VIRTIO_RPMSG_H */
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply

* Re: [PATCH] docs: arm64: Document that text_offset is always 0
From: Mark Rutland @ 2026-06-19 15:21 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-arm-kernel, Ard Biesheuvel, Will Deacon, Jonathan Corbet,
	linux-doc, linux-kernel
In-Reply-To: <20260604140839.1930847-1-linux@rasmusvillemoes.dk>

On Thu, Jun 04, 2026 at 04:08:39PM +0200, Rasmus Villemoes wrote:
> When trying to figure out where to place and call an arm64 Image in
> memory, reading booting.rst should provide the answer. However, it
> requires quite some digging to figure out that text_offset is set via
> ".quad 0" in head.S and is thus actually always 0 since v5.10.

What is the actual problem?

The documentation in booting.rst is accurate; I don't see why it's
necessary to read the source code to look at text_offset. Immediately
above the text in your diff, the documentation has:

| 4. Call the kernel image
| ------------------------
| 
| Requirement: MANDATORY
| 
| The decompressed kernel image contains a 64-byte header as follows::
| 
|   u32 code0;                    /* Executable code */
|   u32 code1;                    /* Executable code */
|   u64 text_offset;              /* Image load offset, little endian */
|   u64 image_size;               /* Effective Image size, little endian */
|   u64 flags;                    /* kernel flags, little endian */
|   u64 res2      = 0;            /* reserved */
|   u64 res3      = 0;            /* reserved */
|   u64 res4      = 0;            /* reserved */
|   u32 magic     = 0x644d5241;   /* Magic number, little endian, "ARM\x64" */
|   u32 res5;                     /* reserved (used for PE COFF offset) */

Can you explain the problem you're facing? e.g.

* Is the documentation unclear, in a way that could be better?

* Is there some aspect of the boot protocol that is hard for a
  bootloader to follow?

* Is there some problem with *testing* that bootloaders respect the
  text_offset requirements?

* Something else?

> Update the documentation and make that explicit. Reword the 2MB
> requirement accordingly, and remove the paragraphs that only apply to
> the ancient versions where text_offset could be non-zero, as they only
> confuse a current reader.
> 
> Fixes: 120dc60d0bdb ("arm64: get rid of TEXT_OFFSET")
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
> I've included a Fixes tag since I spent way too much time tracking
> down where that text_offset might be defined. The mentioned commit did
> get rid of all references to TEXT_OFFSET-the-macro, but not
> text_offset-the-concept.

Keeping text_offset as a concept was deliberate. That allows us to keep
the documentation accruate for older kernel versions, and allows for the
possiblity that a non-zero offset is introduced in future (though I
admit that might be a tough sell).

>  Documentation/arch/arm64/booting.rst | 20 +++++---------------
>  1 file changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
> index 13ef311dace8..f4cc25b1fd56 100644
> --- a/Documentation/arch/arm64/booting.rst
> +++ b/Documentation/arch/arm64/booting.rst
> @@ -55,9 +55,6 @@ not exceed 2 megabytes in size. Since the dtb will be mapped cacheable
>  using blocks of up to 2 megabytes in size, it must not be placed within
>  any 2M region which must be mapped with any specific attributes.
>  
> -NOTE: versions prior to v4.2 also require that the DTB be placed within
> -the 512 MB region starting at text_offset bytes below the kernel Image.
> -
>  3. Decompress the kernel image
>  ------------------------------
>  
> @@ -93,6 +90,8 @@ Header notes:
>  
>  - As of v3.17, all fields are little endian unless stated otherwise.
>  
> +- As of v5.10, text_offset is always 0.
> +
>  - code0/code1 are responsible for branching to stext.
>  
>  - when booting through EFI, code0/code1 are initially skipped.
> @@ -100,12 +99,6 @@ Header notes:
>    entry point (efi_stub_entry).  When the stub has done its work, it
>    jumps to code0 to resume the normal boot process.
>  
> -- Prior to v3.17, the endianness of text_offset was not specified.  In
> -  these cases image_size is zero and text_offset is 0x80000 in the
> -  endianness of the kernel.  Where image_size is non-zero image_size is
> -  little-endian and must be respected.  Where image_size is zero,
> -  text_offset can be assumed to be 0x80000.
> -

So far we've tried to ensure that the documentation covers current *and*
older kernel versions. If we're going to drop text covering older
versions we'd need an explciit statemnt as to which kernel versions the
document is accurate for.

I would prefer that we retained documentation regarding the text_offset
field in the header, even if it happens to be zero today.

Mark.

>  - The flags field (introduced in v3.17) is a little-endian 64-bit field
>    composed as follows:
>  
> @@ -135,12 +128,9 @@ Header notes:
>    end of the kernel image. The amount of space required will vary
>    depending on selected features, and is effectively unbound.
>  
> -The Image must be placed text_offset bytes from a 2MB aligned base
> -address anywhere in usable system RAM and called there. The region
> -between the 2 MB aligned base address and the start of the image has no
> -special significance to the kernel, and may be used for other purposes.
> -At least image_size bytes from the start of the image must be free for
> -use by the kernel.
> +The Image must be placed at a 2MB aligned base address anywhere in
> +usable system RAM and called there.  At least image_size bytes from
> +the start of the image must be free for use by the kernel.
>  NOTE: versions prior to v4.6 cannot make use of memory below the
>  physical offset of the Image so it is recommended that the Image be
>  placed as close as possible to the start of system RAM.
> -- 
> 2.54.0
> 

^ permalink raw reply

* Re: [PATCH v3] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
From: Will Deacon @ 2026-06-19 14:58 UTC (permalink / raw)
  To: Shanker Donthineni
  Cc: Jason Gunthorpe, Catalin Marinas, Vladimir Murzin,
	linux-arm-kernel@lists.infradead.org, Mark Rutland,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	Vikram Sethi, Jason Sequeira
In-Reply-To: <25495831-f32d-4332-a7c2-fb1463b96174@nvidia.com>

On Tue, Jun 16, 2026 at 08:22:39AM -0500, Shanker Donthineni wrote:
> On 6/12/2026 7:48 AM, Jason Gunthorpe wrote:
> > On Thu, Jun 11, 2026 at 08:13:48PM -0500, Shanker Donthineni wrote:
> > 
> > > For the scalar MMIO helpers, the workaround promotes the raw writes to
> > > store-release on affected CPUs as v1/v2 shown below. For the memcpy-toIO
> > > helpers, could you please clarify the specific reason for adding a dmb despite
> > > the documented no-ordering contract? Is the concern that some drivers may
> > > be relying on ordering across memcpy_toio_*() today even though the API
> > > does not guarantee it, and that we should cover those cases defensively?
> > I think given how arm implements them today the iocopy's are actually
> > the _relaxed variations.. I wonder if this matters to any user?
> 
> Following Jason's observation that on arm64 the memcpy_toio()
> /__iowrite{32,64}_copy() helpers are effectively the relaxed
> (write-combining) variants, I'd like to settle one open point before posting
> v4: should the workaround also promote dgh() > dmb on affected CPUs (now
> Olympus core), or leave dgh() as a plain hint?
> 
>        If you'd still prefer the dmb defensively, to cover drivers that may
> rely on ordering across memcpy_toio() today despite the relaxed contract,
> I'm happy to fold it into v4.

The point is, you're going to have different behaviour to every other
arm64 system out there. You may be able to find vague comments in the
code that imply that you don't need to provide ordering, but at the end
of the day it's a pretty cavalier attitude imo and if a driver ever shows
up that relies on it then you're in trouble.

> Please let me know how you'd like me to proceed.

It's up to you. It's your broken CPU, not mine. You also haven't actually
provided any performance data for others to assess the trade-off.

If it was up to _me_, I'd upgrade dgh() on these CPUs to that I don't
need to worry about this again.

Will

^ permalink raw reply

* Re: [PATCH v2 02/11] mm: factor out adjust_range_hwpoison() from hugetlbfs
From: David Hildenbrand (Arm) @ 2026-06-19 14:52 UTC (permalink / raw)
  To: Jane Chu, akpm
  Cc: willy, jack, viro, brauner, muchun.song, osalvador, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-3-jane.chu@oracle.com>

On 6/17/26 19:25, Jane Chu wrote:
> The functionality and implementation of adjust_range_hwpoison() is
> generic, so factor it out and make it ready for generic use.
> 
> [1] https://lore.kernel.org/linux-mm/aeZwAz6PcdlqSnJ2@casper.infradead.org/
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---

[...]

> -/*
> - * Check if a given raw @page is HWPOISON in a folio of any kind
> - */
> -bool is_raw_hwpoison_page_in_folio(struct page *page);
> -
>  static inline unsigned long huge_page_mask_align(struct file *file)
>  {
>  	return PAGE_MASK & ~huge_page_mask(hstate_file(file));
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4e636647100c..a27ce4ad6247 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2753,6 +2753,37 @@ static void filemap_end_dropbehind_read(struct folio *folio)
>  	}
>  }
>  
> +/**
> + * adjust_range_hwpoison - adjust clean readable range to avoid hwpoison.
> + * @folio: folio that contains hwpoison(s).
> + * @offset: bytes into the folio where subsequent read starts.
> + * @bytes: number of bytes wish to read.
> + *
> + * Return: adjusted total number of bytes starting off @offset that can be

s/off/at/ ?


Apart from that, lgtm

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v2 01/11] mm/memory-failure: make is_raw_hwpoison_page_in_hugepage() general purpose
From: David Hildenbrand (Arm) @ 2026-06-19 14:49 UTC (permalink / raw)
  To: Jane Chu, akpm
  Cc: willy, jack, viro, brauner, muchun.song, osalvador, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-2-jane.chu@oracle.com>

On 6/17/26 19:25, Jane Chu wrote:
> Make is_raw_hwpoison_page_in_hugepage() general for checking whether
> a given raw page within any kind of folio is HW poisoned. Thus,
> replace folio_test_hwpoison() with folio_contain_hwpoisoned_page().
> Also rename to is_raw_hwpoison_page_in_folio().
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  fs/hugetlbfs/inode.c    |  4 ++--
>  include/linux/hugetlb.h |  4 ++--
>  mm/memory-failure.c     | 12 ++++++++++--
>  3 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 78d61bf2bd9b..66520f7c53c6 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -198,7 +198,7 @@ static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
>  	struct page *page = folio_page(folio, offset / PAGE_SIZE);
>  	size_t safe_bytes;
>  
> -	if (is_raw_hwpoison_page_in_hugepage(page))
> +	if (is_raw_hwpoison_page_in_folio(page))
>  		return 0;
>  	/* Safe to read the remaining bytes in this page. */
>  	safe_bytes = PAGE_SIZE - (offset % PAGE_SIZE);
> @@ -206,7 +206,7 @@ static size_t adjust_range_hwpoison(struct folio *folio, size_t offset,
>  
>  	/* Check each remaining page as long as we are not done yet. */
>  	for (; safe_bytes < bytes; safe_bytes += PAGE_SIZE, page++)
> -		if (is_raw_hwpoison_page_in_hugepage(page))
> +		if (is_raw_hwpoison_page_in_folio(page))
>  			break;
>  
>  	return min(safe_bytes, bytes);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 5957bc25efa8..a9846f043712 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1079,9 +1079,9 @@ void hugetlb_unregister_node(struct node *node);
>  #endif
>  
>  /*
> - * Check if a given raw @page in a hugepage is HWPOISON.
> + * Check if a given raw @page is HWPOISON in a folio of any kind
>   */
> -bool is_raw_hwpoison_page_in_hugepage(struct page *page);
> +bool is_raw_hwpoison_page_in_folio(struct page *page);
>  
>  static inline unsigned long huge_page_mask_align(struct file *file)
>  {
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ee42d4361309..40129e0b8213 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1834,14 +1834,21 @@ static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
>  	return (struct llist_head *)&folio->_hugetlb_hwpoison;
>  }
>  
> -bool is_raw_hwpoison_page_in_hugepage(struct page *page)
> +/**
> + * is_raw_hwpoison_page_in_folio - answers the question whether a given
> + * page is indeed hwpoisoned.
> + * @page: given page, maybe base page, part of a large folio or hugetlb.
> + *
> + * Return: true if @page is the raw hwpoisoned page; else, false.
> + */

Why do we need the "in_folio" part at all?

> +bool is_raw_hwpoison_page_in_folio(struct page *page)
>  {
>  	struct llist_head *raw_hwp_head;
>  	struct raw_hwp_page *p;
>  	struct folio *folio = page_folio(page);
>  	bool ret = false;
>  
> -	if (!folio_test_hwpoison(folio))
> +	if (!folio_contain_hwpoisoned_page(folio))

I wonder if we should just not call that function for hugetlb, it doesn't make
sense as hugetlb doesn't set _has_hwpoisoned.

But then, I wonder if we really need folio_contain_hwpoisoned_page() at all?

Why not a simple:

if (!folio_test_hugetlb(folio))
	return PageHWPoison(page);

And now I am confused which scenario you are worried about (it's warm here ...)
can you explain which scenario you want to change?

>  		return false;
>  
>  	if (!folio_test_hugetlb(folio))
> @@ -1868,6 +1875,7 @@ bool is_raw_hwpoison_page_in_hugepage(struct page *page)
>  
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(is_raw_hwpoison_page_in_folio);

You should spell out why you export that function in the patch description.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH] docs: arm64: Document that text_offset is always 0
From: Will Deacon @ 2026-06-19 14:33 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-arm-kernel, Ard Biesheuvel, Jonathan Corbet, linux-doc,
	linux-kernel
In-Reply-To: <20260604140839.1930847-1-linux@rasmusvillemoes.dk>

On Thu, Jun 04, 2026 at 04:08:39PM +0200, Rasmus Villemoes wrote:
> When trying to figure out where to place and call an arm64 Image in
> memory, reading booting.rst should provide the answer. However, it
> requires quite some digging to figure out that text_offset is set via
> ".quad 0" in head.S and is thus actually always 0 since v5.10.
> 
> Update the documentation and make that explicit. Reword the 2MB
> requirement accordingly, and remove the paragraphs that only apply to
> the ancient versions where text_offset could be non-zero, as they only
> confuse a current reader.

Doesn't this needlessly prevent us from having a non-zero offset in future,
if we wanted that for some reason?

Will

^ permalink raw reply

* Re: [PATCH 0/3] docs/zh_CN: update translation of doc-guide/sphinx.rst
From: Jonathan Corbet @ 2026-06-19 14:26 UTC (permalink / raw)
  To: Jiandong Qiu, alexs, si.yanteng
  Cc: dzm91, skhan, linux-doc, linux-kernel, Jiandong Qiu
In-Reply-To: <20260619140245.1982921-1-qiujiandong1998@gmail.com>

Jiandong Qiu <qiujiandong1998@gmail.com> writes:

> Hi all,
>
> This is my first time sending patches to the Linux community. I have
> been reading the kernel documentation to learn more about Linux, and in
> the process I found a few places where I could help improve the zh_CN
> translations. Comments and suggestions are welcome.

Thank you for working to improve our documentation!

I've added a couple of comments, though I need to defer to others to
judge the translation work itself.  I do have one question, though: did
you do the translation yourself, or did you use some sort of tool?  In
the latter case, you need to document that usage with Assisted-by tags.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH 2/3] docs/zh_CN: add process/changes.rst translation
From: Jonathan Corbet @ 2026-06-19 14:23 UTC (permalink / raw)
  To: Jiandong Qiu, alexs, si.yanteng
  Cc: dzm91, skhan, linux-doc, linux-kernel, Jiandong Qiu
In-Reply-To: <20260619140245.1982921-3-qiujiandong1998@gmail.com>

Jiandong Qiu <qiujiandong1998@gmail.com> writes:

> Add the zh_CN translation of process/changes.rst.
>
> Update the translation through commit ece7e57afd51
> ("docs: changes.rst and ver_linux: sort the lists")
>
> Signed-off-by: Jiandong Qiu <qiujiandong1998@gmail.com>
> ---
>  .../translations/zh_CN/process/changes.rst    | 530 ++++++++++++++++++
>  1 file changed, 530 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/process/changes.rst
>
> diff --git a/Documentation/translations/zh_CN/process/changes.rst b/Documentation/translations/zh_CN/process/changes.rst
> new file mode 100644
> index 000000000000..cc22f65e4888
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/process/changes.rst
> @@ -0,0 +1,530 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/process/changes.rst
> +
> +:翻译: 裘剑东 Jiandong Qiu <qiujiandong1998@gmail.com>
> +
> +.. _changes_zh:

Here too, we don't need this label.

(Yes, I'm quibbling on details because I am in no position to judge the
translation itself :)

Thanks,

jon

^ permalink raw reply

* Re: [PATCH 1/3] docs/zh_CN: add llvm.rst translation anchor
From: Jonathan Corbet @ 2026-06-19 14:23 UTC (permalink / raw)
  To: Jiandong Qiu, alexs, si.yanteng
  Cc: dzm91, skhan, linux-doc, linux-kernel, Jiandong Qiu
In-Reply-To: <20260619140245.1982921-2-qiujiandong1998@gmail.com>

Jiandong Qiu <qiujiandong1998@gmail.com> writes:

> Add the kbuild_llvm_zh label for local cross-references.
>
> Signed-off-by: Jiandong Qiu <qiujiandong1998@gmail.com>
> ---
> process/changes.rst refers to this anchor.
>
>  Documentation/translations/zh_CN/kbuild/llvm.rst | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/translations/zh_CN/kbuild/llvm.rst b/Documentation/translations/zh_CN/kbuild/llvm.rst
> index f87e0181d8e7..5fdf281a614a 100644
> --- a/Documentation/translations/zh_CN/kbuild/llvm.rst
> +++ b/Documentation/translations/zh_CN/kbuild/llvm.rst
> @@ -5,6 +5,8 @@
>  :Original: Documentation/kbuild/llvm.rst
>  :Translator: 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn>
>  
> +.. _kbuild_llvm_zh:
> +

Please, let's not add more of these top-of-file labels; I've been trying
to stomp those out for years.  If this file needs to be referenced, just
reference it by name and the automarkup code will do the right thing.

Thanks,

jon

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox