Re: [PATCH v3 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask

Devicetree
 help / color / mirror / Atom feed

From: Fenghua Yu <fenghuay@nvidia.com>
To: "Drew Fustini" <fustini@kernel.org>,
	"Adrien Ricciardi" <aricciardi@baylibre.com>,
	"Alexandre Ghiti" <alex@ghiti.fr>,
	"Atish Kumar Patra" <atishp@rivosinc.com>,
	"Atish Patra" <atish.patra@linux.dev>,
	"Babu Moger" <babu.moger@amd.com>,
	"Ben Horgan" <ben.horgan@arm.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Chen Pei" <cp0613@linux.alibaba.com>,
	"Conor Dooley" <conor.dooley@microchip.com>,
	"Conor Dooley" <conor+dt@kernel.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Dave Martin" <Dave.Martin@arm.com>,
	"Gong Shuai" <gong.shuai@sanechips.com.cn>,
	"Gong Shuai" <gsh517@gmail.com>,
	guo.wenjia23@zte.com.cn, "James Morse" <james.morse@arm.com>,
	"Kornel Dulęba" <mindal@semihalf.com>,
	"Krzysztof Kozlowski" <krzk+dt@kernel.org>,
	liu.qingtao2@zte.com.cn,
	"Liu Zhiwei" <zhiwei_liu@linux.alibaba.com>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Paul Walmsley" <pjw@kernel.org>,
	"Peter Newman" <peternewman@google.com>,
	"Radim Krčmář" <rkrcmar@ventanamicro.com>,
	"Reinette Chatre" <reinette.chatre@intel.com>,
	"Rob Herring" <robh@kernel.org>,
	"Samuel Holland" <samuel.holland@sifive.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Tony Luck" <tony.luck@intel.com>,
	"Vasudevan Srinivasan" <vasu@rivosinc.com>,
	"Ved Shanbhogue" <ved@rivosinc.com>,
	"Weiwei Li" <liwei1518@gmail.com>,
	"yunhui cui" <cuiyunhui@bytedance.com>
Cc: linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, devicetree@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, linux-doc@vger.kernel.org
Subject: Re: [PATCH v3 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
Date: Wed, 1 Jul 2026 13:18:41 -0700	[thread overview]
Message-ID: <473da771-b711-457b-b9ad-491fee111b16@nvidia.com> (raw)
In-Reply-To: <20260628-dfustini-atl-sc-cbqri-dt-v3-5-c9c1342fe3cf@kernel.org>

Hi, Drew,

Could you please change my email address to my NVIDIA email 
fenghuay@nvidia.com?

On 6/28/26 14:18, Drew Fustini wrote:
> Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
> RDT_RESOURCE_L3 schemata.
> 
> Mismatched CC caps at the same cache level are treated as a fatal
> configuration error since fs/resctrl exposes a single per-rid cap
> set. Domains are created lazily in the cpuhp online callback so
> cpu_mask reflects only currently online CPUs.
> 
> Assisted-by: Claude:claude-opus-4-7
> Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
> Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
> Signed-off-by: Drew Fustini <fustini@kernel.org>
> ---
>   MAINTAINERS                      |   2 +
>   arch/riscv/include/asm/resctrl.h | 147 ++++++++
>   drivers/resctrl/Kconfig          |   4 +
>   drivers/resctrl/Makefile         |   1 +
>   drivers/resctrl/cbqri_resctrl.c  | 787 +++++++++++++++++++++++++++++++++++++++
>   5 files changed, 941 insertions(+)
> 
[SNIP]

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> new file mode 100644
> index 000000000000..1fb0fbe1b000
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c
[SNIP]

> +/*
> + * Walk cbqri_controllers and pick one capacity controller (CC) per cache
> + * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
> + * one CC sits at the same level (e.g. one per socket), they must agree on
> + * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
> + * exposes a single set of caps per rid. The first matching controller wins.
> + */
> +static int cbqri_resctrl_pick_caches(void)
> +{
> +	struct cbqri_controller *ctrl;
> +	int ret = 0;
> +
> +	mutex_lock(&cbqri_controllers_lock);

Is it better to change mutex_lock()/mutex_unlock() to
  guard(mutex)(&cbqri_controllers_lock)?

1. This code is simpler and can avoid potential missing unlock issue.
2. This matches mpam code.

> +
> +	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> +		struct cbqri_resctrl_res *cbqri_res;
> +		int rid;
> +
> +		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> +			continue;
> +		if (!ctrl->alloc_capable)
> +			continue;
> +
> +		rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
> +		if (rid < 0) {
> +			pr_info("skipping controller at unsupported cache level %u\n",
> +				ctrl->cache.cache_level);
> +			continue;
> +		}
> +
> +		cbqri_res = &cbqri_resctrl_resources[rid];
> +		if (cbqri_res->ctrl) {
> +			/*
> +			 * CCs at the same cache level must agree on every cap
> +			 * resctrl exposes globally. Reject mismatches at pick
> +			 * time so the inconsistency is visible at boot.
> +			 */
> +			if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
> +			    cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
> +			    cbqri_res->ctrl->cc.supports_alloc_at_code !=
> +				    ctrl->cc.supports_alloc_at_code ||
> +			    cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
> +				pr_err("L%d controllers have mismatched capabilities\n",
> +				       ctrl->cache.cache_level);
> +				ret = -EINVAL;
> +				break;

Is it possible to support cbqri on both L2 and L3 on the same machine?
Failure on one controller will stop picking another other controller here.

If both L2 and L3 can be supported on the same machine, does it make 
sense to pr_err() (fatal for this controller) and continue to go to the 
next controller? So failure on L2 won't impact L3?

If that's the case, does it make sense not to return error for 
pick_caches()? So pick_caches() failure is not fatal?

> +			}
> +			continue;
> +		}
> +
> +		cbqri_res->ctrl = ctrl;
> +	}
> +
> +	mutex_unlock(&cbqri_controllers_lock);
> +	return ret;
> +}
> +
> +/*
> + * Fill the rdt_resource fields for one picked rid. An rid with no picked
> + * controller is left untouched so it stays out of resctrl_arch_get_resource().
> + */
> +static void cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
> +{
> +	struct cbqri_controller *ctrl = cbqri_res->ctrl;
> +	struct rdt_resource *res = &cbqri_res->resctrl_res;
> +
> +	if (!ctrl)
> +		return;
> +
> +	switch (res->rid) {
> +	case RDT_RESOURCE_L2:
> +	case RDT_RESOURCE_L3:
> +		res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
> +		res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
> +		res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
> +				    RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
> +		res->cache.cbm_len = ctrl->cc.ncblks;
> +		res->cache.shareable_bits = 0;
> +		res->cache.min_cbm_bits = 1;
> +		res->cache.arch_has_sparse_bitmasks = false;
> +		res->cdp_capable = ctrl->cc.supports_alloc_at_code;
> +		res->alloc_capable = ctrl->alloc_capable;
> +		INIT_LIST_HEAD(&res->ctrl_domains);
> +		INIT_LIST_HEAD(&res->mon_domains);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void cbqri_resctrl_accumulate_caps(void)
> +{
> +	int rid;
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
> +		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +
> +		if (!hw_res->ctrl)
> +			continue;
> +		if (hw_res->ctrl->alloc_capable)
> +			exposed_alloc_capable = true;
> +	}
> +}
> +
> +/*
> + * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
> + * resource res, seeded with cpu and identified by dom_id. Caller must
> + * hold cbqri_domain_list_lock and must have already verified that no
> + * existing ctrl_domain on res carries this id.
> + */
> +static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
> +							struct rdt_resource *res,
> +							unsigned int cpu, int dom_id)
> +{
> +	struct rdt_ctrl_domain *domain;
> +	struct list_head *pos = NULL;
> +	int err;
> +
> +	domain = cbqri_new_domain(ctrl);
> +	if (!domain)
> +		return ERR_PTR(-ENOMEM);
> +
> +	cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> +	domain->hdr.id = dom_id;
> +	domain->hdr.type = RESCTRL_CTRL_DOMAIN;
> +
> +	err = cbqri_init_domain_ctrlval(res, domain);
> +	if (err) {
> +		kfree(container_of(domain, struct cbqri_resctrl_dom,
> +				   resctrl_ctrl_dom));
> +		return ERR_PTR(err);
> +	}
> +
> +	/* Insert sorted by id so user-visible ordering is deterministic. */
> +	resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
> +	list_add_tail(&domain->hdr.list, pos);
> +
> +	resctrl_online_ctrl_domain(res, domain);
> +
> +	return domain;
> +}
> +
> +static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
> +					unsigned int cpu)
> +{
> +	struct cbqri_resctrl_res *hw_res;
> +	struct rdt_ctrl_domain *domain;
> +	struct rdt_resource *res;
> +	int dom_id;
> +	int rid;
> +
> +	rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
> +	if (rid < 0)
> +		return 0;
> +	hw_res = &cbqri_resctrl_resources[rid];
> +
> +	if (!hw_res->ctrl)
> +		return 0;
> +
> +	res = &hw_res->resctrl_res;
> +	dom_id = ctrl->cache.cache_id;
> +
> +	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
> +	if (domain) {
> +		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> +		return 0;
> +	}
> +
> +	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
> +	if (IS_ERR(domain))
> +		return PTR_ERR(domain);
> +
> +	return 0;
> +}
> +
> +static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
> +					       unsigned int cpu)
> +{
> +	struct rdt_ctrl_domain *domain, *tmp;
> +
> +	list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
> +		if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
> +			continue;
> +		cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
> +		if (cpumask_empty(&domain->hdr.cpu_mask)) {
> +			resctrl_offline_ctrl_domain(res, domain);
> +			list_del(&domain->hdr.list);
> +			kfree(container_of(domain, struct cbqri_resctrl_dom,
> +					   resctrl_ctrl_dom));
> +		}
> +	}
> +}
> +
> +/*
> + * Remove a CPU from every domain it was attached to. The per-resource
> + * detach helpers act only when the CPU is set in a domain's mask, so this
> + * is idempotent and undoes a partial online attach as well as a full
> + * offline. Caller holds cbqri_domain_list_lock.
> + */
> +static void cbqri_detach_cpu_from_all_ctrls(unsigned int cpu)
> +{
> +	int rid;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
> +		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +
> +		if (!hw_res->ctrl)
> +			continue;
> +		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
> +	}
> +}
> +
> +/*
> + * Attach a CPU to every controller that claims it. On failure, detach the
> + * CPU from everything attached so far: the cpuhp core does not run this
> + * state's offline teardown when its startup fails, so a partial attach
> + * would otherwise leak into the domain cpu_masks. Caller holds
> + * cbqri_domain_list_lock.
> + */
> +static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err = 0;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	/*
> +	 * Hold cbqri_controllers_lock across the walk so a controller
> +	 * registered after boot cannot corrupt it. The register path takes
> +	 * it as a leaf and never cbqri_domain_list_lock, so this nesting
> +	 * cannot invert.
> +	 */
> +	mutex_lock(&cbqri_controllers_lock);

guard(mutex)(&cbqri_controllers_lock)?

> +	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> +		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> +			continue;
> +		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> +			continue;
> +		if (!ctrl->alloc_capable)
> +			continue;
> +
> +		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +		if (err) {
> +			cbqri_detach_cpu_from_all_ctrls(cpu);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&cbqri_controllers_lock);
> +
> +	return err;
> +}
> +
> +static bool cbqri_resctrl_inited;
> +
> +static void cbqri_resctrl_teardown(void)
> +{
> +	int rid;
> +
> +	if (!cbqri_resctrl_inited)
> +		return;
> +
> +	resctrl_exit();
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
> +		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +
> +		hw_res->ctrl = NULL;
> +		hw_res->cdp_enabled = false;
> +	}
> +	exposed_alloc_capable = false;
> +	cbqri_resctrl_inited = false;
> +}
> +
> +static int cbqri_resctrl_setup(void)
> +{
> +	int rid;
> +	int err;
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
> +		cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
> +
> +	err = cbqri_resctrl_pick_caches();
> +	if (err)
> +		return err;

Failure in pick_caches() will abort any future cbqri features e.g. 
memory bw allocation/monitoring. Is it possible to ignore the 
pick_caches() failure and continue to setup other cbqri features? Failed 
caches won't impact other QoS features, right?

> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
> +		cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
> +
> +	cbqri_resctrl_accumulate_caps();
> +
> +	if (!exposed_alloc_capable) {
> +		pr_debug("no resctrl-capable CBQRI controllers found\n");
> +		return -ENODEV;
> +	}
> +
> +	err = resctrl_init();
> +	if (err)
> +		return err;
> +
> +	cbqri_resctrl_inited = true;
> +	return 0;
> +}
> +
> +static int cbqri_resctrl_online_cpu(unsigned int cpu)
> +{
> +	int err;
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +	err = cbqri_attach_cpu_to_all_ctrls(cpu);
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	if (err)
> +		return err;
> +
> +	/*
> +	 * Seed the per-CPU default RCID/MCID to the reserved (0, 0) pair and
> +	 * notify the resctrl core so it tracks this CPU in the default group.
> +	 */
> +	resctrl_arch_set_cpu_default_closid_rmid(cpu, 0, 0);
> +	resctrl_online_cpu(cpu);
> +	return 0;
> +}
> +
> +static int cbqri_resctrl_offline_cpu(unsigned int cpu)
> +{
> +	resctrl_offline_cpu(cpu);
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +	cbqri_detach_cpu_from_all_ctrls(cpu);
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	return 0;
> +}
> +
> +static int __init cbqri_arch_late_init(void)
> +{
> +	int err;
> +
> +	if (!riscv_isa_extension_available(NULL, SSQOSID))
> +		return -ENODEV;
> +
> +	err = cbqri_resctrl_setup();
> +	if (err)
> +		return err;
> +
> +	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
> +				cbqri_resctrl_online_cpu,
> +				cbqri_resctrl_offline_cpu);
> +	if (err < 0) {
> +		cbqri_resctrl_teardown();
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +late_initcall(cbqri_arch_late_init);
> 

Thanks.

-Fenghua

next prev parent reply	other threads:[~2026-07-01 20:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-28 21:18 [PATCH v3 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
2026-06-28 21:18 ` [PATCH v3 1/8] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-28 21:18 ` [PATCH v3 2/8] riscv: Detect the Ssqosid extension Drew Fustini
2026-06-28 21:18 ` [PATCH v3 3/8] riscv: Add support for srmcfg CSR from " Drew Fustini
2026-06-28 21:28   ` sashiko-bot
2026-06-28 21:18 ` [PATCH v3 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-06-28 21:28   ` sashiko-bot
2026-06-28 21:18 ` [PATCH v3 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-06-28 21:34   ` sashiko-bot
2026-07-01 20:18   ` Fenghua Yu [this message]
2026-06-28 21:18 ` [PATCH v3 6/8] riscv: Enable resctrl filesystem for Ssqosid Drew Fustini
2026-06-28 21:30   ` sashiko-bot
2026-06-28 21:18 ` [PATCH v3 7/8] dt-bindings: riscv: Add binding for CBQRI controllers Drew Fustini
2026-06-29 15:32   ` Conor Dooley
2026-06-28 21:18 ` [PATCH v3 8/8] riscv_cbqri: Add CBQRI capacity allocation platform driver Drew Fustini
2026-06-28 21:27   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=473da771-b711-457b-b9ad-491fee111b16@nvidia.com \
    --to=fenghuay@nvidia.com \
    --cc=Dave.Martin@arm.com \
    --cc=alex@ghiti.fr \
    --cc=aricciardi@baylibre.com \
    --cc=atish.patra@linux.dev \
    --cc=atishp@rivosinc.com \
    --cc=babu.moger@amd.com \
    --cc=ben.horgan@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=conor+dt@kernel.org \
    --cc=conor.dooley@microchip.com \
    --cc=cp0613@linux.alibaba.com \
    --cc=cuiyunhui@bytedance.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=gong.shuai@sanechips.com.cn \
    --cc=gsh517@gmail.com \
    --cc=guo.wenjia23@zte.com.cn \
    --cc=james.morse@arm.com \
    --cc=krzk+dt@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=liu.qingtao2@zte.com.cn \
    --cc=liwei1518@gmail.com \
    --cc=mindal@semihalf.com \
    --cc=palmer@dabbelt.com \
    --cc=peternewman@google.com \
    --cc=pjw@kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=rkrcmar@ventanamicro.com \
    --cc=robh@kernel.org \
    --cc=samuel.holland@sifive.com \
    --cc=tony.luck@intel.com \
    --cc=vasu@rivosinc.com \
    --cc=ved@rivosinc.com \
    --cc=x86@kernel.org \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox