All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: devicetree@vger.kernel.org, robh@kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
Date: Mon, 25 May 2026 00:50:55 +0000	[thread overview]
Message-ID: <20260525005056.312281F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260524-ssqosid-cbqri-rqsc-v7-0-v5-11-78d3a7ba9dbe@kernel.org>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The CPU hotplug callbacks do not notify the generic resctrl subsystem when CPUs come online or go offline, leaving the `rdtgroup` cpumasks out of sync.
- [High] The per-CPU default CLOSID and RMID are not reset to the reserved defaults when a CPU comes online.
- [High] CONFIG_RISCV_ISA_SSQOSID unconditionally selects ARCH_HAS_CPU_RESCTRL without requiring the implementation driver, leading to build failures.
- [High] Missing rollback on partial failure in CPU hotplug online callback leaves the system in an inconsistent state.
--

commit 31a7934518aa1237167aa6b7e0cb45ba9107610b
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add cache allocation via capacity block mask

This commit wires CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2
and RDT_RESOURCE_L3 schemata. It treats mismatched controller capacities at
the same cache level as a fatal error and creates domains lazily during CPU
hotplug.

> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 6abbb21f3a0dc..390353a6153a2 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
>  	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
>  	depends on 64BIT
>  	default n
> +	select ARCH_HAS_CPU_RESCTRL

[Severity: High]
Could this lead to build failures if the CBQRI driver is not enabled?

The Kconfig symbol RISCV_ISA_SSQOSID unconditionally selects
ARCH_HAS_CPU_RESCTRL, which signals to the core kernel that the architecture
provides the resctrl_arch_* interface. This permits CONFIG_RESCTRL_FS to be
enabled.

However, since the implementation resides in drivers/resctrl/cbqri_resctrl.c
which is only compiled when CONFIG_RISCV_CBQRI_DRIVER is set, enabling
RISCV_ISA_SSQOSID and RESCTRL_FS without RISCV_CBQRI_DRIVER might cause
linker errors due to undefined references to the resctrl_arch_* functions.
Would it be safer to add a dependency on the driver?

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> new file mode 100644
> index 0000000000000..1e6198e25fd29
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -0,0 +1,736 @@

[ ... ]

> +static int cbqri_resctrl_online_cpu(unsigned int cpu)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err = 0;
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +
> +	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> +		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> +			continue;
> +		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> +			continue;
> +		if (!ctrl->alloc_capable)
> +			continue;
> +
> +		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +		if (err)
> +			break;
> +	}

[Severity: High]
Does this loop leave the system in an inconsistent state if it fails partway
through?

If cbqri_attach_cpu_to_cap_ctrl() succeeds for an earlier controller
but fails for a subsequent one, the loop breaks and returns an error. The CPU
hotplug framework will abort the bring-up without invoking the teardown
callback cbqri_resctrl_offline_cpu().

Will this leak the rdt_ctrl_domain structs for the controllers that
were successfully attached?

> +
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	return err;
> +}

[Severity: High]
Should we reset the per-CPU default closid and rmid when a CPU comes online?

When a CPU is offlined while in a custom resctrl group, its cpu_srmcfg_default
might still hold that custom closid/rmid. When the CPU comes back online, it
conceptually returns to the default rdtgroup, so it might need its state
reset via resctrl_arch_set_cpu_default_closid_rmid() to prevent tasks from
running with stale assignments.

[Severity: High]
Does this missing notification leave the generic resctrl subsystem out of sync?

The generic fs/resctrl layer usually relies on the architecture to notify it
via resctrl_online_cpu() to update resource group masks. Without this, newly
onlined CPUs might not be added to the default resctrl group, potentially
breaking sysfs CPU tracking.

> +
> +static int cbqri_resctrl_offline_cpu(unsigned int cpu)
> +{
> +	int rid;
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
> +		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +
> +		if (!hw_res->ctrl)
> +			continue;
> +		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
> +	}
> +
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	return 0;
> +}

[Severity: High]
Does this also need to call resctrl_offline_cpu() to notify the core layer?

Without notifying the generic layer, offlined CPUs might remain tracked
permanently in the rdtgroup cpumasks.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=11

  reply	other threads:[~2026-05-25  0:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-24 23:55 ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-25  3:55     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-25  4:16     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:29   ` sashiko-bot
2026-05-25  6:58     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-25  7:21     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:36   ` sashiko-bot
2026-05-25  7:25     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:50   ` sashiko-bot [this message]
2026-05-25  7:40     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:46   ` sashiko-bot
2026-05-26 11:49     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:55   ` sashiko-bot
2026-05-31  1:50     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  0:52   ` sashiko-bot
2026-05-31  1:50     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  1:27   ` sashiko-bot
2026-05-31  1:50     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-25  8:23   ` Sunil V L
2026-05-25  8:23     ` Sunil V L
2026-05-31  1:51     ` Drew Fustini
2026-05-31  1:51       ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-24 23:55   ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-24 23:55   ` Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525005056.312281F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.