From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org
Subject: Re: [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
Date: Mon, 25 May 2026 00:55:23 +0000 [thread overview]
Message-ID: <20260525005523.D677A1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260524-ssqosid-cbqri-rqsc-v7-0-v5-13-78d3a7ba9dbe@kernel.org>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Deleted resctrl groups leak MB_MIN bandwidth because rmdir does not reset the hardware limit, preventing other groups from using the freed bandwidth due to the strict sum constraint.
- [High] Using `prox_dom` as `dom_id` for bandwidth controllers aliases multiple memory controllers within the same NUMA node to a single resctrl domain, leaving some unmanaged.
Pre-existing issues:
- [High] If attaching a CPU to a controller fails inside `cbqri_resctrl_online_cpu()`, the loop breaks and returns the error without rolling back previously successful attachments, leading to a permanent domain state leak.
--
commit c8ab4b9082ddd1241fd1636c42cf108a5428c02c
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
This patch adds bandwidth allocation through reserved bandwidth blocks (Rbwb),
exposed as the MB_MIN resource. The CBQRI sum constraint is enforced at
schemata-write time using a software cache. RCID 0 is assigned the remaining
MRBWB budget on reset.
> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
[ ... ]
> @@ -411,6 +411,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> case RDT_RESOURCE_L2:
> case RDT_RESOURCE_L3:
> return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
> + case RDT_RESOURCE_MB_MIN:
> + /* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
> + return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);
[Severity: High]
Does this validation approach leak bandwidth when groups are deleted?
When a resctrl group is deleted via rmdir, the CLOSID is returned to the free
pool, but standard resctrl flows do not trigger an architecture-specific
callback to reset the resource's value.
Because cbqri_apply_rbwb() iterates over all RCIDs (including inactive ones)
to enforce sum(Rbwb) <= MRBWB, the bandwidth assigned to deleted groups remains
permanently accounted against the global MRBWB budget.
If a user deletes a group to free up bandwidth and tries to assign it to an
active group, could it be falsely rejected with -EINVAL, effectively leaking
that capacity until the exact CLOSID is randomly reused?
[ ... ]
> @@ -860,6 +991,37 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
> return err;
> }
>
> +static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
> + enum resctrl_res_level rid,
> + unsigned int cpu)
> +{
> + struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> + struct rdt_resource *res = &hw_res->resctrl_res;
> + struct rdt_ctrl_domain *domain;
> + int dom_id = ctrl->mem.prox_dom;
> +
> + if (!hw_res->ctrl)
> + return 0;
> +
> + domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
> + if (domain) {
> + cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> + return 0;
> + }
[Severity: High]
Can this alias multiple memory controllers to a single domain?
A single NUMA node (proximity domain) can contain multiple memory controllers.
If multiple bandwidth controllers share the same proximity domain, the first one
processed will create a domain and assign its hw_ctrl pointer to it.
When the second controller is processed, cbqri_find_ctrl_domain() will match
the existing dom_id, and simply add its CPUs to the cpu_mask without saving
the second controller's hw_ctrl pointer.
Will subsequent schemata writes to this domain exclusively program the first
bandwidth controller, leaving the second unmanaged by resctrl?
[ ... ]
> @@ -983,14 +1149,24 @@ static int cbqri_resctrl_online_cpu(unsigned int cpu)
> mutex_lock(&cbqri_domain_list_lock);
>
> list_for_each_entry(ctrl, &cbqri_controllers, list) {
> - if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> - continue;
> - if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> - continue;
> - if (!ctrl->alloc_capable)
> + switch (ctrl->type) {
> + case CBQRI_CONTROLLER_TYPE_CAPACITY:
> + if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> + continue;
> + if (!ctrl->alloc_capable)
> + continue;
> + err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> + break;
> + case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
> + if (!cpumask_test_cpu(cpu, &ctrl->mem.cpu_mask))
> + continue;
> + if (!ctrl->alloc_capable)
> + continue;
> + err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
> + break;
> + default:
> continue;
> -
> - err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> + }
> if (err)
> break;
> }
[Severity: High]
This is a pre-existing issue, but does this leak domain state if an error
occurs here?
If an attachment fails, the loop breaks and returns the error immediately. The
CPU hotplug core catches the error and aborts the online state transition.
However, it does not invoke the corresponding offline teardown callback because
this state failed during its startup sequence.
Since cbqri_resctrl_online_cpu() fails to manually roll back the domain state
for the controllers that successfully processed the CPU in earlier iterations,
will the CPU remain permanently set in their cpu_mask?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=13
next prev parent reply other threads:[~2026-05-25 0:55 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-25 0:30 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-25 0:30 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-25 0:29 ` sashiko-bot
2026-05-25 6:58 ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-25 0:30 ` sashiko-bot
2026-05-25 7:21 ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-25 0:36 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-25 0:50 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-25 0:46 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-25 0:55 ` sashiko-bot [this message]
2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-25 0:52 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-25 1:27 ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-25 8:23 ` Sunil V L
2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260525005523.D677A1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=fustini@kernel.org \
--cc=robh@kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox