Re: [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: robh@kernel.org, devicetree@vger.kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
Date: Mon, 01 Jun 2026 20:57:17 +0000	[thread overview]
Message-ID: <20260601205717.7FDA51F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-13-baf00f50028a@kernel.org>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Blind assignment during reset can violate the hardware bandwidth sum invariant.
- [High] Memory bandwidth capacity leak upon resctrl group removal (rmdir) allows Denial of Service.
- [High] Unbounded `max_rmid` causes massive memory allocation failure when a bandwidth controller is monitor-capable but the L3 controller is not.
- [High] Domain initialization fails unconditionally if the bandwidth controller's global capacity (MRBWB) exceeds U16_MAX.

Pre-existing issues:
- [High] Use-after-free in `cbqri_detach_cpu_from_l3_mon` due to unsafe `cancel_delayed_work`.
--

commit 2593653f0061033605146d0d48bff0500eba1f1d
Author: Drew Fustini <fustini@kernel.org>

    riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb

    Add bandwidth allocation through Rbwb (reserved bandwidth blocks)
    exposed as the MB_MIN resource. Rbwb's sum constraint does not fit MBA's
    percentage cap, so MB_MIN lands as a new RDT_RESOURCE_* rather than
    masquerading as MBA.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index f379058b0114b..1d312004b07d9 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -363,6 +363,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  	case RDT_RESOURCE_L2:
>  	case RDT_RESOURCE_L3:
>  		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
> +	case RDT_RESOURCE_MB_MIN:
> +		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
> +		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);

[Severity: High]
Does this code leak memory bandwidth capacity when a resctrl group is removed?

When a resctrl group is removed via rmdir, the CLOSID is freed by the core,
but no architecture hook is invoked to reset the hardware or software cache.
This means ctrl->rbwb_cache[closid] permanently retains the deleted group's
bandwidth allocation.

Later, when cbqri_apply_rbwb() is called to modify bandwidth for any active
group, it sums across all ctrl->rcid_count CLOSIDs, including the unallocated
ones. The stranded bandwidth from deleted groups is included in the sum,
causing new allocations to be rejected with -EINVAL.

Can an attacker with mkdir permissions repeatedly create groups, assign high
bandwidth, and remove them, to permanently starve the default group and
active processes?

[ ... ]

> @@ -422,9 +433,22 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  	return val;
>  }
>  
> +/*
> + * RCID 0 carries the remaining MRBWB after every other RCID is seeded with
> + * the minimum Rbwb of 1. cbqri_probe_bc() rejects a bandwidth controller
> + * with mrbwb < rcid_count, so this subtraction cannot underflow.
> + */
> +static u64 cbqri_rcid0_rbwb(struct cbqri_controller *ctrl)
> +{
> +	if (WARN_ON_ONCE(ctrl->bc.mrbwb < ctrl->rcid_count))
> +		return 1;
> +	return ctrl->bc.mrbwb - (ctrl->rcid_count - 1);
> +}

[Severity: High]
Does this calculation cause unconditional initialization failure on systems
where MRBWB exceeds U16_MAX?

The maximum reserved bandwidth blocks (mrbwb) is a 32-bit field in hardware
capabilities, but cbqri_apply_rbwb() enforces a strict hardware limitation
of 16 bits per RCID (rejecting values > U16_MAX).

If the global mrbwb exceeds 65535, this function returns a value exceeding
U16_MAX, causing cbqri_apply_rbwb() to reject it with -EINVAL. This in turn
causes domain initialization to fail, preventing CPUs from coming online
with resctrl.

[ ... ]

> @@ -439,15 +463,41 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
>  		return;
>  
>  	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
> -		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
> -			for (t = 0; t < CDP_NUM_TYPES; t++) {
> +		dom = container_of(d, struct cbqri_resctrl_dom,
> +				   resctrl_ctrl_dom);
> +
> +		switch (r->rid) {
> +		case RDT_RESOURCE_MB_MIN:
> +			/*
> +			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
> +			 * Walk N-1..1 first so RCID 0 lands last with the
> +			 * remaining budget.
> +			 */
> +			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
> +				u32 rcid = (i + 1) % hw_res->ctrl->rcid_count;
> +				u64 rbwb = (rcid == 0) ?
> +					cbqri_rcid0_rbwb(dom->hw_ctrl) : 1;
>  				int rerr;
>  
> -				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
> +				rerr = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
>  				if (rerr)
> -					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
> -							   r->rid, i, t, rerr);
> +					pr_err_ratelimited("RBWB reset RCID %u failed (%d)\n",
> +							   rcid, rerr);

[Severity: High]
Does this error handling violate the hardware bandwidth sum invariant?

If cbqri_apply_rbwb() fails for any RCID (such as an MMIO error), it logs
the error but continues the loop, leaving that RCID at its previous
(potentially large) bandwidth.

When it later blindly writes MRBWB - (N-1) to RCID 0 with check_sum = false,
the total sum of all RCIDs in the hardware will exceed the hardware's MRBWB
capacity. Could this violate the CBQRI invariant and potentially lock up the
controller?

[ ... ]

> @@ -590,6 +665,31 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
>  			res->mon_capable = true;
>  		}
>  		break;
> +
> +	case RDT_RESOURCE_MB_MIN:
> +		res->name = "MB_MIN";
> +		res->schema_fmt = RESCTRL_SCHEMA_RANGE;
> +		/*
> +		 * resctrl requires a cache scope for MBA-style domains.
> +		 * Use L3 as a proxy until the resctrl supports non-cache
> +		 * scopes for bandwidth resources.
> +		 */
> +		res->ctrl_scope = RESCTRL_L3_CACHE;
> +		/* Rbwb is an integer block count, not a percentage. No MBA delay_linear. */
> +		res->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
> +		res->membw.min_bw = 1;
> +		res->membw.max_bw = ctrl->bc.mrbwb;

[Severity: High]
Does setting max_bw to ctrl->bc.mrbwb expose an unclamped 32-bit maximum
bandwidth to userspace?

If ctrl->bc.mrbwb exceeds U16_MAX, exposing this value to userspace allows
requests that bypass core validation but will ultimately fail at the hardware
layer in cbqri_apply_rbwb().

> +		res->membw.bw_gran = 1;
> +		/*
> +		 * CBQRI section 4.5 caps sum(Rbwb) <= MRBWB. Default new
> +		 * groups to min_bw so mkdir cannot overflow that sum.
> +		 */
> +		res->membw.default_to_min = true;
> +		res->alloc_capable = ctrl->alloc_capable;
> +		INIT_LIST_HEAD(&res->ctrl_domains);
> +		INIT_LIST_HEAD(&res->mon_domains);

[Severity: High]
Does enabling MB_MIN cause massive memory allocation failures due to an
unbounded max_rmid?

In cbqri_resctrl_accumulate_caps(), exposed_mon_capable is set to true if
any hardware controller has mon_capable == true, which now includes the
bandwidth controller backing MB_MIN.

However, max_rmid is conditionally bounded only by the L3 capacity
controller. If the system has a monitor-capable bandwidth controller but no
monitor-capable L3 capacity controller, max_rmid remains U32_MAX.

Later, resctrl_init() invokes rmid_init(), which attempts to allocate
structures for U32_MAX RMIDs. This will fail with -ENOMEM and completely
abort resctrl initialization.

[ ... ]

> @@ -893,14 +1054,24 @@ static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
>  	lockdep_assert_held(&cbqri_domain_list_lock);
>  
>  	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> -		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> -			continue;
> -		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> -			continue;
> -		if (!ctrl->alloc_capable)
> +		switch (ctrl->type) {
> +		case CBQRI_CONTROLLER_TYPE_CAPACITY:
> +			if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> +				continue;
> +			if (!ctrl->alloc_capable)
> +				continue;
> +			err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +			break;
> +		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
> +			if (!cpumask_test_cpu(cpu, &ctrl->mem.cpu_mask))
> +				continue;
> +			if (!ctrl->alloc_capable)
> +				continue;
> +			err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
> +			break;
> +		default:
>  			continue;
> -
> -		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +		}
>  		if (err) {
>  			cbqri_detach_cpu_from_all_ctrls(cpu);

[Severity: High]
This is a pre-existing issue, but does cbqri_detach_cpu_from_l3_mon() have a
use-after-free bug due to an unsafe cancel_delayed_work()?

If cbqri_detach_cpu_from_all_ctrls() eventually calls into
cbqri_detach_cpu_from_l3_mon():

drivers/resctrl/cbqri_resctrl.c:cbqri_detach_cpu_from_l3_mon() {
    ...
            cancel_delayed_work(&mon_dom->cqm_limbo);
            ...
            kfree(mon_dom);
    ...
}

The CPU offline callback runs under cpus_write_lock(). If the worker has
already dequeued from the workqueue but is preempted right before acquiring
cpus_read_lock(), cancel_delayed_work() returns false and does not wait.

The callback then proceeds to kfree(mon_dom). When the hotplug callback
finishes and releases the write lock, the worker acquires the read lock and
dereferences the freed mon_dom pointer, resulting in a use-after-free.

>  			break;
>  		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601-ssqosid-cbqri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org?part=13

next prev parent reply	other threads:[~2026-06-01 20:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 20:35 [PATCH RFC v6 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:35 ` [PATCH RFC v6 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-06-01 20:55   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-06-01 20:48   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-06-01 20:51   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-06-01 20:56   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-06-01 20:58   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-06-01 20:57   ` sashiko-bot [this message]
2026-06-01 20:36 ` [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-06-01 20:57   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-06-01 21:01   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260601205717.7FDA51F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox