Devicetree
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org
Subject: Re: [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
Date: Mon, 01 Jun 2026 20:57:34 +0000	[thread overview]
Message-ID: <20260601205734.76D021F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-14-baf00f50028a@kernel.org>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] AB-BA deadlock between rdtgroup_mutex and cbqri_domain_list_lock involving CPU hotplug and resctrl filesystem reads.
- [High] Unprotected list_del() and kfree() causes Use-After-Free on res->ctrl_domains.
- [High] Staging registers are read and modified before ensuring the hardware is not busy, leading to data corruption.
--

commit 77c2c065054c1b30b97ae2ddd37b439f110e553b
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight

Add bandwidth allocation through Mweight (shared weight for unreserved
bandwidth) exposed as the MB_WGHT resource. Mweight is an integer in
[0, 255]. A value of 0 disables work-conserving sharing for the group,
capping its bandwidth at the MB_MIN reservation.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 1d312004b07d9..14b955eb79491 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
[ ... ]
> @@ -426,6 +428,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  			val = (u32)rbwb;
>  		break;
>  	}
> +	case RDT_RESOURCE_MB_WGHT: {
> +		u64 mweight;
> +
> +		err = cbqri_read_mweight(ctrl, closid, &mweight);

[Severity: High]
This isn't a bug introduced by this patch, but the newly added
cbqri_read_mweight() called here duplicates a pre-existing flaw.

Can this corrupt data by writing to staging registers before ensuring the
hardware is not busy?

In cbqri_read_mweight(), the driver interacts with the hardware by writing to a
staging register, writing a control register, and waiting for the BUSY flag to
clear.

If a previous operation times out (such as the 1ms limit in
cbqri_wait_busy_flag()), the driver returns -EIO and releases ctrl->lock, but
the hardware might still be processing.

The next caller acquires the lock and immediately calls cbqri_set_bc_bw_alloc()
to read-modify-write the staging register before cbqri_bc_alloc_op() verifies
the BUSY flag is 0.

Would modifying the staging register of an in-flight hardware operation cause
undefined behavior, misconfiguration, or WPRI field corruption?

> +		if (err == 0)
> +			val = (u32)mweight;
> +		break;
> +	}
[ ... ]
> @@ -961,10 +1007,46 @@ static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
>  	return 0;
>  }
>  
> +static void cbqri_detach_cpu_from_one_bw_res(struct cbqri_controller *ctrl,
> +					     enum resctrl_res_level rid,
> +					     unsigned int cpu)
> +{
> +	struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +	struct rdt_resource *res = &hw_res->resctrl_res;
> +	struct rdt_ctrl_domain *domain;
> +	int dom_id = ctrl->mem.prox_dom;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	if (!hw_res->ctrl)
> +		return;
> +
> +	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
> +	if (!domain || !cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
> +		return;
> +
> +	cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
> +	if (cpumask_empty(&domain->hdr.cpu_mask)) {
> +		resctrl_offline_ctrl_domain(res, domain);

[Severity: High]
This isn't a bug introduced by this patch, but the newly added
cbqri_detach_cpu_from_one_bw_res() replicates a broken lock ordering.

Could this lead to an AB-BA deadlock between rdtgroup_mutex and
cbqri_domain_list_lock?

If a user reading a monitor data file triggers rdtgroup_mondata_show(), it
acquires rdtgroup_mutex, then calls mon_event_read() ->
resctrl_arch_rmid_read(), which acquires cbqri_domain_list_lock.

Concurrently, when a CPU is taken offline, cbqri_resctrl_offline_cpu()
acquires cbqri_domain_list_lock, then calls cbqri_detach_cpu_from_all_ctrls()
-> cbqri_detach_cpu_from_one_bw_res(). If the domain becomes empty, it calls
resctrl_offline_ctrl_domain(), which attempts to acquire rdtgroup_mutex.

Would these two paths race and cause a permanent AB-BA deadlock?

> +		list_del(&domain->hdr.list);
> +		kfree(container_of(domain, struct cbqri_resctrl_dom,
> +				   resctrl_ctrl_dom));

[Severity: High]
This is a pre-existing issue, but the new cbqri_detach_cpu_from_one_bw_res()
replicates it here.

Can this result in a Use-After-Free on the res->ctrl_domains list?

VFS read paths in fs/resctrl (like rdtgroup_schemata_show()) traverse the
res->ctrl_domains list using list_for_each_entry while holding only
rdtgroup_mutex.

When the CPU mask becomes empty, resctrl_offline_ctrl_domain() is called above,
which takes and releases rdtgroup_mutex. After it returns, list_del() and
kfree() are executed without holding rdtgroup_mutex or using RCU primitives.

Could a concurrent list traversal under rdtgroup_mutex dereference the freed
domain pointer and crash?

> +	}
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601-ssqosid-cbqri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org?part=14

  reply	other threads:[~2026-06-01 20:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 20:35 [PATCH RFC v6 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:35 ` [PATCH RFC v6 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-06-01 20:55   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-06-01 20:48   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-06-01 20:51   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-06-01 20:56   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-06-01 20:58   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-06-01 20:57   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-06-01 20:57   ` sashiko-bot [this message]
2026-06-01 20:36 ` [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-06-01 21:01   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260601205734.76D021F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox