Devicetree
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: devicetree@vger.kernel.org, robh@kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Date: Mon, 01 Jun 2026 21:01:14 +0000	[thread overview]
Message-ID: <20260601210114.88F1B1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-15-baf00f50028a@kernel.org>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [Critical] Use-After-Free of `mon_dom` due to non-blocking `cancel_delayed_work()` in error and teardown paths.
- [High] Passing `any_cpu = false` when enabling MBM_TOTAL breaks monitor reads and RMID resets on nohz_full systems, as it forces execution into an IRQ-disabled IPI context.

Pre-existing issues:
- [High] AB-BA deadlock between cbqri_domain_list_lock and rdtgroup_mutex due to conflicting lock ordering in CPU hotplug and sysfs read paths.
--

commit 92a8d87e453213af4aa7b562d1f08a7080be5807
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring

Expose CBQRI bandwidth controller's combined read+write counter as
the L3 mbm_total_bytes event. A software accumulator keeps the
64-bit byte total monotonic across the 62-bit hardware counter wrap.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 14b955eb79491..efd75d241122f 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -184,17 +191,67 @@ void resctrl_arch_mon_event_config_write(void *info)
>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
>  			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
>  {
> +	struct cbqri_resctrl_dom *hw_dom;
> +	struct rdt_ctrl_domain *cd;
> +
> +	if (irqs_disabled())
> +		return;
> +
> +	mutex_lock(&cbqri_domain_list_lock);

[Severity: High]
This is a pre-existing issue, but does expanding cbqri_domain_list_lock
usage here worsen an AB-BA deadlock with rdtgroup_mutex?

A deadlock can occur during CPU hotplug and sysfs reads due to conflicting
lock ordering:

Thread A (CPU Hotplug) takes cbqri_domain_list_lock in
cbqri_resctrl_online_cpu(), then calls resctrl_online_mon_domain() via
cbqri_attach_cpu_to_l3_mon(), which blocks waiting for rdtgroup_mutex.

Thread B (Sysfs read) takes rdtgroup_mutex in rdtgroup_mondata_show(), then
delegates the read via smp_call_on_cpu() to a worker thread.

Thread C (Worker) executes resctrl_arch_rmid_read(), which blocks waiting
for cbqri_domain_list_lock.

[ ... ]

> @@ -773,6 +906,61 @@ static int cbqri_resctrl_pick_bw_alloc(void)
[ ... ]
> +static void cbqri_resctrl_pick_counters(void)
> +{
[ ... ]
> +	/*
> +	 * mbm_total_bytes is surfaced on the L3 monitoring domain, so it
> +	 * needs a mon-capable L3 cache controller as well as a single
> +	 * mon-capable bandwidth controller.
> +	 */
> +	if (l3->ctrl && l3->ctrl->mon_capable && cbqri_find_only_mon_bc())
> +		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);

[Severity: High]
Does passing any_cpu = false here break monitor reads and RMID resets on
nohz_full systems?

When any_cpu is false, the resctrl core ensures the read executes on a CPU
in the target domain. If all CPUs in that domain are nohz_full,
mon_event_read() falls back from smp_call_on_cpu() (process context) to
smp_call_function_any() (IPI hardirq context with interrupts disabled).

Because drivers/resctrl/cbqri_resctrl.c checks for irqs_disabled() in both
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() and silently aborts,
monitor reads and resets will fail on nohz_full systems.

[ ... ]

> @@ -854,6 +1054,7 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
[ ... ]
>  err_offline:
>  	/*
>  	 * cancel_delayed_work avoids deadlocking against the cqm_limbo
>  	 * worker which takes cpus_read_lock while this hotplug callback
>  	 * already holds cpus_write_lock. mbm_over is only
>  	 * INIT_DELAYED_WORK'd when MBM_TOTAL was enabled, so gate the
>  	 * cancel on the same condition to avoid touching a zeroed work
>  	 * struct.
>  	 */
>  	cancel_delayed_work(&mon_dom->cqm_limbo);
> +	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> +		cancel_delayed_work(&mon_dom->mbm_over);
>  	resctrl_offline_mon_domain(res, &mon_dom->hdr);
>  err_listdel:
>  	list_del(&mon_dom->hdr.list);
>  err_free:
>  	kfree(mon_dom);
>  	return err;
>  }

[Severity: Critical]
Does this non-blocking cancel_delayed_work() lead to a use-after-free of
mon_dom?

In drivers/resctrl/cbqri_resctrl.c:cbqri_attach_cpu_to_l3_mon(),
resctrl_online_mon_domain() schedules the mbm_over delayed work. If hardware
initialization fails later in the function, the error path executes
cancel_delayed_work(), which is non-blocking.

If the worker thread has already woken up and is actively blocked on
cpus_read_lock() waiting for the hotplug thread to drop cpus_write_lock(),
cancel_delayed_work() will return false.

The hotplug thread will then proceed to free mon_dom and drop the lock.
Once the lock is dropped, the worker thread resumes execution and will
access the freed mon_dom.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260601-ssqosid-cbqri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org?part=15

  reply	other threads:[~2026-06-01 21:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 20:35 [PATCH RFC v6 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:35 ` [PATCH RFC v6 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-06-01 20:55   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-06-01 20:48   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-06-01 20:51   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-06-01 20:49   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-06-01 20:56   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-06-01 20:58   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-06-01 20:57   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-06-01 20:57   ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-06-01 21:01   ` sashiko-bot [this message]
2026-06-01 20:36 ` [PATCH RFC v6 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260601210114.88F1B1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox