From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: devicetree@vger.kernel.org, robh@kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Date: Mon, 01 Jun 2026 21:01:14 +0000 [thread overview]
Message-ID: <20260601210114.88F1B1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-15-baf00f50028a@kernel.org>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Critical] Use-After-Free of `mon_dom` due to non-blocking `cancel_delayed_work()` in error and teardown paths.
- [High] Passing `any_cpu = false` when enabling MBM_TOTAL breaks monitor reads and RMID resets on nohz_full systems, as it forces execution into an IRQ-disabled IPI context.
Pre-existing issues:
- [High] AB-BA deadlock between cbqri_domain_list_lock and rdtgroup_mutex due to conflicting lock ordering in CPU hotplug and sysfs read paths.
--
commit 92a8d87e453213af4aa7b562d1f08a7080be5807
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Expose CBQRI bandwidth controller's combined read+write counter as
the L3 mbm_total_bytes event. A software accumulator keeps the
64-bit byte total monotonic across the 62-bit hardware counter wrap.
> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 14b955eb79491..efd75d241122f 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -184,17 +191,67 @@ void resctrl_arch_mon_event_config_write(void *info)
> void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
> u32 unused, u32 rmid, enum resctrl_event_id eventid)
> {
> + struct cbqri_resctrl_dom *hw_dom;
> + struct rdt_ctrl_domain *cd;
> +
> + if (irqs_disabled())
> + return;
> +
> + mutex_lock(&cbqri_domain_list_lock);
[Severity: High]
This is a pre-existing issue, but does expanding cbqri_domain_list_lock
usage here worsen an AB-BA deadlock with rdtgroup_mutex?
A deadlock can occur during CPU hotplug and sysfs reads due to conflicting
lock ordering:
Thread A (CPU Hotplug) takes cbqri_domain_list_lock in
cbqri_resctrl_online_cpu(), then calls resctrl_online_mon_domain() via
cbqri_attach_cpu_to_l3_mon(), which blocks waiting for rdtgroup_mutex.
Thread B (Sysfs read) takes rdtgroup_mutex in rdtgroup_mondata_show(), then
delegates the read via smp_call_on_cpu() to a worker thread.
Thread C (Worker) executes resctrl_arch_rmid_read(), which blocks waiting
for cbqri_domain_list_lock.
[ ... ]
> @@ -773,6 +906,61 @@ static int cbqri_resctrl_pick_bw_alloc(void)
[ ... ]
> +static void cbqri_resctrl_pick_counters(void)
> +{
[ ... ]
> + /*
> + * mbm_total_bytes is surfaced on the L3 monitoring domain, so it
> + * needs a mon-capable L3 cache controller as well as a single
> + * mon-capable bandwidth controller.
> + */
> + if (l3->ctrl && l3->ctrl->mon_capable && cbqri_find_only_mon_bc())
> + resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
[Severity: High]
Does passing any_cpu = false here break monitor reads and RMID resets on
nohz_full systems?
When any_cpu is false, the resctrl core ensures the read executes on a CPU
in the target domain. If all CPUs in that domain are nohz_full,
mon_event_read() falls back from smp_call_on_cpu() (process context) to
smp_call_function_any() (IPI hardirq context with interrupts disabled).
Because drivers/resctrl/cbqri_resctrl.c checks for irqs_disabled() in both
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() and silently aborts,
monitor reads and resets will fail on nohz_full systems.
[ ... ]
> @@ -854,6 +1054,7 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
[ ... ]
> err_offline:
> /*
> * cancel_delayed_work avoids deadlocking against the cqm_limbo
> * worker which takes cpus_read_lock while this hotplug callback
> * already holds cpus_write_lock. mbm_over is only
> * INIT_DELAYED_WORK'd when MBM_TOTAL was enabled, so gate the
> * cancel on the same condition to avoid touching a zeroed work
> * struct.
> */
> cancel_delayed_work(&mon_dom->cqm_limbo);
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> + cancel_delayed_work(&mon_dom->mbm_over);
> resctrl_offline_mon_domain(res, &mon_dom->hdr);
> err_listdel:
> list_del(&mon_dom->hdr.list);
> err_free:
> kfree(mon_dom);
> return err;
> }
[Severity: Critical]
Does this non-blocking cancel_delayed_work() lead to a use-after-free of
mon_dom?
In drivers/resctrl/cbqri_resctrl.c:cbqri_attach_cpu_to_l3_mon(),
resctrl_online_mon_domain() schedules the mbm_over delayed work. If hardware
initialization fails later in the function, the error path executes
cancel_delayed_work(), which is non-blocking.
If the worker thread has already woken up and is actively blocked on
cpus_read_lock() waiting for the hotplug thread to drop cpus_write_lock(),
cancel_delayed_work() will return false.
The hotplug thread will then proceed to free mon_dom and drop the lock.
Once the lock is dropped, the worker thread resumes execution and will
access the freed mon_dom.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260601-ssqosid-cbqri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org?part=15
next prev parent reply other threads:[~2026-06-01 21:01 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 20:35 [PATCH RFC v6 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-06-01 20:49 ` sashiko-bot
2026-06-01 20:35 ` [PATCH RFC v6 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-06-01 20:35 ` [PATCH RFC v6 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-06-01 20:55 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-06-01 20:48 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-06-01 20:51 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-06-01 20:49 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-06-01 20:56 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-06-01 20:58 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-06-01 20:57 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-06-01 20:57 ` sashiko-bot
2026-06-01 20:36 ` [PATCH RFC v6 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-06-01 21:01 ` sashiko-bot [this message]
2026-06-01 20:36 ` [PATCH RFC v6 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-06-01 20:36 ` [PATCH RFC v6 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260601210114.88F1B1F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=fustini@kernel.org \
--cc=robh@kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox