Re: [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: devicetree@vger.kernel.org, robh@kernel.org, conor+dt@kernel.org
Subject: Re: [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
Date: Mon, 25 May 2026 00:46:06 +0000	[thread overview]
Message-ID: <20260525004606.D82941F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260524-ssqosid-cbqri-rqsc-v7-0-v5-12-78d3a7ba9dbe@kernel.org>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Freeing the monitor domain after a non-sync cancellation of delayed work causes a Use-After-Free vulnerability.
- [High] The `resctrl_arch_reset_rmid` re-arming logic is ineffective because the resctrl core never invokes it for Occupancy events.
- [Medium] The reported `num_rmid` capability is hardcoded to the first controller's `mcid_count`, leading to incorrect capabilities reporting.
- [Low] The `cbqri_resctrl_counters` array is defined and populated but never used.
--

commit e0ea68d0dff2235cb1ae515b494803e2b894c85f
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add L3 cache occupancy monitoring

This commit exposes QOS_L3_OCCUP_EVENT_ID to allow userspace to read
per-MCID llc_occupancy. It creates L3 mon_domains lazily on the first CPU of
a cache_id, linked to the paired ctrl_domain. It also implements architecture
specific resctrl hooks to read and reset RMIDs.

> resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
> CONFIG_EVENT both resets the counter to 0 and selects the event, so
> re-arming with the same event keeps the MCID counting after reset rather
> than relying on sticky-last-event semantics that the CBQRI register
> definition does not guarantee.

[Severity: High]
Does the resctrl core actually invoke resctrl_arch_reset_rmid() for Occupancy
events?

The resctrl core only invokes resctrl_arch_reset_rmid() individually for MBM
events during mon_event_read() (when rr->first is true). For
QOS_L3_OCCUP_EVENT_ID, mon_event_read() is never called with first = true,
and the recycle path (__check_limbo()) only reads the counter without
resetting. 

Will this cause the Occupancy counter to stop functioning when an RMID is
recycled since it is never re-armed?

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 1e6198e25fd29..3ba895c3f8111 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> @@ -32,7 +33,18 @@ struct cbqri_resctrl_dom {
>  
>  static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
>  
> +/*
> + * Per-event controller table. Only events CBQRI can back occupy a
> + * slot, so other events do not bloat the array.
> + */
> +#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
> +static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];

[Severity: Low]
Is this array necessary?

The patch introduces this per-event controller array and populates it in
cbqri_resctrl_pick_counters(), but it appears to be unused.
resctrl_arch_rmid_read() dynamically resolves the controller via
cbqri_find_ctrl_domain() instead.

[ ... ]

> @@ -500,6 +630,14 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
>  		res->alloc_capable = ctrl->alloc_capable;
>  		INIT_LIST_HEAD(&res->ctrl_domains);
>  		INIT_LIST_HEAD(&res->mon_domains);
> +
> +		if (ctrl->mon_capable && res->rid == RDT_RESOURCE_L3) {
> +			res->mon_scope = RESCTRL_L3_CACHE;
> +			res->mon.num_rmid = ctrl->mcid_count;

[Severity: Medium]
Will this expose an inflated number of RMIDs to userspace if subsequent
controllers have fewer MCIDs?

cbqri_resctrl_accumulate_caps() calculates max_rmid as the system-wide
minimum mcid_count across all controllers, which bounds internal
resctrl allocations. 

However, res->mon.num_rmid is initialized here using only the first
controller's mcid_count. If a subsequent controller has a smaller mcid_count,
could this cause userspace to receive unexpected "Out of RMIDs" errors when
creating groups?

[ ... ]

> +static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
> +					 unsigned int cpu)
> +{
> +	struct rdt_l3_mon_domain *mon_dom, *tmp;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
> +		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
> +			continue;
> +		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
> +		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
> +			cancel_delayed_work(&mon_dom->cqm_limbo);
> +			if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> +				cancel_delayed_work(&mon_dom->mbm_over);
> +			resctrl_offline_mon_domain(res, &mon_dom->hdr);
> +			list_del(&mon_dom->hdr.list);
> +			kfree(mon_dom);

[Severity: High]
Can this cause a use-after-free if the delayed work is executing?

cancel_delayed_work() is used without synchronization (_sync) to avoid a
deadlock against cpus_write_lock(). However, if the work is currently
executing but has not yet acquired cpus_read_lock() (or is preempted),
cancel_delayed_work() returns immediately.

The hotplug thread then frees the domain with kfree(mon_dom). When the
delayed work thread resumes and acquires cpus_read_lock(), won't it access
its work_struct and other fields within the freed mon_dom structure?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=12

next prev parent reply	other threads:[~2026-05-25  0:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-25  0:29   ` sashiko-bot
2026-05-25  6:58     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-25  7:21     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-25  0:36   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-25  0:50   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-25  0:46   ` sashiko-bot [this message]
2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-25  0:55   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-25  0:52   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-25  1:27   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-25  8:23   ` Sunil V L
2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525004606.D82941F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox