From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: conor+dt@kernel.org, robh@kernel.org, krzk+dt@kernel.org,
devicetree@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Date: Sat, 16 May 2026 18:38:59 -0700 [thread overview]
Message-ID: <agkcM6FCKbc9G9Df@thelio> (raw)
In-Reply-To: <20260512212653.B5715C2BCB8@smtp.kernel.org>
On Tue, May 12, 2026 at 09:26:53PM +0000, sashiko-bot@kernel.org wrote:
> > @@ -308,6 +348,76 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> > mutex_unlock(&ctrl->lock);
> > return err;
> >
> > + case QOS_L3_MBM_TOTAL_EVENT_ID: {
> > + struct cbqri_controller *bc;
> > +
> > + /*
> > + * The L3 monitoring domain's id is the L3 cache id. The
> > + * matching ctrl domain's hw_dom->paired_bc was cached at
> > + * add time to avoid walking cbqri_controllers on every read.
> > + */
> > + d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
>
> Is there a list traversal race here that could happen during CPU hotplug?
>
> When offlining a CPU, the domain removal process uses list_del_rcu() and frees
> domains after calling synchronize_rcu(). However, cbqri_find_ctrl_domain()
> traverses the ctrl_domains list without holding an rcu_read_lock().
>
> If synchronize_rcu() does not wait for this reader, could this lead to a
> use-after-free when the freed domain struct is accessed?
I'll change it to take cbqri_domain_list_lock in
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid(), and drop the RCU
on the writer side in favor of plain list_del under the same mutex.
> > @@ -730,6 +840,16 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
> > res->mon.num_rmid = ctrl->mcid_count;
> > resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
> > false, 0, NULL);
> > +
> > + /*
> > + * Expose BC bandwidth monitoring as the L3's
> > + * mbm_total_bytes when a BC shares topology with this
> > + * L3 (MPAM "MB on L3" mapping).
> > + */
> > + if (cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID])
> > + resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID,
> > + false, 0, NULL);
> > +
> > res->mon_capable = true;
> > }
> > break;
>
> Is bandwidth monitoring inadvertently disabled if the L3 capacity controller
> lacks occupancy monitoring?
>
> Because this block is inside an if condition checking ctrl->mon_capable where
> ctrl is the L3 capacity controller, it appears the bandwidth monitoring won't
> be exposed even if a valid and monitoring-capable bandwidth controller is
> present on the system.
The L3 mon_domain is currently anchored on the L3 CC's mon_capable path,
so a system with a mon-capable BC but a mon-incapable L3 CC loses
mbm_total_bytes. That can be addressed in a future series.
> > @@ -949,6 +1072,29 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
> > if (err)
> > goto err_offline;
> >
> > + /*
> > + * Pair this L3 domain with the system's mon-capable BC. The
> > + * cached pointer is consulted by every rmid_read / reset_rmid.
> > + * BC mon init is system-wide. Failure here means mbm_total_bytes
> > + * is unusable for any domain, so fail the attach.
> > + */
> > + {
> > + struct cbqri_resctrl_dom *hw_dom = container_of(ctrl_dom,
> > + struct cbqri_resctrl_dom,
> > + resctrl_ctrl_dom);
> > +
> > + hw_dom->paired_bc = cbqri_find_only_mon_bc();
> > + if (hw_dom->paired_bc) {
> > + err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);
>
> Could this sequence result in a use-after-free or a null pointer dereference?
>
> By the time this block runs, resctrl_online_mon_domain() has already been
> called earlier in the function, exposing the monitoring domain to user space.
> If a concurrent sysfs read triggers resctrl_arch_rmid_read() at this moment,
> it could pass the lockless check for bc->mbm_total_states and block waiting
> for the hardware lock.
>
> If cbqri_init_bc_mon_counters() fails, it drops the lock, frees
> bc->mbm_total_states, and sets it to null.
>
> When the sysfs read unblocks, will it dereference the freed pointer? Might it
> be safer to defer domain onlining until after the hardware state is fully
> initialized?
I will change it so cbqri_init_bc_mon_counters() and the paired_bc
assignment happen before resctrl_online_mon_domain(), so no reader can
observe the half-initialized state.
>
> Furthermore, does this unconditional assignment of cbqri_find_only_mon_bc()
> cause bandwidth overcounting on platforms with multiple L3 domains?
>
> If the single system-wide bandwidth controller is paired with every L3
> domain, standard user-space tools that calculate group bandwidth by summing
> values across all L3 domains might overcount the true system bandwidth by
> a factor equal to the number of active L3 domains.
Only a single bandwidth controller is supported with this implemenation.
cbqri_resctrl_pick_counters() counts distinct L3 cache_ids at setup and
refuses to expose mbm_total_bytes when more than one L3 domain is
present.
-Drew
WARNING: multiple messages have this Message-ID (diff)
From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: conor+dt@kernel.org, robh@kernel.org, krzk+dt@kernel.org,
devicetree@vger.kernel.org, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
Date: Sat, 16 May 2026 18:38:59 -0700 [thread overview]
Message-ID: <agkcM6FCKbc9G9Df@thelio> (raw)
In-Reply-To: <20260512212653.B5715C2BCB8@smtp.kernel.org>
On Tue, May 12, 2026 at 09:26:53PM +0000, sashiko-bot@kernel.org wrote:
> > @@ -308,6 +348,76 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> > mutex_unlock(&ctrl->lock);
> > return err;
> >
> > + case QOS_L3_MBM_TOTAL_EVENT_ID: {
> > + struct cbqri_controller *bc;
> > +
> > + /*
> > + * The L3 monitoring domain's id is the L3 cache id. The
> > + * matching ctrl domain's hw_dom->paired_bc was cached at
> > + * add time to avoid walking cbqri_controllers on every read.
> > + */
> > + d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
>
> Is there a list traversal race here that could happen during CPU hotplug?
>
> When offlining a CPU, the domain removal process uses list_del_rcu() and frees
> domains after calling synchronize_rcu(). However, cbqri_find_ctrl_domain()
> traverses the ctrl_domains list without holding an rcu_read_lock().
>
> If synchronize_rcu() does not wait for this reader, could this lead to a
> use-after-free when the freed domain struct is accessed?
I'll change it to take cbqri_domain_list_lock in
resctrl_arch_rmid_read() and resctrl_arch_reset_rmid(), and drop the RCU
on the writer side in favor of plain list_del under the same mutex.
> > @@ -730,6 +840,16 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
> > res->mon.num_rmid = ctrl->mcid_count;
> > resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
> > false, 0, NULL);
> > +
> > + /*
> > + * Expose BC bandwidth monitoring as the L3's
> > + * mbm_total_bytes when a BC shares topology with this
> > + * L3 (MPAM "MB on L3" mapping).
> > + */
> > + if (cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID])
> > + resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID,
> > + false, 0, NULL);
> > +
> > res->mon_capable = true;
> > }
> > break;
>
> Is bandwidth monitoring inadvertently disabled if the L3 capacity controller
> lacks occupancy monitoring?
>
> Because this block is inside an if condition checking ctrl->mon_capable where
> ctrl is the L3 capacity controller, it appears the bandwidth monitoring won't
> be exposed even if a valid and monitoring-capable bandwidth controller is
> present on the system.
The L3 mon_domain is currently anchored on the L3 CC's mon_capable path,
so a system with a mon-capable BC but a mon-incapable L3 CC loses
mbm_total_bytes. That can be addressed in a future series.
> > @@ -949,6 +1072,29 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
> > if (err)
> > goto err_offline;
> >
> > + /*
> > + * Pair this L3 domain with the system's mon-capable BC. The
> > + * cached pointer is consulted by every rmid_read / reset_rmid.
> > + * BC mon init is system-wide. Failure here means mbm_total_bytes
> > + * is unusable for any domain, so fail the attach.
> > + */
> > + {
> > + struct cbqri_resctrl_dom *hw_dom = container_of(ctrl_dom,
> > + struct cbqri_resctrl_dom,
> > + resctrl_ctrl_dom);
> > +
> > + hw_dom->paired_bc = cbqri_find_only_mon_bc();
> > + if (hw_dom->paired_bc) {
> > + err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);
>
> Could this sequence result in a use-after-free or a null pointer dereference?
>
> By the time this block runs, resctrl_online_mon_domain() has already been
> called earlier in the function, exposing the monitoring domain to user space.
> If a concurrent sysfs read triggers resctrl_arch_rmid_read() at this moment,
> it could pass the lockless check for bc->mbm_total_states and block waiting
> for the hardware lock.
>
> If cbqri_init_bc_mon_counters() fails, it drops the lock, frees
> bc->mbm_total_states, and sets it to null.
>
> When the sysfs read unblocks, will it dereference the freed pointer? Might it
> be safer to defer domain onlining until after the hardware state is fully
> initialized?
I will change it so cbqri_init_bc_mon_counters() and the paired_bc
assignment happen before resctrl_online_mon_domain(), so no reader can
observe the half-initialized state.
>
> Furthermore, does this unconditional assignment of cbqri_find_only_mon_bc()
> cause bandwidth overcounting on platforms with multiple L3 domains?
>
> If the single system-wide bandwidth controller is paired with every L3
> domain, standard user-space tools that calculate group bandwidth by summing
> values across all L3 domains might overcount the true system bandwidth by
> a factor equal to the number of active L3 domains.
Only a single bandwidth controller is supported with this implemenation.
cbqri_resctrl_pick_counters() counts distinct L3 cache_ids at setup and
refuses to expose mbm_total_bytes when more than one L3 domain is
present.
-Drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-05-17 1:39 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 23:52 ` sashiko-bot
2026-05-14 22:24 ` Drew Fustini
2026-05-14 22:24 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 1:26 ` sashiko-bot
2026-05-15 13:46 ` Drew Fustini
2026-05-15 13:46 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 1:58 ` sashiko-bot
2026-05-16 3:05 ` Drew Fustini
2026-05-16 3:05 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 2:29 ` sashiko-bot
2026-05-16 6:15 ` Drew Fustini
2026-05-16 6:15 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 4:01 ` sashiko-bot
2026-05-16 15:45 ` Drew Fustini
2026-05-16 15:45 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 5:00 ` sashiko-bot
2026-05-16 22:39 ` Drew Fustini
2026-05-16 22:39 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 5:42 ` sashiko-bot
2026-05-17 0:33 ` Drew Fustini
2026-05-17 0:33 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 21:26 ` sashiko-bot
2026-05-17 1:38 ` Drew Fustini [this message]
2026-05-17 1:38 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 21:46 ` sashiko-bot
2026-05-17 5:01 ` Drew Fustini
2026-05-17 5:01 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 22:56 ` sashiko-bot
2026-05-17 6:56 ` Drew Fustini
2026-05-17 6:56 ` Drew Fustini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agkcM6FCKbc9G9Df@thelio \
--to=fustini@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=krzk+dt@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=robh@kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.