From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org,
krzk+dt@kernel.org, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
Date: Sat, 16 May 2026 15:39:53 -0700 [thread overview]
Message-ID: <agjyOYWZLcnm4PZO@thelio> (raw)
In-Reply-To: <20260512050001.38B24C2BCB0@smtp.kernel.org>
On Tue, May 12, 2026 at 05:00:00AM +0000, sashiko-bot@kernel.org wrote:
> > resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
> > CONFIG_EVENT both resets the counter to 0 and selects the event, so
> > re-arming with the same event keeps the MCID counting after reset rather
> > than relying on sticky-last-event semantics that the CBQRI register
> > definition does not guarantee.
>
> The resctrl core only calls mon_event_read() with first=true (which
> triggers the arch reset hook) for Memory Bandwidth Monitoring events,
> never for L3 Occupancy.
>
> Because resctrl_arch_reset_rmid() is never invoked for L3 Occupancy during
> RMID recycle, will the counter fail to re-arm when a new monitoring group
> is created, causing it to stop counting or inherit dirty data from previous
> users?
cbqri_init_mon_counters() pre-configures the MCID slot at domain init,
and occupancy is a current state read rather than an accumulator, so a
recycled MCID returns the live counter. I'll update the commit message
and drop the occupancy branch in resctrl_arch_reset_rmid() since it is
unreachable.
> > +/*
> > + * Per-event controller table. Only events CBQRI can back occupy a
> > + * slot, so other events do not bloat the array.
> > + */
> > +#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
> > +static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
>
> This array is allocated and initialized in cbqri_resctrl_pick_counters()
> and nulled out during teardown, but its contents appear to never be accessed
> by any logic in the driver.
>
> Is this array just dead code?
No, cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID] is read in
cbqri_resctrl.c to decide whether to enable mbm_total_bytes.
> > @@ -205,10 +225,101 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> > u32 closid, u32 rmid, enum resctrl_event_id eventid,
> > void *arch_priv, u64 *val, void *arch_mon_ctx)
> > {
> > - return -ENODATA;
> > + struct cbqri_resctrl_dom *hw_dom;
> > + struct cbqri_controller *ctrl;
> > + struct rdt_ctrl_domain *d;
> > + u64 ctr_val;
> > + int err;
> > +
> > + resctrl_arch_rmid_read_context_check();
> > +
> > + /*
> > + * Each branch takes a sleeping mutex. Bail if called with IRQs
> > + * disabled (e.g. smp_call_function_any() from nohz_full CPUs).
> > + */
> > + if (irqs_disabled())
> > + return -EIO;
>
> The resctrl core intentionally reads remote domains via
> smp_call_function_any() when the target CPU is nohz_full, running the
> callback in hardirq context.
>
> Will rejecting calls in atomic context completely break cache occupancy
> monitoring for nohz_full CPUs, as any attempt to read their statistics
> will always fail with -EIO?
This matches MPAM's __read_mon(). fs/resctrl explicitly handles -EIO
from this class of drivers when dispatching via smp_call_function_any()
for nohz_full CPUs. I'll update the comment to clarify.
> > + err = resctrl_online_mon_domain(res, &mon_dom->hdr);
> > + if (err)
> > + goto err_listdel;
> > +
> > + err = cbqri_init_mon_counters(ctrl);
>
> During domain initialization, resctrl_online_mon_domain() internally
> calls resctrl_arch_reset_rmid_all(), which loops over all max_rmid to
> issue CONFIG_EVENT MMIO operations.
>
> Immediately after this returns, cbqri_init_mon_counters() is called,
> which loops over ctrl->mcid_count to issue the exact same CONFIG_EVENT
> MMIO operations again.
>
> Since each CBQRI MMIO operation can sleep and take up to 1ms, does
> doing this twice per domain initialization introduce strictly redundant
> and significant latency during CPU hotplug?
Yes. I'll drop cbqri_init_mon_counters() from the L3 mon attach path,
since resctrl_arch_reset_rmid_all() called from
resctrl_online_mon_domain() already covers the same occupancy
configuration.
> > + if (err)
> > + goto err_offline;
> > +
> > + return 0;
> > +
> > +err_offline:
> > + cancel_delayed_work_sync(&mon_dom->cqm_limbo);
> > + cancel_delayed_work_sync(&mon_dom->mbm_over);
>
> The resctrl core only initializes mbm_over if resctrl_is_mbm_enabled()
> is true. Since the CBQRI driver only enables the L3 Occupancy event and
> not MBM, mbm_over remains completely zero-initialized from allocation.
>
> Can calling cancel_delayed_work_sync() on a zeroed structure pass an
> uninitialized timer to del_timer_sync() and trigger kernel panics?
Yes. The fix is to guard the mbm_over cancel on
resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID). resctrl only
INIT_DELAYED_WORK's mbm_over when MBM is enabled, so a system without a
paired BC will no longer cancel a zeroed work struct.
> > +static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
> > + unsigned int cpu)
> > +{
>
> The hotplug callback cbqri_resctrl_online_cpu() unconditionally skips
> controllers if !ctrl->alloc_capable.
>
> Will this cause controllers that are only capable of monitoring to be
> ignored entirely, meaning no CPUs will ever be attached and no monitoring
> domains will be created on monitoring-only hardware?
This implementation does not support a capacity controller that supports
mon_capable but not alloc_capable. I'll add pr_warn_once() in
cbqri_resctrl_pick_caches() to make the limitation visible.
> > - domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
> > - if (IS_ERR(domain))
> > - return PTR_ERR(domain);
> > + if (ctrl->mon_capable && ctrl->cache.cache_level == 3) {
> > + err = cbqri_attach_cpu_to_l3_mon(ctrl, res, cpu);
> > + if (err)
> > + return err;
>
> If cbqri_attach_cpu_to_l3_mon() fails here, the error is returned, but the
> CPU is not removed from the ctrl_domain mask and the domain structure is
> not freed.
>
> Because the CPU hotplug core handles a startup failure by rolling back
> from the previous state, the offline callback for the current state
> (cbqri_resctrl_offline_cpu) is not invoked.
>
> Does this leave the CPU permanently stranded in the ctrl_domain mask and
> leak the newly allocated domain structure?
Yes. I'll make cbqri_attach_cpu_to_cap_ctrl() roll back the
cpumask_set_cpu and tear down the new ctrl_domain when
cbqri_attach_cpu_to_l3_mon() fails.
> > +static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
> > + unsigned int cpu)
> > +{
> > + struct rdt_l3_mon_domain *mon_dom, *tmp;
> > +
> > + lockdep_assert_held(&cbqri_domain_list_lock);
> > +
> > + list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
> > + if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
> > + continue;
> > + cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
> > + if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
> > + cancel_delayed_work_sync(&mon_dom->cqm_limbo);
>
> This function is executed as a CPU hotplug callback, meaning the thread
> holds cpus_write_lock(). Concurrently, the cqm_handle_limbo() worker
> function in the resctrl core begins by acquiring cpus_read_lock().
>
> If the worker starts executing while the hotplug thread is active, the
> worker will block on cpus_read_lock(), while the hotplug thread blocks in
> cancel_delayed_work_sync() waiting for the worker to finish.
>
> Could this create an unresolvable AB-BA deadlock that hangs the system
> during CPU offline operations?
Yes, I will move the L3 mon_domain teardown path from
cancel_delayed_work_sync to cancel_delayed_work.
> > + cancel_delayed_work_sync(&mon_dom->mbm_over);
>
> (As mentioned above, mbm_over appears to be uninitialized here as well.)
I'll do the same fix here too.
-Drew
next prev parent reply other threads:[~2026-05-16 22:39 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52 ` sashiko-bot
2026-05-14 22:24 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12 1:26 ` sashiko-bot
2026-05-15 13:46 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12 1:58 ` sashiko-bot
2026-05-16 3:05 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12 2:29 ` sashiko-bot
2026-05-16 6:15 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12 4:01 ` sashiko-bot
2026-05-16 15:45 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12 5:00 ` sashiko-bot
2026-05-16 22:39 ` Drew Fustini [this message]
2026-05-11 5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12 5:42 ` sashiko-bot
2026-05-17 0:33 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-12 21:26 ` sashiko-bot
2026-05-17 1:38 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-12 21:46 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-12 22:56 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agjyOYWZLcnm4PZO@thelio \
--to=fustini@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=krzk+dt@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=robh@kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox