Devicetree
 help / color / mirror / Atom feed
From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org,
	krzk+dt@kernel.org, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
Date: Sat, 16 May 2026 15:39:53 -0700	[thread overview]
Message-ID: <agjyOYWZLcnm4PZO@thelio> (raw)
In-Reply-To: <20260512050001.38B24C2BCB0@smtp.kernel.org>

On Tue, May 12, 2026 at 05:00:00AM +0000, sashiko-bot@kernel.org wrote:
> >     resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
> >     CONFIG_EVENT both resets the counter to 0 and selects the event, so
> >     re-arming with the same event keeps the MCID counting after reset rather
> >     than relying on sticky-last-event semantics that the CBQRI register
> >     definition does not guarantee.
> 
> The resctrl core only calls mon_event_read() with first=true (which
> triggers the arch reset hook) for Memory Bandwidth Monitoring events,
> never for L3 Occupancy.
> 
> Because resctrl_arch_reset_rmid() is never invoked for L3 Occupancy during
> RMID recycle, will the counter fail to re-arm when a new monitoring group
> is created, causing it to stop counting or inherit dirty data from previous
> users?

cbqri_init_mon_counters() pre-configures the MCID slot at domain init,
and occupancy is a current state read rather than an accumulator, so a
recycled MCID returns the live counter. I'll update the commit message
and drop the occupancy branch in resctrl_arch_reset_rmid() since it is
unreachable.

> > +/*
> > + * Per-event controller table. Only events CBQRI can back occupy a
> > + * slot, so other events do not bloat the array.
> > + */
> > +#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
> > +static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
> 
> This array is allocated and initialized in cbqri_resctrl_pick_counters()
> and nulled out during teardown, but its contents appear to never be accessed
> by any logic in the driver. 
> 
> Is this array just dead code?

No, cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID] is read in
cbqri_resctrl.c to decide whether to enable mbm_total_bytes. 

> > @@ -205,10 +225,101 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
> >  			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
> >  			   void *arch_priv, u64 *val, void *arch_mon_ctx)
> >  {
> > -	return -ENODATA;
> > +	struct cbqri_resctrl_dom *hw_dom;
> > +	struct cbqri_controller *ctrl;
> > +	struct rdt_ctrl_domain *d;
> > +	u64 ctr_val;
> > +	int err;
> > +
> > +	resctrl_arch_rmid_read_context_check();
> > +
> > +	/*
> > +	 * Each branch takes a sleeping mutex. Bail if called with IRQs
> > +	 * disabled (e.g. smp_call_function_any() from nohz_full CPUs).
> > +	 */
> > +	if (irqs_disabled())
> > +		return -EIO;
> 
> The resctrl core intentionally reads remote domains via
> smp_call_function_any() when the target CPU is nohz_full, running the
> callback in hardirq context.
> 
> Will rejecting calls in atomic context completely break cache occupancy
> monitoring for nohz_full CPUs, as any attempt to read their statistics
> will always fail with -EIO?

This matches MPAM's __read_mon(). fs/resctrl explicitly handles -EIO
from this class of drivers when dispatching via smp_call_function_any()
for nohz_full CPUs. I'll update the comment to clarify.

> > +	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
> > +	if (err)
> > +		goto err_listdel;
> > +
> > +	err = cbqri_init_mon_counters(ctrl);
> 
> During domain initialization, resctrl_online_mon_domain() internally
> calls resctrl_arch_reset_rmid_all(), which loops over all max_rmid to
> issue CONFIG_EVENT MMIO operations.
> 
> Immediately after this returns, cbqri_init_mon_counters() is called,
> which loops over ctrl->mcid_count to issue the exact same CONFIG_EVENT
> MMIO operations again.
> 
> Since each CBQRI MMIO operation can sleep and take up to 1ms, does
> doing this twice per domain initialization introduce strictly redundant
> and significant latency during CPU hotplug?

Yes. I'll drop cbqri_init_mon_counters() from the L3 mon attach path,
since resctrl_arch_reset_rmid_all() called from
resctrl_online_mon_domain() already covers the same occupancy
configuration.

> > +	if (err)
> > +		goto err_offline;
> > +
> > +	return 0;
> > +
> > +err_offline:
> > +	cancel_delayed_work_sync(&mon_dom->cqm_limbo);
> > +	cancel_delayed_work_sync(&mon_dom->mbm_over);
> 
> The resctrl core only initializes mbm_over if resctrl_is_mbm_enabled()
> is true. Since the CBQRI driver only enables the L3 Occupancy event and
> not MBM, mbm_over remains completely zero-initialized from allocation.
> 
> Can calling cancel_delayed_work_sync() on a zeroed structure pass an
> uninitialized timer to del_timer_sync() and trigger kernel panics?

Yes. The fix is to guard the mbm_over cancel on
resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID). resctrl only
INIT_DELAYED_WORK's mbm_over when MBM is enabled, so a system without a
paired BC will no longer cancel a zeroed work struct.

> > +static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
> > +					unsigned int cpu)
> > +{
> 
> The hotplug callback cbqri_resctrl_online_cpu() unconditionally skips
> controllers if !ctrl->alloc_capable.
> 
> Will this cause controllers that are only capable of monitoring to be
> ignored entirely, meaning no CPUs will ever be attached and no monitoring
> domains will be created on monitoring-only hardware?

This implementation does not support a capacity controller that supports
mon_capable but not alloc_capable. I'll add pr_warn_once() in
cbqri_resctrl_pick_caches() to make the limitation visible. 

> > -	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
> > -	if (IS_ERR(domain))
> > -		return PTR_ERR(domain);
> > +	if (ctrl->mon_capable && ctrl->cache.cache_level == 3) {
> > +		err = cbqri_attach_cpu_to_l3_mon(ctrl, res, cpu);
> > +		if (err)
> > +			return err;
> 
> If cbqri_attach_cpu_to_l3_mon() fails here, the error is returned, but the
> CPU is not removed from the ctrl_domain mask and the domain structure is
> not freed.
> 
> Because the CPU hotplug core handles a startup failure by rolling back
> from the previous state, the offline callback for the current state
> (cbqri_resctrl_offline_cpu) is not invoked.
> 
> Does this leave the CPU permanently stranded in the ctrl_domain mask and
> leak the newly allocated domain structure?

Yes. I'll make cbqri_attach_cpu_to_cap_ctrl() roll back the
cpumask_set_cpu and tear down the new ctrl_domain when
cbqri_attach_cpu_to_l3_mon() fails.

> > +static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
> > +					 unsigned int cpu)
> > +{
> > +	struct rdt_l3_mon_domain *mon_dom, *tmp;
> > +
> > +	lockdep_assert_held(&cbqri_domain_list_lock);
> > +
> > +	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
> > +		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
> > +			continue;
> > +		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
> > +		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
> > +			cancel_delayed_work_sync(&mon_dom->cqm_limbo);
> 
> This function is executed as a CPU hotplug callback, meaning the thread
> holds cpus_write_lock(). Concurrently, the cqm_handle_limbo() worker
> function in the resctrl core begins by acquiring cpus_read_lock().
> 
> If the worker starts executing while the hotplug thread is active, the
> worker will block on cpus_read_lock(), while the hotplug thread blocks in
> cancel_delayed_work_sync() waiting for the worker to finish.
> 
> Could this create an unresolvable AB-BA deadlock that hangs the system
> during CPU offline operations?

Yes, I will move the L3 mon_domain teardown path from
cancel_delayed_work_sync to cancel_delayed_work.

> > +			cancel_delayed_work_sync(&mon_dom->mbm_over);
> 
> (As mentioned above, mbm_over appears to be uninitialized here as well.)

I'll do the same fix here too.

-Drew

  reply	other threads:[~2026-05-16 22:39 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52   ` sashiko-bot
2026-05-14 22:24     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12  1:26   ` sashiko-bot
2026-05-15 13:46     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12  1:58   ` sashiko-bot
2026-05-16  3:05     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12  2:29   ` sashiko-bot
2026-05-16  6:15     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12  4:01   ` sashiko-bot
2026-05-16 15:45     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12  5:00   ` sashiko-bot
2026-05-16 22:39     ` Drew Fustini [this message]
2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12  5:42   ` sashiko-bot
2026-05-17  0:33     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-12 21:26   ` sashiko-bot
2026-05-17  1:38     ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-12 21:46   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-12 22:56   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agjyOYWZLcnm4PZO@thelio \
    --to=fustini@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=robh@kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox