From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org,
krzk+dt@kernel.org, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org
Subject: Re: [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops
Date: Fri, 15 May 2026 06:46:58 -0700 [thread overview]
Message-ID: <agcj0jBGTAllxjCu@tt-p14-dfustini> (raw)
In-Reply-To: <20260512012635.84FCCC2BCB0@smtp.kernel.org>
On Tue, May 12, 2026 at 01:26:34AM +0000, sashiko-bot@kernel.org wrote:
[..]
> > +int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
> > + const struct cbqri_cc_config *cfg)
> > +{
> > + bool need_at_mirror;
> > + u64 saved_cbm = 0;
> > + int err = 0;
> > + u64 reg;
> > +
> > + mutex_lock(&ctrl->lock);
> > +
> > + need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
> > +
> > + /*
> > + * Capture the cfg->at half CBM before any write so a partial
> > + * AT-mirror failure can revert and keep the two halves consistent.
> > + */
> > + if (need_at_mirror) {
> > + err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
> > + closid, cfg->at);
> > + if (err < 0)
> > + goto out;
> > + saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
> > + }
>
> To mitigate silent firmware no-ops where the hardware returns success without
> updating the staging register, cbqri_read_cache_config() pre-clears
> the register.
>
> If a silent no-op occurs here during the READ_LIMIT operation, could
> saved_cbm capture stale MMIO data? If a revert is later triggered, does
> this risk writing garbage data back and corrupting the controller's capacity
> allocation state?
I will change it to pre-clear cc_block_mask before the READ_LIMIT so a
silent firmware no-op surfaces as zero rather than stale data.
> > + /* Get the operation status */
> > + *status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
> > +
> > + /*
> > + * Check for the AT support if the register is implemented
> > + * (if not, the status value will remain 0)
> > + */
> > + if (*status != 0) {
> > + /*
> > + * Re-issue operation with AT=CODE so the controller
> > + * latches AT=CODE on supported hardware (or resets it to 0
> > + * on hardware that doesn't). OP must be a defined CBQRI op
> > + * here. OP=0 is a no-op and would silently disable CDP.
> > + */
> > + reg = saved_reg;
> > + FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, ®, operation);
> > + FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_AT_MASK, ®,
> > + CBQRI_CONTROL_REGISTERS_AT_CODE);
> > + iowrite64(reg, ctrl->base + reg_offset);
>
> When issuing the second probe to verify AT support, the code restores the raw
> boot-time register state via reg = saved_reg.
>
> Does this execute the command against whatever uninitialized RCID value the
> firmware originally left in the register, since the RCID mask is not cleared
> again? If the RCID is out of bounds, could the hardware reject the operation,
> causing the driver to falsely report that supports_alloc_at_code is false?
>
I will change cbqri_probe_feature() to build each write from saved_reg
masked clean of OP, AT, RCID, and EVT_ID before ORing in the intended
fields.
> > +static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> > +{
>
> [ ... ]
>
> > + /*
> > + * Resolve cache_size via cacheinfo. cpus_read_lock satisfies
> > + * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If
> > + * every cpu_mask member is offline, cache_size stays 0 and the
> > + * controller cannot back occupancy monitoring.
> > + */
> > + cpus_read_lock();
> > + if (!ctrl->cache.cache_size) {
> > + int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask);
> > +
> > + if (cpu < nr_cpu_ids) {
> > + struct cacheinfo *ci;
> > +
> > + ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level);
> > + if (ci)
> > + ctrl->cache.cache_size = ci->size;
> > + }
> > + }
> > + cpus_read_unlock();
>
> A comment in riscv_cbqri_register_controller() states that cache_size is
> filled lazily during probe.
>
> Since cbqri_probe_cc() is invoked synchronously during early boot
> registration, wouldn't CPUs in other clusters still be offline? Because there
> is no lazy re-evaluation mechanism or CPU hotplug hook later on, does this
> permanently leave ctrl->cache.cache_size at 0 and break occupancy monitoring
> calculations for those controllers?
ACPI parsing runs at subsys_initcall which is after smp_init() brings up
the secondary CPUs, so every cpu in cpu_mask is online by the time
cbqri_probe_cc() runs.
>
> > +void cbqri_controller_destroy(struct cbqri_controller *ctrl)
> > +{
> > + kfree(ctrl);
> > +}
>
> When riscv_cbqri_unregister_last() is used to roll back a successful
> registration, it calls this function to free the controller.
>
> Does this leak the I/O mappings and memory regions, since it fails to call
> iounmap(ctrl->base) and release_mem_region() for the resources successfully
> allocated during cbqri_probe_controller()?
I will fix this to mirror the probe error paths. They already clean up
and clear ctrl->base before returning, so destroying a probe-failed
controller does not double free or unmap an already-released region.
> > +int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
> > +{
> > + struct cbqri_controller *ctrl;
> > + int err;
> > +
> > + if (!info->addr) {
> > + pr_warn("skipping controller with invalid addr=0x0\n");
> > + return -EINVAL;
> > + }
> > +
> > + ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
> > + if (!ctrl)
> > + return -ENOMEM;
> > +
> > + mutex_init(&ctrl->lock);
> > +
> > + ctrl->addr = info->addr;
> > + ctrl->size = info->size;
> > + ctrl->type = info->type;
> > + ctrl->rcid_count = info->rcid_count;
> > + ctrl->mcid_count = info->mcid_count;
>
> The cbqri header explicitly defines CBQRI_MAX_RCID and CBQRI_MAX_MCID as 1024,
> alongside documentation warning that a malformed firmware table claiming a
> large number of IDs could trip the soft-lockup watchdog during per-id MMIO
> init loops.
>
> Is it safe to blindly assign these counts without validating them against the
> safety caps? Could a malformed ACPI or DT table bypass this protection and
> trigger a soft lockup during boot?
CBQRI_MAX_RCID and CBQRI_MAX_MCID are enforced in acpi_parse_rqsc()
which is the only firmware discovery path that exists today. Any value
that reaches riscv_cbqri_register_controller() has already passed that.
-Drew
next prev parent reply other threads:[~2026-05-15 13:47 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52 ` sashiko-bot
2026-05-14 22:24 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12 1:26 ` sashiko-bot
2026-05-15 13:46 ` Drew Fustini [this message]
2026-05-11 5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12 1:58 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12 2:29 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12 4:01 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12 5:00 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12 5:42 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-12 21:26 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-12 21:46 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-12 22:56 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agcj0jBGTAllxjCu@tt-p14-dfustini \
--to=fustini@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=krzk+dt@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=robh@kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox