From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: krzk+dt@kernel.org, robh@kernel.org, conor+dt@kernel.org,
devicetree@vger.kernel.org
Subject: Re: [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
Date: Tue, 12 May 2026 02:29:38 +0000 [thread overview]
Message-ID: <20260512022939.683A1C2BCB0@smtp.kernel.org> (raw)
In-Reply-To: <20260510-ssqosid-cbqri-rqsc-v7-0-v4-9-eb53831ef683@kernel.org>
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] State divergence on hardware verification failure breaks capacity invariants.
- [High] Unprotected initial `READ_LIMIT` is vulnerable to silent hardware failure and causes cross-RCID state corruption.
- [High] A single transient hardware timeout permanently disables the controller due to a latched `faulted` flag.
- [Medium] Feature probing fails to clear `EVT_ID`, risking false-negative feature detection.
- [Medium] Missing bounds check on `mweight` causes silent truncation and false I/O errors.
--
commit 40fea10bc5e942138ec126160ec0621edcc9673f
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: Add bandwidth controller probe and allocation device ops
Add support for CBQRI bandwidth controller (BC) discovery and the two BC
allocation control knobs. Both fields share the bc_bw_alloc register, so
each write needs READ_LIMIT-modify-CONFIG_LIMIT-verify with a sentinel.
> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index e46b02d2c50d7..7e5decd7a6b2e 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c
> @@ -145,6 +184,44 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
> return 0;
> }
>
> +/*
> + * Perform bandwidth allocation control operation on bandwidth controller.
> + * Caller must hold ctrl->lock.
> + */
> +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> +{
> + int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> + int status;
> + u64 reg;
> +
> + lockdep_assert_held(&ctrl->lock);
> +
> + if (ctrl->faulted)
> + return -EIO;
> +
> + if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
If a timeout in cbqri_wait_busy_flag() sets ctrl->faulted = true, will
runtime operations like cbqri_bc_alloc_op() ever execute again? Because
it returns early here when ctrl->faulted is true, the polling function
is never called again to clear the flag, which seems to permanently
disable the controller after a single timeout.
[ ... ]
> +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> + void (*set)(struct cbqri_controller *, u64),
> + u64 (*get)(struct cbqri_controller *),
> + u64 val, bool *committed)
> +{
> + int ret;
> + u64 reg;
> +
> + lockdep_assert_held(&ctrl->lock);
> +
> + /* Load current RCID state so the unmodified field is preserved */
> + ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> + if (ret < 0)
> + return ret;
Does this initial read need the same sentinel protection as the
verification read? If this READ_LIMIT silently fails, the staging
register might retain data from a previous operation (like a different
RCID). Since set() only modifies the target field, CONFIG_LIMIT could
commit the stale data in the unmodified field, causing cross-RCID state
corruption.
> +
> + set(ctrl, val);
> +
> + ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
> + if (ret < 0)
> + return ret;
> +
> + /*
> + * CONFIG_LIMIT committed. The per-CLOSID software cache must
> + * track hardware regardless of whether the verify below passes.
> + */
> + if (committed)
> + *committed = true;
[ ... ]
> + reg = get(ctrl);
> + if (reg != val) {
> + pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
> + reg, val);
> + return -EIO;
> + }
> +
> + return 0;
> +}
If the verification fails here and returns -EIO, *committed remains
true. When called from cbqri_apply_rbwb(), this causes the rejected
value to be cached in ctrl->rbwb_cache[closid]. If the requested value
was smaller than the previous one, could this cause the software cache
to underestimate the reserved capacity and allow future allocations to
exceed the MRBWB hardware limit?
[ ... ]
> +int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
> + u64 mweight)
> +{
> + int ret;
> +
> + mutex_lock(&ctrl->lock);
> + ret = cbqri_apply_bc_field(ctrl, closid,
> + cbqri_set_mweight, cbqri_get_mweight,
> + mweight, NULL);
> + mutex_unlock(&ctrl->lock);
> + return ret;
> +}
What happens if an arbitrarily large mweight is passed in? FIELD_MODIFY
silently truncates values larger than the 8-bit MWEIGHT_MASK. The
verification step in cbqri_apply_bc_field() will then compare the
truncated hardware value against the original untruncated val and
always fail, returning -EIO and logging a hardware failure instead of
returning -EINVAL. Should this have bounds checking similar to
cbqri_apply_rbwb()?
[ ... ]
> @@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> return 0;
> }
>
> +static int cbqri_probe_bc(struct cbqri_controller *ctrl)
> +{
> + bool has_mon_at_code = false;
> + int err, status;
> + u64 reg;
[ ... ]
> + /* Probe monitoring features */
> + err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
> + CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
> + &has_mon_at_code);
When probing features, cbqri_probe_feature() modifies the OP mask and
clears RCID_MASK, but it doesn't appear to clear EVT_ID_MASK:
drivers/resctrl/cbqri_devices.c:cbqri_probe_feature() {
...
FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, ®, operation);
reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
iowrite64(reg, ctrl->base + reg_offset);
...
}
If the hardware or firmware left an unsupported Event ID in those bits,
could the hardware reject the READ_COUNTER operation, causing the probe
to see STATUS != SUCCESS and incorrectly assume monitoring is not
supported?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=9
next prev parent reply other threads:[~2026-05-12 2:29 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12 1:26 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12 1:58 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12 2:29 ` sashiko-bot [this message]
2026-05-11 5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12 4:01 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12 5:00 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12 5:42 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-12 21:26 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-12 21:46 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-12 22:56 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512022939.683A1C2BCB0@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=fustini@kernel.org \
--cc=krzk+dt@kernel.org \
--cc=robh@kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox