Devicetree
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Drew Fustini" <fustini@kernel.org>
Cc: krzk+dt@kernel.org, robh@kernel.org, conor+dt@kernel.org,
	devicetree@vger.kernel.org
Subject: Re: [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
Date: Tue, 12 May 2026 02:29:38 +0000	[thread overview]
Message-ID: <20260512022939.683A1C2BCB0@smtp.kernel.org> (raw)
In-Reply-To: <20260510-ssqosid-cbqri-rqsc-v7-0-v4-9-eb53831ef683@kernel.org>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] State divergence on hardware verification failure breaks capacity invariants.
- [High] Unprotected initial `READ_LIMIT` is vulnerable to silent hardware failure and causes cross-RCID state corruption.
- [High] A single transient hardware timeout permanently disables the controller due to a latched `faulted` flag.
- [Medium] Feature probing fails to clear `EVT_ID`, risking false-negative feature detection.
- [Medium] Missing bounds check on `mweight` causes silent truncation and false I/O errors.
--

commit 40fea10bc5e942138ec126160ec0621edcc9673f
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add bandwidth controller probe and allocation device ops

Add support for CBQRI bandwidth controller (BC) discovery and the two BC
allocation control knobs. Both fields share the bc_bw_alloc register, so
each write needs READ_LIMIT-modify-CONFIG_LIMIT-verify with a sentinel.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index e46b02d2c50d7..7e5decd7a6b2e 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c
> @@ -145,6 +184,44 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
>  	return 0;
>  }
>  
> +/*
> + * Perform bandwidth allocation control operation on bandwidth controller.
> + * Caller must hold ctrl->lock.
> + */
> +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> +{
> +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> +	int status;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	if (ctrl->faulted)
> +		return -EIO;
> +
> +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {

If a timeout in cbqri_wait_busy_flag() sets ctrl->faulted = true, will
runtime operations like cbqri_bc_alloc_op() ever execute again? Because
it returns early here when ctrl->faulted is true, the polling function
is never called again to clear the flag, which seems to permanently
disable the controller after a single timeout.

[ ... ]

> +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> +				void (*set)(struct cbqri_controller *, u64),
> +				u64 (*get)(struct cbqri_controller *),
> +				u64 val, bool *committed)
> +{
> +	int ret;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	/* Load current RCID state so the unmodified field is preserved */
> +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> +	if (ret < 0)
> +		return ret;

Does this initial read need the same sentinel protection as the
verification read? If this READ_LIMIT silently fails, the staging
register might retain data from a previous operation (like a different
RCID). Since set() only modifies the target field, CONFIG_LIMIT could
commit the stale data in the unmodified field, causing cross-RCID state
corruption.

> +
> +	set(ctrl, val);
> +
> +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * CONFIG_LIMIT committed. The per-CLOSID software cache must
> +	 * track hardware regardless of whether the verify below passes.
> +	 */
> +	if (committed)
> +		*committed = true;

[ ... ]

> +	reg = get(ctrl);
> +	if (reg != val) {
> +		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
> +				   reg, val);
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}

If the verification fails here and returns -EIO, *committed remains
true. When called from cbqri_apply_rbwb(), this causes the rejected
value to be cached in ctrl->rbwb_cache[closid]. If the requested value
was smaller than the previous one, could this cause the software cache
to underestimate the reserved capacity and allow future allocations to
exceed the MRBWB hardware limit?

[ ... ]

> +int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
> +			       u64 mweight)
> +{
> +	int ret;
> +
> +	mutex_lock(&ctrl->lock);
> +	ret = cbqri_apply_bc_field(ctrl, closid,
> +				   cbqri_set_mweight, cbqri_get_mweight,
> +				   mweight, NULL);
> +	mutex_unlock(&ctrl->lock);
> +	return ret;
> +}

What happens if an arbitrarily large mweight is passed in? FIELD_MODIFY
silently truncates values larger than the 8-bit MWEIGHT_MASK. The
verification step in cbqri_apply_bc_field() will then compare the
truncated hardware value against the original untruncated val and
always fail, returning -EIO and logging a hardware failure instead of
returning -EINVAL. Should this have bounds checking similar to
cbqri_apply_rbwb()?

[ ... ]

> @@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
>  	return 0;
>  }
>  
> +static int cbqri_probe_bc(struct cbqri_controller *ctrl)
> +{
> +	bool has_mon_at_code = false;
> +	int err, status;
> +	u64 reg;

[ ... ]

> +	/* Probe monitoring features */
> +	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
> +				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
> +				  &has_mon_at_code);

When probing features, cbqri_probe_feature() modifies the OP mask and
clears RCID_MASK, but it doesn't appear to clear EVT_ID_MASK:

drivers/resctrl/cbqri_devices.c:cbqri_probe_feature() {
    ...
    FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
    reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
    iowrite64(reg, ctrl->base + reg_offset);
    ...
}

If the hardware or firmware left an unsupported Event ID in those bits,
could the hardware reject the READ_COUNTER operation, causing the probe
to see STATUS != SUCCESS and incorrectly assume monitoring is not
supported?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=9

  reply	other threads:[~2026-05-12  2:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12  1:26   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12  1:58   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12  2:29   ` sashiko-bot [this message]
2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12  4:01   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12  5:00   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12  5:42   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-12 21:26   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-12 21:46   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-12 22:56   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512022939.683A1C2BCB0@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=fustini@kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox