All of lore.kernel.org
 help / color / mirror / Atom feed
From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: krzk+dt@kernel.org, robh@kernel.org, conor+dt@kernel.org,
	devicetree@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
Date: Fri, 15 May 2026 23:15:30 -0700	[thread overview]
Message-ID: <aggLggGOeuPTQV3p@thelio> (raw)
In-Reply-To: <20260512022939.683A1C2BCB0@smtp.kernel.org>

On Tue, May 12, 2026 at 02:29:38AM +0000, sashiko-bot@kernel.org wrote:
[..]
> > +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> > +{
> > +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> > +	int status;
> > +	u64 reg;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	if (ctrl->faulted)
> > +		return -EIO;
> > +
> > +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
> 
> If a timeout in cbqri_wait_busy_flag() sets ctrl->faulted = true, will
> runtime operations like cbqri_bc_alloc_op() ever execute again? Because
> it returns early here when ctrl->faulted is true, the polling function
> is never called again to clear the flag, which seems to permanently
> disable the controller after a single timeout.

I'll drop the ctrl->faulted early return from cbqri_bc_alloc_op(),
cbqri_cc_alloc_op() and cbqri_mon_op() to avoid locking the controller
out for the lifetime of the driver.

> > +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> > +				void (*set)(struct cbqri_controller *, u64),
> > +				u64 (*get)(struct cbqri_controller *),
> > +				u64 val, bool *committed)
> > +{
> > +	int ret;
> > +	u64 reg;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	/* Load current RCID state so the unmodified field is preserved */
> > +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> > +	if (ret < 0)
> > +		return ret;
> 
> Does this initial read need the same sentinel protection as the
> verification read? If this READ_LIMIT silently fails, the staging
> register might retain data from a previous operation (like a different
> RCID). Since set() only modifies the target field, CONFIG_LIMIT could
> commit the stale data in the unmodified field, causing cross-RCID state
> corruption.

I'll stop relying on the register to preserve the unmodified field. I
will add an mweight_cache alongside rbwb_cache.

> > +	reg = get(ctrl);
> > +	if (reg != val) {
> > +		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
> > +				   reg, val);
> > +		return -EIO;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> If the verification fails here and returns -EIO, *committed remains
> true. When called from cbqri_apply_rbwb(), this causes the rejected
> value to be cached in ctrl->rbwb_cache[closid]. If the requested value
> was smaller than the previous one, could this cause the software cache
> to underestimate the reserved capacity and allow future allocations to
> exceed the MRBWB hardware limit?

I'll change the order so that committed is only set to true after the
verify readback succeeds.

> > +int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
> > +			       u64 mweight)
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&ctrl->lock);
> > +	ret = cbqri_apply_bc_field(ctrl, closid,
> > +				   cbqri_set_mweight, cbqri_get_mweight,
> > +				   mweight, NULL);
> > +	mutex_unlock(&ctrl->lock);
> > +	return ret;
> > +}
> 
> What happens if an arbitrarily large mweight is passed in? FIELD_MODIFY
> silently truncates values larger than the 8-bit MWEIGHT_MASK. The
> verification step in cbqri_apply_bc_field() will then compare the
> truncated hardware value against the original untruncated val and
> always fail, returning -EIO and logging a hardware failure instead of
> returning -EINVAL. Should this have bounds checking similar to
> cbqri_apply_rbwb()?

I will bound mweight against FIELD_MAX(MWEIGHT_MASK) at the beginning.

> 
> > @@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> >  	return 0;
> >  }
> >  
> > +static int cbqri_probe_bc(struct cbqri_controller *ctrl)
> > +{
> > +	bool has_mon_at_code = false;
> > +	int err, status;
> > +	u64 reg;
> 
> [ ... ]
> 
> > +	/* Probe monitoring features */
> > +	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
> > +				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
> > +				  &has_mon_at_code);
> 
> When probing features, cbqri_probe_feature() modifies the OP mask and
> clears RCID_MASK, but it doesn't appear to clear EVT_ID_MASK:
> 
> drivers/resctrl/cbqri_devices.c:cbqri_probe_feature() {
>     ...
>     FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
>     reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
>     iowrite64(reg, ctrl->base + reg_offset);
>     ...
> }
> 
> If the hardware or firmware left an unsupported Event ID in those bits,
> could the hardware reject the READ_COUNTER operation, causing the probe
> to see STATUS != SUCCESS and incorrectly assume monitoring is not
> supported?

I will clear OP, AT, RCID and EVT_ID on every write in the probe.

-Drew

WARNING: multiple messages have this Message-ID (diff)
From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: krzk+dt@kernel.org, robh@kernel.org, conor+dt@kernel.org,
	devicetree@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
Date: Fri, 15 May 2026 23:15:30 -0700	[thread overview]
Message-ID: <aggLggGOeuPTQV3p@thelio> (raw)
In-Reply-To: <20260512022939.683A1C2BCB0@smtp.kernel.org>

On Tue, May 12, 2026 at 02:29:38AM +0000, sashiko-bot@kernel.org wrote:
[..]
> > +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> > +{
> > +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> > +	int status;
> > +	u64 reg;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	if (ctrl->faulted)
> > +		return -EIO;
> > +
> > +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
> 
> If a timeout in cbqri_wait_busy_flag() sets ctrl->faulted = true, will
> runtime operations like cbqri_bc_alloc_op() ever execute again? Because
> it returns early here when ctrl->faulted is true, the polling function
> is never called again to clear the flag, which seems to permanently
> disable the controller after a single timeout.

I'll drop the ctrl->faulted early return from cbqri_bc_alloc_op(),
cbqri_cc_alloc_op() and cbqri_mon_op() to avoid locking the controller
out for the lifetime of the driver.

> > +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> > +				void (*set)(struct cbqri_controller *, u64),
> > +				u64 (*get)(struct cbqri_controller *),
> > +				u64 val, bool *committed)
> > +{
> > +	int ret;
> > +	u64 reg;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	/* Load current RCID state so the unmodified field is preserved */
> > +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> > +	if (ret < 0)
> > +		return ret;
> 
> Does this initial read need the same sentinel protection as the
> verification read? If this READ_LIMIT silently fails, the staging
> register might retain data from a previous operation (like a different
> RCID). Since set() only modifies the target field, CONFIG_LIMIT could
> commit the stale data in the unmodified field, causing cross-RCID state
> corruption.

I'll stop relying on the register to preserve the unmodified field. I
will add an mweight_cache alongside rbwb_cache.

> > +	reg = get(ctrl);
> > +	if (reg != val) {
> > +		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
> > +				   reg, val);
> > +		return -EIO;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> If the verification fails here and returns -EIO, *committed remains
> true. When called from cbqri_apply_rbwb(), this causes the rejected
> value to be cached in ctrl->rbwb_cache[closid]. If the requested value
> was smaller than the previous one, could this cause the software cache
> to underestimate the reserved capacity and allow future allocations to
> exceed the MRBWB hardware limit?

I'll change the order so that committed is only set to true after the
verify readback succeeds.

> > +int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
> > +			       u64 mweight)
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&ctrl->lock);
> > +	ret = cbqri_apply_bc_field(ctrl, closid,
> > +				   cbqri_set_mweight, cbqri_get_mweight,
> > +				   mweight, NULL);
> > +	mutex_unlock(&ctrl->lock);
> > +	return ret;
> > +}
> 
> What happens if an arbitrarily large mweight is passed in? FIELD_MODIFY
> silently truncates values larger than the 8-bit MWEIGHT_MASK. The
> verification step in cbqri_apply_bc_field() will then compare the
> truncated hardware value against the original untruncated val and
> always fail, returning -EIO and logging a hardware failure instead of
> returning -EINVAL. Should this have bounds checking similar to
> cbqri_apply_rbwb()?

I will bound mweight against FIELD_MAX(MWEIGHT_MASK) at the beginning.

> 
> > @@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> >  	return 0;
> >  }
> >  
> > +static int cbqri_probe_bc(struct cbqri_controller *ctrl)
> > +{
> > +	bool has_mon_at_code = false;
> > +	int err, status;
> > +	u64 reg;
> 
> [ ... ]
> 
> > +	/* Probe monitoring features */
> > +	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
> > +				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
> > +				  &has_mon_at_code);
> 
> When probing features, cbqri_probe_feature() modifies the OP mask and
> clears RCID_MASK, but it doesn't appear to clear EVT_ID_MASK:
> 
> drivers/resctrl/cbqri_devices.c:cbqri_probe_feature() {
>     ...
>     FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
>     reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
>     iowrite64(reg, ctrl->base + reg_offset);
>     ...
> }
> 
> If the hardware or firmware left an unsupported Event ID in those bits,
> could the hardware reject the READ_COUNTER operation, causing the probe
> to see STATUS != SUCCESS and incorrectly assume monitoring is not
> supported?

I will clear OP, AT, RCID and EVT_ID on every write in the probe.

-Drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2026-05-16  6:15 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11  5:10 ` Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11  5:10   ` Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11  5:10   ` Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11  5:10   ` Drew Fustini
2026-05-11 23:52   ` sashiko-bot
2026-05-14 22:24     ` Drew Fustini
2026-05-14 22:24       ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  1:26   ` sashiko-bot
2026-05-15 13:46     ` Drew Fustini
2026-05-15 13:46       ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  1:58   ` sashiko-bot
2026-05-16  3:05     ` Drew Fustini
2026-05-16  3:05       ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  2:29   ` sashiko-bot
2026-05-16  6:15     ` Drew Fustini [this message]
2026-05-16  6:15       ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  4:01   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  5:00   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12  5:42   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12 21:26   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12 21:46   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-11  5:11   ` Drew Fustini
2026-05-12 22:56   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aggLggGOeuPTQV3p@thelio \
    --to=fustini@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=robh@kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.