* [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
@ 2026-06-22 9:35 Lorenzo Bianconi
2026-06-23 10:53 ` Simon Horman
0 siblings, 1 reply; 3+ messages in thread
From: Lorenzo Bianconi @ 2026-06-22 9:35 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Leto Liu, linux-arm-kernel, linux-mediatek, netdev, Brown Huang
From: Brown Huang <brown.huang@airoha.com>
CPU accesses QDMA via the bus. When multiple modules are using the bus
simultaneously, CPU access to QDMA may encounter bus timeouts and fails,
resulting in QDMA configuration failures and potentially causing packet
transmission issues. In order to mitigate the issue, introduce a retry
mechanism to airoha_qdma_set_trtcm_param routine in order to ensure the
configuration is correctly applied to the hardware.
Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Signed-off-by: Brown Huang <brown.huang@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in v2:
- Wait for write configuration to be completed before running
airoha_qdma_get_trtcm_param() in airoha_qdma_set_trtcm_param().
- Link to v1: https://lore.kernel.org/r/20260608-airoha_qdma_set_trtcm_param-retry-fix-v1-1-f07704f0d8c5@kernel.org
---
drivers/net/ethernet/airoha/airoha_eth.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 3370c3df7c10..bb5c0599a4ee 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -2673,14 +2673,30 @@ static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel,
FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
+ int i;
- airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
- airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
+ for (i = 0; i < 100; i++) {
+ u32 data;
- return read_poll_timeout(airoha_qdma_rr, val,
- val & TRTCM_PARAM_RW_DONE_MASK,
- USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
- qdma, REG_TRTCM_CFG_PARAM(addr));
+ airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
+ wmb();
+ airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
+
+ if (read_poll_timeout(airoha_qdma_rr, data,
+ data & TRTCM_PARAM_RW_DONE_MASK,
+ USEC_PER_MSEC, 10 * USEC_PER_MSEC,
+ true, qdma, REG_TRTCM_CFG_PARAM(addr)))
+ return -ETIMEDOUT;
+
+ if (airoha_qdma_get_trtcm_param(qdma, channel, addr, param,
+ mode, &data, NULL))
+ continue;
+
+ if (data == val)
+ return 0;
+ }
+
+ return -EBUSY;
}
static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel,
---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260605-airoha_qdma_set_trtcm_param-retry-fix-a9d2956b9b2f
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
2026-06-22 9:35 [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param() Lorenzo Bianconi
@ 2026-06-23 10:53 ` Simon Horman
2026-06-23 20:32 ` Lorenzo Bianconi
0 siblings, 1 reply; 3+ messages in thread
From: Simon Horman @ 2026-06-23 10:53 UTC (permalink / raw)
To: lorenzo
Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, pabeni,
leto.liu, linux-arm-kernel, linux-mediatek, netdev, brown.huang
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
Simon says: I'd appreciate it if you could look over the first two issues
flagged below in the context of this patch. While the third
and final issue looks like it is in possible separate follow-up
territory.
---
net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
This patch introduces a retry loop to airoha_qdma_set_trtcm_param() to mitigate
bus timeouts when configuring QDMA parameters. It attempts to verify that the
configuration is correctly applied to the hardware.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 3370c3df7c107..bb5c0599a4eed 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -2673,14 +2673,30 @@ static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel,
> FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
> FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
> FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
> + int i;
>
> - airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> - airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> + for (i = 0; i < 100; i++) {
> + u32 data;
>
> - return read_poll_timeout(airoha_qdma_rr, val,
> - val & TRTCM_PARAM_RW_DONE_MASK,
> - USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
> - qdma, REG_TRTCM_CFG_PARAM(addr));
> + airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> + wmb();
> + airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> +
> + if (read_poll_timeout(airoha_qdma_rr, data,
> + data & TRTCM_PARAM_RW_DONE_MASK,
> + USEC_PER_MSEC, 10 * USEC_PER_MSEC,
> + true, qdma, REG_TRTCM_CFG_PARAM(addr)))
> + return -ETIMEDOUT;
[Severity: High]
If a bus timeout causes the write phase's read_poll_timeout() in
airoha_qdma_set_trtcm_param() to fail, this immediately returns -ETIMEDOUT,
aborting the 100-iteration retry loop.
Doesn't this defeat the mitigation?
> +
> + if (airoha_qdma_get_trtcm_param(qdma, channel, addr, param,
> + mode, &data, NULL))
> + continue;
[Severity: High]
If airoha_qdma_get_trtcm_param() times out during the verification phase, it
indicates the hardware hasn't finished the read command (DONE bit is still 0).
The continue statement then immediately starts the next iteration, writing new
commands to DATA_LOW and CFG_PARAM.
Can writing to command registers while the hardware state machine is actively
busy violate hardware protocols and lead to unrecoverable lockups?
> +
> + if (data == val)
> + return 0;
[Severity: High]
This isn't a bug introduced by this patch, but if the bus drops the command
write to REG_TRTCM_CFG_PARAM, the hardware never executes the command and the
TRTCM_PARAM_RW_DONE_MASK bit remains set from the previous operation.
Consequently, read_poll_timeout() returns an immediate false success.
If both the set and get command writes are dropped by the bus, the get
operation will read back the newly written DATA_LOW register value.
Will the data == val check evaluate to true and falsely report success without
the hardware ever processing the configuration?
> + }
> +
> + return -EBUSY;
> }
>
[ ... ]
> static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel,
> u32 addr, enum trtcm_mode_type mode,
> bool enable, u32 enable_mask)
> {
> u32 val;
>
> if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
> mode, &val, NULL))
> return -EINVAL;
[Severity: High]
Standalone calls to airoha_qdma_get_trtcm_param(), like this one in
airoha_qdma_set_trtcm_config(), lack a retry loop entirely.
Does this leave read-modify-write cycles fully vulnerable to the same bus
timeouts the patch attempts to mitigate?
[Severity: High]
This isn't a bug introduced by this patch, but the patch mitigates bus
timeouts in airoha_qdma_set_trtcm_param() while entirely missing
airoha_qdma_set_rl_param().
The latter performs the exact same write-and-poll sequence on identical QDMA
configuration registers:
drivers/net/ethernet/airoha/airoha_eth.c:airoha_qdma_set_rl_param() {
...
return read_poll_timeout(..., val & RATE_LIMIT_PARAM_RW_DONE_MASK, ...);
}
If bus timeouts affect QDMA access broadly as stated in the commit message, are
the rate-limiting configuration paths randomly vulnerable to the same failures?
[Severity: High]
This isn't a bug introduced by this patch, but in airoha_tc_remove_htb_queue(),
hardware teardown incorrectly uses queue + 1 instead of queue:
drivers/net/ethernet/airoha/airoha_eth.c:airoha_tc_remove_htb_queue() {
...
airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0);
...
}
Does this leave the removed queue's hardware limits permanently active
(resource leak) and inadvertently disable the rate limits for queue + 1,
corrupting the QoS state of another active channel?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
2026-06-23 10:53 ` Simon Horman
@ 2026-06-23 20:32 ` Lorenzo Bianconi
0 siblings, 0 replies; 3+ messages in thread
From: Lorenzo Bianconi @ 2026-06-23 20:32 UTC (permalink / raw)
To: Simon Horman
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, leto.liu,
linux-arm-kernel, linux-mediatek, netdev, brown.huang
[-- Attachment #1: Type: text/plain, Size: 5732 bytes --]
On Jun 23, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
>
> Simon says: I'd appreciate it if you could look over the first two issues
> flagged below in the context of this patch. While the third
> and final issue looks like it is in possible separate follow-up
> territory.
>
> ---
> net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
>
> This patch introduces a retry loop to airoha_qdma_set_trtcm_param() to mitigate
> bus timeouts when configuring QDMA parameters. It attempts to verify that the
> configuration is correctly applied to the hardware.
>
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > index 3370c3df7c107..bb5c0599a4eed 100644
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> > @@ -2673,14 +2673,30 @@ static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel,
> > FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
> > FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
> > FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
> > + int i;
> >
> > - airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> > - airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> > + for (i = 0; i < 100; i++) {
> > + u32 data;
> >
> > - return read_poll_timeout(airoha_qdma_rr, val,
> > - val & TRTCM_PARAM_RW_DONE_MASK,
> > - USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
> > - qdma, REG_TRTCM_CFG_PARAM(addr));
> > + airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> > + wmb();
> > + airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> > +
> > + if (read_poll_timeout(airoha_qdma_rr, data,
> > + data & TRTCM_PARAM_RW_DONE_MASK,
> > + USEC_PER_MSEC, 10 * USEC_PER_MSEC,
> > + true, qdma, REG_TRTCM_CFG_PARAM(addr)))
> > + return -ETIMEDOUT;
>
> [Severity: High]
> If a bus timeout causes the write phase's read_poll_timeout() in
> airoha_qdma_set_trtcm_param() to fail, this immediately returns -ETIMEDOUT,
> aborting the 100-iteration retry loop.
>
> Doesn't this defeat the mitigation?
I agree we just run 'continue' here instead of returing -ETIMEDOUT
>
> > +
> > + if (airoha_qdma_get_trtcm_param(qdma, channel, addr, param,
> > + mode, &data, NULL))
> > + continue;
>
> [Severity: High]
> If airoha_qdma_get_trtcm_param() times out during the verification phase, it
> indicates the hardware hasn't finished the read command (DONE bit is still 0).
>
> The continue statement then immediately starts the next iteration, writing new
> commands to DATA_LOW and CFG_PARAM.
>
> Can writing to command registers while the hardware state machine is actively
> busy violate hardware protocols and lead to unrecoverable lockups?
@Brown and @Leto: can you please comment on this case?
>
> > +
> > + if (data == val)
> > + return 0;
>
> [Severity: High]
> This isn't a bug introduced by this patch, but if the bus drops the command
> write to REG_TRTCM_CFG_PARAM, the hardware never executes the command and the
> TRTCM_PARAM_RW_DONE_MASK bit remains set from the previous operation.
>
> Consequently, read_poll_timeout() returns an immediate false success.
>
> If both the set and get command writes are dropped by the bus, the get
> operation will read back the newly written DATA_LOW register value.
>
> Will the data == val check evaluate to true and falsely report success without
> the hardware ever processing the configuration?
This seems quite a 'pathological' scenario, is it really an issue?
Regards,
Lorenzo
>
> > + }
> > +
> > + return -EBUSY;
> > }
> >
>
> [ ... ]
>
> > static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel,
> > u32 addr, enum trtcm_mode_type mode,
> > bool enable, u32 enable_mask)
> > {
> > u32 val;
> >
> > if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
> > mode, &val, NULL))
> > return -EINVAL;
>
> [Severity: High]
> Standalone calls to airoha_qdma_get_trtcm_param(), like this one in
> airoha_qdma_set_trtcm_config(), lack a retry loop entirely.
>
> Does this leave read-modify-write cycles fully vulnerable to the same bus
> timeouts the patch attempts to mitigate?
>
> [Severity: High]
> This isn't a bug introduced by this patch, but the patch mitigates bus
> timeouts in airoha_qdma_set_trtcm_param() while entirely missing
> airoha_qdma_set_rl_param().
>
> The latter performs the exact same write-and-poll sequence on identical QDMA
> configuration registers:
>
> drivers/net/ethernet/airoha/airoha_eth.c:airoha_qdma_set_rl_param() {
> ...
> return read_poll_timeout(..., val & RATE_LIMIT_PARAM_RW_DONE_MASK, ...);
> }
>
> If bus timeouts affect QDMA access broadly as stated in the commit message, are
> the rate-limiting configuration paths randomly vulnerable to the same failures?
>
> [Severity: High]
> This isn't a bug introduced by this patch, but in airoha_tc_remove_htb_queue(),
> hardware teardown incorrectly uses queue + 1 instead of queue:
>
> drivers/net/ethernet/airoha/airoha_eth.c:airoha_tc_remove_htb_queue() {
> ...
> airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0);
> ...
> }
>
> Does this leave the removed queue's hardware limits permanently active
> (resource leak) and inadvertently disable the rate limits for queue + 1,
> corrupting the QoS state of another active channel?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-23 20:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 9:35 [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param() Lorenzo Bianconi
2026-06-23 10:53 ` Simon Horman
2026-06-23 20:32 ` Lorenzo Bianconi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox