On Jun 23, Simon Horman wrote: > This is an AI-generated review of your patch. The human sending this > email has considered the AI review valid, or at least plausible. > Full review at: https://sashiko.dev > > Simon says: I'd appreciate it if you could look over the first two issues > flagged below in the context of this patch. While the third > and final issue looks like it is in possible separate follow-up > territory. > > --- > net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param() > > This patch introduces a retry loop to airoha_qdma_set_trtcm_param() to mitigate > bus timeouts when configuring QDMA parameters. It attempts to verify that the > configuration is correctly applied to the hardware. > > > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c > > index 3370c3df7c107..bb5c0599a4eed 100644 > > --- a/drivers/net/ethernet/airoha/airoha_eth.c > > +++ b/drivers/net/ethernet/airoha/airoha_eth.c > > @@ -2673,14 +2673,30 @@ static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel, > > FIELD_PREP(TRTCM_METER_GROUP_MASK, group) | > > FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) | > > FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode); > > + int i; > > > > - airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val); > > - airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config); > > + for (i = 0; i < 100; i++) { > > + u32 data; > > > > - return read_poll_timeout(airoha_qdma_rr, val, > > - val & TRTCM_PARAM_RW_DONE_MASK, > > - USEC_PER_MSEC, 10 * USEC_PER_MSEC, true, > > - qdma, REG_TRTCM_CFG_PARAM(addr)); > > + airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val); > > + wmb(); > > + airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config); > > + > > + if (read_poll_timeout(airoha_qdma_rr, data, > > + data & TRTCM_PARAM_RW_DONE_MASK, > > + USEC_PER_MSEC, 10 * USEC_PER_MSEC, > > + true, qdma, REG_TRTCM_CFG_PARAM(addr))) > > + return -ETIMEDOUT; > > [Severity: High] > If a bus timeout causes the write phase's read_poll_timeout() in > airoha_qdma_set_trtcm_param() to fail, this immediately returns -ETIMEDOUT, > aborting the 100-iteration retry loop. > > Doesn't this defeat the mitigation? I agree we just run 'continue' here instead of returing -ETIMEDOUT > > > + > > + if (airoha_qdma_get_trtcm_param(qdma, channel, addr, param, > > + mode, &data, NULL)) > > + continue; > > [Severity: High] > If airoha_qdma_get_trtcm_param() times out during the verification phase, it > indicates the hardware hasn't finished the read command (DONE bit is still 0). > > The continue statement then immediately starts the next iteration, writing new > commands to DATA_LOW and CFG_PARAM. > > Can writing to command registers while the hardware state machine is actively > busy violate hardware protocols and lead to unrecoverable lockups? @Brown and @Leto: can you please comment on this case? > > > + > > + if (data == val) > > + return 0; > > [Severity: High] > This isn't a bug introduced by this patch, but if the bus drops the command > write to REG_TRTCM_CFG_PARAM, the hardware never executes the command and the > TRTCM_PARAM_RW_DONE_MASK bit remains set from the previous operation. > > Consequently, read_poll_timeout() returns an immediate false success. > > If both the set and get command writes are dropped by the bus, the get > operation will read back the newly written DATA_LOW register value. > > Will the data == val check evaluate to true and falsely report success without > the hardware ever processing the configuration? This seems quite a 'pathological' scenario, is it really an issue? Regards, Lorenzo > > > + } > > + > > + return -EBUSY; > > } > > > > [ ... ] > > > static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel, > > u32 addr, enum trtcm_mode_type mode, > > bool enable, u32 enable_mask) > > { > > u32 val; > > > > if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE, > > mode, &val, NULL)) > > return -EINVAL; > > [Severity: High] > Standalone calls to airoha_qdma_get_trtcm_param(), like this one in > airoha_qdma_set_trtcm_config(), lack a retry loop entirely. > > Does this leave read-modify-write cycles fully vulnerable to the same bus > timeouts the patch attempts to mitigate? > > [Severity: High] > This isn't a bug introduced by this patch, but the patch mitigates bus > timeouts in airoha_qdma_set_trtcm_param() while entirely missing > airoha_qdma_set_rl_param(). > > The latter performs the exact same write-and-poll sequence on identical QDMA > configuration registers: > > drivers/net/ethernet/airoha/airoha_eth.c:airoha_qdma_set_rl_param() { > ... > return read_poll_timeout(..., val & RATE_LIMIT_PARAM_RW_DONE_MASK, ...); > } > > If bus timeouts affect QDMA access broadly as stated in the commit message, are > the rate-limiting configuration paths randomly vulnerable to the same failures? > > [Severity: High] > This isn't a bug introduced by this patch, but in airoha_tc_remove_htb_queue(), > hardware teardown incorrectly uses queue + 1 instead of queue: > > drivers/net/ethernet/airoha/airoha_eth.c:airoha_tc_remove_htb_queue() { > ... > airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0); > ... > } > > Does this leave the removed queue's hardware limits permanently active > (resource leak) and inadvertently disable the rate limits for queue + 1, > corrupting the QoS state of another active channel?