From: Stephen Hemminger <stephen@networkplumber.org>
To: Zaiyu Wang <zaiyuwang@trustnetic.com>
Cc: dev@dpdk.org, stable@dpdk.org, Jiawen Wu <jiawenwu@trustnetic.com>
Subject: Re: [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode
Date: Sun, 17 May 2026 16:50:32 -0700 [thread overview]
Message-ID: <20260517165032.059933d3@phoenix.local> (raw)
In-Reply-To: <20260511103604.19724-15-zaiyuwang@trustnetic.com>
On Mon, 11 May 2026 18:35:56 +0800
Zaiyu Wang <zaiyuwang@trustnetic.com> wrote:
> The link was previously configured via firmware, but this approach
> resulted in unstable link behavior. To resolve the issue, re-add the
> PHY configuration flow directly into the driver.
>
> Fixes: ead3616f630d ("net/txgbe: support PHY configuration via SW-FW mailbox")
> Cc: stable@dpdk.org
>
> Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
> ---
Lots of AI feedback here:
# Review of DPDK Patch: net/txgbe: fix link stability for 40G NIC
## Summary
This patch re-adds direct PHY configuration flow to the txgbe driver to fix unstable link behavior on 40G NICs. The previous firmware-based configuration approach was causing link stability issues.
## ERRORS
### 1. Use-after-free risk in `txgbe_setup_phy_link_aml40`
**File:** `drivers/net/txgbe/base/txgbe_aml40.c:157-159`
```c
ret_status = txgbe_set_link_to_amlite(hw, speed);
rte_spinlock_unlock(&hw->phy_lock);
if (ret_status == TXGBE_ERR_TIMEOUT)
hw->link_valid = false;
```
The spinlock is unlocked before checking `ret_status`, but if another thread modifies `hw->link_valid` between lines 158 and 160, the state becomes inconsistent. The assignment to `hw->link_valid` should occur before unlocking:
```c
ret_status = txgbe_set_link_to_amlite(hw, speed);
if (ret_status == TXGBE_ERR_TIMEOUT)
hw->link_valid = false;
rte_spinlock_unlock(&hw->phy_lock);
```
### 2. Missing error propagation in `txgbe_e56_rx_rd_second_code_40g`
**File:** `drivers/net/txgbe/base/txgbe_e56.c:1816`
The function declares `status = 0` and returns `status`, but never assigns a failure value even when qsort is called on potentially invalid data. If the timeout in the preceding while loop is reached (line 1825), the SECOND_CODE array may contain incomplete data, but the function still returns success.
### 3. Missing bounds check before array access
**File:** `drivers/net/txgbe/base/txgbe_e56.c:1831`
```c
median = ((N + 1) / 2) - 1;
*SECOND_CODE = RXS_BBCDR_SECOND_ORDER_ST[median];
```
If `N=5`, `median=2` which is valid. However, this code pattern is repeated multiple times (lines 244, 1831, etc.) with `N` as a constant, so it's safe. Nevertheless, adding `RTE_VERIFY(median < ARRAY_SIZE(RXS_BBCDR_SECOND_ORDER_ST))` would make intent explicit.
**Not flagging this as an error** since `N=5` is a fixed constant throughout.
### 4. Timeout return without cleanup in `txgbe_e56_rxs_calib_adapt_seq_40G`
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2475-2481`
```c
if (timer++ > PHYINIT_TIMEOUT) {
rdata = 0;
addr = E56PHY_PMD_CFG_0_ADDR;
rdata = rd32_ephy(hw, addr);
set_fields_e56(&rdata, E56PHY_PMD_CFG_0_RX_EN_CFG, 0x0);
wr32_ephy(hw, addr, rdata);
return TXGBE_ERR_TIMEOUT;
}
```
The function has already configured many registers in the loop `for (i = 0; i < 4; i++)` (starting line 2393). When a timeout occurs on lane 0-2, the function returns immediately without restoring registers on the lanes that were successfully configured. This leaves the hardware in a partially configured state. The cleanup should disable all lanes, not just the one that timed out.
## WARNINGS
### 1. Hardcoded timeout in multiple locations
**File:** `drivers/net/txgbe/base/txgbe_e56.c` (multiple locations)
The `PHYINIT_TIMEOUT` constant is used consistently, but the delays vary (100µs, 500µs, 1000µs, 10ms). For the 500µs delay case (e.g., line 2478), `PHYINIT_TIMEOUT` iterations result in `PHYINIT_TIMEOUT * 500µs` total wait time. If `PHYINIT_TIMEOUT` is intended to be milliseconds, the timeout duration becomes inconsistent across different polling loops. Consider documenting what the timeout value represents (iterations? milliseconds?) and using consistent delay granularity.
### 2. Potentially unreachable code after loop
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2656`
```c
for (j = 0; j < 16; j++) {
// ... ADC adaptation loop
}
/* g. Repeat #a to #f total 16 times */
```
The comment `/* g. Repeat #a to #f total 16 times */` appears *after* the loop that already runs 16 times. This is documentation only, but could be confusing. The comment should be before the loop or removed.
### 3. Inconsistent use of `msleep` vs `usec_delay`
**File:** `drivers/net/txgbe/base/txgbe_e56.c`
The patch uses `msleep()` for delays >= 10ms (lines 181, 3029) and `usec_delay()` for shorter delays (line 1826). However, line 3029 uses `msleep(10)` for 10ms, while line 2707 uses no delay after setting a register. Consider documenting the rationale for sleep vs busy-wait or using a consistent threshold.
### 4. Variable `bypass_ctle` hardcoded but declared as variable
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2396`
```c
u32 bypass_ctle = true;
```
The variable `bypass_ctle` is declared as `u32` but assigned a boolean value, and it's never modified. Either:
- Change to `const bool bypass_ctle = true;` (preferred)
- Or document why it's a runtime variable despite being hardcoded
### 5. Missing validation of speed parameter in initialization functions
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2206`
```c
if (speed == TXGBE_LINK_SPEED_10GB_FULL || speed == TXGBE_LINK_SPEED_40GB_FULL) {
CMVAR_SEC_LOW_TH = S10G_CMVAR_SEC_LOW_TH;
// ...
} else if (speed == TXGBE_LINK_SPEED_25GB_FULL) {
// ...
} else {
DEBUGOUT("Error Speed\n");
return 0; // Returns success despite error
}
```
The function returns 0 (success) when an invalid speed is passed, but logs "Error Speed". This should return an error code like `-EINVAL` or `TXGBE_ERR_PARAM`.
## INFORMATIONAL
### 1. Large function complexity
The function `txgbe_e56_rxs_calib_adapt_seq_40G` spans ~280 lines with deeply nested loops (3-level nesting). Consider refactoring into smaller helper functions for each calibration stage (ADC offset, ADC gain, interleaver adaptation) to improve readability and maintainability.
### 2. Magic numbers without symbolic constants
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2470`
```c
while (EPHY_XFLD(E56G__PMD_CTRL_FSM_RX_STAT_0, ctrl_fsm_rx0_st) != 0x21 ||
```
The value `0x21` (POWERDN_ST) appears in multiple locations (lines 2470, 3145). This is already defined as `E56PHY_RX_POWERDN_ST` elsewhere. Use the symbolic constant consistently.
### 3. Duplicated initialization sequences
The 40G initialization in `txgbe_e56_cfg_40g` (lines 176-566) and the existing 10G/25G code share many similar register sequences. Consider extracting common configuration patterns into helper functions to reduce code duplication and maintenance burden.
### 4. Temperature check frequency
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2253-2256`
The temperature tracking sequence comment states "must be run before the temperature drifts by >5degC" and recommends running every 100ms. However, the patch doesn't add timer-based periodic execution—it only runs during link setup. If temperature tracking is critical for stability, consider documenting that the caller must invoke this periodically.
---
## Positive Observations
1. The patch correctly adds `hw->link_valid` checks in `txgbe_check_mac_link_aml40` to prevent reporting link up when PHY configuration fails (lines 57-60, 80-81).
2. Error paths in timeout scenarios attempt cleanup by disabling RX (e.g., line 2477).
3. The use of median filtering for SECOND_CODE (lines 1829-1831) reduces noise from asynchronous hardware updates—good defensive programming.
next prev parent reply other threads:[~2026-05-17 23:56 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 3:40 [PATCH 00/18] Wangxun Fixes Zaiyu Wang
2026-04-23 3:40 ` [PATCH 01/18] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-23 3:40 ` [PATCH 02/18] net/ngbe: " Zaiyu Wang
2026-04-23 3:40 ` [PATCH 03/18] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-23 3:40 ` [PATCH 04/18] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-23 3:40 ` [PATCH 05/18] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-23 3:40 ` [PATCH 06/18] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-23 3:40 ` [PATCH 07/18] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-23 3:40 ` [PATCH 08/18] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-23 7:54 ` Jiawen Wu
2026-04-23 3:40 ` [PATCH 09/18] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-23 3:40 ` [PATCH 10/18] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-23 3:40 ` [PATCH 11/18] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-23 3:40 ` [PATCH 12/18] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-23 8:22 ` Jiawen Wu
2026-04-23 3:40 ` [PATCH 13/18] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-23 3:40 ` [PATCH 14/18] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-23 3:40 ` [PATCH 15/18] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-23 3:40 ` [PATCH 16/18] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-23 3:40 ` [PATCH 17/18] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-23 3:40 ` [PATCH 18/18] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-24 21:59 ` Stephen Hemminger
2026-04-29 10:24 ` [PATCH v2 00/20] Wangxun Fixes Zaiyu Wang
2026-04-29 10:24 ` [PATCH v2 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-29 10:24 ` [PATCH v2 02/20] net/ngbe: " Zaiyu Wang
2026-04-29 10:24 ` [PATCH v2 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-29 10:24 ` [PATCH v2 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-29 10:24 ` [PATCH v2 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-29 15:10 ` Stephen Hemminger
2026-04-29 10:25 ` [PATCH v2 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-29 15:11 ` Stephen Hemminger
2026-05-09 11:06 ` Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-29 15:12 ` Stephen Hemminger
2026-04-29 10:25 ` [PATCH v2 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-04-29 10:25 ` [PATCH v2 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 00/20] Wangxun Fixes Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 02/20] net/ngbe: " Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 15:44 ` [PATCH v3 00/20] Wangxun Fixes Stephen Hemminger
2026-05-09 17:07 ` Stephen Hemminger
2026-05-11 10:28 ` Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 " Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 02/20] net/ngbe: " Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-17 23:37 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-17 23:39 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 05/20] net/txgbe: fix inaccuracy in Tx rate limiting Zaiyu Wang
2026-05-17 23:40 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-17 23:44 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-17 23:46 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-17 23:49 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-17 23:50 ` Stephen Hemminger [this message]
2026-05-11 10:35 ` [PATCH v4 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-17 23:52 ` Stephen Hemminger
2026-05-11 10:35 ` [PATCH v4 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-17 23:53 ` Stephen Hemminger
2026-05-11 10:36 ` [PATCH v4 18/20] net/txgbe: fix get EEPROM operation Zaiyu Wang
2026-05-17 23:54 ` Stephen Hemminger
2026-05-11 10:36 ` [PATCH v4 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-11 10:36 ` [PATCH v4 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-17 23:55 ` Stephen Hemminger
2026-05-18 14:54 ` [PATCH v4 00/20] Wangxun Fixes Stephen Hemminger
2026-05-19 6:56 ` Zaiyu Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260517165032.059933d3@phoenix.local \
--to=stephen@networkplumber.org \
--cc=dev@dpdk.org \
--cc=jiawenwu@trustnetic.com \
--cc=stable@dpdk.org \
--cc=zaiyuwang@trustnetic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox