DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Zaiyu Wang <zaiyuwang@trustnetic.com>
Cc: dev@dpdk.org, stable@dpdk.org, Jiawen Wu <jiawenwu@trustnetic.com>
Subject: Re: [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode
Date: Sun, 17 May 2026 16:50:32 -0700	[thread overview]
Message-ID: <20260517165032.059933d3@phoenix.local> (raw)
In-Reply-To: <20260511103604.19724-15-zaiyuwang@trustnetic.com>

On Mon, 11 May 2026 18:35:56 +0800
Zaiyu Wang <zaiyuwang@trustnetic.com> wrote:

> The link was previously configured via firmware, but this approach
> resulted in unstable link behavior. To resolve the issue, re-add the
> PHY configuration flow directly into the driver.
> 
> Fixes: ead3616f630d ("net/txgbe: support PHY configuration via SW-FW mailbox")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
> ---

Lots of AI feedback here:


# Review of DPDK Patch: net/txgbe: fix link stability for 40G NIC

## Summary
This patch re-adds direct PHY configuration flow to the txgbe driver to fix unstable link behavior on 40G NICs. The previous firmware-based configuration approach was causing link stability issues.

## ERRORS

### 1. Use-after-free risk in `txgbe_setup_phy_link_aml40`
**File:** `drivers/net/txgbe/base/txgbe_aml40.c:157-159`

```c
ret_status = txgbe_set_link_to_amlite(hw, speed);
rte_spinlock_unlock(&hw->phy_lock);

if (ret_status == TXGBE_ERR_TIMEOUT)
    hw->link_valid = false;
```

The spinlock is unlocked before checking `ret_status`, but if another thread modifies `hw->link_valid` between lines 158 and 160, the state becomes inconsistent. The assignment to `hw->link_valid` should occur before unlocking:

```c
ret_status = txgbe_set_link_to_amlite(hw, speed);
if (ret_status == TXGBE_ERR_TIMEOUT)
    hw->link_valid = false;
rte_spinlock_unlock(&hw->phy_lock);
```

### 2. Missing error propagation in `txgbe_e56_rx_rd_second_code_40g`
**File:** `drivers/net/txgbe/base/txgbe_e56.c:1816`

The function declares `status = 0` and returns `status`, but never assigns a failure value even when qsort is called on potentially invalid data. If the timeout in the preceding while loop is reached (line 1825), the SECOND_CODE array may contain incomplete data, but the function still returns success.

### 3. Missing bounds check before array access
**File:** `drivers/net/txgbe/base/txgbe_e56.c:1831`

```c
median = ((N + 1) / 2) - 1;
*SECOND_CODE = RXS_BBCDR_SECOND_ORDER_ST[median];
```

If `N=5`, `median=2` which is valid. However, this code pattern is repeated multiple times (lines 244, 1831, etc.) with `N` as a constant, so it's safe. Nevertheless, adding `RTE_VERIFY(median < ARRAY_SIZE(RXS_BBCDR_SECOND_ORDER_ST))` would make intent explicit.

**Not flagging this as an error** since `N=5` is a fixed constant throughout.

### 4. Timeout return without cleanup in `txgbe_e56_rxs_calib_adapt_seq_40G`
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2475-2481`

```c
if (timer++ > PHYINIT_TIMEOUT) {
    rdata = 0;
    addr  = E56PHY_PMD_CFG_0_ADDR;
    rdata = rd32_ephy(hw, addr);
    set_fields_e56(&rdata, E56PHY_PMD_CFG_0_RX_EN_CFG, 0x0);
    wr32_ephy(hw, addr, rdata);
    return TXGBE_ERR_TIMEOUT;
}
```

The function has already configured many registers in the loop `for (i = 0; i < 4; i++)` (starting line 2393). When a timeout occurs on lane 0-2, the function returns immediately without restoring registers on the lanes that were successfully configured. This leaves the hardware in a partially configured state. The cleanup should disable all lanes, not just the one that timed out.

## WARNINGS

### 1. Hardcoded timeout in multiple locations
**File:** `drivers/net/txgbe/base/txgbe_e56.c` (multiple locations)

The `PHYINIT_TIMEOUT` constant is used consistently, but the delays vary (100µs, 500µs, 1000µs, 10ms). For the 500µs delay case (e.g., line 2478), `PHYINIT_TIMEOUT` iterations result in `PHYINIT_TIMEOUT * 500µs` total wait time. If `PHYINIT_TIMEOUT` is intended to be milliseconds, the timeout duration becomes inconsistent across different polling loops. Consider documenting what the timeout value represents (iterations? milliseconds?) and using consistent delay granularity.

### 2. Potentially unreachable code after loop
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2656`

```c
for (j = 0; j < 16; j++) {
    // ... ADC adaptation loop
}
/* g. Repeat #a to #f total 16 times */
```

The comment `/* g. Repeat #a to #f total 16 times */` appears *after* the loop that already runs 16 times. This is documentation only, but could be confusing. The comment should be before the loop or removed.

### 3. Inconsistent use of `msleep` vs `usec_delay`
**File:** `drivers/net/txgbe/base/txgbe_e56.c`

The patch uses `msleep()` for delays >= 10ms (lines 181, 3029) and `usec_delay()` for shorter delays (line 1826). However, line 3029 uses `msleep(10)` for 10ms, while line 2707 uses no delay after setting a register. Consider documenting the rationale for sleep vs busy-wait or using a consistent threshold.

### 4. Variable `bypass_ctle` hardcoded but declared as variable
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2396`

```c
u32 bypass_ctle = true;
```

The variable `bypass_ctle` is declared as `u32` but assigned a boolean value, and it's never modified. Either:
- Change to `const bool bypass_ctle = true;` (preferred)
- Or document why it's a runtime variable despite being hardcoded

### 5. Missing validation of speed parameter in initialization functions
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2206`

```c
if (speed == TXGBE_LINK_SPEED_10GB_FULL || speed == TXGBE_LINK_SPEED_40GB_FULL) {
    CMVAR_SEC_LOW_TH = S10G_CMVAR_SEC_LOW_TH;
    // ...
} else if (speed == TXGBE_LINK_SPEED_25GB_FULL) {
    // ...
} else {
    DEBUGOUT("Error Speed\n");
    return 0;  // Returns success despite error
}
```

The function returns 0 (success) when an invalid speed is passed, but logs "Error Speed". This should return an error code like `-EINVAL` or `TXGBE_ERR_PARAM`.

## INFORMATIONAL

### 1. Large function complexity
The function `txgbe_e56_rxs_calib_adapt_seq_40G` spans ~280 lines with deeply nested loops (3-level nesting). Consider refactoring into smaller helper functions for each calibration stage (ADC offset, ADC gain, interleaver adaptation) to improve readability and maintainability.

### 2. Magic numbers without symbolic constants
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2470`

```c
while (EPHY_XFLD(E56G__PMD_CTRL_FSM_RX_STAT_0, ctrl_fsm_rx0_st) != 0x21 ||
```

The value `0x21` (POWERDN_ST) appears in multiple locations (lines 2470, 3145). This is already defined as `E56PHY_RX_POWERDN_ST` elsewhere. Use the symbolic constant consistently.

### 3. Duplicated initialization sequences
The 40G initialization in `txgbe_e56_cfg_40g` (lines 176-566) and the existing 10G/25G code share many similar register sequences. Consider extracting common configuration patterns into helper functions to reduce code duplication and maintenance burden.

### 4. Temperature check frequency
**File:** `drivers/net/txgbe/base/txgbe_e56.c:2253-2256`

The temperature tracking sequence comment states "must be run before the temperature drifts by >5degC" and recommends running every 100ms. However, the patch doesn't add timer-based periodic execution—it only runs during link setup. If temperature tracking is critical for stability, consider documenting that the caller must invoke this periodically.

---

## Positive Observations
1. The patch correctly adds `hw->link_valid` checks in `txgbe_check_mac_link_aml40` to prevent reporting link up when PHY configuration fails (lines 57-60, 80-81).
2. Error paths in timeout scenarios attempt cleanup by disabling RX (e.g., line 2477).
3. The use of median filtering for SECOND_CODE (lines 1829-1831) reduces noise from asynchronous hardware updates—good defensive programming.

  reply	other threads:[~2026-05-17 23:56 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  3:40 [PATCH 00/18] Wangxun Fixes Zaiyu Wang
2026-04-23  3:40 ` [PATCH 01/18] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-23  3:40 ` [PATCH 02/18] net/ngbe: " Zaiyu Wang
2026-04-23  3:40 ` [PATCH 03/18] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-23  3:40 ` [PATCH 04/18] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-23  3:40 ` [PATCH 05/18] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-23  3:40 ` [PATCH 06/18] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-23  3:40 ` [PATCH 07/18] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-23  3:40 ` [PATCH 08/18] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-23  7:54   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 09/18] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-23  3:40 ` [PATCH 10/18] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-23  3:40 ` [PATCH 11/18] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-23  3:40 ` [PATCH 12/18] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-23  8:22   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 13/18] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 14/18] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-23  3:40 ` [PATCH 15/18] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 16/18] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-23  3:40 ` [PATCH 17/18] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-23  3:40 ` [PATCH 18/18] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-24 21:59   ` Stephen Hemminger
2026-04-29 10:24 ` [PATCH v2 00/20] Wangxun Fixes Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 02/20] net/ngbe: " Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-29 15:10     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-29 15:11     ` Stephen Hemminger
2026-05-09 11:06       ` Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-29 15:12     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 00/20] Wangxun Fixes Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 02/20] net/ngbe: " Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 15:44   ` [PATCH v3 00/20] Wangxun Fixes Stephen Hemminger
2026-05-09 17:07   ` Stephen Hemminger
2026-05-11 10:28     ` Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 02/20] net/ngbe: " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-17 23:37     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-17 23:39     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 05/20] net/txgbe: fix inaccuracy in Tx rate limiting Zaiyu Wang
2026-05-17 23:40     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-17 23:44     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-17 23:46     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-17 23:49     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-17 23:50     ` Stephen Hemminger [this message]
2026-05-11 10:35   ` [PATCH v4 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-17 23:52     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-17 23:53     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 18/20] net/txgbe: fix get EEPROM operation Zaiyu Wang
2026-05-17 23:54     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-11 10:36   ` [PATCH v4 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-17 23:55     ` Stephen Hemminger
2026-05-18 14:54   ` [PATCH v4 00/20] Wangxun Fixes Stephen Hemminger
2026-05-19  6:56     ` Zaiyu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260517165032.059933d3@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=jiawenwu@trustnetic.com \
    --cc=stable@dpdk.org \
    --cc=zaiyuwang@trustnetic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox