DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Zaiyu Wang <zaiyuwang@trustnetic.com>
Cc: dev@dpdk.org, stable@dpdk.org, Jiawen Wu <jiawenwu@trustnetic.com>
Subject: Re: [PATCH v4 12/20] net/txgbe: fix link stability for 25G NIC
Date: Sun, 17 May 2026 16:49:36 -0700	[thread overview]
Message-ID: <20260517164936.1c83977e@phoenix.local> (raw)
In-Reply-To: <20260511103604.19724-13-zaiyuwang@trustnetic.com>

On Mon, 11 May 2026 18:35:54 +0800
Zaiyu Wang <zaiyuwang@trustnetic.com> wrote:

> The link was previously configured via firmware, but this approach
> resulted in unstable link behavior. To resolve the issue, re-add the
> PHY configuration flow directly into the driver.
> 
> Fixes: ead3616f630d ("net/txgbe: support PHY configuration via SW-FW mailbox")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
> ---

Lots of AI review feedback here:

# Review of Patch 12/20: net/txgbe: fix link stability for 25G NIC

## Errors

### 1. Use-after-free potential in txgbe_e56_rxs_calib_adapt_seq (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines ~1600-1650

The function calls `txgbe_e56_rxs_osc_init_for_temp_track_range(hw, speed)` which can return `TXGBE_ERR_TIMEOUT`. When the timeout path at line ~1115 is taken, it writes to `E56PHY_PMD_CFG_0_ADDR` to disable RX and then returns `-1`. However, the caller `txgbe_e56_rxs_calib_adapt_seq` continues execution and accesses PHY registers without verifying the hardware state is valid. If the initialization timed out, subsequent register accesses may be unreliable.

**Suggested fix:**
```c
status = txgbe_e56_rxs_osc_init_for_temp_track_range(hw, speed);
if (status != 0)
    return status;  /* Propagate timeout or error immediately */
```

### 2. Resource leak: lock not released on error path (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_aml.c`  
**Location:** Lines 241-247

In `txgbe_setup_phy_link_aml`, `rte_spinlock_lock(&hw->phy_lock)` is acquired at line 241. If `txgbe_set_link_to_amlite` returns `TXGBE_ERR_PHY_INIT_NOT_DONE` at line 246, the code jumps to `out:` at line 264 without releasing the spinlock. The lock is only released on the success path or `TXGBE_ERR_TIMEOUT`.

**Suggested fix:**
```c
rte_spinlock_lock(&hw->phy_lock);
ret_status = txgbe_set_link_to_amlite(hw, speed);
rte_spinlock_unlock(&hw->phy_lock);  /* Always release before checking status */

if (ret_status == TXGBE_ERR_PHY_INIT_NOT_DONE)
    goto out;
```

### 3. Missing error propagation to caller (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_aml.c`  
**Location:** Lines 280-283

In `txgbe_setup_phy_link_aml`, the function returns `status` which is initialized to 0 and never assigned a failure value, even when `txgbe_set_link_to_amlite` fails or link is not established. The function always returns success regardless of actual outcome.

**Suggested fix:**
```c
if (!link_up) {
    *need_reset = true;
    return TXGBE_ERR_TIMEOUT;  /* Return error when link fails */
}
```

### 4. Double error return value overwrite (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 1090-1095, 1157-1162, 1197-1202, 1242-1247

In `txgbe_e56_rxs_osc_init_for_temp_track_range`, the pattern:
```c
if (timer++ > PHYINIT_TIMEOUT) {
    DEBUGOUT("ERROR: ...");
    break;
    return -1;  /* Unreachable: break exits the loop */
}
```
The `return -1` is unreachable. After `break`, execution continues to the next loop iteration check and then proceeds past the loop. The error is not propagated.

**Suggested fix:**
```c
if (timer++ > PHYINIT_TIMEOUT) {
    DEBUGOUT("ERROR: Wait timeout");
    return TXGBE_ERR_TIMEOUT;  /* Remove break, return immediately */
}
```

### 5. Error code dropped without propagation (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 1828-1830

In `txgbe_temp_track_seq`:
```c
status = txgbe_e56_get_temp(hw, &temperature);
if (status)
    return 0;  /* Returns success when get_temp failed */
```
Temperature reading failure is silently converted to success. The caller cannot distinguish between a successful temperature track and a failed temperature read.

**Suggested fix:**
```c
status = txgbe_e56_get_temp(hw, &temperature);
if (status)
    return status;  /* Propagate the error */
```

### 6. Integer overflow in bit shift (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 19-31 in `set_fields_e56()`

The function performs `(1 << bit_low)` where `bit_low` can be up to 31 (since bit fields in the code span 0-31). The literal `1` is `int` (signed 32-bit). When `bit_low >= 31`, `1 << 31` invokes undefined behavior (signed overflow).

**Suggested fix:**
```c
if (bit_high == bit_low) {
    if (set_value == 0)
        *src_data &= ~(1U << bit_low);  /* Use 1U for unsigned */
    else
        *src_data |= (1U << bit_low);
} else {
    for (i = bit_low; i <= bit_high; i++)
        *src_data &= ~(1U << i);
    *src_data |= (set_value << bit_low);
}
```

### 7. Missing bounds check on array size calculation (CORRECTNESS)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Line 1498 in `txgbe_e56_rx_rd_second_code()`

The code calls `qsort(RXS_BBCDR_SECOND_ORDER_ST, array_size, sizeof(int), ...)` where `array_size = ARRAY_SIZE(RXS_BBCDR_SECOND_ORDER_ST)`. The array has 5 elements, but the calculation `array_size = ARRAY_SIZE(...)` trusts the macro. If N changes in the loop initialization (line 1489: `N = 5`), the array could overflow.

**Suggested fix:**
```c
/* Ensure loop count matches array size */
BUILD_ASSERT(N <= ARRAY_SIZE(RXS_BBCDR_SECOND_ORDER_ST));
```

## Warnings

### 1. Variables declared but may be read before initialization (STYLE)
**File:** `drivers/net/txgbe/base/txgbe_aml.c`  
**Location:** Lines 325-327, 355-357

Variables `need_reset` are declared immediately before use, which is acceptable C99 style. However, they are declared in the middle of a conditional block scope, which can reduce clarity when the same variable name is reused in different branches.

**Suggested improvement:**
Declare `need_reset` once at function scope for better readability, though the current usage is technically correct.

### 2. Missing documentation for complex algorithm (STYLE)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 902-1264 (txgbe_e56_rxs_osc_init_for_temp_track_range)

This 360-line function implements a complex PHY oscillator calibration sequence with numbered steps in comments (1, 2, 3...). The algorithm is critical for 25G link stability but lacks a high-level overview comment explaining the purpose, temperature dependency, and why specific register sequences are needed.

**Suggested improvement:**
Add a function-level Doxygen comment explaining the calibration procedure and referencing the hardware specification section.

### 3. Magic numbers without named constants (STYLE)
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Multiple locations (e.g., lines 505, 732, 1000)

Numerous magic numbers appear in register writes (e.g., `0x68c1`, `0x3321`, `0x973e`, `0xccde` at lines 2281-2284) without explanation. These appear to be RS-FEC configuration magic values but are not documented.

**Suggested improvement:**
Define named constants or add inline comments explaining what these values configure:
```c
#define RS_FEC_CONFIG_VAL1  0x68c1  /* RS-FEC configuration register value */
```

## Info

### 1. Deep nesting in calibration sequence
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 1550-1750 (txgbe_e56_rxs_calib_adapt_seq)

The ADC calibration loop (16 iterations) has 5 levels of nesting. Consider extracting the calibration step into a helper function to improve readability.

### 2. Possible candidate for helper function
**File:** `drivers/net/txgbe/base/txgbe_e56.c`  
**Location:** Lines 1950-2050 (repeated CTLE bypass pattern)

The code to override CTLE bypass for multiple lanes (RXS1, RXS2, RXS3) at lines 1999-2030 is repetitive. This pattern could be extracted into a helper function.

---

## Summary

**Critical Issues:** 7 Errors identified, primarily correctness bugs affecting resource management and error handling.  
**Style Issues:** 3 Warnings - documentation and code organization improvements.  
**Observations:** 2 Info items - code structure suggestions.

The most serious issues are:
1. Lock leak on error path (Error #2)
2. Missing error propagation causing silent failures (Errors #3, #5)
3. Undefined behavior in bit operations (Error #6)
4. Unreachable error returns masking timeouts (Error #4)

============================================================


  reply	other threads:[~2026-05-17 23:56 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  3:40 [PATCH 00/18] Wangxun Fixes Zaiyu Wang
2026-04-23  3:40 ` [PATCH 01/18] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-23  3:40 ` [PATCH 02/18] net/ngbe: " Zaiyu Wang
2026-04-23  3:40 ` [PATCH 03/18] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-23  3:40 ` [PATCH 04/18] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-23  3:40 ` [PATCH 05/18] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-23  3:40 ` [PATCH 06/18] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-23  3:40 ` [PATCH 07/18] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-23  3:40 ` [PATCH 08/18] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-23  7:54   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 09/18] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-23  3:40 ` [PATCH 10/18] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-23  3:40 ` [PATCH 11/18] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-23  3:40 ` [PATCH 12/18] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-23  8:22   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 13/18] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 14/18] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-23  3:40 ` [PATCH 15/18] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 16/18] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-23  3:40 ` [PATCH 17/18] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-23  3:40 ` [PATCH 18/18] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-24 21:59   ` Stephen Hemminger
2026-04-29 10:24 ` [PATCH v2 00/20] Wangxun Fixes Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 02/20] net/ngbe: " Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-29 15:10     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-29 15:11     ` Stephen Hemminger
2026-05-09 11:06       ` Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-29 15:12     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 00/20] Wangxun Fixes Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 02/20] net/ngbe: " Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 15:44   ` [PATCH v3 00/20] Wangxun Fixes Stephen Hemminger
2026-05-09 17:07   ` Stephen Hemminger
2026-05-11 10:28     ` Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 02/20] net/ngbe: " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-17 23:37     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-17 23:39     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 05/20] net/txgbe: fix inaccuracy in Tx rate limiting Zaiyu Wang
2026-05-17 23:40     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-17 23:44     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-17 23:46     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-17 23:49     ` Stephen Hemminger [this message]
2026-05-11 10:35   ` [PATCH v4 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-17 23:50     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-17 23:52     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-17 23:53     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 18/20] net/txgbe: fix get EEPROM operation Zaiyu Wang
2026-05-17 23:54     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-11 10:36   ` [PATCH v4 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-17 23:55     ` Stephen Hemminger
2026-05-18 14:54   ` [PATCH v4 00/20] Wangxun Fixes Stephen Hemminger
2026-05-19  6:56     ` Zaiyu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260517164936.1c83977e@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=jiawenwu@trustnetic.com \
    --cc=stable@dpdk.org \
    --cc=zaiyuwang@trustnetic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox