DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Zaiyu Wang <zaiyuwang@trustnetic.com>
Cc: dev@dpdk.org, stable@dpdk.org, Jiawen Wu <jiawenwu@trustnetic.com>
Subject: Re: [PATCH v4 16/20] net/txgbe: fix SFP module identification
Date: Sun, 17 May 2026 16:52:25 -0700	[thread overview]
Message-ID: <20260517165225.6ba8b774@phoenix.local> (raw)
In-Reply-To: <20260511103604.19724-17-zaiyuwang@trustnetic.com>

On Mon, 11 May 2026 18:35:58 +0800
Zaiyu Wang <zaiyuwang@trustnetic.com> wrote:

> Some optical modules were not correctly recognized due to ambiguous
> classification in the original detection flow. Rework the module
> identification logic to cover all module types. Also narrow the
> I2C lock scope to avoid potential race conditions during module
> access.
> 
> Fixes: ab191e6d9189 ("net/txgbe: support new SFP/QSFP modules")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
> ---

Lots of AI feedback here.
Note: not all this is required; like the request for extra release notes is AI being picky.

# Review of Patch 16/20: net/txgbe: fix SFP module identification

## Summary
This patch refactors SFP/QSFP module identification logic and adjusts I2C locking scope. The changes address module type detection and claim to fix race conditions.

---

## Errors

### 1. Resource leak in `txgbe_identify_sfp_module()` - multiple error paths missing unlock

**Error paths that acquire lock but fail to release on error:**

At line 836, `hw->mac.acquire_swfw_sync()` acquires `TXGBE_MNGSEM_SWPHY`. Several error paths after this point fail to release the lock:

```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_IDENTIFIER, &identifier);
if (err != 0) {
    /* ... */
    hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY);  // OK - releases
    return TXGBE_ERR_SFP_NOT_PRESENT;
}

if (identifier != TXGBE_SFF_IDENTIFIER_SFP) {
    hw->phy.type = txgbe_phy_sfp_unsupported;
    hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY);  // OK - releases
    return TXGBE_ERR_SFP_NOT_SUPPORTED;
}

err = hw->phy.read_i2c_eeprom(hw, ...);
/* Multiple read_i2c_eeprom calls follow */
```

**After line 894**, there are at least **4 read_i2c_eeprom() calls** (lines ~895-930 in the context) that can fail with `goto out;` - but `goto out;` is removed in this patch. If any of these reads fail now, the function returns immediately **without releasing the lock acquired at line 836**.

Example at line 1000:
```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_CABLE_SPEC_COMP, &cable_spec);
if (err != 0)
    goto out;  /* This label is REMOVED - now falls through or returns */
```

The old code had `out:` label that might have handled cleanup. The new code removes this pattern but doesn't add `hw->mac.release_swfw_sync()` calls before each early return.

**Impact:** Lock held indefinitely on I2C read failure, blocking other threads.

**Fix:** Add lock release before every early return after line 836:
```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_CABLE_SPEC_COMP, &cable_spec);
if (err != 0) {
    hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY);
    goto out;
}
```

---

### 2. `txgbe_write_i2c_byte()` removes STOP condition - probable hardware protocol violation

**Line 1299 (old) vs 1299 (new):**

Old code:
```c
wr32(hw, TXGBE_I2CDATA, byte_offset | TXGBE_I2CDATA_STOP);
```

New code:
```c
wr32(hw, TXGBE_I2CDATA, byte_offset);
```

The `TXGBE_I2CDATA_STOP` flag is removed from the byte_offset write but remains only on the data write. This changes the I2C transaction sequence:

- Old: Write byte_offset with STOP, write data with STOP
- New: Write byte_offset (no STOP), write data with WRITE+STOP (via `txgbe_i2c_stop()`)

**Analysis:** I2C writes to EEPROMs typically follow: START → device_addr+W → ACK → byte_offset → ACK → data → ACK → STOP. Removing STOP from the byte_offset phase may be correct if the hardware requires a combined write transaction, **but this is a functional change to I2C timing that is not explained in the commit message** and could break module writes.

**Impact:** Could cause I2C write failures or data corruption on SFP/QSFP EEPROMs.

**Fix:** Either:
1. Verify via hardware datasheet that single STOP at end is correct, document in commit message
2. Restore `TXGBE_I2CDATA_STOP` on byte_offset write

---

### 3. Removed validation logic without replacement - SFP vendor checking deleted

**Lines 1025-1070 (old code, removed):** The old `txgbe_identify_qsfp_module()` read vendor OUI bytes, checked for Intel vendor, and enforced "allow_any_sfp" policy. This entire block is deleted with no equivalent in the new code.

**Impact:**
- Unsupported vendor modules are now silently accepted (security/quality risk)
- `hw->allow_unsupported_sfp` flag is no longer checked (user configuration ignored)
- Warning messages for untested modules removed (users not informed of risks)

**This is a functional regression unless the validation moved elsewhere** (not evident in this patch).

**Fix:** Restore vendor OUI check or document why it's safe to remove.

---

## Warnings

### 1. `txgbe_identify_qsfp_module()` GPIO config runs unconditionally

**Lines 1098-1099:**
```c
/* config GPIO before read i2c */
wr32(hw, TXGBE_GPIODATA, TXGBE_GPIOBIT_1);
```

This GPIO write now runs **even when media_type is not fiber_qsfp** (the media check is after the GPIO write at lines 1108-1111). The old code had this GPIO write inside the `if (hw->mac.type == txgbe_mac_aml40)` block, after the media check.

**Impact:** Unnecessary hardware access, potential side effects on non-QSFP ports.

**Fix:** Move GPIO config inside the `if (hw->mac.type == txgbe_mac_aml40)` block or add media_type check first.

---

### 2. Missing error propagation - `txgbe_read_i2c_sff8636()` page select failure ignored

**Line 1215:**
```c
hw->phy.write_i2c_byte(hw, TXGBE_SFF_QSFP_PAGE_SELECT,
                TXGBE_I2C_EEPROM_DEV_ADDR,
                page);
```

The return value from `write_i2c_byte()` is discarded. If page select fails, the subsequent read at line 1218 will read from the **wrong page**, producing incorrect module data.

**Fix:**
```c
s32 err = hw->phy.write_i2c_byte(hw, TXGBE_SFF_QSFP_PAGE_SELECT,
                                 TXGBE_I2C_EEPROM_DEV_ADDR, page);
if (err != 0)
    return err;
```

---

### 3. Implicit comparison on pointer

**Line 1123 (in `txgbe_identify_qsfp_module`):**
```c
if (err)
    return -EBUSY;
```

Should be explicit:
```c
if (err != 0)
    return -EBUSY;
```

---

### 4. Release notes likely required

This patch changes module detection logic and affects supported module types. API behavior changes (vendor validation removed, new module type support) typically require release notes under "Fixed Issues" or "Known Issues" if compatibility changes exist.

---

### 5. Function pointer assignments removed but functions still defined

**Lines 2912-2913 (txgbe_hw.c):**
```c
phy->read_i2c_byte_unlocked = txgbe_read_i2c_byte_unlocked;
phy->write_i2c_byte_unlocked = txgbe_write_i2c_byte_unlocked;
```

These assignments are removed, but the commit message doesn't indicate whether the `*_unlocked` functions are deleted or just unused. If the functions remain defined but unreferenced, they are dead code. If other drivers reference `phy->read_i2c_byte_unlocked` as a function pointer, this breaks them.

---

## Info

### 1. Simplified I2C locking approach is clearer

The removal of separate `*_unlocked` variants and the move to caller-managed locking (acquire once, multiple reads, release) is a good pattern that reduces lock churn.

---

### 2. New SFP type handling for 25G/40G modules

The patch adds proper detection for:
- 25G AOC modules (`txgbe_sfp_type_25g_aoc_core0/1`)
- 40G QSFP types with Ethernet compliance byte checks

This appears to address the "ambiguous classification" mentioned in the commit message.

---

### 3. Consistent use of `read_i2c_sff8636()` helper

The new `txgbe_read_i2c_sff8636()` helper centralizes page-select logic for QSFP modules, which is good for maintainability (despite the error handling issue noted above).

---

## Final Check Summary

**Errors (must fix):**
1. Lock leaks in `txgbe_identify_sfp_module()` on I2C read errors after line 836
2. `TXGBE_I2CDATA_STOP` removal in `txgbe_write_i2c_byte()` changes I2C protocol, not explained/verified
3. Vendor OUI validation removed without replacement - security/quality regression

**Warnings (should fix):**
1. GPIO config runs before media_type check
2. Page select error ignored in `txgbe_read_i2c_sff8636()`
3. Implicit comparison `if (err)`
4. Release notes likely needed for behavior changes
5. Unused function pointer assignments removed - check for dead code


  reply	other threads:[~2026-05-17 23:56 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  3:40 [PATCH 00/18] Wangxun Fixes Zaiyu Wang
2026-04-23  3:40 ` [PATCH 01/18] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-23  3:40 ` [PATCH 02/18] net/ngbe: " Zaiyu Wang
2026-04-23  3:40 ` [PATCH 03/18] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-23  3:40 ` [PATCH 04/18] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-23  3:40 ` [PATCH 05/18] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-23  3:40 ` [PATCH 06/18] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-23  3:40 ` [PATCH 07/18] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-23  3:40 ` [PATCH 08/18] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-23  7:54   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 09/18] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-23  3:40 ` [PATCH 10/18] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-23  3:40 ` [PATCH 11/18] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-23  3:40 ` [PATCH 12/18] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-23  8:22   ` Jiawen Wu
2026-04-23  3:40 ` [PATCH 13/18] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 14/18] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-23  3:40 ` [PATCH 15/18] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-23  3:40 ` [PATCH 16/18] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-23  3:40 ` [PATCH 17/18] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-23  3:40 ` [PATCH 18/18] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-24 21:59   ` Stephen Hemminger
2026-04-29 10:24 ` [PATCH v2 00/20] Wangxun Fixes Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 02/20] net/ngbe: " Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-04-29 10:24   ` [PATCH v2 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-04-29 15:10     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-04-29 15:11     ` Stephen Hemminger
2026-05-09 11:06       ` Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-04-29 15:12     ` Stephen Hemminger
2026-04-29 10:25   ` [PATCH v2 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-04-29 10:25   ` [PATCH v2 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 11:28 ` [PATCH v3 00/20] Wangxun Fixes Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 02/20] net/ngbe: " Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 05/20] net/txgbe: fix inaccuracy in TX rate limiting Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 18/20] net/txgbe: fix get eeprom operation Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-09 11:28   ` [PATCH v3 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-09 15:44   ` [PATCH v3 00/20] Wangxun Fixes Stephen Hemminger
2026-05-09 17:07   ` Stephen Hemminger
2026-05-11 10:28     ` Zaiyu Wang
2026-05-11 10:35 ` [PATCH v4 " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 01/20] net/txgbe: remove duplicate xstats counters Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 02/20] net/ngbe: " Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 03/20] net/ngbe: add missing CDR config for YT PHY Zaiyu Wang
2026-05-17 23:37     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 04/20] net/ngbe: fix VF promiscuous and allmulticast Zaiyu Wang
2026-05-17 23:39     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 05/20] net/txgbe: fix inaccuracy in Tx rate limiting Zaiyu Wang
2026-05-17 23:40     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 06/20] net/txgbe: fix link status check condition Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 07/20] net/txgbe: fix Tx desc free logic Zaiyu Wang
2026-05-17 23:44     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 08/20] net/txgbe: fix link flow control registers for Amber-Lite Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 09/20] net/txgbe: fix link flow control config for Sapphire Zaiyu Wang
2026-05-17 23:46     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 10/20] net/txgbe: fix a mass of unknown interrupts Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 11/20] net/txgbe: fix traffic class priority configuration Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 12/20] net/txgbe: fix link stability for 25G NIC Zaiyu Wang
2026-05-17 23:49     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 13/20] net/txgbe: fix link stability for 40G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 14/20] net/txgbe: fix link stability for Amber-Lite backplane mode Zaiyu Wang
2026-05-17 23:50     ` Stephen Hemminger
2026-05-11 10:35   ` [PATCH v4 15/20] net/txgbe: fix FEC mode configuration on 25G NIC Zaiyu Wang
2026-05-11 10:35   ` [PATCH v4 16/20] net/txgbe: fix SFP module identification Zaiyu Wang
2026-05-17 23:52     ` Stephen Hemminger [this message]
2026-05-11 10:35   ` [PATCH v4 17/20] net/txgbe: fix get module info operation Zaiyu Wang
2026-05-17 23:53     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 18/20] net/txgbe: fix get EEPROM operation Zaiyu Wang
2026-05-17 23:54     ` Stephen Hemminger
2026-05-11 10:36   ` [PATCH v4 19/20] net/txgbe: fix to reset Tx write-back pointer Zaiyu Wang
2026-05-11 10:36   ` [PATCH v4 20/20] net/txgbe: fix to enable Tx desc check Zaiyu Wang
2026-05-17 23:55     ` Stephen Hemminger
2026-05-18 14:54   ` [PATCH v4 00/20] Wangxun Fixes Stephen Hemminger
2026-05-19  6:56     ` Zaiyu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260517165225.6ba8b774@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=jiawenwu@trustnetic.com \
    --cc=stable@dpdk.org \
    --cc=zaiyuwang@trustnetic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox