Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH iwl-net v2 2/4] iavf: stop removing VLAN filters from PF on interface down
From: Petr Oros @ 2026-04-17 14:29 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Aleksandr Loktionov, Rafal Romanowski, Tony Nguyen,
	Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesse Brandeburg, Mitch Williams,
	Aaron Brown, Przemyslaw Patynowski, Jedrzej Jagielski,
	intel-wired-lan, linux-kernel, jacob.e.keller
In-Reply-To: <cover.1776426683.git.poros@redhat.com>

When a VF goes down, the driver currently sends DEL_VLAN to the PF for
every VLAN filter (ACTIVE -> DISABLE -> send DEL -> INACTIVE), then
re-adds them all on UP (INACTIVE -> ADD -> send ADD -> ADDING ->
ACTIVE). This round-trip is unnecessary because:

 1. The PF disables the VF's queues via VIRTCHNL_OP_DISABLE_QUEUES,
    which already prevents all RX/TX traffic regardless of VLAN filter
    state.

 2. The VLAN filters remaining in PF HW while the VF is down is
    harmless - packets matching those filters have nowhere to go with
    queues disabled.

 3. The DEL+ADD cycle during down/up creates race windows where the
    VLAN filter list is incomplete. With spoofcheck enabled, the PF
    enables TX VLAN filtering on the first non-zero VLAN add, blocking
    traffic for any VLANs not yet re-added.

Remove the entire DISABLE/INACTIVE state machinery:
 - Remove IAVF_VLAN_DISABLE and IAVF_VLAN_INACTIVE enum values
 - Remove iavf_restore_filters() and its call from iavf_open()
 - Remove VLAN filter handling from iavf_clear_mac_vlan_filters(),
   rename it to iavf_clear_mac_filters()
 - Remove DEL_VLAN_FILTER scheduling from iavf_down()
 - Remove all DISABLE/INACTIVE handling from iavf_del_vlans()

VLAN filters now stay ACTIVE across down/up cycles. Only explicit
user removal (ndo_vlan_rx_kill_vid) or PF/VF reset triggers VLAN
filter deletion/re-addition.

Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 +--
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 39 ++-----------------
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 33 +++-------------
 3 files changed, 12 insertions(+), 66 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 47a862ca5e2c3f..5765715914d6b2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -159,10 +159,8 @@ enum iavf_vlan_state_t {
 	IAVF_VLAN_INVALID,
 	IAVF_VLAN_ADD,		/* filter needs to be added */
 	IAVF_VLAN_ADDING,	/* ADD sent to PF, waiting for response */
-	IAVF_VLAN_ACTIVE,	/* filter is accepted by PF */
-	IAVF_VLAN_DISABLE,	/* filter needs to be deleted by PF, then marked INACTIVE */
-	IAVF_VLAN_INACTIVE,	/* filter is inactive, we are in IFF_DOWN */
-	IAVF_VLAN_REMOVE,	/* filter needs to be removed from list */
+	IAVF_VLAN_ACTIVE,	/* PF confirmed, filter is in HW */
+	IAVF_VLAN_REMOVE,	/* filter queued for DEL from PF */
 };
 
 struct iavf_vlan_filter {
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index dad001abc9086b..12e102506011a6 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -801,27 +801,6 @@ static void iavf_del_vlan(struct iavf_adapter *adapter, struct iavf_vlan vlan)
 	spin_unlock_bh(&adapter->mac_vlan_list_lock);
 }
 
-/**
- * iavf_restore_filters
- * @adapter: board private structure
- *
- * Restore existing non MAC filters when VF netdev comes back up
- **/
-static void iavf_restore_filters(struct iavf_adapter *adapter)
-{
-	struct iavf_vlan_filter *f;
-
-	/* re-add all VLAN filters */
-	spin_lock_bh(&adapter->mac_vlan_list_lock);
-
-	list_for_each_entry(f, &adapter->vlan_filter_list, list) {
-		if (f->state == IAVF_VLAN_INACTIVE)
-			f->state = IAVF_VLAN_ADD;
-	}
-
-	spin_unlock_bh(&adapter->mac_vlan_list_lock);
-	adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER;
-}
 
 /**
  * iavf_get_num_vlans_added - get number of VLANs added
@@ -1240,13 +1219,12 @@ static void iavf_up_complete(struct iavf_adapter *adapter)
 }
 
 /**
- * iavf_clear_mac_vlan_filters - Remove mac and vlan filters not sent to PF
- * yet and mark other to be removed.
+ * iavf_clear_mac_filters - Remove MAC filters not sent to PF yet and mark
+ * others to be removed.
  * @adapter: board private structure
  **/
-static void iavf_clear_mac_vlan_filters(struct iavf_adapter *adapter)
+static void iavf_clear_mac_filters(struct iavf_adapter *adapter)
 {
-	struct iavf_vlan_filter *vlf, *vlftmp;
 	struct iavf_mac_filter *f, *ftmp;
 
 	spin_lock_bh(&adapter->mac_vlan_list_lock);
@@ -1265,11 +1243,6 @@ static void iavf_clear_mac_vlan_filters(struct iavf_adapter *adapter)
 		}
 	}
 
-	/* disable all VLAN filters */
-	list_for_each_entry_safe(vlf, vlftmp, &adapter->vlan_filter_list,
-				 list)
-		vlf->state = IAVF_VLAN_DISABLE;
-
 	spin_unlock_bh(&adapter->mac_vlan_list_lock);
 }
 
@@ -1365,7 +1338,7 @@ void iavf_down(struct iavf_adapter *adapter)
 	iavf_napi_disable_all(adapter);
 	iavf_irq_disable(adapter);
 
-	iavf_clear_mac_vlan_filters(adapter);
+	iavf_clear_mac_filters(adapter);
 	iavf_clear_cloud_filters(adapter);
 	iavf_clear_fdir_filters(adapter);
 	iavf_clear_adv_rss_conf(adapter);
@@ -1382,8 +1355,6 @@ void iavf_down(struct iavf_adapter *adapter)
 		 */
 		if (!list_empty(&adapter->mac_filter_list))
 			adapter->aq_required |= IAVF_FLAG_AQ_DEL_MAC_FILTER;
-		if (!list_empty(&adapter->vlan_filter_list))
-			adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER;
 		if (!list_empty(&adapter->cloud_filter_list))
 			adapter->aq_required |= IAVF_FLAG_AQ_DEL_CLOUD_FILTER;
 		if (!list_empty(&adapter->fdir_list_head))
@@ -4488,8 +4459,6 @@ static int iavf_open(struct net_device *netdev)
 	iavf_add_filter(adapter, adapter->hw.mac.addr);
 	spin_unlock_bh(&adapter->mac_vlan_list_lock);
 
-	/* Restore filters that were removed with IFF_DOWN */
-	iavf_restore_filters(adapter);
 	iavf_restore_fdir_filters(adapter);
 
 	iavf_configure(adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 6b06ae872a0cdf..4f197d908124e6 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -911,22 +911,12 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 	spin_lock_bh(&adapter->mac_vlan_list_lock);
 
 	list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
-		/* since VLAN capabilities are not allowed, we dont want to send
-		 * a VLAN delete request because it will most likely fail and
-		 * create unnecessary errors/noise, so just free the VLAN
-		 * filters marked for removal to enable bailing out before
-		 * sending a virtchnl message
-		 */
 		if (f->state == IAVF_VLAN_REMOVE &&
 		    !VLAN_FILTERING_ALLOWED(adapter)) {
 			list_del(&f->list);
 			kfree(f);
 			adapter->num_vlan_filters--;
-		} else if (f->state == IAVF_VLAN_DISABLE &&
-		    !VLAN_FILTERING_ALLOWED(adapter)) {
-			f->state = IAVF_VLAN_INACTIVE;
-		} else if (f->state == IAVF_VLAN_REMOVE ||
-			   f->state == IAVF_VLAN_DISABLE) {
+		} else if (f->state == IAVF_VLAN_REMOVE) {
 			count++;
 		}
 	}
@@ -959,13 +949,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 		vvfl->vsi_id = adapter->vsi_res->vsi_id;
 		vvfl->num_elements = count;
 		list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
-			if (f->state == IAVF_VLAN_DISABLE) {
-				vvfl->vlan_id[i] = f->vlan.vid;
-				f->state = IAVF_VLAN_INACTIVE;
-				i++;
-				if (i == count)
-					break;
-			} else if (f->state == IAVF_VLAN_REMOVE) {
+			if (f->state == IAVF_VLAN_REMOVE) {
 				vvfl->vlan_id[i] = f->vlan.vid;
 				list_del(&f->list);
 				kfree(f);
@@ -1007,8 +991,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 		vvfl_v2->vport_id = adapter->vsi_res->vsi_id;
 		vvfl_v2->num_elements = count;
 		list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
-			if (f->state == IAVF_VLAN_DISABLE ||
-			    f->state == IAVF_VLAN_REMOVE) {
+			if (f->state == IAVF_VLAN_REMOVE) {
 				struct virtchnl_vlan_supported_caps *filtering_support =
 					&adapter->vlan_v2_caps.filtering.filtering_support;
 				struct virtchnl_vlan *vlan;
@@ -1022,13 +1005,9 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 				vlan->tci = f->vlan.vid;
 				vlan->tpid = f->vlan.tpid;
 
-				if (f->state == IAVF_VLAN_DISABLE) {
-					f->state = IAVF_VLAN_INACTIVE;
-				} else {
-					list_del(&f->list);
-					kfree(f);
-					adapter->num_vlan_filters--;
-				}
+				list_del(&f->list);
+				kfree(f);
+				adapter->num_vlan_filters--;
 				i++;
 				if (i == count)
 					break;
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net v2 3/4] iavf: wait for PF confirmation before removing VLAN filters
From: Petr Oros @ 2026-04-17 14:29 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Aleksandr Loktionov, Rafal Romanowski, Tony Nguyen,
	Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesse Brandeburg, Mitch Williams,
	Aaron Brown, Przemyslaw Patynowski, Jedrzej Jagielski,
	intel-wired-lan, linux-kernel, jacob.e.keller
In-Reply-To: <cover.1776426683.git.poros@redhat.com>

The VLAN filter DELETE path was asymmetric with the ADD path: ADD
waits for PF confirmation (ADD -> ADDING -> ACTIVE), but DELETE
immediately frees the filter struct after sending the DEL message
without waiting for the PF response.

This is problematic because:
 - If the PF rejects the DEL, the filter remains in HW but the driver
   has already freed the tracking structure, losing sync.
 - Race conditions between DEL pending and other operations
   (add, reset) cannot be properly resolved if the filter struct
   is already gone.

Add IAVF_VLAN_REMOVING state to make the DELETE path symmetric:

  REMOVE -> REMOVING (send DEL) -> PF confirms -> kfree
                                -> PF rejects  -> ACTIVE

In iavf_del_vlans(), transition filters from REMOVE to REMOVING
instead of immediately freeing them. The new DEL completion handler
in iavf_virtchnl_completion() frees filters on success or reverts
them to ACTIVE on error.

Update iavf_add_vlan() to handle the REMOVING state: if a DEL is
pending and the user re-adds the same VLAN, queue it for ADD so
it gets re-programmed after the PF processes the DEL.

The !VLAN_FILTERING_ALLOWED early-exit path still frees filters
directly since no PF message is sent in that case.

Also update iavf_del_vlan() to skip filters already in REMOVING
state: DEL has been sent to PF and the completion handler will
free the filter when PF confirms. Without this guard, the sequence
DEL(pending) -> user-del -> second DEL could cause the PF to return
an error for the second DEL (filter already gone), causing the
completion handler to incorrectly revert a deleted filter back to
ACTIVE.

Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 13 ++++---
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 37 +++++++++++++------
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 5765715914d6b2..050f8241ef5e6b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -161,6 +161,7 @@ enum iavf_vlan_state_t {
 	IAVF_VLAN_ADDING,	/* ADD sent to PF, waiting for response */
 	IAVF_VLAN_ACTIVE,	/* PF confirmed, filter is in HW */
 	IAVF_VLAN_REMOVE,	/* filter queued for DEL from PF */
+	IAVF_VLAN_REMOVING,	/* DEL sent to PF, waiting for response */
 };
 
 struct iavf_vlan_filter {
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 12e102506011a6..d373feee4c7e9c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -757,10 +757,10 @@ iavf_vlan_filter *iavf_add_vlan(struct iavf_adapter *adapter,
 		adapter->num_vlan_filters++;
 		iavf_schedule_aq_request(adapter, IAVF_FLAG_AQ_ADD_VLAN_FILTER);
 	} else if (f->state == IAVF_VLAN_REMOVE) {
-		/* Re-add the filter since we cannot tell whether the
-		 * pending delete has already been processed by the PF.
-		 * A duplicate add is harmless.
-		 */
+		/* DEL not yet sent to PF, cancel it */
+		f->state = IAVF_VLAN_ACTIVE;
+	} else if (f->state == IAVF_VLAN_REMOVING) {
+		/* DEL already sent to PF, re-add after completion */
 		f->state = IAVF_VLAN_ADD;
 		iavf_schedule_aq_request(adapter,
 					 IAVF_FLAG_AQ_ADD_VLAN_FILTER);
@@ -791,11 +791,14 @@ static void iavf_del_vlan(struct iavf_adapter *adapter, struct iavf_vlan vlan)
 			list_del(&f->list);
 			kfree(f);
 			adapter->num_vlan_filters--;
-		} else {
+		} else if (f->state != IAVF_VLAN_REMOVING) {
 			f->state = IAVF_VLAN_REMOVE;
 			iavf_schedule_aq_request(adapter,
 						 IAVF_FLAG_AQ_DEL_VLAN_FILTER);
 		}
+		/* If REMOVING, DEL is already sent to PF; completion
+		 * handler will free the filter when PF confirms.
+		 */
 	}
 
 	spin_unlock_bh(&adapter->mac_vlan_list_lock);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 4f197d908124e6..93ca79c3e3b535 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -948,12 +948,10 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 
 		vvfl->vsi_id = adapter->vsi_res->vsi_id;
 		vvfl->num_elements = count;
-		list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
+		list_for_each_entry(f, &adapter->vlan_filter_list, list) {
 			if (f->state == IAVF_VLAN_REMOVE) {
 				vvfl->vlan_id[i] = f->vlan.vid;
-				list_del(&f->list);
-				kfree(f);
-				adapter->num_vlan_filters--;
+				f->state = IAVF_VLAN_REMOVING;
 				i++;
 				if (i == count)
 					break;
@@ -990,7 +988,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 
 		vvfl_v2->vport_id = adapter->vsi_res->vsi_id;
 		vvfl_v2->num_elements = count;
-		list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
+		list_for_each_entry(f, &adapter->vlan_filter_list, list) {
 			if (f->state == IAVF_VLAN_REMOVE) {
 				struct virtchnl_vlan_supported_caps *filtering_support =
 					&adapter->vlan_v2_caps.filtering.filtering_support;
@@ -1005,9 +1003,7 @@ void iavf_del_vlans(struct iavf_adapter *adapter)
 				vlan->tci = f->vlan.vid;
 				vlan->tpid = f->vlan.tpid;
 
-				list_del(&f->list);
-				kfree(f);
-				adapter->num_vlan_filters--;
+				f->state = IAVF_VLAN_REMOVING;
 				i++;
 				if (i == count)
 					break;
@@ -2370,10 +2366,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
 			wake_up(&adapter->vc_waitqueue);
 			break;
-		case VIRTCHNL_OP_DEL_VLAN:
-			dev_err(&adapter->pdev->dev, "Failed to delete VLAN filter, error %s\n",
-				iavf_stat_str(&adapter->hw, v_retval));
-			break;
 		case VIRTCHNL_OP_DEL_ETH_ADDR:
 			dev_err(&adapter->pdev->dev, "Failed to delete MAC filter, error %s\n",
 				iavf_stat_str(&adapter->hw, v_retval));
@@ -2895,6 +2887,27 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		spin_unlock_bh(&adapter->mac_vlan_list_lock);
 		}
 		break;
+	case VIRTCHNL_OP_DEL_VLAN:
+	case VIRTCHNL_OP_DEL_VLAN_V2: {
+		struct iavf_vlan_filter *f, *ftmp;
+
+		spin_lock_bh(&adapter->mac_vlan_list_lock);
+		list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list,
+					 list) {
+			if (f->state == IAVF_VLAN_REMOVING) {
+				if (v_retval) {
+					/* PF rejected DEL, keep filter */
+					f->state = IAVF_VLAN_ACTIVE;
+				} else {
+					list_del(&f->list);
+					kfree(f);
+					adapter->num_vlan_filters--;
+				}
+			}
+		}
+		spin_unlock_bh(&adapter->mac_vlan_list_lock);
+		}
+		break;
 	case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
 		/* PF enabled vlan strip on this VF.
 		 * Update netdev->features if needed to be in sync with ethtool.
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net v2 4/4] iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
From: Petr Oros @ 2026-04-17 14:29 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Aleksandr Loktionov, Rafal Romanowski, Tony Nguyen,
	Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesse Brandeburg, Mitch Williams,
	Aaron Brown, Przemyslaw Patynowski, Jedrzej Jagielski,
	intel-wired-lan, linux-kernel, jacob.e.keller
In-Reply-To: <cover.1776426683.git.poros@redhat.com>

The V1 ADD_VLAN opcode had no success handler; filters sent via V1
stayed in ADDING state permanently.  Add a fallthrough case so V1
filters also transition ADDING -> ACTIVE on PF confirmation.

Critically, add an `if (v_retval) break` guard: the error switch in
iavf_virtchnl_completion() does NOT return after handling errors,
it falls through to the success switch.  Without this guard, a
PF-rejected ADD would incorrectly mark ADDING filters as ACTIVE,
creating a driver/HW mismatch where the driver believes the filter
is installed but the PF never accepted it.

For V2, this is harmless: iavf_vlan_add_reject() in the error
block already kfree'd all ADDING filters, so the success handler
finds nothing to transition.

Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 93ca79c3e3b535..4f2defd2331b17 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -2876,9 +2876,13 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		spin_unlock_bh(&adapter->adv_rss_lock);
 		}
 		break;
+	case VIRTCHNL_OP_ADD_VLAN:
 	case VIRTCHNL_OP_ADD_VLAN_V2: {
 		struct iavf_vlan_filter *f;

+		if (v_retval)
+			break;
+
 		spin_lock_bh(&adapter->mac_vlan_list_lock);
 		list_for_each_entry(f, &adapter->vlan_filter_list, list) {
 			if (f->state == IAVF_VLAN_ADDING)
-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH v4 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-17 14:35 UTC (permalink / raw)
  To: Fidelio Lawson, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Marek Vasut, Maxime Chevallier, Simon Horman,
	Heiner Kallweit, Russell King
  Cc: netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <20260417-ksz87xx_errata_low_loss_connections-v4-1-6c7044ec4363@exotec.com>

On 4/17/26 2:44 PM, Fidelio Lawson wrote:

[...]

> @@ -1271,6 +1287,29 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
>   		if (ret)
>   			return ret;
>   		break;
> +	case PHY_REG_KSZ87XX_SHORT_CABLE:
> +		if (!ksz_is_ksz87xx(dev))
> +			return -EOPNOTSUPP;
> +		ret = ksz87xx_apply_low_loss_preset(dev, !!val);
> +		if (ret)
> +			return ret;
> +		break;
> +	case PHY_REG_KSZ87XX_LPF_BW:
> +		if (!ksz_is_ksz87xx(dev))
> +			return -EOPNOTSUPP;
> +		ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF, (u8)val);
> +		if (ret)
> +			return ret;
> +		dev->lpf_bw = val;
> +		break;
> +	case PHY_REG_KSZ87XX_EQ_INIT:
> +		if (!ksz_is_ksz87xx(dev))
> +			return -EOPNOTSUPP;
> +		ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_DSP_EQ, (u8)val);
Do these values need some check, so they would be in the correct 
range(s) / in the correct bitfields before being written into those 
registers ?

^ permalink raw reply

* Re: [PATCH v4 0/3] ksz87xx: add support for low-loss cable equalizer errata
From: Marek Vasut @ 2026-04-17 14:36 UTC (permalink / raw)
  To: Fidelio Lawson, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Marek Vasut, Maxime Chevallier, Simon Horman,
	Heiner Kallweit, Russell King
  Cc: netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <20260417-ksz87xx_errata_low_loss_connections-v4-0-6c7044ec4363@exotec.com>

On 4/17/26 2:44 PM, Fidelio Lawson wrote:
> Hello,
> 
> This patch implements the “Module 3: Equalizer fix for short cables” erratum
> described in Microchip document DS80000687C for KSZ87xx switches.
> 
> According to the erratum, the embedded PHY receiver in KSZ87xx switches is
> tuned by default for long, high-loss Ethernet cables. When operating with
> short or low-loss cables (for example CAT5e or CAT6), the PHY equalizer may
> over-amplify the incoming signal, leading to internal distortion and link
> establishment failures.
> 
> Microchip documents two independent mechanisms to mitigate this issue:
> adjusting the receiver low‑pass filter bandwidth and reducing the DSP
> equalizer initial value. These registers are located in the switch’s
> internal LinkMD table and cannot be accessed directly through a
> stand‑alone PHY driver.
> 
> To keep the PHY‑facing API clean, this series models the erratum handling
> as vendor‑specific Clause 22 PHY registers, virtualized by the KSZ8 DSA
> driver. Accesses are intercepted by ksz8_r_phy() / ksz8_w_phy() and
> translated into the appropriate indirect LinkMD register writes. The
> erratum affects the shared PHY analog front‑end and therefore applies
> globally to the switch.
> 
> Based on review feedback, the user‑visible interface is kept deliberately
> simple and predictable:
> 
> - A boolean “short‑cable” PHY tunable applies a documented and
>    conservative preset (LPF bandwidth 62MHz, DSP EQ initial value 0).
>    This is the recommended KISS interface for the common short‑cable
>    scenario.
> 
> - Two additional integer PHY tunables allow advanced or experimental
>    tuning of the LPF bandwidth and the DSP EQ initial value. These
>    controls are orthogonal, have no ordering requirements, and simply
>    override the corresponding setting when written.
> 
> The tunables act as simple setters with no implicit state machine or
> invalid combinations, avoiding surprises for userspace and not relying
> on extended error reporting or netlink ethtool support.
> 
> This series contains:
> 
>    1. Support for the KSZ87xx low‑loss cable erratum in the KSZ8 DSA driver,
>       including the short‑cable preset and orthogonal tuning controls.
> 
>    2. Addition of vendor‑specific PHY tunable identifiers for the
>       short‑cable preset, LPF bandwidth, and DSP EQ initial value.
> 
>    3. Exposure of these tunables through the Micrel PHY driver via
>       get_tunable / set_tunable callbacks.
> 
> This version follows the design agreed upon during v3 review and
> reworks the interface accordingly.
> 
> This series is based on Linux v7.0-rc1.
> 
> Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
Thank you for working on this, except for that one nitpick on 1/3, this 
looks really good !

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v3 1/5] iavf: return EBUSY if reset in progress or not ready during MAC change
From: Przemek Kitszel @ 2026-04-17 14:38 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez
  Cc: intel-wired-lan, netdev, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni
In-Reply-To: <20260414110006.124286-2-jtornosm@redhat.com>

On 4/14/26 13:00, Jose Ignacio Tornos Martinez wrote:
> When a MAC address change is requested while the VF is resetting or still
> initializing, return -EBUSY immediately instead of attempting the
> operation.
> 
> Additionally, during early initialization states (before __IAVF_DOWN),
> the PF may be slow to respond to MAC change requests, causing long
> delays. Only allow MAC changes once the VF reaches __IAVF_DOWN state or
> later, when the watchdog is running and the VF is ready for operations.
> 
> After commit ad7c7b2172c3 ("net: hold netdev instance lock
> during sysfs operations"), MAC changes are called with the netdev lock
> held, so we should not wait with the lock held during reset or
> initialization. This allows the caller to retry or handle the busy state
> appropriately without blocking other operations.

that makes sense, but that could break user scripts, OTOH, with netdev
lock taken here, user could be blocked forever, so I think this is a net
positive change,
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

> 
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> 
>   drivers/net/ethernet/intel/iavf/iavf_main.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index dad001abc908..67aa14350b1b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -1060,6 +1060,9 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
>   	struct sockaddr *addr = p;
>   	int ret;
>   
> +	if (iavf_is_reset_in_progress(adapter) || adapter->state < __IAVF_DOWN)
> +		return -EBUSY;
> +
>   	if (!is_valid_ether_addr(addr->sa_data))
>   		return -EADDRNOTAVAIL;
>   


^ permalink raw reply

* Re: [PATCH net v2 1/2] bnge: fix initial HWRM sequence
From: Jakub Kicinski @ 2026-04-17 14:42 UTC (permalink / raw)
  To: Vikas Gupta
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, netdev,
	linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta
In-Reply-To: <CAHLZf_voEPTrfkuO6pJcchaQaOqJin8m7j-+hwMrjJcGFmJv0A@mail.gmail.com>

On Fri, 17 Apr 2026 11:46:08 +0530 Vikas Gupta wrote:
> > > -err_func_unrgtr:
> > > -     bnge_fw_unregister_dev(bd);
> > > +err_free_ctx_mem:
> > > +     bnge_free_ctx_mem(bd);
> > >       return rc;
> > >  }  
> >
> > This error path appears to have the same regression. If
> > bnge_hwrm_func_drv_rgtr() fails after bnge_func_qcaps() has already
> > configured the backing store, freeing the context memory directly without
> > unregistering might allow the hardware to access freed memory.  
> 
> Even if bnge_hwrm_func_drv_rgtr() fails, it is still safe to free the context
> memory at the host because the driver unloads from this point.

Looking closer, indeed, the way bnge_hwrm_func_drv_unrgtr() is written
the AI suggestion is pointless. Hopefully you're right cause debugging
FW corrupting host memory after reboot on bnxt is not fun.

> AI reviews appear to ignore logic related to handling context memory
> in the patch.
> I see no valid comments on the patch.

Why is bnge_func_qcaps() allocating context mem? It may be the case
that context mem has to be allocated but bnge_func_qcaps() doesn't
sound like a function that'd perform such key part of init.
Why not just move the alloc earlier in bnge_fw_register_dev() ?

^ permalink raw reply

* Re: [net-next v2 1/5] dt-bindings: net: starfive,jh7110-dwmac: Remove JH8100
From: Andrew Lunn @ 2026-04-17 14:54 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev, linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260417024523.107786-2-minda.chen@starfivetech.com>

On Fri, Apr 17, 2026 at 10:45:19AM +0800, Minda Chen wrote:
> Remove JH8100 dt-bindings because do not support it now.

> StarFive have stopped JH8100 developing and will release it
> outside.

Is there a missing "not" in that sentence?

    Andrew

---
pw-bot: cr

^ permalink raw reply

* [PATCH iwl-net v7 0/3] ice: fix missing dpll notifications for SW pins
From: Petr Oros @ 2026-04-17 14:59 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Arkadiusz Kubalewski, Jiri Pirko, Vadim Fedorenko,
	Ivan Vecera, Michal Schmidt, Jacob Keller, Aleksandr Loktionov,
	Rinitha S, intel-wired-lan, linux-kernel

The SMA/U.FL pin redesign never propagated dpll notifications to the
software-controlled pin wrappers.  This series fixes that in two steps
plus a prerequisite dpll core change.

Patch 1 exports __dpll_pin_change_ntf() so ice can send peer
notifications from callback context where dpll_lock is already held.

Patch 2 fixes HW-to-SW notification propagation: periodic work now
notifies SW wrappers when their backing CGU input changes, and
phase_offset reporting for SW pins reads the backing pin's value.

Patch 3 adds SW-to-SW peer notification: when SMA or U.FL pin state
changes via PCA9575 routing, the paired pin gets a dpll change event.

Ivan Vecera (1):
  dpll: export __dpll_pin_change_ntf() for use under dpll_lock

Petr Oros (2):
  ice: fix missing dpll notifications for SW pins
  ice: add dpll peer notification for paired SMA and U.FL pins

 drivers/dpll/dpll_netlink.c               | 10 +++
 drivers/dpll/dpll_netlink.h               |  2 -
 drivers/net/ethernet/intel/ice/ice_dpll.c | 79 +++++++++++++++++++----
 include/linux/dpll.h                      |  1 +
 4 files changed, 79 insertions(+), 13 deletions(-)

---
v7:
 - split ice patch into two: HW-to-SW notification propagation and
   SW-to-SW peer notification (requested by Jiri Pirko)
 - drop spurious blank line removal in ice_dpll_sma_direction_set()
v6: https://lore.kernel.org/all/20260416113952.389405-1-poros@redhat.com/
v5: https://lore.kernel.org/all/20260409102501.1447628-1-poros@redhat.com/
v4: https://lore.kernel.org/all/20260319205256.998876-1-poros@redhat.com/
v3: https://lore.kernel.org/all/20260220140700.2910174-1-poros@redhat.com/
v2: https://lore.kernel.org/all/20260219131500.2271897-1-poros@redhat.com/
v1: https://lore.kernel.org/all/20260218211414.1411163-1-poros@redhat.com/


^ permalink raw reply

* [PATCH iwl-net v7 1/3] dpll: export __dpll_pin_change_ntf() for use under dpll_lock
From: Petr Oros @ 2026-04-17 14:59 UTC (permalink / raw)
  To: netdev
  Cc: Ivan Vecera, Vadim Fedorenko, Petr Oros, Tony Nguyen,
	Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Arkadiusz Kubalewski,
	Jiri Pirko, Michal Schmidt, Jacob Keller, Aleksandr Loktionov,
	Rinitha S, intel-wired-lan, linux-kernel
In-Reply-To: <20260417145907.696307-1-poros@redhat.com>

From: Ivan Vecera <ivecera@redhat.com>

Export __dpll_pin_change_ntf() so that drivers can send pin change
notifications from within pin callbacks, which are already called
under dpll_lock. Using dpll_pin_change_ntf() in that context would
deadlock.

Add lockdep_assert_held() to catch misuse without the lock held.

Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/dpll/dpll_netlink.c | 10 ++++++++++
 drivers/dpll/dpll_netlink.h |  2 --
 include/linux/dpll.h        |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
index af7ce62ec55ca8..0ff1658c2dc1ba 100644
--- a/drivers/dpll/dpll_netlink.c
+++ b/drivers/dpll/dpll_netlink.c
@@ -900,11 +900,21 @@ int dpll_pin_delete_ntf(struct dpll_pin *pin)
 	return dpll_pin_event_send(DPLL_CMD_PIN_DELETE_NTF, pin);
 }
 
+/**
+ * __dpll_pin_change_ntf - notify that the pin has been changed
+ * @pin: registered pin pointer
+ *
+ * Context: caller must hold dpll_lock. Suitable for use inside pin
+ *          callbacks which are already invoked under dpll_lock.
+ * Return: 0 if succeeds, error code otherwise.
+ */
 int __dpll_pin_change_ntf(struct dpll_pin *pin)
 {
+	lockdep_assert_held(&dpll_lock);
 	dpll_pin_notify(pin, DPLL_PIN_CHANGED);
 	return dpll_pin_event_send(DPLL_CMD_PIN_CHANGE_NTF, pin);
 }
+EXPORT_SYMBOL_GPL(__dpll_pin_change_ntf);
 
 /**
  * dpll_pin_change_ntf - notify that the pin has been changed
diff --git a/drivers/dpll/dpll_netlink.h b/drivers/dpll/dpll_netlink.h
index dd28b56d27c56d..a9cfd55f57fc42 100644
--- a/drivers/dpll/dpll_netlink.h
+++ b/drivers/dpll/dpll_netlink.h
@@ -11,5 +11,3 @@ int dpll_device_delete_ntf(struct dpll_device *dpll);
 int dpll_pin_create_ntf(struct dpll_pin *pin);
 
 int dpll_pin_delete_ntf(struct dpll_pin *pin);
-
-int __dpll_pin_change_ntf(struct dpll_pin *pin);
diff --git a/include/linux/dpll.h b/include/linux/dpll.h
index b7277a8b484d26..f8037f1ab20b60 100644
--- a/include/linux/dpll.h
+++ b/include/linux/dpll.h
@@ -286,6 +286,7 @@ int dpll_pin_ref_sync_pair_add(struct dpll_pin *pin,
 
 int dpll_device_change_ntf(struct dpll_device *dpll);
 
+int __dpll_pin_change_ntf(struct dpll_pin *pin);
 int dpll_pin_change_ntf(struct dpll_pin *pin);
 
 int register_dpll_notifier(struct notifier_block *nb);
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net v7 2/3] ice: fix missing dpll notifications for SW pins
From: Petr Oros @ 2026-04-17 14:59 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Arkadiusz Kubalewski, Jiri Pirko, Vadim Fedorenko,
	Ivan Vecera, Michal Schmidt, Jacob Keller, Aleksandr Loktionov,
	Rinitha S, intel-wired-lan, linux-kernel
In-Reply-To: <20260417145907.696307-1-poros@redhat.com>

The SMA/U.FL pin redesign (commit 2dd5d03c77e2 ("ice: redesign dpll
sma/u.fl pins control")) introduced software-controlled pins that wrap
backing CGU input/output pins, but never updated the notification and
data paths to propagate pin events to these SW wrappers.

The periodic work sends dpll_pin_change_ntf() only for direct CGU input
pins.  SW pins that wrap these inputs never receive change or phase
offset notifications, so userspace consumers such as synce4l monitoring
SMA pins via dpll netlink never learn about state transitions or phase
offset updates.  Similarly, ice_dpll_phase_offset_get() reads the SW
pin's own phase_offset field which is never updated; the PPS monitor
writes to the backing CGU input's field instead.

Fix by introducing ice_dpll_pin_ntf(), a wrapper around
dpll_pin_change_ntf() that also notifies any registered SMA/U.FL pin
whose backing CGU input matches.  Replace all direct
dpll_pin_change_ntf() calls in the periodic notification paths with
this wrapper.  Fix ice_dpll_phase_offset_get() to return the backing
CGU input's phase_offset for input-direction SW pins.

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_dpll.c | 47 +++++++++++++++++------
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index 3a90a2940fdc6e..11b942b83500fb 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -1963,7 +1963,10 @@ ice_dpll_phase_offset_get(const struct dpll_pin *pin, void *pin_priv,
 				       d->active_input == p->input->pin))
 		*phase_offset = d->phase_offset * ICE_DPLL_PHASE_OFFSET_FACTOR;
 	else if (d->phase_offset_monitor_period)
-		*phase_offset = p->phase_offset * ICE_DPLL_PHASE_OFFSET_FACTOR;
+		*phase_offset = (p->input &&
+				 p->direction == DPLL_PIN_DIRECTION_INPUT ?
+				 p->input->phase_offset :
+				 p->phase_offset) * ICE_DPLL_PHASE_OFFSET_FACTOR;
 	else
 		*phase_offset = 0;
 	mutex_unlock(&pf->dplls.lock);
@@ -2659,6 +2662,27 @@ static u64 ice_generate_clock_id(struct ice_pf *pf)
 	return pci_get_dsn(pf->pdev);
 }
 
+/**
+ * ice_dpll_pin_ntf - notify pin change including any SW pin wrappers
+ * @dplls: pointer to dplls struct
+ * @pin: the dpll_pin that changed
+ *
+ * Send a change notification for @pin and for any registered SMA/U.FL pin
+ * whose backing CGU input matches @pin.
+ */
+static void ice_dpll_pin_ntf(struct ice_dplls *dplls, struct dpll_pin *pin)
+{
+	dpll_pin_change_ntf(pin);
+	for (int i = 0; i < ICE_DPLL_PIN_SW_NUM; i++) {
+		if (dplls->sma[i].pin && dplls->sma[i].input &&
+		    dplls->sma[i].input->pin == pin)
+			dpll_pin_change_ntf(dplls->sma[i].pin);
+		if (dplls->ufl[i].pin && dplls->ufl[i].input &&
+		    dplls->ufl[i].input->pin == pin)
+			dpll_pin_change_ntf(dplls->ufl[i].pin);
+	}
+}
+
 /**
  * ice_dpll_notify_changes - notify dpll subsystem about changes
  * @d: pointer do dpll
@@ -2667,6 +2691,7 @@ static u64 ice_generate_clock_id(struct ice_pf *pf)
  */
 static void ice_dpll_notify_changes(struct ice_dpll *d)
 {
+	struct ice_dplls *dplls = &d->pf->dplls;
 	bool pin_notified = false;
 
 	if (d->prev_dpll_state != d->dpll_state) {
@@ -2675,17 +2700,17 @@ static void ice_dpll_notify_changes(struct ice_dpll *d)
 	}
 	if (d->prev_input != d->active_input) {
 		if (d->prev_input)
-			dpll_pin_change_ntf(d->prev_input);
+			ice_dpll_pin_ntf(dplls, d->prev_input);
 		d->prev_input = d->active_input;
 		if (d->active_input) {
-			dpll_pin_change_ntf(d->active_input);
+			ice_dpll_pin_ntf(dplls, d->active_input);
 			pin_notified = true;
 		}
 	}
 	if (d->prev_phase_offset != d->phase_offset) {
 		d->prev_phase_offset = d->phase_offset;
 		if (!pin_notified && d->active_input)
-			dpll_pin_change_ntf(d->active_input);
+			ice_dpll_pin_ntf(dplls, d->active_input);
 	}
 }
 
@@ -2714,6 +2739,7 @@ static bool ice_dpll_is_pps_phase_monitor(struct ice_pf *pf)
 
 /**
  * ice_dpll_pins_notify_mask - notify dpll subsystem about bulk pin changes
+ * @dplls: pointer to dplls struct
  * @pins: array of ice_dpll_pin pointers registered within dpll subsystem
  * @pin_num: number of pins
  * @phase_offset_ntf_mask: bitmask of pin indexes to notify
@@ -2723,15 +2749,14 @@ static bool ice_dpll_is_pps_phase_monitor(struct ice_pf *pf)
  *
  * Context: Must be called while pf->dplls.lock is released.
  */
-static void ice_dpll_pins_notify_mask(struct ice_dpll_pin *pins,
+static void ice_dpll_pins_notify_mask(struct ice_dplls *dplls,
+				      struct ice_dpll_pin *pins,
 				      u8 pin_num,
 				      u32 phase_offset_ntf_mask)
 {
-	int i = 0;
-
-	for (i = 0; i < pin_num; i++)
-		if (phase_offset_ntf_mask & (1 << i))
-			dpll_pin_change_ntf(pins[i].pin);
+	for (int i = 0; i < pin_num; i++)
+		if (phase_offset_ntf_mask & BIT(i))
+			ice_dpll_pin_ntf(dplls, pins[i].pin);
 }
 
 /**
@@ -2907,7 +2932,7 @@ static void ice_dpll_periodic_work(struct kthread_work *work)
 	ice_dpll_notify_changes(de);
 	ice_dpll_notify_changes(dp);
 	if (phase_offset_ntf)
-		ice_dpll_pins_notify_mask(d->inputs, d->num_inputs,
+		ice_dpll_pins_notify_mask(d, d->inputs, d->num_inputs,
 					  phase_offset_ntf);
 
 resched:
-- 
2.52.0


^ permalink raw reply related

* [PATCH iwl-net v7 3/3] ice: add dpll peer notification for paired SMA and U.FL pins
From: Petr Oros @ 2026-04-17 14:59 UTC (permalink / raw)
  To: netdev
  Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Arkadiusz Kubalewski, Jiri Pirko, Vadim Fedorenko,
	Ivan Vecera, Michal Schmidt, Jacob Keller, Aleksandr Loktionov,
	Rinitha S, intel-wired-lan, linux-kernel
In-Reply-To: <20260417145907.696307-1-poros@redhat.com>

SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
SMA2/U.FL2).  When one pin's state changes via a PCA9575 GPIO write,
the paired pin's state also changes, but no notification is sent for
the peer pin.  Userspace consumers monitoring the peer via dpll netlink
subscribe never learn about the update.

Add ice_dpll_sw_pin_notify_peer() which sends a change notification for
the paired SW pin.  Call it from ice_dpll_pin_sma_direction_set(),
ice_dpll_sma_pin_state_set(), and ice_dpll_ufl_pin_state_set() after
pf->dplls.lock is released.  Use __dpll_pin_change_ntf() because
dpll_lock is still held by the dpll netlink layer (dpll_pin_pre_doit).

Fixes: 2dd5d03c77e2 ("ice: redesign dpll sma/u.fl pins control")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_dpll.c | 32 +++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index 11b942b83500fb..be72a076f7a15c 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -1154,6 +1154,32 @@ ice_dpll_input_state_get(const struct dpll_pin *pin, void *pin_priv,
 				      extack, ICE_DPLL_PIN_TYPE_INPUT);
 }
 
+/**
+ * ice_dpll_sw_pin_notify_peer - notify the paired SW pin after a state change
+ * @d: pointer to dplls struct
+ * @changed: the SW pin that was explicitly changed (already notified by dpll core)
+ *
+ * SMA and U.FL pins share physical signal paths in pairs (SMA1/U.FL1 and
+ * SMA2/U.FL2).  When one pin's routing changes via the PCA9575 GPIO
+ * expander, the paired pin's state may also change.  Send a change
+ * notification for the peer pin so userspace consumers monitoring the
+ * peer via dpll netlink learn about the update.
+ *
+ * Context: Called from dpll_pin_ops callbacks after pf->dplls.lock is
+ *          released.  Uses __dpll_pin_change_ntf() because dpll_lock is
+ *          still held by the dpll netlink layer.
+ */
+static void ice_dpll_sw_pin_notify_peer(struct ice_dplls *d,
+					struct ice_dpll_pin *changed)
+{
+	struct ice_dpll_pin *peer;
+
+	peer = (changed >= d->sma && changed < d->sma + ICE_DPLL_PIN_SW_NUM) ?
+		&d->ufl[changed->idx] : &d->sma[changed->idx];
+	if (peer->pin)
+		__dpll_pin_change_ntf(peer->pin);
+}
+
 /**
  * ice_dpll_sma_direction_set - set direction of SMA pin
  * @p: pointer to a pin
@@ -1344,6 +1370,8 @@ ice_dpll_ufl_pin_state_set(const struct dpll_pin *pin, void *pin_priv,
 
 unlock:
 	mutex_unlock(&pf->dplls.lock);
+	if (!ret)
+		ice_dpll_sw_pin_notify_peer(&pf->dplls, p);
 
 	return ret;
 }
@@ -1462,6 +1490,8 @@ ice_dpll_sma_pin_state_set(const struct dpll_pin *pin, void *pin_priv,
 
 unlock:
 	mutex_unlock(&pf->dplls.lock);
+	if (!ret)
+		ice_dpll_sw_pin_notify_peer(&pf->dplls, sma);
 
 	return ret;
 }
@@ -1657,6 +1687,8 @@ ice_dpll_pin_sma_direction_set(const struct dpll_pin *pin, void *pin_priv,
 	mutex_lock(&pf->dplls.lock);
 	ret = ice_dpll_sma_direction_set(p, direction, extack);
 	mutex_unlock(&pf->dplls.lock);
+	if (!ret)
+		ice_dpll_sw_pin_notify_peer(&pf->dplls, p);
 
 	return ret;
 }
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH v4 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Fidelio LAWSON @ 2026-04-17 15:20 UTC (permalink / raw)
  To: Marek Vasut, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Marek Vasut, Maxime Chevallier, Simon Horman,
	Heiner Kallweit, Russell King
  Cc: netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <10e325c0-aeb0-47b1-b758-e4f47ff7b004@nabladev.com>

On 4/17/26 16:35, Marek Vasut wrote:
> On 4/17/26 2:44 PM, Fidelio Lawson wrote:
> 
> [...]
> 
>> @@ -1271,6 +1287,29 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, 
>> u16 reg, u16 val)
>>           if (ret)
>>               return ret;
>>           break;
>> +    case PHY_REG_KSZ87XX_SHORT_CABLE:
>> +        if (!ksz_is_ksz87xx(dev))
>> +            return -EOPNOTSUPP;
>> +        ret = ksz87xx_apply_low_loss_preset(dev, !!val);
>> +        if (ret)
>> +            return ret;
>> +        break;
>> +    case PHY_REG_KSZ87XX_LPF_BW:
>> +        if (!ksz_is_ksz87xx(dev))
>> +            return -EOPNOTSUPP;
>> +        ret = ksz8_ind_write8(dev, TABLE_LINK_MD, 
>> KSZ87XX_REG_PHY_LPF, (u8)val);
>> +        if (ret)
>> +            return ret;
>> +        dev->lpf_bw = val;
>> +        break;
>> +    case PHY_REG_KSZ87XX_EQ_INIT:
>> +        if (!ksz_is_ksz87xx(dev))
>> +            return -EOPNOTSUPP;
>> +        ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_DSP_EQ, 
>> (u8)val);
> Do these values need some check, so they would be in the correct 
> range(s) / in the correct bitfields before being written into those 
> registers ?

Yes, I can add validation to ensure that only the documented bitfields 
are accepted before writing the registers, (bits [7:6] for the LPF 
bandwidth and bits [5:0] for the DSP EQ initial value).


^ permalink raw reply

* Re: [PATCH v4 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-17 15:22 UTC (permalink / raw)
  To: Fidelio LAWSON, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Marek Vasut, Maxime Chevallier, Simon Horman,
	Heiner Kallweit, Russell King
  Cc: netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <56034c9c-fede-4ede-b68d-5ecc484a64cd@gmail.com>

On 4/17/26 5:20 PM, Fidelio LAWSON wrote:
> On 4/17/26 16:35, Marek Vasut wrote:
>> On 4/17/26 2:44 PM, Fidelio Lawson wrote:
>>
>> [...]
>>
>>> @@ -1271,6 +1287,29 @@ int ksz8_w_phy(struct ksz_device *dev, u16 
>>> phy, u16 reg, u16 val)
>>>           if (ret)
>>>               return ret;
>>>           break;
>>> +    case PHY_REG_KSZ87XX_SHORT_CABLE:
>>> +        if (!ksz_is_ksz87xx(dev))
>>> +            return -EOPNOTSUPP;
>>> +        ret = ksz87xx_apply_low_loss_preset(dev, !!val);
>>> +        if (ret)
>>> +            return ret;
>>> +        break;
>>> +    case PHY_REG_KSZ87XX_LPF_BW:
>>> +        if (!ksz_is_ksz87xx(dev))
>>> +            return -EOPNOTSUPP;
>>> +        ret = ksz8_ind_write8(dev, TABLE_LINK_MD, 
>>> KSZ87XX_REG_PHY_LPF, (u8)val);
>>> +        if (ret)
>>> +            return ret;
>>> +        dev->lpf_bw = val;
>>> +        break;
>>> +    case PHY_REG_KSZ87XX_EQ_INIT:
>>> +        if (!ksz_is_ksz87xx(dev))
>>> +            return -EOPNOTSUPP;
>>> +        ret = ksz8_ind_write8(dev, TABLE_LINK_MD, 
>>> KSZ87XX_REG_DSP_EQ, (u8)val);
>> Do these values need some check, so they would be in the correct 
>> range(s) / in the correct bitfields before being written into those 
>> registers ?
> 
> Yes, I can add validation to ensure that only the documented bitfields 
> are accepted before writing the registers, (bits [7:6] for the LPF 
> bandwidth and bits [5:0] for the DSP EQ initial value).
Yes please, thank you !

^ permalink raw reply

* Re: [PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
From: Przemek Kitszel @ 2026-04-17 15:22 UTC (permalink / raw)
  To: Petr Oros
  Cc: netdev, Tony Nguyen, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Jesse Brandeburg, Mitch Williams,
	Aaron Brown, Przemyslaw Patynowski, Jedrzej Jagielski,
	intel-wired-lan, linux-kernel, jacob.e.keller
In-Reply-To: <cover.1776426683.git.poros@redhat.com>

On 4/17/26 16:29, Petr Oros wrote:
> The iavf VLAN filter state machine has several design issues that lead
> to race conditions between userspace add/del calls and the watchdog
> task's virtchnl processing.  Filters can get lost or leak HW resources,
> especially during interface down/up cycles and namespace moves.
> 

[...]

> 
> v2: Retarget from iwl-next to iwl-net; these are bug fixes.
>      Rebase on current net tree (conflict resolved).
> 
> Petr Oros (4):
>    iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
>    iavf: stop removing VLAN filters from PF on interface down
>    iavf: wait for PF confirmation before removing VLAN filters
>    iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
> 
>   drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 52 +++----------
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
>   3 files changed, 52 insertions(+), 85 deletions(-)
> 

Thank you for the series, it looks good.
Also thanks for the not obvious details, like changing
list_for_each_entry_safe() -> list_for_each_entry() in places that
no longer alter the list

for the series:
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

^ permalink raw reply

* [PATCH net] net: airoha: Fix PPE cpu port configuration for GDM2 loopback path
From: Lorenzo Bianconi @ 2026-04-17 15:24 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev

When QoS loopback is enabled for GDM3 or GDM4, incoming packets are
forwarded to GDM2. However, the PPE cpu port for GDM2 is not configured
in this path, causing traffic originating from GDM3/GDM4, which may
be set up as WAN ports backed by QDMA1, to be incorrectly directed
to QDMA0 instead.
Configure the PPE cpu port for GDM2 when QoS loopback is active on
GDM3 or GDM4 to ensure traffic is routed to the correct QDMA instance.

Fixes: 9cd451d414f6 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 8 ++++++--
 drivers/net/ethernet/airoha/airoha_eth.h | 3 ++-
 drivers/net/ethernet/airoha/airoha_ppe.c | 6 +++---
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..d2b7c437a782 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1727,7 +1727,7 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port)
 {
 	struct airoha_eth *eth = port->qdma->eth;
 	u32 val, pse_port, chan;
-	int src_port;
+	int i, src_port;
 
 	/* Forward the traffic to the proper GDM port */
 	pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
@@ -1769,6 +1769,9 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port)
 		      SP_CPORT_MASK(val),
 		      __field_prep(SP_CPORT_MASK(val), FE_PSE_PORT_CDM2));
 
+	for (i = 0; i < eth->soc->num_ppe; i++)
+		airoha_ppe_set_cpu_port(port, i, AIROHA_GDM2_IDX);
+
 	if (port->id == AIROHA_GDM4_IDX && airoha_is_7581(eth)) {
 		u32 mask = FC_ID_OF_SRC_PORT_MASK(port->nbq);
 
@@ -1807,7 +1810,8 @@ static int airoha_dev_init(struct net_device *dev)
 	}
 
 	for (i = 0; i < eth->soc->num_ppe; i++)
-		airoha_ppe_set_cpu_port(port, i);
+		airoha_ppe_set_cpu_port(port, i,
+					airoha_get_fe_port(port));
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 95e557638617..715aa26cbac8 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -653,7 +653,8 @@ int airoha_get_fe_port(struct airoha_gdm_port *port);
 bool airoha_is_valid_gdm_port(struct airoha_eth *eth,
 			      struct airoha_gdm_port *port);
 
-void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id);
+void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id,
+			     u8 fport);
 bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
 void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
 			  u16 hash, bool rx_wlan);
diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 859818676b69..5c9dff6bccd1 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -85,10 +85,9 @@ static u32 airoha_ppe_get_timestamp(struct airoha_ppe *ppe)
 	return FIELD_GET(AIROHA_FOE_IB1_BIND_TIMESTAMP, timestamp);
 }
 
-void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id)
+void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id, u8 fport)
 {
 	struct airoha_qdma *qdma = port->qdma;
-	u8 fport = airoha_get_fe_port(port);
 	struct airoha_eth *eth = qdma->eth;
 	u8 qdma_id = qdma - &eth->qdma[0];
 	u32 fe_cpu_port;
@@ -182,7 +181,8 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
 			if (!port)
 				continue;
 
-			airoha_ppe_set_cpu_port(port, i);
+			airoha_ppe_set_cpu_port(port, i,
+						airoha_get_fe_port(port));
 		}
 	}
 }

---
base-commit: 82c21069028c5db3463f851ae8ac9cc2e38a3827
change-id: 20260417-airoha-ppe-cpu-port-for-gdm2-loopback-96b9b52179c1

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* [PATCH v3 net 1/1] net/sched: sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()
From: chia-yu.chang @ 2026-04-17 15:25 UTC (permalink / raw)
  To: victor, hxzene, linux-hardening, kees, gustavoars, jhs, jiri,
	davem, edumazet, kuba, pabeni, linux-kernel, netdev, horms, ij,
	ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
	mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Fix dualpi2_change() to correctly enforce updated limit and memlimit
values after a configuration change of the dualpi2 qdisc.

Before this patch, dualpi2_change() always attempted to dequeue packets
via the root qdisc (C-queue) when reducing backlog or memory usage, and
unconditionally assumed that a valid skb will be returned. When traffic
classification results in packets being queued in the L-queue while the
C-queue is empty, this leads to a NULL skb dereference during limit or
memlimit enforcement.

This is fixed by first dequeuing from the C-queue path if it is
non-empty. Once the C-queue is empty, packets are dequeued directly from
the L-queue. Return values from qdisc_dequeue_internal() are checked for
both queues. When dequeuing from the L-queue, the parent qdisc qlen and
backlog counters are updated explicitly to keep overall qdisc statistics
consistent.

Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Closes: https://lore.kernel.org/netdev/20260413075740.2234828-1-hxzene@gmail.com/
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/sched/sch_dualpi2.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index fe6f5e889625..241e6a46bd00 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -868,11 +868,35 @@ static int dualpi2_change(struct Qdisc *sch, struct nlattr *opt,
 	old_backlog = sch->qstats.backlog;
 	while (qdisc_qlen(sch) > sch->limit ||
 	       q->memory_used > q->memory_limit) {
-		struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
+		struct sk_buff *skb = NULL;
 
-		q->memory_used -= skb->truesize;
-		qdisc_qstats_backlog_dec(sch, skb);
-		rtnl_qdisc_drop(skb, sch);
+		if (qdisc_qlen(sch) > qdisc_qlen(q->l_queue)) {
+			skb = qdisc_dequeue_internal(sch, true);
+			if (unlikely(!skb)) {
+				WARN_ON_ONCE(1);
+				break;
+			}
+			q->memory_used -= skb->truesize;
+			rtnl_qdisc_drop(skb, sch);
+		} else if (qdisc_qlen(q->l_queue)) {
+			skb = qdisc_dequeue_internal(q->l_queue, true);
+			if (unlikely(!skb)) {
+				WARN_ON_ONCE(1);
+				break;
+			}
+			/* L-queue packets are counted in both sch and
+			 * l_queue on enqueue; qdisc_dequeue_internal()
+			 * handled l_queue, so we further account for sch.
+			 */
+			--sch->q.qlen;
+			qdisc_qstats_backlog_dec(sch, skb);
+			q->memory_used -= skb->truesize;
+			rtnl_qdisc_drop(skb, q->l_queue);
+			qdisc_qstats_drop(sch);
+		} else {
+			WARN_ON_ONCE(1);
+			break;
+		}
 	}
 	qdisc_tree_reduce_backlog(sch, old_qlen - qdisc_qlen(sch),
 				  old_backlog - sch->qstats.backlog);
-- 
2.34.1


^ permalink raw reply related

* [PATCH v4 net 0/3] ECN offload handling for AccECN series
From: chia-yu.chang @ 2026-04-17 15:26 UTC (permalink / raw)
  To: linyunsheng, andrew+netdev, parav, jasowang, mst, shenjian15,
	salil.mehta, shaojijie, saeedm, tariqt, mbloch, leonro,
	linux-rdma, netdev, davem, edumazet, kuba, pabeni, horms, ij,
	ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
	mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Hello,

Please find the v4 ECN offload handling for AccECN patch series for net.
It aims to avoid potential CWR flag corruption due to RFC3168 ECN offload,
because this flag is part of ACE signal used for Accurate ECN (RFC9768).

This corresponds to discussions in virtio mailing list:
https://lore.kernel.org/all/20250814120118.81787-1-chia-yu.chang@nokia-bell-labs.com/
And it was suggested to clarify SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN.

A prior submission is made to net-next:
https://lore.kernel.org/all/SJ0PR12MB68066115C5329872316E6B37DC98A@SJ0PR12MB6806.namprd12.prod.outlook.com/
And it was suggetsed to submit the first two patches to net.

Best regards,
Chia-Yu

---
v4:
- Fix spacing in commit message of #2 and #3 (Paolo Abeni <pabeni@redhat.com>)
- In v3, there were questions towards hisilicon and mellanox ppl but no response:
  So, we re-submit this series to ensure the discussion can stay fresh:
  - https://lore.kernel.org/netdev/b796bd57-650a-41d1-8032-f124084634c3@redhat.com/
  - https://lore.kernel.org/netdev/62393422-bc8f-4676-bf3c-4d1be15ab800@redhat.com/

v3:
- Fix commit message title typo
- Seprate prior #2 into #2 and #3 to have one patch per NIC (Paolo Abeni <pabeni@redhat.com>)

v2:
- Fix commit header title typo

---
Chia-Yu Chang (3):
  net: update comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
  net: mlx5e: fix CWR handling in drivers to preserve ACE signal
  net: hns3: fix CWR handling in drivers to preserve ACE signal

 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |  4 ++--
 include/linux/skbuff.h                          | 15 ++++++++++++++-
 3 files changed, 17 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply

* [PATCH v4 net 1/3] net: update comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
From: chia-yu.chang @ 2026-04-17 15:26 UTC (permalink / raw)
  To: linyunsheng, andrew+netdev, parav, jasowang, mst, shenjian15,
	salil.mehta, shaojijie, saeedm, tariqt, mbloch, leonro,
	linux-rdma, netdev, davem, edumazet, kuba, pabeni, horms, ij,
	ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
	mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang
In-Reply-To: <20260417152642.71674-1-chia-yu.chang@nokia-bell-labs.com>

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

This patch updates the documentation of ECN‑related GSO flags, it
clarifies the limitations of SKB_GSO_TCP_ECN and explains how to preserve
the CWR flag (part of the ACE signal) in the Rx path.

For Tx, SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN are used respectively for
RFC3168 ECN and AccECN (RFC9768). SKB_GSO_TCP_ECN indicates that the
first segment has CWR set, while subsequent segments have CWR cleared.
In contrast, SKB_GSO_TCP_ACCECN means that the segment uses AccECN and
therefore its CWR flag must not be modified during segmentation.

For RX, SKB_GSO_TCP_ECN shall NOT be used, because the stack cannot know
whether the connection uses RFC3168 ECN or AccECN, whereas RFC3168 ECN
offload may clear CWR flag and thus corrupts the ACE signal. Instead, any
segment that arrives with CWR set must use the SKB_GSO_TCP_ACCECN flag
to prevent RFC3168 ECN offload logic from clearing the CWR flag.

Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/linux/skbuff.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2bcf78a4de7b..9080a6d508a3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -671,7 +671,13 @@ enum {
 	/* This indicates the skb is from an untrusted source. */
 	SKB_GSO_DODGY = 1 << 1,
 
-	/* This indicates the tcp segment has CWR set. */
+	/* For TX, this indicates that the first TCP segment has CWR set, and
+	 * any subsequent segment in the same skb has CWR cleared. This flag
+	 * must not be used in RX, because the connection to which the segment
+	 * belongs is not tracked to use RFC3168 or AccECN. Using RFC3168 ECN
+	 * offload may clear CWR and corrupt ACE signal (CWR is part of it).
+	 * Instead, SKB_GSO_TCP_ACCECN shall be used to avoid CWR corruption.
+	 */
 	SKB_GSO_TCP_ECN = 1 << 2,
 
 	__SKB_GSO_TCP_FIXEDID = 1 << 3,
@@ -706,6 +712,13 @@ enum {
 
 	SKB_GSO_FRAGLIST = 1 << 18,
 
+	/* For TX, this indicates that the TCP segment uses the CWR flag as part
+	 * of the ACE signal, and the CWR flag must not be modified in the skb.
+	 * For RX, any incoming segment with CWR set must use this flag so that
+	 * no RFC3168 ECN offload can clear the CWR flag. This is required to
+	 * preserve ACE signal correctness (CWR is part of it) in a forwarding
+	 * scenario, e.g., from one netdevice RX to other netdevice TX
+	 */
 	SKB_GSO_TCP_ACCECN = 1 << 19,
 
 	/* These indirectly map onto the same netdev feature.
-- 
2.34.1


^ permalink raw reply related

* [PATCH v4 net 3/3] net: hns3: fix CWR handling in drivers to preserve ACE signal
From: chia-yu.chang @ 2026-04-17 15:26 UTC (permalink / raw)
  To: linyunsheng, andrew+netdev, parav, jasowang, mst, shenjian15,
	salil.mehta, shaojijie, saeedm, tariqt, mbloch, leonro,
	linux-rdma, netdev, davem, edumazet, kuba, pabeni, horms, ij,
	ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
	mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang
In-Reply-To: <20260417152642.71674-1-chia-yu.chang@nokia-bell-labs.com>

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Currently, hns3 Rx paths use SKB_GSO_TCP_ECN flag when a TCP segment
with the CWR flag set. This is wrong because SKB_GSO_TCP_ECN is only
valid for RFC3168 ECN on Tx, and using it on Rx allows RFC3168 ECN
offload to clear the CWR flag. As a result, incoming TCP segments
lose their ACE signal integrity required for AccECN (RFC9768),
especially when the packet is forwarded and later re-segmented by GSO.

Fix this by setting SKB_GSO_TCP_ACCECN for any Rx segment with the CWR
flag set. SKB_GSO_TCP_ACCECN ensure that RFC3168 ECN offload will
not clear the CWR flag, therefore preserving the ACE signal.

Fixes: d474d88f88261 ("net: hns3: add hns3_gro_complete for HW GRO process")
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index a3206c97923e..e1b0dba56182 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3904,7 +3904,7 @@ static int hns3_gro_complete(struct sk_buff *skb, u32 l234info)
 
 	skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
 	if (th->cwr)
-		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
 
 	if (l234info & BIT(HNS3_RXD_GRO_FIXID_B))
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
-- 
2.34.1


^ permalink raw reply related

* [PATCH v4 net 2/3] net: mlx5e: fix CWR handling in drivers to preserve ACE signal
From: chia-yu.chang @ 2026-04-17 15:26 UTC (permalink / raw)
  To: linyunsheng, andrew+netdev, parav, jasowang, mst, shenjian15,
	salil.mehta, shaojijie, saeedm, tariqt, mbloch, leonro,
	linux-rdma, netdev, davem, edumazet, kuba, pabeni, horms, ij,
	ncardwell, koen.de_schepper, g.white, ingemar.s.johansson,
	mirja.kuehlewind, cheshire, rs.ietf, Jason_Livingood, vidhi_goel
  Cc: Chia-Yu Chang
In-Reply-To: <20260417152642.71674-1-chia-yu.chang@nokia-bell-labs.com>

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Currently, mlx5 Rx paths use the SKB_GSO_TCP_ECN flag when a TCP segment
with the CWR flag set. This is wrong because SKB_GSO_TCP_ECN is only
valid for RFC3168 ECN on Tx, and using it on Rx allows RFC3168 ECN
offload to clear the CWR flag. As a result, incoming TCP segments
may lose their ACE signal integrity required for AccECN (RFC9768),
especially when the packet is forwarded and later re-segmented by GSO.

Fix this by setting SKB_GSO_TCP_ACCECN for any Rx segment with the CWR
flag set. SKB_GSO_TCP_ACCECN ensures that RFC3168 ECN offload will
not clear the CWR flag, therefore preserving the ACE signal.

Fixes: 92552d3abd329 ("net/mlx5e: HW_GRO cqe handler implementation")
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5b60aa47c75b..9b1c80079532 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1180,7 +1180,7 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
 	skb->csum_offset = offsetof(struct tcphdr, check);
 
 	if (tcp->cwr)
-		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
 }
 
 static void mlx5e_shampo_update_ipv6_tcp_hdr(struct mlx5e_rq *rq, struct ipv6hdr *ipv6,
@@ -1201,7 +1201,7 @@ static void mlx5e_shampo_update_ipv6_tcp_hdr(struct mlx5e_rq *rq, struct ipv6hdr
 	skb->csum_offset = offsetof(struct tcphdr, check);
 
 	if (tcp->cwr)
-		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN;
 }
 
 static void mlx5e_shampo_update_hdr(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, bool match)
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next] selftests/bpf: drop xdping tool
From: Alexis Lothoré (eBPF Foundation) @ 2026-04-17 15:33 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev
  Cc: ebpf, Bastien Curutchet, Thomas Petazzoni, linux-kernel, bpf,
	linux-kselftest, netdev, Alan Maguire,
	Alexis Lothoré (eBPF Foundation)

As part of a larger cleanup effort in the bpf selftests directory,
tests and scripts are either being converted to the test_progs framework
(so they are executed automatically in bpf CI), or removed if not
relevant for such integration.

The test_xdping.sh script (with the associated xdping.c) acts as a RTT
measurement tool, by attaching two small xdp programs to two interfaces.
Converting this test to test_progs may not make much sense:
- RTT measurement does not really fit in the scope of a functional test,
  this is rather about measuring some performance level.
- there are other existing tests in test_progs that actively validate
  XDP features like program attachment, return value processing, packet
  modification, etc

Drop test_xdping.sh and the corresponding xdping.c userspace part. Keep
the ebpf part (xdping_kern.c), as it is used by another test integrated
in test_progs (btf_dump)

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
 tools/testing/selftests/bpf/.gitignore     |   1 -
 tools/testing/selftests/bpf/Makefile       |   3 -
 tools/testing/selftests/bpf/test_xdping.sh | 103 ------------
 tools/testing/selftests/bpf/xdping.c       | 254 -----------------------------
 4 files changed, 361 deletions(-)

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index bfdc5518ecc8..986a6389186b 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -21,7 +21,6 @@ test_lirc_mode2_user
 flow_dissector_load
 test_tcpnotify_user
 test_libbpf
-xdping
 test_cpp
 *.d
 *.subskel.h
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 78e60040811e..00a986a7d088 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -111,7 +111,6 @@ TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c)
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
 	test_lirc_mode2.sh \
-	test_xdping.sh \
 	test_bpftool_build.sh \
 	test_doc_build.sh \
 	test_xsk.sh \
@@ -134,7 +133,6 @@ TEST_GEN_PROGS_EXTENDED = \
 	xdp_features \
 	xdp_hw_metadata \
 	xdp_synproxy \
-	xdping \
 	xskxceiver
 
 TEST_GEN_FILES += $(TEST_KMODS) liburandom_read.so urandom_read sign-file uprobe_multi
@@ -320,7 +318,6 @@ $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELP
 $(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS)
 $(OUTPUT)/test_tag: $(TESTING_HELPERS)
 $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS)
-$(OUTPUT)/xdping: $(TESTING_HELPERS)
 $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS)
 $(OUTPUT)/test_maps: $(TESTING_HELPERS)
 $(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) $(UNPRIV_HELPERS)
diff --git a/tools/testing/selftests/bpf/test_xdping.sh b/tools/testing/selftests/bpf/test_xdping.sh
deleted file mode 100755
index c3d82e0a7378..000000000000
--- a/tools/testing/selftests/bpf/test_xdping.sh
+++ /dev/null
@@ -1,103 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-
-# xdping tests
-#   Here we setup and teardown configuration required to run
-#   xdping, exercising its options.
-#
-#   Setup is similar to test_tunnel tests but without the tunnel.
-#
-# Topology:
-# ---------
-#     root namespace   |     tc_ns0 namespace
-#                      |
-#      ----------      |     ----------
-#      |  veth1  | --------- |  veth0  |
-#      ----------    peer    ----------
-#
-# Device Configuration
-# --------------------
-# Root namespace with BPF
-# Device names and addresses:
-#	veth1 IP: 10.1.1.200
-#	xdp added to veth1, xdpings originate from here.
-#
-# Namespace tc_ns0 with BPF
-# Device names and addresses:
-#       veth0 IPv4: 10.1.1.100
-#	For some tests xdping run in server mode here.
-#
-
-readonly TARGET_IP="10.1.1.100"
-readonly TARGET_NS="xdp_ns0"
-
-readonly LOCAL_IP="10.1.1.200"
-
-setup()
-{
-	ip netns add $TARGET_NS
-	ip link add veth0 type veth peer name veth1
-	ip link set veth0 netns $TARGET_NS
-	ip netns exec $TARGET_NS ip addr add ${TARGET_IP}/24 dev veth0
-	ip addr add ${LOCAL_IP}/24 dev veth1
-	ip netns exec $TARGET_NS ip link set veth0 up
-	ip link set veth1 up
-}
-
-cleanup()
-{
-	set +e
-	ip netns delete $TARGET_NS 2>/dev/null
-	ip link del veth1 2>/dev/null
-	if [[ $server_pid -ne 0 ]]; then
-		kill -TERM $server_pid
-	fi
-}
-
-test()
-{
-	client_args="$1"
-	server_args="$2"
-
-	echo "Test client args '$client_args'; server args '$server_args'"
-
-	server_pid=0
-	if [[ -n "$server_args" ]]; then
-		ip netns exec $TARGET_NS ./xdping $server_args &
-		server_pid=$!
-		sleep 10
-	fi
-	./xdping $client_args $TARGET_IP
-
-	if [[ $server_pid -ne 0 ]]; then
-		kill -TERM $server_pid
-		server_pid=0
-	fi
-
-	echo "Test client args '$client_args'; server args '$server_args': PASS"
-}
-
-set -e
-
-server_pid=0
-
-trap cleanup EXIT
-
-setup
-
-for server_args in "" "-I veth0 -s -S" ; do
-	# client in skb mode
-	client_args="-I veth1 -S"
-	test "$client_args" "$server_args"
-
-	# client with count of 10 RTT measurements.
-	client_args="-I veth1 -S -c 10"
-	test "$client_args" "$server_args"
-done
-
-# Test drv mode
-test "-I veth1 -N" "-I veth0 -s -N"
-test "-I veth1 -N -c 10" "-I veth0 -s -N"
-
-echo "OK. All tests passed"
-exit 0
diff --git a/tools/testing/selftests/bpf/xdping.c b/tools/testing/selftests/bpf/xdping.c
deleted file mode 100644
index 9ed8c796645d..000000000000
--- a/tools/testing/selftests/bpf/xdping.c
+++ /dev/null
@@ -1,254 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
-
-#include <linux/bpf.h>
-#include <linux/if_link.h>
-#include <arpa/inet.h>
-#include <assert.h>
-#include <errno.h>
-#include <signal.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-#include <libgen.h>
-#include <net/if.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <netdb.h>
-
-#include "bpf/bpf.h"
-#include "bpf/libbpf.h"
-
-#include "xdping.h"
-#include "testing_helpers.h"
-
-static int ifindex;
-static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
-
-static void cleanup(int sig)
-{
-	bpf_xdp_detach(ifindex, xdp_flags, NULL);
-	if (sig)
-		exit(1);
-}
-
-static int get_stats(int fd, __u16 count, __u32 raddr)
-{
-	struct pinginfo pinginfo = { 0 };
-	char inaddrbuf[INET_ADDRSTRLEN];
-	struct in_addr inaddr;
-	__u16 i;
-
-	inaddr.s_addr = raddr;
-
-	printf("\nXDP RTT data:\n");
-
-	if (bpf_map_lookup_elem(fd, &raddr, &pinginfo)) {
-		perror("bpf_map_lookup elem");
-		return 1;
-	}
-
-	for (i = 0; i < count; i++) {
-		if (pinginfo.times[i] == 0)
-			break;
-
-		printf("64 bytes from %s: icmp_seq=%d ttl=64 time=%#.5f ms\n",
-		       inet_ntop(AF_INET, &inaddr, inaddrbuf,
-				 sizeof(inaddrbuf)),
-		       count + i + 1,
-		       (double)pinginfo.times[i]/1000000);
-	}
-
-	if (i < count) {
-		fprintf(stderr, "Expected %d samples, got %d.\n", count, i);
-		return 1;
-	}
-
-	bpf_map_delete_elem(fd, &raddr);
-
-	return 0;
-}
-
-static void show_usage(const char *prog)
-{
-	fprintf(stderr,
-		"usage: %s [OPTS] -I interface destination\n\n"
-		"OPTS:\n"
-		"    -c count		Stop after sending count requests\n"
-		"			(default %d, max %d)\n"
-		"    -I interface	interface name\n"
-		"    -N			Run in driver mode\n"
-		"    -s			Server mode\n"
-		"    -S			Run in skb mode\n",
-		prog, XDPING_DEFAULT_COUNT, XDPING_MAX_COUNT);
-}
-
-int main(int argc, char **argv)
-{
-	__u32 mode_flags = XDP_FLAGS_DRV_MODE | XDP_FLAGS_SKB_MODE;
-	struct addrinfo *a, hints = { .ai_family = AF_INET };
-	__u16 count = XDPING_DEFAULT_COUNT;
-	struct pinginfo pinginfo = { 0 };
-	const char *optstr = "c:I:NsS";
-	struct bpf_program *main_prog;
-	int prog_fd = -1, map_fd = -1;
-	struct sockaddr_in rin;
-	struct bpf_object *obj;
-	struct bpf_map *map;
-	char *ifname = NULL;
-	char filename[256];
-	int opt, ret = 1;
-	__u32 raddr = 0;
-	int server = 0;
-	char cmd[256];
-
-	while ((opt = getopt(argc, argv, optstr)) != -1) {
-		switch (opt) {
-		case 'c':
-			count = atoi(optarg);
-			if (count < 1 || count > XDPING_MAX_COUNT) {
-				fprintf(stderr,
-					"min count is 1, max count is %d\n",
-					XDPING_MAX_COUNT);
-				return 1;
-			}
-			break;
-		case 'I':
-			ifname = optarg;
-			ifindex = if_nametoindex(ifname);
-			if (!ifindex) {
-				fprintf(stderr, "Could not get interface %s\n",
-					ifname);
-				return 1;
-			}
-			break;
-		case 'N':
-			xdp_flags |= XDP_FLAGS_DRV_MODE;
-			break;
-		case 's':
-			/* use server program */
-			server = 1;
-			break;
-		case 'S':
-			xdp_flags |= XDP_FLAGS_SKB_MODE;
-			break;
-		default:
-			show_usage(basename(argv[0]));
-			return 1;
-		}
-	}
-
-	if (!ifname) {
-		show_usage(basename(argv[0]));
-		return 1;
-	}
-	if (!server && optind == argc) {
-		show_usage(basename(argv[0]));
-		return 1;
-	}
-
-	if ((xdp_flags & mode_flags) == mode_flags) {
-		fprintf(stderr, "-N or -S can be specified, not both.\n");
-		show_usage(basename(argv[0]));
-		return 1;
-	}
-
-	if (!server) {
-		/* Only supports IPv4; see hints initialization above. */
-		if (getaddrinfo(argv[optind], NULL, &hints, &a) || !a) {
-			fprintf(stderr, "Could not resolve %s\n", argv[optind]);
-			return 1;
-		}
-		memcpy(&rin, a->ai_addr, sizeof(rin));
-		raddr = rin.sin_addr.s_addr;
-		freeaddrinfo(a);
-	}
-
-	/* Use libbpf 1.0 API mode */
-	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
-
-	snprintf(filename, sizeof(filename), "%s_kern.bpf.o", argv[0]);
-
-	if (bpf_prog_test_load(filename, BPF_PROG_TYPE_XDP, &obj, &prog_fd)) {
-		fprintf(stderr, "load of %s failed\n", filename);
-		return 1;
-	}
-
-	main_prog = bpf_object__find_program_by_name(obj,
-						     server ? "xdping_server" : "xdping_client");
-	if (main_prog)
-		prog_fd = bpf_program__fd(main_prog);
-	if (!main_prog || prog_fd < 0) {
-		fprintf(stderr, "could not find xdping program");
-		return 1;
-	}
-
-	map = bpf_object__next_map(obj, NULL);
-	if (map)
-		map_fd = bpf_map__fd(map);
-	if (!map || map_fd < 0) {
-		fprintf(stderr, "Could not find ping map");
-		goto done;
-	}
-
-	signal(SIGINT, cleanup);
-	signal(SIGTERM, cleanup);
-
-	printf("Setting up XDP for %s, please wait...\n", ifname);
-
-	printf("XDP setup disrupts network connectivity, hit Ctrl+C to quit\n");
-
-	if (bpf_xdp_attach(ifindex, prog_fd, xdp_flags, NULL) < 0) {
-		fprintf(stderr, "Link set xdp fd failed for %s\n", ifname);
-		goto done;
-	}
-
-	if (server) {
-		close(prog_fd);
-		close(map_fd);
-		printf("Running server on %s; press Ctrl+C to exit...\n",
-		       ifname);
-		do { } while (1);
-	}
-
-	/* Start xdping-ing from last regular ping reply, e.g. for a count
-	 * of 10 ICMP requests, we start xdping-ing using reply with seq number
-	 * 10.  The reason the last "real" ping RTT is much higher is that
-	 * the ping program sees the ICMP reply associated with the last
-	 * XDP-generated packet, so ping doesn't get a reply until XDP is done.
-	 */
-	pinginfo.seq = htons(count);
-	pinginfo.count = count;
-
-	if (bpf_map_update_elem(map_fd, &raddr, &pinginfo, BPF_ANY)) {
-		fprintf(stderr, "could not communicate with BPF map: %s\n",
-			strerror(errno));
-		cleanup(0);
-		goto done;
-	}
-
-	/* We need to wait for XDP setup to complete. */
-	sleep(10);
-
-	snprintf(cmd, sizeof(cmd), "ping -c %d -I %s %s",
-		 count, ifname, argv[optind]);
-
-	printf("\nNormal ping RTT data\n");
-	printf("[Ignore final RTT; it is distorted by XDP using the reply]\n");
-
-	ret = system(cmd);
-
-	if (!ret)
-		ret = get_stats(map_fd, count, raddr);
-
-	cleanup(0);
-
-done:
-	if (prog_fd > 0)
-		close(prog_fd);
-	if (map_fd > 0)
-		close(map_fd);
-
-	return ret;
-}

---
base-commit: b7fb68124aa80db90394236a9a4a6add12f4425d
change-id: 20260417-xdping-5c2ef5a63899

Best regards,
--  
Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>


^ permalink raw reply related

* Re: [PATCH net v2 1/2] bnge: fix initial HWRM sequence
From: Vikas Gupta @ 2026-04-17 15:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, netdev,
	linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta
In-Reply-To: <20260417074254.42f01fa7@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On Fri, Apr 17, 2026 at 8:12 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 17 Apr 2026 11:46:08 +0530 Vikas Gupta wrote:
> > > > -err_func_unrgtr:
> > > > -     bnge_fw_unregister_dev(bd);
> > > > +err_free_ctx_mem:
> > > > +     bnge_free_ctx_mem(bd);
> > > >       return rc;
> > > >  }
> > >
> > > This error path appears to have the same regression. If
> > > bnge_hwrm_func_drv_rgtr() fails after bnge_func_qcaps() has already
> > > configured the backing store, freeing the context memory directly without
> > > unregistering might allow the hardware to access freed memory.
> >
> > Even if bnge_hwrm_func_drv_rgtr() fails, it is still safe to free the context
> > memory at the host because the driver unloads from this point.
>
> Looking closer, indeed, the way bnge_hwrm_func_drv_unrgtr() is written
> the AI suggestion is pointless. Hopefully you're right cause debugging
> FW corrupting host memory after reboot on bnxt is not fun.
>
> > AI reviews appear to ignore logic related to handling context memory
> > in the patch.
> > I see no valid comments on the patch.
>
> Why is bnge_func_qcaps() allocating context mem? It may be the case
> that context mem has to be allocated but bnge_func_qcaps() doesn't
> sound like a function that'd perform such key part of init.
> Why not just move the alloc earlier in bnge_fw_register_dev() ?

I agree that bnge_func_qcaps(), which appears to be a query function,
should not allocate memory. I can refactor bnge_func_qcaps() and
move bnge_alloc_ctx_mem() to bnge_fw_register_dev() so that
bnge_func_qcaps() remains solely a query function.
I`ll make changes in v3.

Thanks,
Vikas

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5465 bytes --]

^ permalink raw reply

* [PATCH] fixup! net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Fidelio Lawson @ 2026-04-17 15:50 UTC (permalink / raw)
  To: netdev; +Cc: Marek Vasut, Andrew Lunn, Woojung Huh, Fidelio Lawson
In-Reply-To: <20260417-ksz87xx_errata_low_loss_connections-v4-1-6c7044ec4363@exotec.com>

---
 drivers/net/dsa/microchip/ksz8.c     | 6 ++++++
 drivers/net/dsa/microchip/ksz8_reg.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/dsa/microchip/ksz8.c b/drivers/net/dsa/microchip/ksz8.c
index 0f2b8acee80f..62fc59c3da7e 100644
--- a/drivers/net/dsa/microchip/ksz8.c
+++ b/drivers/net/dsa/microchip/ksz8.c
@@ -1297,6 +1297,9 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
 	case PHY_REG_KSZ87XX_LPF_BW:
 		if (!ksz_is_ksz87xx(dev))
 			return -EOPNOTSUPP;
+		/* Only accept LPF bandwidth bits [7:6] */
+		if (val & ~KSZ87XX_LPF_VALID_MASK)
+			return -EINVAL;
 		ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_PHY_LPF, (u8)val);
 		if (ret)
 			return ret;
@@ -1305,6 +1308,9 @@ int ksz8_w_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 val)
 	case PHY_REG_KSZ87XX_EQ_INIT:
 		if (!ksz_is_ksz87xx(dev))
 			return -EOPNOTSUPP;
+		/* Only accept DSP EQ initial value bits [5:0] */
+		if (val & ~KSZ87XX_DSP_EQ_VALID_MASK)
+			return -EINVAL;
 		ret = ksz8_ind_write8(dev, TABLE_LINK_MD, KSZ87XX_REG_DSP_EQ, (u8)val);
 		if (ret)
 			return ret;
diff --git a/drivers/net/dsa/microchip/ksz8_reg.h b/drivers/net/dsa/microchip/ksz8_reg.h
index 5df17c463f7c..cd41214f874e 100644
--- a/drivers/net/dsa/microchip/ksz8_reg.h
+++ b/drivers/net/dsa/microchip/ksz8_reg.h
@@ -206,6 +206,9 @@
 #define KSZ87XX_REG_DSP_EQ			0x08   /* DSP EQ initial value */
 #define KSZ87XX_REG_PHY_LPF			0x4C   /* RX LPF bandwidth */
 
+#define KSZ87XX_DSP_EQ_VALID_MASK	GENMASK(5, 0)
+#define KSZ87XX_LPF_VALID_MASK		GENMASK(7, 6)
+
 /* For KSZ8765. */
 #define PORT_REMOTE_ASYM_PAUSE		BIT(5)
 #define PORT_REMOTE_SYM_PAUSE		BIT(4)
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v7 0/5] netem: bug fixes
From: Simon Horman @ 2026-04-17 16:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20260415142822.133241-1-stephen@networkplumber.org>

On Wed, Apr 15, 2026 at 07:27:03AM -0700, Stephen Hemminger wrote:
> These bugs were found when doing AI assisted  review of sch_netem.c
> during investigation of the packet duplication recursion problem
> addressed in Jamal's series.
> 
> The fixes cover:
> 
>  - probability gaps in the 4-state Markov loss model
>  - queue limit not accounting for reordered packets
>  - PRNG reseeded on every tc change, breaking reproducibility
>  - slot delay configuration not validated for inverted ranges
>  - slot delay arithmetic overflow for ranges above ~2.1 seconds
> 
> v7 - queue limit check Fixes: goes back further to earlier change
>    - use NL_SET_ERR_MSG_ATTR
> 
> Stephen Hemminger (5):
>   net/sched: netem: fix probability gaps in 4-state loss model
>   net/sched: netem: fix queue limit check to include reordered packets
>   net/sched: netem: only reseed PRNG when seed is explicitly provided
>   net/sched: netem: check for invalid slot range
>   net/sched: netem: fix slot delay calculation overflow

To the maintainers: I'd like to ask for more time to complete review of this.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox