* [PATCH net v3 0/5] Fix i40e/ice/iavf VF bonding after netdev lock changes
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez
This series fixes VF bonding failures introduced by commit ad7c7b2172c3
("net: hold netdev instance lock during sysfs operations").
When adding VFs to a bond immediately after setting trust mode, MAC
address changes fail with -EAGAIN, preventing bonding setup. This
affects both i40e (700-series) and ice (800-series) Intel NICs.
The core issue is lock contention: iavf_set_mac() is now called with the
netdev lock held and waits for MAC change completion while holding it.
However, both the watchdog task that sends the request and the adminq_task
that processes PF responses also need this lock, creating a deadlock where
neither can run, causing timeouts.
Additionally, setting VF trust triggers an unnecessary ~10 second VF reset
in i40e driver that delays bonding setup, even though filter
synchronization happens naturally during normal VF operation. For ice
driver, the delay is not so big, but in the same way the operation is not
necessary.
This series:
1. Adds safety guard to prevent MAC changes during reset or early
initialization (before VF is ready)
2. Eliminates unnecessary VF reset when setting trust in i40e
3. Fixes lock contention by polling admin queue synchronously
4. Eliminates unnecessary VF reset when setting trust in ice
5. Refactors virtchnl polling to unify init-time and runtime code paths
The key fix (patch 3/5) implements a synchronous MAC change operation
similar to the approach used for ndo_change_mtu deadlock fix:
https://lore.kernel.org/intel-wired-lan/20260211191855.1532226-1-poros@redhat.com/
Instead of scheduling work and waiting, it:
- Sends the virtchnl message directly (not via watchdog)
- Polls the admin queue hardware directly for responses
- Processes all messages inline (including non-MAC messages)
- Returns when complete or times out
This allows the operation to complete synchronously while holding
netdev_lock, without relying on watchdog or adminq_task. A new generic
iavf_poll_virtchnl_response() function was introduced for this.
Patch 5 refactors the polling implementation based on Przemek Kitszel
feedback, unifying in a centralized polling way, the previously (with
patch 3) separate init-time (avf_poll_virtchnl_msg()) and runtime polling
(iavf_poll_virtchnl_response()) into the original polling function
(iavf_poll_virtchnl_msg()) allowing both behaviors.
I have preferred to create a separate patch for the refactoring for the
sake of clarity in the solution, and I would prefer to include in the net
series because it is tightly coupled with patch 3.
The function can sleep for up to 2.5 seconds polling hardware, but this
is acceptable since netdev_lock is per-device and only serializes
operations on the same interface.
Testing shows VF bonding now works reliably in ~5 seconds vs 15+ seconds
before (i40e), without timeouts or errors (i40e and ice).
Tested on Intel 700-series (i40e) and 800-series (ice) dual-port NICs
with iavf driver.
Thanks to Jan Tluka <jtluka@redhat.com> and Yuying Ma <yuma@redhat.com> for
reporting the issues.
Jose Ignacio Tornos Martinez (5):
iavf: return EBUSY if reset in progress or not ready during MAC change
i40e: skip unnecessary VF reset when setting trust
iavf: send MAC change request synchronously
ice: skip unnecessary VF reset when setting trust
iavf: refactor virtchnl polling to unify init and runtime paths
---
v3:
- Complete patch 3 with the comments from Przemek Kitszel
- Added patch 5: Refactor to unify polling into iavf_poll_virtchnl_msg()
function (Przemek Kitszel suggestion). It processes messages through
iavf_virtchnl_completion() when appropriate (runtime operations with
timeout; init-time operations continue to return raw messages without
completion processing).
- No changes to patch 1,2 and 4 from v2
v2: https://lore.kernel.org/netdev/20260407165206.1121317-1-jtornosm@redhat.com/
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 ++++++-
drivers/net/ethernet/intel/iavf/iavf.h | 6 +++++-
drivers/net/ethernet/intel/iavf/iavf_main.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------
drivers/net/ethernet/intel/ice/ice_sriov.c | 13 +++++++++----
5 files changed, 193 insertions(+), 64 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH net v3 1/5] iavf: return EBUSY if reset in progress or not ready during MAC change
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez
In-Reply-To: <20260414110006.124286-1-jtornosm@redhat.com>
When a MAC address change is requested while the VF is resetting or still
initializing, return -EBUSY immediately instead of attempting the
operation.
Additionally, during early initialization states (before __IAVF_DOWN),
the PF may be slow to respond to MAC change requests, causing long
delays. Only allow MAC changes once the VF reaches __IAVF_DOWN state or
later, when the watchdog is running and the VF is ready for operations.
After commit ad7c7b2172c3 ("net: hold netdev instance lock
during sysfs operations"), MAC changes are called with the netdev lock
held, so we should not wait with the lock held during reset or
initialization. This allows the caller to retry or handle the busy state
appropriately without blocking other operations.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
drivers/net/ethernet/intel/iavf/iavf_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index dad001abc908..67aa14350b1b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1060,6 +1060,9 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
struct sockaddr *addr = p;
int ret;
+ if (iavf_is_reset_in_progress(adapter) || adapter->state < __IAVF_DOWN)
+ return -EBUSY;
+
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
--
2.53.0
^ permalink raw reply related
* [PATCH net v3 2/5] i40e: skip unnecessary VF reset when setting trust
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez
In-Reply-To: <20260414110006.124286-1-jtornosm@redhat.com>
When VF trust is changed, i40e_ndo_set_vf_trust() always calls
i40e_vc_reset_vf() to sync MAC/VLAN filters. However, this reset is
only necessary when trust is removed from a VF that has ADQ (advanced
queue) filters, which need to be deleted
In all other cases, the reset causes a ~10 second delay during which:
- VF must reinitialize completely
- Any in-progress operations (like bonding enslave) fail with timeouts
- VF is unavailable
The MAC/VLAN filter sync will happen naturally through the normal VF
operations and doesn't require a forced reset.
Fix by only resetting when actually needed: when removing trust from a
VF that has ADQ cloud filters. For all other trust changes, just update
the trust flag and let normal operation continue.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
---
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index a26c3d47ec15..fea267af7afe 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -4987,16 +4987,21 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
- i40e_vc_reset_vf(vf, true);
dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
vf_id, setting ? "" : "un");
+ /* Only reset VF if we're removing trust and it has ADQ cloud filters.
+ * Cloud filters can only be added when trusted, so they must be
+ * removed when trust is revoked. Other trust changes don't require
+ * reset - MAC/VLAN filter sync happens through normal operation.
+ */
if (vf->adq_enabled) {
if (!vf->trusted) {
dev_info(&pf->pdev->dev,
"VF %u no longer Trusted, deleting all cloud filters\n",
vf_id);
i40e_del_all_cloud_filters(vf);
+ i40e_vc_reset_vf(vf, true);
}
}
--
2.53.0
^ permalink raw reply related
* [PATCH net v3 3/5] iavf: send MAC change request synchronously
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez, stable
In-Reply-To: <20260414110006.124286-1-jtornosm@redhat.com>
After commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs
operations"), iavf_set_mac() is called with the netdev instance lock
already held.
The function queues a MAC address change request via
iavf_replace_primary_mac() and then waits for completion. However, in
the current flow, the actual virtchnl message is sent by the watchdog
task, which also needs to acquire the netdev lock to run. Additionally,
the adminq_task which processes virtchnl responses also needs the netdev
lock.
This creates a deadlock scenario:
1. iavf_set_mac() holds netdev lock and waits for MAC change
2. Watchdog needs netdev lock to send the request -> blocked
3. Even if request is sent, adminq_task needs netdev lock to process
PF response -> blocked
4. MAC change times out after 2.5 seconds
5. iavf_set_mac() returns -EAGAIN
This particularly affects VFs during bonding setup when multiple VFs are
enslaved in quick succession.
Fix by implementing a synchronous MAC change operation similar to the
approach used in commit fdadbf6e84c4 ("iavf: fix incorrect reset handling
in callbacks").
The solution:
1. Send the virtchnl ADD_ETH_ADDR message directly (not via watchdog)
2. Poll the admin queue hardware directly for responses
3. Process all received messages (including non-MAC messages)
4. Return when MAC change completes or times out
A new generic function iavf_poll_virtchnl_response() is introduced that
can be reused for any future synchronous virtchnl operations. It takes a
callback to check completion, allowing flexible condition checking.
This allows the operation to complete synchronously while holding
netdev_lock, without relying on watchdog or adminq_task. The function
can sleep for up to 2.5 seconds polling hardware, but this is acceptable
since netdev_lock is per-device and only serializes operations on the
same interface.
To support this, change iavf_add_ether_addrs() to return an error code
instead of void, allowing callers to detect failures.
Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
cc: stable@vger.kernel.org
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
---
v3: Complete with Przemek Kitszel comments:
- Moved iavf_poll_virtchnl_response() to iavf_virtchnl.c for reusability
- Changed kdoc to use "Return:" instead of "Returns"
- Changed to do-while loop structure
- Added pending parameter to skip sleep when more messages queued
- Reduced sleep time to 50-75 usec (from 1000-2000, per commit 9e3f23f44f32)
- Added v_opcode parameter for standard completion checking
- Callback parameter takes priority over opcode check
- Made cond_data parameter const
- Final condition check after timeout before returning -EAGAIN
v2: https://lore.kernel.org/netdev/20260407165206.1121317-4-jtornosm@redhat.com/
drivers/net/ethernet/intel/iavf/iavf.h | 7 +-
drivers/net/ethernet/intel/iavf/iavf_main.c | 57 ++++++---
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 111 +++++++++++++++++-
3 files changed, 155 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index e9fb0a0919e3..b012a91b0252 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -589,7 +589,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter);
void iavf_enable_queues(struct iavf_adapter *adapter);
void iavf_disable_queues(struct iavf_adapter *adapter);
void iavf_map_queues(struct iavf_adapter *adapter);
-void iavf_add_ether_addrs(struct iavf_adapter *adapter);
+int iavf_add_ether_addrs(struct iavf_adapter *adapter);
void iavf_del_ether_addrs(struct iavf_adapter *adapter);
void iavf_add_vlans(struct iavf_adapter *adapter);
void iavf_del_vlans(struct iavf_adapter *adapter);
@@ -607,6 +607,11 @@ void iavf_disable_vlan_stripping(struct iavf_adapter *adapter);
void iavf_virtchnl_completion(struct iavf_adapter *adapter,
enum virtchnl_ops v_opcode,
enum iavf_status v_retval, u8 *msg, u16 msglen);
+int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data,
+ enum virtchnl_ops v_opcode,
+ unsigned int timeout_ms);
int iavf_config_rss(struct iavf_adapter *adapter);
void iavf_cfg_queues_bw(struct iavf_adapter *adapter);
void iavf_cfg_queues_quanta_size(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 67aa14350b1b..80277d495a8d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1047,6 +1047,46 @@ static bool iavf_is_mac_set_handled(struct net_device *netdev,
return ret;
}
+/**
+ * iavf_mac_change_done - Check if MAC change completed
+ * @adapter: board private structure
+ * @data: MAC address being checked (as const void *)
+ *
+ * Callback for iavf_poll_virtchnl_response() to check if MAC change completed.
+ *
+ * Returns true if MAC change completed, false otherwise
+ */
+static bool iavf_mac_change_done(struct iavf_adapter *adapter, const void *data)
+{
+ const u8 *addr = data;
+
+ return iavf_is_mac_set_handled(adapter->netdev, addr);
+}
+
+/**
+ * iavf_set_mac_sync - Synchronously change MAC address
+ * @adapter: board private structure
+ * @addr: MAC address to set
+ *
+ * Sends MAC change request to PF and polls admin queue for response.
+ * Caller must hold netdev_lock. This can sleep for up to 2.5 seconds.
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
+{
+ int ret;
+
+ netdev_assert_locked(adapter->netdev);
+
+ ret = iavf_add_ether_addrs(adapter);
+ if (ret)
+ return ret;
+
+ return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done, addr,
+ VIRTCHNL_OP_UNKNOWN, 2500);
+}
+
/**
* iavf_set_mac - NDO callback to set port MAC address
* @netdev: network interface device structure
@@ -1067,26 +1107,13 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
return -EADDRNOTAVAIL;
ret = iavf_replace_primary_mac(adapter, addr->sa_data);
-
if (ret)
return ret;
- ret = wait_event_interruptible_timeout(adapter->vc_waitqueue,
- iavf_is_mac_set_handled(netdev, addr->sa_data),
- msecs_to_jiffies(2500));
-
- /* If ret < 0 then it means wait was interrupted.
- * If ret == 0 then it means we got a timeout.
- * else it means we got response for set MAC from PF,
- * check if netdev MAC was updated to requested MAC,
- * if yes then set MAC succeeded otherwise it failed return -EACCES
- */
- if (ret < 0)
+ ret = iavf_set_mac_sync(adapter, addr->sa_data);
+ if (ret)
return ret;
- if (!ret)
- return -EAGAIN;
-
if (!ether_addr_equal(netdev->dev_addr, addr->sa_data))
return -EACCES;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index a52c100dcbc5..df124f840ddb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -2,6 +2,7 @@
/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/net/intel/libie/rx.h>
+#include <net/netdev_lock.h>
#include "iavf.h"
#include "iavf_ptp.h"
@@ -555,8 +556,10 @@ iavf_set_mac_addr_type(struct virtchnl_ether_addr *virtchnl_ether_addr,
* @adapter: adapter structure
*
* Request that the PF add one or more addresses to our filters.
+ *
+ * Return: 0 on success, negative on failure
**/
-void iavf_add_ether_addrs(struct iavf_adapter *adapter)
+int iavf_add_ether_addrs(struct iavf_adapter *adapter)
{
struct virtchnl_ether_addr_list *veal;
struct iavf_mac_filter *f;
@@ -568,7 +571,7 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
/* bail because we already have a command pending */
dev_err(&adapter->pdev->dev, "Cannot add filters, command %d pending\n",
adapter->current_op);
- return;
+ return -EBUSY;
}
spin_lock_bh(&adapter->mac_vlan_list_lock);
@@ -580,7 +583,7 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
if (!count) {
adapter->aq_required &= ~IAVF_FLAG_AQ_ADD_MAC_FILTER;
spin_unlock_bh(&adapter->mac_vlan_list_lock);
- return;
+ return 0;
}
adapter->current_op = VIRTCHNL_OP_ADD_ETH_ADDR;
@@ -595,7 +598,7 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
veal = kzalloc(len, GFP_ATOMIC);
if (!veal) {
spin_unlock_bh(&adapter->mac_vlan_list_lock);
- return;
+ return -ENOMEM;
}
veal->vsi_id = adapter->vsi_res->vsi_id;
@@ -617,6 +620,7 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
iavf_send_pf_msg(adapter, VIRTCHNL_OP_ADD_ETH_ADDR, (u8 *)veal, len);
kfree(veal);
+ return 0;
}
/**
@@ -2956,3 +2960,102 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
} /* switch v_opcode */
adapter->current_op = VIRTCHNL_OP_UNKNOWN;
}
+
+/**
+ * iavf_virtchnl_done - Check if virtchnl operation completed
+ * @adapter: board private structure
+ * @condition: optional callback for custom completion check
+ * (takes priority)
+ * @cond_data: context data for callback
+ * @v_opcode: virtchnl opcode value we're waiting for if no condition
+ * configured (typically VIRTCHNL_OP_UNKNOWN), if condition not used
+ *
+ * Checks completion status. Callback takes priority if provided. Otherwise
+ * waits for current_op to reach v_opcode (typically VIRTCHNL_OP_UNKNOWN
+ * after completion).
+ *
+ * Return: true if operation completed
+ */
+static inline bool iavf_virtchnl_done(struct iavf_adapter *adapter,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data,
+ enum virtchnl_ops v_opcode)
+{
+ if (condition)
+ return condition(adapter, cond_data);
+
+ return adapter->current_op == v_opcode;
+}
+
+/**
+ * iavf_poll_virtchnl_response - Poll admin queue for virtchnl response
+ * @adapter: board private structure
+ * @condition: optional callback to check if desired response received
+ * (takes priority)
+ * @cond_data: context data passed to condition callback
+ * @v_opcode: virtchnl opcode value to wait for if no condition configured
+ * (typically VIRTCHNL_OP_UNKNOWN), if condition, not used
+ * @timeout_ms: maximum time to wait in milliseconds
+ *
+ * Polls admin queue and processes all messages until condition returns true
+ * or timeout expires. If condition is NULL, waits for current_op to become
+ * v_opcode (typically VIRTCHNL_OP_UNKNOWN after operation completes).
+ * Caller must hold netdev_lock. This can sleep for up to timeout_ms while
+ * polling hardware.
+ *
+ * Return: 0 on success (condition met), -EAGAIN on timeout or error
+ */
+int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data,
+ enum virtchnl_ops v_opcode,
+ unsigned int timeout_ms)
+{
+ struct iavf_hw *hw = &adapter->hw;
+ struct iavf_arq_event_info event;
+ enum virtchnl_ops v_op;
+ enum iavf_status v_ret;
+ unsigned long timeout;
+ u16 pending;
+ int ret;
+
+ netdev_assert_locked(adapter->netdev);
+
+ event.buf_len = IAVF_MAX_AQ_BUF_SIZE;
+ event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
+ if (!event.msg_buf)
+ return -ENOMEM;
+
+ timeout = jiffies + msecs_to_jiffies(timeout_ms);
+ do {
+ if (iavf_virtchnl_done(adapter, condition, cond_data, v_opcode)) {
+ ret = 0;
+ goto out;
+ }
+
+ ret = iavf_clean_arq_element(hw, &event, &pending);
+ if (!ret) {
+ v_op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
+ v_ret = (enum iavf_status)le32_to_cpu(event.desc.cookie_low);
+
+ iavf_virtchnl_completion(adapter, v_op, v_ret,
+ event.msg_buf, event.msg_len);
+
+ memset(event.msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE);
+
+ if (pending)
+ continue;
+ }
+
+ usleep_range(50, 75);
+ } while (time_before(jiffies, timeout));
+
+ if (iavf_virtchnl_done(adapter, condition, cond_data, v_opcode))
+ ret = 0;
+ else
+ ret = -EAGAIN;
+
+out:
+ kfree(event.msg_buf);
+ return ret;
+}
--
2.53.0
^ permalink raw reply related
* [PATCH net v3 4/5] ice: skip unnecessary VF reset when setting trust
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez
In-Reply-To: <20260414110006.124286-1-jtornosm@redhat.com>
Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
ice_reset_vf() when the trust setting changes.
The ice driver already has logic to clean up MAC LLDP filters when
removing trust, which is the only operation that requires filter
synchronization. After this cleanup, the VF reset is only necessary if
there were actually filters to remove.
For all other trust state changes (setting trust, or removing trust
when no filters exist), the reset is unnecessary as filter
synchronization happens naturally through normal VF operations.
Fix by only triggering the VF reset when removing trust AND filters
were actually cleaned up (num_mac_lldp was non-zero).
This saves some time and eliminates unnecessary service disruption when
changing VF trust settings if not necessary.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
---
drivers/net/ethernet/intel/ice/ice_sriov.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 7e00e091756d..23f692b1e86c 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1399,14 +1399,19 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
mutex_lock(&vf->cfg_lock);
- while (!trusted && vf->num_mac_lldp)
- ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
-
vf->trusted = trusted;
- ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ /* Only reset VF if removing trust and there are MAC LLDP filters
+ * to clean up. Reset is needed to ensure filter removal completes.
+ */
+ if (!trusted && vf->num_mac_lldp) {
+ while (vf->num_mac_lldp)
+ ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
+ ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
+ }
+
mutex_unlock(&vf->cfg_lock);
out_put_vf:
--
2.53.0
^ permalink raw reply related
* [PATCH net v3 5/5] iavf: refactor virtchnl polling into single function
From: Jose Ignacio Tornos Martinez @ 2026-04-14 11:00 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, jesse.brandeburg, anthony.l.nguyen, davem,
edumazet, kuba, pabeni, Jose Ignacio Tornos Martinez,
Przemek Kitszel
In-Reply-To: <20260414110006.124286-1-jtornosm@redhat.com>
At this moment, the driver has two separate functions for polling virtchnl
messages from the admin queue:
- iavf_poll_virtchnl_msg() for init-time (no timeout, no completion
handler)
- iavf_poll_virtchnl_response() for runtime (with timeout, calls
completion)
Refactor by enhancing iavf_poll_virtchnl_msg() to handle both use cases:
1. Init-time mode (timeout_ms=0):
- Polls until matching opcode found or queue empty
- Returns raw message data without processing through completion handler
- Exits immediately on empty queue (no sleep/retry)
2. Runtime mode (timeout_ms>0):
- Polls with timeout using condition callback or opcode check
- Processes all messages through iavf_virtchnl_completion()
- Supports custom completion callback (takes priority) or falls back
to checking adapter->current_op against expected opcode
- Uses pending parameter to skip sleep when more messages queued
- Uses 50-75 usec sleep (due to commit 9e3f23f44f32 ("i40e: reduce wait
time for adminq command completion"))
By unifying message handling, both init-time and runtime messages can be
processed through the completion handler when appropriate, ensuring
consistent state updates and maintaining backward compatibility with all
existing call sites.
Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf.h | 9 +-
drivers/net/ethernet/intel/iavf/iavf_main.c | 13 +-
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 247 ++++++++----------
3 files changed, 125 insertions(+), 144 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index b012a91b0252..9b25c5a65d2a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -607,11 +607,10 @@ void iavf_disable_vlan_stripping(struct iavf_adapter *adapter);
void iavf_virtchnl_completion(struct iavf_adapter *adapter,
enum virtchnl_ops v_opcode,
enum iavf_status v_retval, u8 *msg, u16 msglen);
-int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
- bool (*condition)(struct iavf_adapter *, const void *),
- const void *cond_data,
- enum virtchnl_ops v_opcode,
- unsigned int timeout_ms);
+int iavf_poll_virtchnl_msg(struct iavf_hw *hw, struct iavf_arq_event_info *event,
+ enum virtchnl_ops op_to_poll, unsigned int timeout_ms,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data);
int iavf_config_rss(struct iavf_adapter *adapter);
void iavf_cfg_queues_bw(struct iavf_adapter *adapter);
void iavf_cfg_queues_quanta_size(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 80277d495a8d..b0db15fd8ddb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1075,6 +1075,7 @@ static bool iavf_mac_change_done(struct iavf_adapter *adapter, const void *data)
*/
static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
{
+ struct iavf_arq_event_info event;
int ret;
netdev_assert_locked(adapter->netdev);
@@ -1083,8 +1084,16 @@ static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
if (ret)
return ret;
- return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done, addr,
- VIRTCHNL_OP_UNKNOWN, 2500);
+ event.buf_len = IAVF_MAX_AQ_BUF_SIZE;
+ event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
+ if (!event.msg_buf)
+ return -ENOMEM;
+
+ ret = iavf_poll_virtchnl_msg(&adapter->hw, &event, VIRTCHNL_OP_UNKNOWN,
+ 2500, iavf_mac_change_done, addr);
+
+ kfree(event.msg_buf);
+ return ret;
}
/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index df124f840ddb..ef9a251060d9 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -54,55 +54,121 @@ int iavf_send_api_ver(struct iavf_adapter *adapter)
}
/**
- * iavf_poll_virtchnl_msg
+ * iavf_virtchnl_completion_done - Check if virtchnl operation completed
+ * @adapter: adapter structure
+ * @condition: optional callback for custom completion check
+ * @cond_data: context data for callback
+ * @op_to_poll: opcode to check against current_op (if no callback)
+ *
+ * Checks if operation is complete. Callback takes priority if provided,
+ * otherwise checks if current_op matches op_to_poll.
+ *
+ * Return: true if operation completed
+ */
+static inline bool
+iavf_virtchnl_completion_done(struct iavf_adapter *adapter,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data,
+ enum virtchnl_ops op_to_poll)
+{
+ if (condition)
+ return condition(adapter, cond_data);
+
+ return adapter->current_op == op_to_poll;
+}
+
+/**
+ * iavf_poll_virtchnl_msg - Poll admin queue for virtchnl message
* @hw: HW configuration structure
* @event: event to populate on success
- * @op_to_poll: requested virtchnl op to poll for
+ * @op_to_poll: virtchnl opcode to poll for (used for init-time and runtime
+ * without callback)
+ * @timeout_ms: timeout in milliseconds (0 = no timeout, exit on empty queue)
+ * @condition: optional callback to check custom completion (runtime use,
+ * takes priority over op_to_poll check)
+ * @cond_data: context data for condition callback
+ *
+ * Enhanced polling function that handles both init-time and runtime use cases:
+ * - Init-time: Set op_to_poll, timeout_ms=0, condition=NULL
+ * Polls until matching opcode found or queue empty
+ * - Runtime with callback: Set timeout_ms>0, condition callback, cond_data
+ * Polls with timeout until condition returns true (op_to_poll not used)
+ * - Runtime without callback: Set op_to_poll, timeout_ms>0, condition=NULL
+ * Polls with timeout until adapter->current_op == op_to_poll
+ *
+ * Runtime messages are processed through iavf_virtchnl_completion().
+ * For init-time use, returns 0 with raw message data in event buffer.
+ * For runtime use, returns 0 when completion condition is met.
*
- * Initialize poll for virtchnl msg matching the requested_op. Returns 0
- * if a message of the correct opcode is in the queue or an error code
- * if no message matching the op code is waiting and other failures.
+ * Return: 0 on success, -EAGAIN on timeout, or error code
*/
-static int
-iavf_poll_virtchnl_msg(struct iavf_hw *hw, struct iavf_arq_event_info *event,
- enum virtchnl_ops op_to_poll)
+int iavf_poll_virtchnl_msg(struct iavf_hw *hw, struct iavf_arq_event_info *event,
+ enum virtchnl_ops op_to_poll, unsigned int timeout_ms,
+ bool (*condition)(struct iavf_adapter *, const void *),
+ const void *cond_data)
{
+ struct iavf_adapter *adapter = hw->back;
+ unsigned long timeout = timeout_ms ? jiffies + msecs_to_jiffies(timeout_ms) : 0;
enum virtchnl_ops received_op;
enum iavf_status status;
- u32 v_retval;
+ u32 v_retval = 0;
+ u16 pending;
- while (1) {
- /* When the AQ is empty, iavf_clean_arq_element will return
- * nonzero and this loop will terminate.
- */
- status = iavf_clean_arq_element(hw, event, NULL);
- if (status != IAVF_SUCCESS)
- return iavf_status_to_errno(status);
- received_op =
- (enum virtchnl_ops)le32_to_cpu(event->desc.cookie_high);
+ do {
+ if (timeout_ms && iavf_virtchnl_completion_done(adapter, condition,
+ cond_data, op_to_poll))
+ return 0;
- if (received_op == VIRTCHNL_OP_EVENT) {
- struct iavf_adapter *adapter = hw->back;
- struct virtchnl_pf_event *vpe =
- (struct virtchnl_pf_event *)event->msg_buf;
+ status = iavf_clean_arq_element(hw, event, &pending);
+ if (status == IAVF_SUCCESS) {
+ received_op = (enum virtchnl_ops)le32_to_cpu(event->desc.cookie_high);
- if (vpe->event != VIRTCHNL_EVENT_RESET_IMPENDING)
- continue;
+ /* Handle reset events specially */
+ if (received_op == VIRTCHNL_OP_EVENT) {
+ struct virtchnl_pf_event *vpe =
+ (struct virtchnl_pf_event *)event->msg_buf;
- dev_info(&adapter->pdev->dev, "Reset indication received from the PF\n");
- if (!(adapter->flags & IAVF_FLAG_RESET_PENDING))
- iavf_schedule_reset(adapter,
- IAVF_FLAG_RESET_PENDING);
+ if (vpe->event != VIRTCHNL_EVENT_RESET_IMPENDING)
+ continue;
+
+ dev_info(&adapter->pdev->dev,
+ "Reset indication received from the PF\n");
+ if (!(adapter->flags & IAVF_FLAG_RESET_PENDING))
+ iavf_schedule_reset(adapter,
+ IAVF_FLAG_RESET_PENDING);
+
+ return -EIO;
+ }
+
+ v_retval = le32_to_cpu(event->desc.cookie_low);
+
+ if (!timeout_ms) {
+ if (received_op == op_to_poll)
+ return virtchnl_status_to_errno((enum virtchnl_status_code)
+ v_retval);
+ } else {
+ iavf_virtchnl_completion(adapter, received_op,
+ (enum iavf_status)v_retval,
+ event->msg_buf, event->msg_len);
+ }
+
+ if (pending)
+ continue;
+ } else if (!timeout_ms) {
+ return iavf_status_to_errno(status);
+ }
- return -EIO;
+ if (timeout_ms) {
+ memset(event->msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE);
+ usleep_range(50, 75);
}
- if (op_to_poll == received_op)
- break;
- }
+ } while (!timeout_ms || time_before(jiffies, timeout));
+
+ if (iavf_virtchnl_completion_done(adapter, condition, cond_data, op_to_poll))
+ return 0;
- v_retval = le32_to_cpu(event->desc.cookie_low);
- return virtchnl_status_to_errno((enum virtchnl_status_code)v_retval);
+ return -EAGAIN;
}
/**
@@ -124,7 +190,8 @@ int iavf_verify_api_ver(struct iavf_adapter *adapter)
if (!event.msg_buf)
return -ENOMEM;
- err = iavf_poll_virtchnl_msg(&adapter->hw, &event, VIRTCHNL_OP_VERSION);
+ err = iavf_poll_virtchnl_msg(&adapter->hw, &event, VIRTCHNL_OP_VERSION,
+ 0, NULL, NULL);
if (!err) {
struct virtchnl_version_info *pf_vvi =
(struct virtchnl_version_info *)event.msg_buf;
@@ -294,7 +361,8 @@ int iavf_get_vf_config(struct iavf_adapter *adapter)
if (!event.msg_buf)
return -ENOMEM;
- err = iavf_poll_virtchnl_msg(hw, &event, VIRTCHNL_OP_GET_VF_RESOURCES);
+ err = iavf_poll_virtchnl_msg(hw, &event, VIRTCHNL_OP_GET_VF_RESOURCES,
+ 0, NULL, NULL);
memcpy(adapter->vf_res, event.msg_buf, min(event.msg_len, len));
/* some PFs send more queues than we should have so validate that
@@ -322,7 +390,8 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter)
return -ENOMEM;
err = iavf_poll_virtchnl_msg(&adapter->hw, &event,
- VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS);
+ VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS,
+ 0, NULL, NULL);
if (!err)
memcpy(&adapter->vlan_v2_caps, event.msg_buf,
min(event.msg_len, len));
@@ -342,7 +411,8 @@ int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter)
event.buf_len = sizeof(rxdids);
err = iavf_poll_virtchnl_msg(&adapter->hw, &event,
- VIRTCHNL_OP_GET_SUPPORTED_RXDIDS);
+ VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
+ 0, NULL, NULL);
if (!err)
adapter->supp_rxdids = rxdids;
@@ -359,7 +429,8 @@ int iavf_get_vf_ptp_caps(struct iavf_adapter *adapter)
event.buf_len = sizeof(caps);
err = iavf_poll_virtchnl_msg(&adapter->hw, &event,
- VIRTCHNL_OP_1588_PTP_GET_CAPS);
+ VIRTCHNL_OP_1588_PTP_GET_CAPS,
+ 0, NULL, NULL);
if (!err)
adapter->ptp.hw_caps = caps;
@@ -2961,101 +3032,3 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
adapter->current_op = VIRTCHNL_OP_UNKNOWN;
}
-/**
- * iavf_virtchnl_done - Check if virtchnl operation completed
- * @adapter: board private structure
- * @condition: optional callback for custom completion check
- * (takes priority)
- * @cond_data: context data for callback
- * @v_opcode: virtchnl opcode value we're waiting for if no condition
- * configured (typically VIRTCHNL_OP_UNKNOWN), if condition not used
- *
- * Checks completion status. Callback takes priority if provided. Otherwise
- * waits for current_op to reach v_opcode (typically VIRTCHNL_OP_UNKNOWN
- * after completion).
- *
- * Return: true if operation completed
- */
-static inline bool iavf_virtchnl_done(struct iavf_adapter *adapter,
- bool (*condition)(struct iavf_adapter *, const void *),
- const void *cond_data,
- enum virtchnl_ops v_opcode)
-{
- if (condition)
- return condition(adapter, cond_data);
-
- return adapter->current_op == v_opcode;
-}
-
-/**
- * iavf_poll_virtchnl_response - Poll admin queue for virtchnl response
- * @adapter: board private structure
- * @condition: optional callback to check if desired response received
- * (takes priority)
- * @cond_data: context data passed to condition callback
- * @v_opcode: virtchnl opcode value to wait for if no condition configured
- * (typically VIRTCHNL_OP_UNKNOWN), if condition, not used
- * @timeout_ms: maximum time to wait in milliseconds
- *
- * Polls admin queue and processes all messages until condition returns true
- * or timeout expires. If condition is NULL, waits for current_op to become
- * v_opcode (typically VIRTCHNL_OP_UNKNOWN after operation completes).
- * Caller must hold netdev_lock. This can sleep for up to timeout_ms while
- * polling hardware.
- *
- * Return: 0 on success (condition met), -EAGAIN on timeout or error
- */
-int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
- bool (*condition)(struct iavf_adapter *, const void *),
- const void *cond_data,
- enum virtchnl_ops v_opcode,
- unsigned int timeout_ms)
-{
- struct iavf_hw *hw = &adapter->hw;
- struct iavf_arq_event_info event;
- enum virtchnl_ops v_op;
- enum iavf_status v_ret;
- unsigned long timeout;
- u16 pending;
- int ret;
-
- netdev_assert_locked(adapter->netdev);
-
- event.buf_len = IAVF_MAX_AQ_BUF_SIZE;
- event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
- if (!event.msg_buf)
- return -ENOMEM;
-
- timeout = jiffies + msecs_to_jiffies(timeout_ms);
- do {
- if (iavf_virtchnl_done(adapter, condition, cond_data, v_opcode)) {
- ret = 0;
- goto out;
- }
-
- ret = iavf_clean_arq_element(hw, &event, &pending);
- if (!ret) {
- v_op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
- v_ret = (enum iavf_status)le32_to_cpu(event.desc.cookie_low);
-
- iavf_virtchnl_completion(adapter, v_op, v_ret,
- event.msg_buf, event.msg_len);
-
- memset(event.msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE);
-
- if (pending)
- continue;
- }
-
- usleep_range(50, 75);
- } while (time_before(jiffies, timeout));
-
- if (iavf_virtchnl_done(adapter, condition, cond_data, v_opcode))
- ret = 0;
- else
- ret = -EAGAIN;
-
-out:
- kfree(event.msg_buf);
- return ret;
-}
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v3 1/3] net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata
From: Marek Vasut @ 2026-04-14 11:05 UTC (permalink / raw)
To: Fidelio Lawson, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Marek Vasut, Maxime Chevallier, Simon Horman,
Heiner Kallweit, Russell King
Cc: netdev, linux-kernel, Fidelio Lawson
In-Reply-To: <20260414-ksz87xx_errata_low_loss_connections-v3-1-0e3838ca98c9@exotec.com>
On 4/14/26 11:12 AM, Fidelio Lawson wrote:
> Implement the "Module 3: Equalizer fix for short cables" erratum from
> Microchip document DS80000687C for KSZ87xx switches.
>
> The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
> where the PHY receiver equalizer may amplify high-amplitude signals
> excessively, resulting in internal distortion and link establishment
> failures.
>
> KSZ87xx devices require a workaround for the Module 3 low-loss cable
> condition, controlled through the switch TABLE_LINK_MD_V indirect
> registers.
>
> The affected registers are part of the switch address space and are not
> directly accessible from the PHY driver. To keep the PHY-facing API
> clean and avoid leaking switch-specific details, model this errata
> control as vendor-specific Clause 22 PHY registers.
>
> A vendor-specific Clause 22 PHY register is introduced as a mode
> selector in PHY_REG_LOW_LOSS_CTRL, and ksz8_r_phy() / ksz8_w_phy()
> translate accesses to these bits into the appropriate indirect
> TABLE_LINK_MD_V accesses.
>
> The control register defines the following modes:
> 0: disabled (default behavior)
> 1: EQ training workaround
> 2: LPF 90 MHz
> 3: LPF 62 MHz
> 4: LPF 55 MHz
> 5: LPF 44 MHz
I may not fully understand this, but aren't the EQ and LPF settings
orthogonal ?
^ permalink raw reply
* RE: [PATCH iwl-next v2 1/2] idpf: remove conditonal MBX deinit from idpf_vc_core_deinit()
From: Loktionov, Aleksandr @ 2026-04-14 11:07 UTC (permalink / raw)
To: Tantilov, Emil S, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Kitszel, Przemyslaw, Bhat, Jay,
Barrera, Ivan D, Zaremba, Larysa, Nguyen, Anthony L,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, Lobakin, Aleksander,
linux-pci@vger.kernel.org, Chittim, Madhu, decot@google.com,
willemb@google.com, sheenamo@google.com, lukas@wunner.de
In-Reply-To: <20260414031631.2107-2-emil.s.tantilov@intel.com>
> -----Original Message-----
> From: Tantilov, Emil S <emil.s.tantilov@intel.com>
> Sent: Tuesday, April 14, 2026 5:17 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Bhat, Jay <jay.bhat@intel.com>;
> Barrera, Ivan D <ivan.d.barrera@intel.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Zaremba, Larysa
> <larysa.zaremba@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Lobakin, Aleksander <aleksander.lobakin@intel.com>;
> linux-pci@vger.kernel.org; Chittim, Madhu <madhu.chittim@intel.com>;
> decot@google.com; willemb@google.com; sheenamo@google.com;
> lukas@wunner.de
> Subject: [PATCH iwl-next v2 1/2] idpf: remove conditonal MBX deinit
> from idpf_vc_core_deinit()
"conditional" -> "conditional"
Everything else looks fine
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
> Previously it was assumed that idpf_vc_core_deinit() is always being
> called during reset handling, with remove being an exception. Ideally
> the driver needs to communicate the changes to FW in all instances
> where the MBX is not already disabled. Remove the remove_in_prog check
> from
> idpf_vc_core_deinit() as the MBX was already disabled while handling
> the reset via libie_ctlq_xn_shutdown() by the service task. This is
> also needed by the following patch, introducing PCI callbacks support.
>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Reviewed-by: Jay Bhat <jay.bhat@intel.com>
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 129c8f6b0faa..fceaf3ec1cd4 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -3229,24 +3229,15 @@ int idpf_vc_core_init(struct idpf_adapter
> *adapter)
> */
> void idpf_vc_core_deinit(struct idpf_adapter *adapter) {
> - bool remove_in_prog;
> -
> if (!test_bit(IDPF_VC_CORE_INIT, adapter->flags))
> return;
>
> - /* Avoid transaction timeouts when called during reset */
> - remove_in_prog = test_bit(IDPF_REMOVE_IN_PROG, adapter->flags);
> - if (!remove_in_prog)
> - idpf_deinit_dflt_mbx(adapter);
> -
> idpf_ptp_release(adapter);
> idpf_deinit_task(adapter);
> idpf_idc_deinit_core_aux_device(adapter);
> idpf_rel_rx_pt_lkup(adapter);
> idpf_intr_rel(adapter);
> -
> - if (remove_in_prog)
> - idpf_deinit_dflt_mbx(adapter);
> + idpf_deinit_dflt_mbx(adapter);
>
> cancel_delayed_work_sync(&adapter->serv_task);
>
> --
> 2.37.3
^ permalink raw reply
* RE: [PATCH iwl-next v2 2/2] idpf: implement pci error handlers
From: Loktionov, Aleksandr @ 2026-04-14 11:09 UTC (permalink / raw)
To: Tantilov, Emil S, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Kitszel, Przemyslaw, Bhat, Jay,
Barrera, Ivan D, Zaremba, Larysa, Nguyen, Anthony L,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, Lobakin, Aleksander,
linux-pci@vger.kernel.org, Chittim, Madhu, decot@google.com,
willemb@google.com, sheenamo@google.com, lukas@wunner.de
In-Reply-To: <20260414031631.2107-3-emil.s.tantilov@intel.com>
> -----Original Message-----
> From: Tantilov, Emil S <emil.s.tantilov@intel.com>
> Sent: Tuesday, April 14, 2026 5:17 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Bhat, Jay <jay.bhat@intel.com>;
> Barrera, Ivan D <ivan.d.barrera@intel.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Zaremba, Larysa
> <larysa.zaremba@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Lobakin, Aleksander <aleksander.lobakin@intel.com>;
> linux-pci@vger.kernel.org; Chittim, Madhu <madhu.chittim@intel.com>;
> decot@google.com; willemb@google.com; sheenamo@google.com;
> lukas@wunner.de
> Subject: [PATCH iwl-next v2 2/2] idpf: implement pci error handlers
>
> Add callbacks to handle PCI errors and FLR reset. When preparing to
> handle reset on the bus, the driver must stop all operations that can
> lead to MMIO access in order to prevent HW errors. To accomplish this
> introduce helper
> idpf_reset_prepare() that gets called prior to FLR or when PCI error
> is detected. Upon resume the recovery is done through the existing
> reset path by starting the event task.
>
> The following callbacks are implemented:
> .reset_prepare runs the first portion of the generic reset path
> leading up to the part where we wait for the reset to complete.
> .reset_done/resume runs the recovery part of the reset handling.
> .error_detected is the callback dealing with PCI errors, similar to
> the prepare call, we stop all operations, prior to attempting a
> recovery.
> .slot_reset is the callback attempting to restore the device, provided
> a PCI reset was initiated by the AER driver.
>
> Whereas previously the init logic guaranteed netdevs during reset, the
> addition of idpf_detach_and_close() to the PCI callbacks flow makes it
> possible for the function to be called without netdevs. Add check to
> avoid NULL pointer dereference in that case.
>
> Co-developed-by: Alan Brady <alan.brady@intel.com>
> Signed-off-by: Alan Brady <alan.brady@intel.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Reviewed-by: Jay Bhat <jay.bhat@intel.com>
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf.h | 3 +
> drivers/net/ethernet/intel/idpf/idpf_lib.c | 13 ++-
> drivers/net/ethernet/intel/idpf/idpf_main.c | 112 ++++++++++++++++++++
> 3 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf.h
> b/drivers/net/ethernet/intel/idpf/idpf.h
> index 1d0e32e47e87..164d2f3e233a 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf.h
> @@ -88,6 +88,7 @@ enum idpf_state {
> * @IDPF_REMOVE_IN_PROG: Driver remove in progress
> * @IDPF_MB_INTR_MODE: Mailbox in interrupt mode
> * @IDPF_VC_CORE_INIT: virtchnl core has been init
> + * @IDPF_PCI_CB_RESET: Reset via the PCI callbacks
> * @IDPF_FLAGS_NBITS: Must be last
> */
> enum idpf_flags {
> @@ -97,6 +98,7 @@ enum idpf_flags {
> IDPF_REMOVE_IN_PROG,
> IDPF_MB_INTR_MODE,
> IDPF_VC_CORE_INIT,
...
> +/**
> + * idpf_pci_err_resume - Resume operations after PCI error recovery
> + * @pdev: PCI device struct
> + */
> +static void idpf_pci_err_resume(struct pci_dev *pdev) {
> + struct idpf_adapter *adapter = pci_get_drvdata(pdev);
> +
> + /* Force a PFR when resuming from PCI error. */
> + if (test_and_set_bit(IDPF_PCI_CB_RESET, adapter->flags))
> + adapter->dev_ops.reg_ops.trigger_reset(adapter,
> IDPF_HR_FUNC_RESET);
You say "Force a PFR", but PFR is only triggered on the AER path, not on the FLR path.
Everything else looks fine
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> +
> + queue_delayed_work(adapter->vc_event_wq,
> + &adapter->vc_event_task,
> + msecs_to_jiffies(300));
> +}
...
> };
> module_pci_driver(idpf_driver);
> --
> 2.37.3
^ permalink raw reply
* Re: [PATCH 1/1] net: strparser: fix skb_head leak in strp_abort_strp()
From: patchwork-bot+netdevbpf @ 2026-04-14 11:10 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, davem, edumazet, kuba, pabeni, horms, nate.karstens, sd,
linux, Julia.Lawall, tom, yifanwucs, tomapufckgml, yuantan098,
bird, rakukuip
In-Reply-To: <ade3857a9404999ce9a1c27ec523efc896072678.1775482694.git.rakukuip@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Sat, 11 Apr 2026 23:10:10 +0800 you wrote:
> From: Luxiao Xu <rakukuip@gmail.com>
>
> When the stream parser is aborted, for example after a message assembly timeout,
> it can still hold a reference to a partially assembled message in
> strp->skb_head.
>
> That skb is not released in strp_abort_strp(), which leaks the partially
> assembled message and can be triggered repeatedly to exhaust memory.
>
> [...]
Here is the summary with links:
- [1/1] net: strparser: fix skb_head leak in strp_abort_strp()
https://git.kernel.org/netdev/net/c/fe72340daaf1
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v2] netfilter: nfnetlink_osf: fix null-ptr-deref in nf_osf_ttl
From: Florian Westphal @ 2026-04-14 11:14 UTC (permalink / raw)
To: Kito Xu (veritas501)
Cc: pablo, coreteam, davem, edumazet, ffmancera, horms, kuba,
linux-kernel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <20260414104900.2617863-1-hxzene@gmail.com>
Kito Xu (veritas501) <hxzene@gmail.com> wrote:
> nf_osf_ttl() calls __in_dev_get_rcu(skb->dev) and passes the result
> to in_dev_for_each_ifa_rcu() without checking for NULL. When the
> receiving device has no IPv4 configuration (ip_ptr is NULL),
> __in_dev_get_rcu() returns NULL and in_dev_for_each_ifa_rcu()
> dereferences it unconditionally, causing a kernel crash.
>
> This can happen when a packet arrives on a device that has had its
> IPv4 configuration removed (e.g., MTU set below IPV4_MIN_MTU causing
> inetdev_destroy) or on a device that was never assigned an IPv4
> address, while an xt_osf or nft_osf rule with TTL_LESS mode is
> active and the packet TTL exceeds the fingerprint TTL.
>
> Add a NULL check for in_dev before using it. When in_dev is NULL,
> return 0 (no match) since source-address locality cannot be
> determined without IPv4 addresses on the device.
>
> KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
> RIP: 0010:nf_osf_match_one+0x204/0xa70
> Call Trace:
> <IRQ>
> nf_osf_match+0x2f8/0x780
> xt_osf_match_packet+0x11c/0x1f0
> ipt_do_table+0x7fe/0x12b0
> nf_hook_slow+0xac/0x1e0
> ip_rcv+0x123/0x370
> __netif_receive_skb_one_core+0x166/0x1b0
> process_backlog+0x197/0x590
> __napi_poll+0xa1/0x540
> net_rx_action+0x401/0xd80
> handle_softirqs+0x19f/0x610
> </IRQ>
>
> Fixes: a218dc82f0b5 ("netfilter: nft_osf: Add ttl option support")
> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
The other __in_dev_get_rcu() callers in netfilter check return value, so:
Reviewed-by: Florian Westphal <fw@strlen.de>
^ permalink raw reply
* [PATCH v2 nf] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
From: Eric Woudstra @ 2026-04-14 11:21 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Eric Woudstra
Cc: netfilter-devel, netdev
Calling skb_reset_mac_header() before calling skb_vlan_push() does
remove the error:
"skb_vlan_push got skb with skb->data not at mac header (offset 18)"
But the inner vlan tag is still not inserted correctly.
skb_vlan_push() uses __vlan_insert_inner_tag() to insert the tag
at offset ETH_HLEN. But the inner tag should only be pushed, without
offset, similar to nf_flow_pppoe_push().
Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
Fixes: a3aca98aec9a ("netfilter: nf_flow_table_ip: reset mac header before vlan push")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nf_flow_table_ip.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index fd56d663cb5b..0086f8a1a0d6 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -544,6 +544,26 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
return 1;
}
+static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id)
+{
+ if (skb_vlan_tag_present(skb)) {
+ struct vlan_hdr *vhdr;
+
+ if (skb_cow_head(skb, VLAN_HLEN))
+ return -1;
+
+ __skb_push(skb, VLAN_HLEN);
+ skb_reset_network_header(skb);
+ vhdr = (struct vlan_hdr *)(skb->data);
+ vhdr->h_vlan_TCI = htons(id);
+ vhdr->h_vlan_encapsulated_proto = skb->protocol;
+ skb->protocol = proto;
+ } else {
+ __vlan_hwaccel_put_tag(skb, proto, id);
+ }
+ return 0;
+}
+
static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
{
int data_len = skb->len + sizeof(__be16);
@@ -738,9 +758,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
switch (tuple->encap[i].proto) {
case htons(ETH_P_8021Q):
case htons(ETH_P_8021AD):
- skb_reset_mac_header(skb);
- if (skb_vlan_push(skb, tuple->encap[i].proto,
- tuple->encap[i].id) < 0)
+ if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
+ tuple->encap[i].id) < 0)
return -1;
break;
case htons(ETH_P_PPP_SES):
--
2.53.0
^ permalink raw reply related
* [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
This small patchset is about avoid infinite recursion in bpf_skops_hdr_opt_len()
via TCP_NODELAY setsockopt.
---
KaFai Wan (2):
bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
selftests/bpf: Cover TCP_NODELAY in hdr opt callback
net/ipv4/tcp.c | 5 ++-
.../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++
3 files changed, 56 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply
* [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
In-Reply-To: <20260414112310.1285783-1-kafai.wan@linux.dev>
A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
That reaches __tcp_sock_set_nodelay(), which may call
tcp_push_pending_frames(). The transmit path then computes TCP
options again, re-enters bpf_skops_hdr_opt_len(), and invokes the
same BPF callback recursively. This can loop until the kernel
stack overflows.
TCP_NODELAY is not safe from the header option callback context.
Reject it with -EOPNOTSUPP when TCP header option callbacks are
enabled on the socket, so the callback cannot recurse back into
tcp_push_pending_frames() through do_tcp_setsockopt().
Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
net/ipv4/tcp.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 202a4e57a218..7ac4c98be19d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
switch (optname) {
case TCP_NODELAY:
- __tcp_sock_set_nodelay(sk, val);
+ if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))
+ err = -EOPNOTSUPP;
+ else
+ __tcp_sock_set_nodelay(sk, val);
break;
case TCP_THIN_LINEAR_TIMEOUTS:
--
2.43.0
^ permalink raw reply related
* [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
In-Reply-To: <20260414112310.1285783-1-kafai.wan@linux.dev>
Add a sockops test program that enables
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG on connection setup and calls
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
Exercise the connection by sending data after the socket is
established. Before the fix, this setup can recurse through
tcp_push_pending_frames() and bpf_skops_hdr_opt_len() until the
kernel hits a stack guard page. After the fix, the connection
continues to make forward progress and the data exchange completes.
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
.../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++
2 files changed, 52 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..f361f9c7bf59 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -513,6 +513,39 @@ static void misc(void)
bpf_link__destroy(link);
}
+static void hdr_sockopt(void)
+{
+ const char send_msg[] = "MISC!!!";
+ char recv_msg[sizeof(send_msg)];
+ const unsigned int nr_data = 2;
+ struct bpf_link *link;
+ struct sk_fds sk_fds;
+ int i, ret;
+
+ link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
+ if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
+ return;
+
+ if (sk_fds_connect(&sk_fds, false)) {
+ bpf_link__destroy(link);
+ return;
+ }
+
+ for (i = 0; i < nr_data; i++) {
+ ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0);
+ if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)"))
+ goto check_linum;
+
+ ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg));
+ if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)"))
+ goto check_linum;
+ }
+
+check_linum:
+ sk_fds_close(&sk_fds);
+ bpf_link__destroy(link);
+}
+
struct test {
const char *desc;
void (*run)(void);
@@ -526,6 +559,7 @@ static struct test tests[] = {
DEF_TEST(fastopen_estab),
DEF_TEST(fin),
DEF_TEST(misc),
+ DEF_TEST(hdr_sockopt),
};
void test_tcp_hdr_options(void)
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e1dc7246193e 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -326,4 +326,22 @@ int misc_estab(struct bpf_sock_ops *skops)
return CG_OK;
}
+SEC("sockops")
+int misc_hdr_sockopt(struct bpf_sock_ops *skops)
+{
+ int true_val = 1;
+
+ switch (skops->op) {
+ case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+ case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+ set_hdr_cb_flags(skops, 0);
+ break;
+ case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+ bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ break;
+ }
+
+ return 0;
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related
* [PATCH AUTOSEL 6.19-6.12] net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules
From: Sasha Levin @ 2026-04-14 11:24 UTC (permalink / raw)
To: patches, stable
Cc: John Pavlick, Russell King (Oracle), Marcin Nita, Jakub Kicinski,
Sasha Levin, linux, andrew, hkallweit1, davem, edumazet, pabeni,
netdev, linux-kernel
In-Reply-To: <20260414112509.410217-1-sashal@kernel.org>
From: John Pavlick <jspavlick@posteo.net>
[ Upstream commit 95aca8602ef70ffd3d971675751c81826e124f90 ]
Several GPON ONT SFP sticks based on Realtek RTL960x report
1000BASE-LX at 1300MBd in their EEPROM but can operate at 2500base-X.
On hosts capable of 2500base-X (e.g. Banana Pi R3 / MT7986), the
kernel negotiates only 1G because it trusts the incorrect EEPROM data.
Add quirks for:
- Hisense-Leox LXT-010S-H
- Hisense ZNID-GPON-2311NA
- HSGQ HSGQ-XPON-Stick
Each quirk advertises 2500base-X and ignores TX_FAULT during the
module's ~40s Linux boot time.
Tested on Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed
2.5Gbps link and full throughput with flow offloading.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Suggested-by: Marcin Nita <marcin.nita@leolabs.pl>
Signed-off-by: John Pavlick <jspavlick@posteo.net>
Link: https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed. Let me compile my full analysis.
---
## Complete Analysis
### PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1: Subject Line**
- Subsystem: `net: sfp:`
- Action verb: "add" (quirks)
- Summary: Adding hardware quirks for Hisense and HSGQ GPON ONT SFP
modules
- Record: [net: sfp] [add] [hardware quirks for GPON ONT SFP modules
with incorrect EEPROM data]
**Step 1.2: Tags**
- Reviewed-by: Russell King (Oracle) — the SFP subsystem maintainer
- Suggested-by: Marcin Nita — suggested investigating sfp.c quirks as a
solution
- Signed-off-by: John Pavlick (author)
- Link:
https://patch.msgid.link/20260406132321.72563-1-jspavlick@posteo.net
- Signed-off-by: Jakub Kicinski (netdev maintainer, applied the patch)
- No Cc: stable (expected — that's why we're reviewing)
- No Fixes: tag (expected — this is a quirk addition, not a code fix)
- Record: Notable: Russell King, the SFP subsystem maintainer/author,
reviewed this. Strong quality signal.
**Step 1.3: Commit Body**
- Bug: GPON ONT SFP sticks report 1000BASE-LX / 1300MBd in EEPROM but
actually support 2500base-X
- Symptom: Kernel negotiates only 1G because it trusts incorrect EEPROM
data
- Affected hardware: Hisense-Leox LXT-010S-H, Hisense ZNID-GPON-2311NA,
HSGQ HSGQ-XPON-Stick
- All based on Realtek RTL960x chipset
- Tested: Banana Pi R3 (MT7986) with OpenWrt 25.12.1, confirmed 2.5Gbps
link
- TX_FAULT quirk needed during module's ~40s Linux boot time
- Record: Real-world hardware problem limiting link speed. Users get 1G
instead of 2.5G.
**Step 1.4: Hidden Bug Fix Detection**
- This is not a "hidden" bug fix — it is an explicit hardware quirk
addition to work around incorrect EEPROM data. This falls squarely
into the "QUIRKS and WORKAROUNDS" exception category for stable.
### PHASE 2: DIFF ANALYSIS
**Step 2.1: Inventory**
- Files changed: 1 (`drivers/net/phy/sfp.c`)
- Lines added: 16 (including comments)
- Lines removed: 0
- Functions modified: None — only the `sfp_quirks[]` static const array
is extended
- Scope: Single-file, table-only addition
- Record: Extremely contained — 3 new entries in a quirk table, with
explanatory comments.
**Step 2.2: Code Flow Change**
- Before: These three SFP modules (Hisense-Leox LXT-010S-H, Hisense
ZNID-GPON-2311NA, HSGQ HSGQ-XPON-Stick) have no quirk entries, so the
kernel reads their EEPROM data literally and negotiates 1G
- After: These modules are matched by vendor/part strings and:
1. `sfp_quirk_2500basex` enables 2500base-X mode advertisement
2. `sfp_fixup_ignore_tx_fault` ignores the TX_FAULT signal during boot
**Step 2.3: Bug Mechanism**
- Category: Hardware workaround (h)
- The modules have incorrect EEPROM data (report 1000BASE-LX but support
2500base-X)
- The quirks use the exact same pattern as many existing entries (e.g.,
HUAWEI MA5671A, FS GPON-ONU-34-20BI)
- Record: Hardware quirk — identical pattern to existing accepted
entries.
**Step 2.4: Fix Quality**
- Obviously correct: Uses exact same macro and functions as ~10 other
existing entries
- Minimal/surgical: Only adds data to a static table; no logic changes
- Regression risk: Zero for users without these modules (quirks matched
by vendor/part string)
- For users WITH these modules: enables 2.5G link (improvement) and
ignores TX_FAULT during boot
- Record: Highest possible confidence — data-only addition using
established infrastructure.
### PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1: Blame**
- The `sfp_quirks[]` table was introduced by Russell King in commit
23571c7b9643 (2022-09-13)
- The `sfp_quirk_2500basex` function and `sfp_fixup_ignore_tx_fault`
function have existed since at least v6.1
- Record: Infrastructure is mature and present in all active stable
trees.
**Step 3.2: Fixes Tag**
- No Fixes: tag (expected for quirk additions). N/A.
**Step 3.3: File History**
- SFP quirk additions are extremely frequent — ~17 quirk-related commits
since v6.6
- This is a well-established pattern in the kernel community
- Record: Standalone commit, no dependencies on other patches.
**Step 3.4: Author**
- John Pavlick is a community contributor (not subsystem maintainer)
- But the patch was reviewed by Russell King (SFP subsystem
author/maintainer) and applied by Jakub Kicinski (netdev maintainer)
- Record: Properly reviewed by the right maintainers.
**Step 3.5: Dependencies**
- The patch uses `SFP_QUIRK()` macro, `sfp_quirk_2500basex`, and
`sfp_fixup_ignore_tx_fault`
- All three exist in v6.1, v6.6, and v6.12 stable trees (verified)
- Record: No dependencies. Completely self-contained.
### PHASE 4: MAILING LIST RESEARCH
**Step 4.1: Original Patch Discussion**
- Found via lore: v3 of the patch, submitted 2026-04-06
- v1→v2: Added Suggested-by tag
- v2→v3: Fixed inaccurate commit message about MT7986 SerDes
capabilities
- Applied by Jakub Kicinski to netdev/net.git (main) as commit
95aca8602ef7
- Record: Clean submission history, no objections.
**Step 4.2: Reviewers**
- Russell King (Oracle) — SFP subsystem maintainer — Reviewed-by
- Applied by Jakub Kicinski — netdev maintainer
- Record: Reviewed by the right people.
**Step 4.3-4.5: Bug Reports / Related / Stable Discussion**
- No formal bug report — this is a hardware enablement quirk
- The underlying problem is that these GPON SFP sticks' EEPROM
incorrectly reports capabilities
- No stable-specific discussion found; no prior nomination
### PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1-5.4: Functions**
- No functions are modified. Only static data (the `sfp_quirks[]` array)
is extended.
- The quirk matching happens in `sfp_lookup_quirk()` which iterates the
table and matches vendor/part strings
- The matched `sfp_quirk_2500basex` and `sfp_fixup_ignore_tx_fault`
functions are called during SFP module insertion
- Record: No code flow changes — purely data-driven matching.
**Step 5.5: Similar Patterns**
- Exact same pattern used by:
- HUAWEI MA5671A (sfp_quirk_2500basex + sfp_fixup_ignore_tx_fault)
- FS GPON-ONU-34-20BI (sfp_quirk_2500basex +
sfp_fixup_ignore_tx_fault)
- ALCATELLUCENT G010SP (sfp_quirk_2500basex +
sfp_fixup_ignore_tx_fault)
- Record: Identical pattern to multiple existing accepted entries.
### PHASE 6: STABLE TREE ANALYSIS
**Step 6.1: Code Existence in Stable**
- `sfp_quirk_2500basex` exists in v6.1, v6.6, v6.12 (verified)
- `sfp_fixup_ignore_tx_fault` exists in v6.1, v6.6, v6.12 (verified)
- `SFP_QUIRK()` 4-argument macro exists in all stable trees (verified)
- Record: All needed infrastructure exists in all active stable trees.
**Step 6.2: Backport Complications**
- Minor context difference: In mainline, HUAWEI MA5671A uses
`sfp_fixup_ignore_tx_fault_and_los` (changed by commit 9f9c31bacaae),
while in v6.6 and v6.12 it still uses `sfp_fixup_ignore_tx_fault`.
This affects the context lines around the insertion point.
- The Lantech entries also differ (SFP_QUIRK_S vs SFP_QUIRK_M,
additional 8330-265D entry)
- This means the patch will need minor context adjustment (fuzz or
manual resolution) for older trees
- Record: Expected minor context conflicts, trivially resolvable.
**Step 6.3: Related Fixes Already in Stable**
- No — these specific modules (Hisense-Leox, Hisense ZNID, HSGQ) have no
existing quirks in any tree.
### PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
**Step 7.1: Subsystem Criticality**
- Subsystem: net/phy (SFP transceiver support)
- Criticality: IMPORTANT — SFP modules are used in many networking
setups, particularly in GPON/fiber deployments and embedded/router
platforms (OpenWrt, etc.)
- Record: [net/phy/sfp] [IMPORTANT]
**Step 7.2: Subsystem Activity**
- Very active — 31 changes since v6.6, including many quirk additions
- SFP quirk additions to stable are a well-established practice
- Record: Actively maintained, frequent quirk additions.
### PHASE 8: IMPACT AND RISK ASSESSMENT
**Step 8.1: Affected Users**
- Users of Hisense-Leox LXT-010S-H, Hisense ZNID-GPON-2311NA, and HSGQ
HSGQ-XPON-Stick SFP modules
- These are GPON ONT SFP sticks commonly used in fiber-to-the-home
setups and by OpenWrt users
- Record: Driver-specific, but affects a real user population in the
fiber networking community.
**Step 8.2: Trigger Conditions**
- Every time these SFP modules are inserted into a host capable of
2500base-X
- 100% reproducible — the EEPROM always reports wrong data
- Record: Deterministic, always triggers, no race or timing dependency.
**Step 8.3: Failure Mode**
- Without quirk: Link operates at 1G instead of 2.5G — loss of 60%
bandwidth
- This is a functional issue, not a crash or security issue
- Severity: MEDIUM (hardware not working at full capability)
- Record: Performance/capability limitation, not crash or corruption.
**Step 8.4: Risk-Benefit**
- BENEFIT: HIGH — enables proper 2.5G operation for these GPON SFP
modules
- RISK: VERY LOW — data-only addition, zero impact on users without
these specific modules, uses well-tested infrastructure
- Record: Very favorable risk-benefit ratio.
### PHASE 9: FINAL SYNTHESIS
**Step 9.1: Evidence Summary**
FOR backporting:
- Falls squarely into the "SFP/NETWORK QUIRKS" exception category
(explicitly called out as YES for stable)
- 16 lines added, zero lines removed — purely additive data
- Uses exact same pattern as many existing entries already in stable
trees
- All required infrastructure (macros, functions) exists in all active
stable trees
- Reviewed by Russell King (SFP subsystem maintainer)
- Applied by Jakub Kicinski (netdev maintainer)
- Tested on real hardware with confirmed results
- Zero regression risk for unaffected users
- Fixes real hardware limitation for GPON fiber users
AGAINST backporting:
- Not a crash/security/corruption fix — it's hardware enablement
- Minor context conflicts needed for older stable trees
- No explicit Cc: stable nomination (but this is expected per our
guidelines)
**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? YES — data-only addition, reviewed by
subsystem maintainer, tested on hardware
2. Fixes a real bug? YES — SFP modules operate at wrong speed due to
incorrect EEPROM
3. Important issue? MEDIUM — hardware not operating at full capability
4. Small and contained? YES — 16 lines in one file, data-only
5. No new features or APIs? CORRECT — no new features, just device quirk
entries
6. Can apply to stable? YES — with minor context adjustment
**Step 9.3: Exception Categories**
This is a **hardware quirk/workaround** — one of the explicitly listed
exception categories that is "YES for stable" even though it technically
"adds code." SFP quirks are specifically called out in the analysis
guidelines.
**Step 9.4: Decision**
This is a textbook SFP quirk addition. The guidelines explicitly state:
"SFP_QUIRK_* for optical modules with broken behavior... These are all
YES for stable." The patch is minimal, uses existing infrastructure
available in all stable trees, is reviewed by the subsystem maintainer,
was tested on real hardware, and carries essentially zero regression
risk.
### Verification
- [Phase 1] Parsed tags: Reviewed-by Russell King (SFP maintainer),
Signed-off-by Jakub Kicinski (netdev maintainer), Suggested-by Marcin
Nita, Link to lore
- [Phase 2] Diff analysis: 16 lines added (3 SFP_QUIRK entries +
comments) to static const sfp_quirks[] array, zero lines removed,
single file
- [Phase 3] git blame: sfp_quirks[] table introduced by Russell King in
23571c7b9643 (2022-09-13), infrastructure present since v6.1
- [Phase 3] Verified sfp_quirk_2500basex exists in v6.1 (line 355), v6.6
(line 399), v6.12 (line 424)
- [Phase 3] Verified sfp_fixup_ignore_tx_fault exists in v6.1 (line
325), v6.6 (line 348), v6.12 (line 358)
- [Phase 3] Verified SFP_QUIRK() 4-argument macro exists in v6.1, v6.6,
v6.12
- [Phase 3] No dependencies — standalone commit confirmed
- [Phase 4] Lore thread found: v3 submission, applied to netdev/net, no
NAKs or concerns
- [Phase 4] b4 dig found original submission; Russell King CC'd and
provided Reviewed-by
- [Phase 4] Patch went through v1→v2→v3, applied version is v3 (latest)
- [Phase 5] No function modifications — only static data table extended
- [Phase 6] Minor context conflict expected: HUAWEI MA5671A entry uses
sfp_fixup_ignore_tx_fault in v6.6/v6.12 but
sfp_fixup_ignore_tx_fault_and_los in mainline; Lantech entries differ
(SFP_QUIRK_M vs SFP_QUIRK_S)
- [Phase 6] Confirmed all needed infrastructure exists in all active
stable trees
- [Phase 7] SFP subsystem is actively maintained with frequent quirk
additions
- [Phase 8] Zero regression risk for unaffected users; deterministic
2.5G enablement for affected hardware
**YES**
drivers/net/phy/sfp.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 7a85b758fb1e6..c62e3f364ea73 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -543,6 +543,22 @@ static const struct sfp_quirk sfp_quirks[] = {
SFP_QUIRK("HUAWEI", "MA5671A", sfp_quirk_2500basex,
sfp_fixup_ignore_tx_fault_and_los),
+ // Hisense LXT-010S-H is a GPON ONT SFP (sold as LEOX LXT-010S-H) that
+ // can operate at 2500base-X, but reports 1000BASE-LX / 1300MBd in its
+ // EEPROM
+ SFP_QUIRK("Hisense-Leox", "LXT-010S-H", sfp_quirk_2500basex,
+ sfp_fixup_ignore_tx_fault),
+
+ // Hisense ZNID-GPON-2311NA can operate at 2500base-X, but reports
+ // 1000BASE-LX / 1300MBd in its EEPROM
+ SFP_QUIRK("Hisense", "ZNID-GPON-2311NA", sfp_quirk_2500basex,
+ sfp_fixup_ignore_tx_fault),
+
+ // HSGQ HSGQ-XPON-Stick can operate at 2500base-X, but reports
+ // 1000BASE-LX / 1300MBd in its EEPROM
+ SFP_QUIRK("HSGQ", "HSGQ-XPON-Stick", sfp_quirk_2500basex,
+ sfp_fixup_ignore_tx_fault),
+
// Lantech 8330-262D-E and 8330-265D can operate at 2500base-X, but
// incorrectly report 2500MBd NRZ in their EEPROM.
// Some 8330-265D modules have inverted LOS, while all of them report
--
2.53.0
^ permalink raw reply related
* Re: Re: [PATCH,net-next] tcp: Add TCP ROCCET congestion control module.
From: Lukas Prause @ 2026-04-14 11:23 UTC (permalink / raw)
To: Neal Cardwell, Tim Fuechsel
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Kuniyuki Iwashima, linux-kernel,
netdev
In-Reply-To: <CADVnQymmsispHew4-frsuBBfObZHdSbH+jfP-9aSW1HguK_N4A@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6597 bytes --]
Thanks for the very detailed review of our code.
We will incorporate your comments regarding documentation and variable
usage into a new version of our code.
> Please reference figures in the paper and mention specific concrete
> numerical examples of latency reductions to quantify these statements.
Figures 5 and 6 show the performance of ROCCET in stationary and mobile
scenarios (https://arxiv.org/pdf/2510.25281). In the analyzed scenario,
we have observed a lower sRTT with ROCCET than with BBRv3 and CUBIC. The
observed throughput was marginally lower than that of BBRv3, but still
on a similar level. A detailed quantitative evaluation can be found in
the paper in sections VI and VII.
> Can you please elaborate on this statement here? AFAICT from figures 7
> and 8 in https://arxiv.org/pdf/2510.25281 it seems ROCCET is
> essentially starved by CUBIC when sharing a bottleneck with CUBIC when
> the bottleneck has 2*BDP or more of buffering. AFAICT it sounds like
> ROCCET does have "fairness issues when sharing a link with TCP CUBIC"?
Our main use case is a connection where the bottleneck link is in the
cellular network, where the bottleneck queue is typically not shared
between flows. "Fairness" between flows is being implemented by the base
station's scheduler. In this scenario, ROCCET achieves its objective to
not "bloat" its own queue.
We have performed additional fairness experiments in non-cellular
networks (figures 7 and 8). Here we show that even when used in other
types of networks, ROCCET does not cause harm (see
https://dl.acm.org/doi/10.1145/3365609.3365855) to other congestion control.
> Please specify what side effect or side effects ROCCET is claiming to
> solve (presumably bufferbloat?).
The side effect we observe in cellular networks is that, in particular,
for loss-based congestion control, the cwnd often gets 'frozen' at a
size that is too large for the BDP of the current link. This effect is
caused by the TCP cwnd validation, which at some point stops increasing
the cwnd because it assumes that the sender is application-limited.
However, this often leads to a cwnd size that is too large for the link,
but too small to cause a congestion event by overfilling the buffer. The
result is a standing queue that causes permanently high RTTs. Figure 2
in the paper (https://arxiv.org/pdf/2510.25281) shows the described
behaviour for a single TCP CUBIC flow.
> Expressed in isolation like this, that sounds potentially dangerous.
> Please mention what signal(s) ROCCET uses to exit slow start if it's
> not using loss.
>
> In addition, from reading the code AFAICT the connection does use loss
> to exit slow start (see my remarks below in this message). So AFAICT
> this summary seems inaccurate, or at least misleading?
You are right, the summary is misleading. In the code we submitted,
there are three conditions for exiting slow start:
The first one is packet loss (as you already mentioned, without a cwnd
reduction) Second is if the srRTT calculated by ROCCET exceeds an upper
bound and ACK rate, sampled in 100ms time intervals, differs by 10
segments. The third one is when the growth of the cwnd is stopped by the
TCP cwnd validation (which considers the connection as
application-limited).
> If no lower RTT is found for 10 seconds, the algorithm interpolates
> the `min_rtt` upwards towards the current RTT.
>
> + If the path is persistently congested (e.g., a large buffer is
> constantly full), the `min_rtt` baseline will drift up.
>
> + This makes the algorithm less sensitive to queueing delay over
> time, potentially defeating the purpose of reducing bufferbloat in the
> long run. Contrast this with BBR, which actively drains the queue
> (using the ProbeRTT mechanism) to try to find the true physical
> minimum RTT.
>
> Can you please add a comment explaining why the ROCCET algorithm takes
> this approach, and how the algorithm expects to avoid queues that
> ratchet ever higher?
We added this functionality for the edge case of long-lived fat flows,
which are experiencing routing changes, to detect a higher base RTT.
Since this functionality is disabled by default and can also cause
problems with min_RTT detection, we have decided to remove it.
The measurement results in our paper have been obtained with this
functionality disabled.
> Here, `cnt` is incremented by `1` on every call, regardless of the
> `acked` value (number of packets ACKed in this event).
You are right, we will change this.
> + With the default `ack_rate_diff_ca` of `200`, this condition will
> become true for $sum_cwnd * 100 / sum_acked >= 200$, i.e.
> $num_acks_per_round * 100 >= 200$. So AFAICT we expect this condition
> to be true if there are 2 or more ACKs in a round trip. This makes
> `bw_limit_detect` effectively a no-op or always-on trigger rather than
> a true detector of queue growth or bandwidth limits.
The purpose of this part of the code was to detect an increasing queue
by monitoring data sent and acknowledged in combination with an
increasing sRTT over 5 RTT time intervals. In the steady state of a TCP
connection, the sending rate of the TCP sender should be equal to the
receiver's ack rate, due to TCP self-clocking. The idea behind this code
was to check if the cwnd is still correlated to the sending rate. If
this is not the case and we also observe increasing RTTs, we assume the
TCP sender is filling a buffer. However, we have made a mistake when
calculating sum_cwnd:
We are accumulating the cwnd on each ack event, instead of each RTT,
which, as you mentioned, would make more sense. Because this leads to
the erroneous behaviour that you described, we will remove this part of
the code for now until we have evaluated the intended implementation.
> Did the experiments in the paper use the approach documented in the
> paper, or the approach documented in this code? They are very
> different, AFAICT.
The experiments were performed using the submitted code. This means that
the mentioned code snippet always evaluates to true, so that ROCCET only
reacts to changes in latency, which is different from what we described
in the paper.
> Having a module parameter to ignore loss in this way makes it too easy
> for users to cause excessive congestion. I would urge you to remove
> that module parameter. Researchers can add that sort of mechanism in
> their own code for research.
That is true, we will remove this part of the implementation.
Thanks,
Lukas
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4891 bytes --]
^ permalink raw reply
* Re: [PATCH v5] net: caif: fix stack out-of-bounds write in cfctrl_link_setup()
From: Simon Horman @ 2026-04-14 11:29 UTC (permalink / raw)
To: Paolo Abeni
Cc: Kangzheng Gu, davem, edumazet, kuba, kees, thorsten.blum, arnd,
sjur.brandeland, netdev, linux-kernel, stable
In-Reply-To: <255224dc-0a55-4a0c-95f3-b84d4c6b3897@redhat.com>
On Mon, Apr 13, 2026 at 11:30:53AM +0200, Paolo Abeni wrote:
> On 4/12/26 3:57 PM, Simon Horman wrote:
> > I am wondering if it would be best to follow the pattern for
> > writing linkparam.u.utility.name elsewhere in this function.
> > That:
> > 1. Uses a somewhat more succinct loop control structure
> > 2. Silently truncates input without updating cmdrsp if overrun would occur
> >
> > Something like this (compile tested only!):
> >
> > diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
> > index c6cc2bfed65d..ba184c11386e 100644
> > --- a/net/caif/cfctrl.c
> > +++ b/net/caif/cfctrl.c
> > @@ -15,6 +15,7 @@
> > #include <net/caif/cfctrl.h>
> >
> > #define container_obj(layr) container_of(layr, struct cfctrl, serv.layer)
> > +#define RFM_VOLUME_LEN 20
> > #define UTILITY_NAME_LENGTH 16
> > #define CFPKT_CTRL_PKT_LEN 20
> >
> > @@ -414,10 +415,11 @@ static int cfctrl_link_setup(struct cfctrl *cfctrl, struct cfpkt *pkt, u8 cmdrsp
> > */
> > linkparam.u.rfm.connid = cfpkt_extr_head_u32(pkt);
> > cp = (u8 *) linkparam.u.rfm.volume;
> > - for (tmp = cfpkt_extr_head_u8(pkt);
> > - cfpkt_more(pkt) && tmp != '\0';
> > - tmp = cfpkt_extr_head_u8(pkt))
> > + caif_assert(sizeof(linkparam.u.rfm.volume) >= RFM_VOLUME_LEN);
> > + for(i = 0; i < RFM_VOLUME_LEN - 1 && cfpkt_more(pkt); i++) {
> > + tmp = cfpkt_extr_head_u8(pkt);
> > *cp++ = tmp;
> > + }
> > *cp = '\0';
> >
> > if (CFCTRL_ERR_BIT & cmdrsp)
>
> I agree that the code suggested by Simon is clearer. Note that AFAICS it
> lacks an additional `tmp!= '\0'` check to break the loop, but even with
> that added it should be preferable.
Sorry, I left out the `tmp!= '\0' check.
That was unintentional and I agree it should be there.
^ permalink raw reply
* Re: [PATCH net 1/1] net: caif: clear client service pointer on teardown
From: patchwork-bot+netdevbpf @ 2026-04-14 11:30 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, davem, edumazet, kuba, pabeni, horms, sjur.brandeland,
yifanwucs, tomapufckgml, yuantan098, bird, enjou1224z, zcliangcn
In-Reply-To: <9f3d37847c0037568aae698ca23cd47c6691acb0.1775897577.git.zcliangcn@gmail.com>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Sat, 11 Apr 2026 23:10:26 +0800 you wrote:
> From: Zhengchuan Liang <zcliangcn@gmail.com>
>
> `caif_connect()` can tear down an existing client after remote shutdown by
> calling `caif_disconnect_client()` followed by `caif_free_client()`.
> `caif_free_client()` releases the service layer referenced by
> `adap_layer->dn`, but leaves that pointer stale.
>
> [...]
Here is the summary with links:
- [net,1/1] net: caif: clear client service pointer on teardown
https://git.kernel.org/netdev/net/c/f7cf8ece8cee
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix xdp crash in soft reset error path
From: Holda, Patryk @ 2026-04-14 11:36 UTC (permalink / raw)
To: Simon Horman, Tantilov, Emil S
Cc: daniel@iogearbox.net, ast@kernel.org, willemb@google.com,
stable@vger.kernel.org, decot@google.com, bpf@vger.kernel.org,
Nguyen, Anthony L, Kitszel, Przemyslaw,
intel-wired-lan@lists.osuosl.org, edumazet@google.com,
netdev@vger.kernel.org, pabeni@redhat.com, andrew+netdev@lunn.ch,
kuba@kernel.org, davem@davemloft.net, sdf@fomichev.me,
Loktionov, Aleksandr, Lobakin, Aleksander,
john.fastabend@gmail.com, hawk@kernel.org
In-Reply-To: <20260321091753.GT74886@horms.kernel.org>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Simon Horman
> Sent: Saturday, March 21, 2026 10:18 AM
> To: Tantilov, Emil S <emil.s.tantilov@intel.com>
> Cc: daniel@iogearbox.net; ast@kernel.org; willemb@google.com;
> stable@vger.kernel.org; decot@google.com; bpf@vger.kernel.org; Nguyen,
> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; intel-wired-lan@lists.osuosl.org;
> edumazet@google.com; netdev@vger.kernel.org; pabeni@redhat.com;
> andrew+netdev@lunn.ch; kuba@kernel.org; davem@davemloft.net;
> sdf@fomichev.me; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>;
> Lobakin, Aleksander <aleksander.lobakin@intel.com>;
> john.fastabend@gmail.com; hawk@kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v2] idpf: fix xdp crash in soft
> reset error path
>
> On Fri, Mar 20, 2026 at 02:35:42PM -0700, Tantilov, Emil S wrote:
> >
> >
> > On 3/20/2026 10:48 AM, Simon Horman wrote:
> > > This is an AI-generated review of your patch. The human sending this
> > > email has considered the AI review valid, or at least plausible.
> > >
> > > For local reproduction steps see
> > > https://netdev-ai.bots.linux.dev/ai-local.html
> > > ---
> > > idpf: fix xdp crash in soft reset error path
> > >
> > > This commit fixes a NULL pointer dereference that occurs when
> > > idpf_vport_open() fails during soft reset. The fix restores
> > > vport->xdp_prog in the error path and updates the restart check in
> > > idpf_xsk_pool_setup() to use IDPF_VPORT_UP instead of netif_running().
> > >
> > > > Fixes: 3d57b2c00f09 ("idpf: add XSk pool initialization")
> > >
> > > The Fixes: tag may not be targeting the correct commit. The primary
> > > bug being fixed is the NULL pointer dereference crash in
> > > idpf_xdp_setup_prog() that occurs when soft reset fails and vport-
> >xdp_prog is not restored.
> >
> > This is not exactly true, this is just one of the instances that would
> > cause idpf_qp_switch() to be called. [1]
> >
> > >
> > > Looking at the git history, this missing restoration was introduced
> > > in commit 705457e7211f ("idpf: implement XDP_SETUP_PROG in
> ndo_bpf
> > > for splitq"), where idpf_xdp_setup_prog() was first implemented.
> > > While commit
> > > 3d57b2c00f09 introduced the secondary issue with the netif_running()
> > > check, the main crash fix addresses code from commit 705457e7211f.
> > >
> > > Should the Fixes: tag point to commit 705457e7211f instead?
> >
> > The reason I chose commit 3d57b2c00f09 is because it is the commit
> > introducing the function where the crash is occurring:
> > [ 3179.284770] RIP: 0010:idpf_find_rxq_vec+0x17/0x30 [idpf] ...
> > [ 3179.291937] Call Trace:
> > [ 3179.292392] <TASK>
> > [ 3179.292843] idpf_qp_switch+0x25/0x820 [idpf]
> >
> > The setting of the restart variable is where the above commits "meet",
> > in that both conditions - netif_ruinning() and idpf_xdp_enabled() [1]
> > can be wrong:
> > https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/tr
> > ee/drivers/net/ethernet/intel/idpf/xsk.c#n571
> >
> > which would end up calling idpf_qp_switch() instead of taking the
> > alternate path:
> > restart = idpf_xdp_enabled(vport) && netif_running(vport->netdev);
> > if (!restart)
> > goto pool;
> >
> > Which was introduced by 3d57b2c00f09.
>
> Thanks for the clarification.
> I agree that using 3d57b2c00f09 makes sense.
>
> ...
Tested-by: Patryk Holda <patryk.holda@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH iwl-net 1/2] idpf: do not enable XDP if queue based scheduling is not supported
From: Holda, Patryk @ 2026-04-14 11:37 UTC (permalink / raw)
To: Hay, Joshua A, intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <20260406233236.3585504-2-joshua.a.hay@intel.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Joshua Hay
> Sent: Tuesday, April 7, 2026 1:33 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH iwl-net 1/2] idpf: do not enable XDP if
> queue based scheduling is not supported
>
> The current XDP implementation uses queue based scheduling for its TxQs.
> If the FW does not advertise support for queue based scheduling, do not
> enable XDP. Add the missing capability check at the start of the XDP
> configuration. This will temporarily break XDP while a flow based
> implementation is worked on, as well as while FWs with queue based by
> default are rolled out.
>
> Fixes: 705457e7211f ("idpf: implement XDP_SETUP_PROG in ndo_bpf for
> splitq")
> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> ---
> drivers/net/ethernet/intel/idpf/xdp.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/idpf/xdp.c
> b/drivers/net/ethernet/intel/idpf/xdp.c
> index 18a6e7062863..9c3bdb193684 100644
> --- a/drivers/net/ethernet/intel/idpf/xdp.c
> +++ b/drivers/net/ethernet/intel/idpf/xdp.c
> @@ -511,6 +511,13 @@ int idpf_xdp(struct net_device *dev, struct
> netdev_bpf *xdp)
> if (!idpf_is_queue_model_split(vport->dflt_qv_rsrc.txq_model))
> goto notsupp;
>
> + if (!idpf_is_cap_ena(vport->adapter, IDPF_OTHER_CAPS,
> + VIRTCHNL2_CAP_SPLITQ_QSCHED)) {
> + NL_SET_ERR_MSG_MOD(xdp->extack,
> + "Device does not support requested XDP Tx
> scheduling mode");
> + goto notsupp;
> + }
> +
> switch (xdp->command) {
> case XDP_SETUP_PROG:
> ret = idpf_xdp_setup_prog(vport, xdp);
> --
> 2.39.2
Tested-by: Patryk Holda <patryk.holda@intel.com>
^ permalink raw reply
* Re: [PATCH v2 nf] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
From: Pablo Neira Ayuso @ 2026-04-14 11:38 UTC (permalink / raw)
To: Eric Woudstra
Cc: Florian Westphal, Phil Sutter, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, netfilter-devel,
netdev
In-Reply-To: <20260414112120.248744-1-ericwouds@gmail.com>
On Tue, Apr 14, 2026 at 01:21:20PM +0200, Eric Woudstra wrote:
> Calling skb_reset_mac_header() before calling skb_vlan_push() does
> remove the error:
>
> "skb_vlan_push got skb with skb->data not at mac header (offset 18)"
>
> But the inner vlan tag is still not inserted correctly.
>
> skb_vlan_push() uses __vlan_insert_inner_tag() to insert the tag
> at offset ETH_HLEN. But the inner tag should only be pushed, without
> offset, similar to nf_flow_pppoe_push().
It is doubled-tagged-vlan that is broken, right? I observed this once
but I have been burdened into a few things.
> Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
> Fixes: a3aca98aec9a ("netfilter: nf_flow_table_ip: reset mac header before vlan push")
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>
> ---
>
> net/netfilter/nf_flow_table_ip.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index fd56d663cb5b..0086f8a1a0d6 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -544,6 +544,26 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
> return 1;
> }
>
> +static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id)
> +{
> + if (skb_vlan_tag_present(skb)) {
> + struct vlan_hdr *vhdr;
> +
> + if (skb_cow_head(skb, VLAN_HLEN))
> + return -1;
> +
> + __skb_push(skb, VLAN_HLEN);
> + skb_reset_network_header(skb);
> + vhdr = (struct vlan_hdr *)(skb->data);
> + vhdr->h_vlan_TCI = htons(id);
> + vhdr->h_vlan_encapsulated_proto = skb->protocol;
> + skb->protocol = proto;
> + } else {
> + __vlan_hwaccel_put_tag(skb, proto, id);
> + }
> + return 0;
> +}
> +
> static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
> {
> int data_len = skb->len + sizeof(__be16);
> @@ -738,9 +758,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
> switch (tuple->encap[i].proto) {
> case htons(ETH_P_8021Q):
> case htons(ETH_P_8021AD):
> - skb_reset_mac_header(skb);
> - if (skb_vlan_push(skb, tuple->encap[i].proto,
> - tuple->encap[i].id) < 0)
> + if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
> + tuple->encap[i].id) < 0)
> return -1;
> break;
> case htons(ETH_P_PPP_SES):
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH net] net: airoha: Fix VIP configuration for AN7583 SoC
From: patchwork-bot+netdevbpf @ 2026-04-14 11:40 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms,
linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260412-airoha-7583-vip-fix-v1-1-c35e02b054bb@kernel.org>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Sun, 12 Apr 2026 09:57:29 +0200 you wrote:
> EN7581 and AN7583 SoCs have different VIP definitions. Introduce
> get_vip_port callback in airoha_eth_soc_data struct in order to take
> into account EN7581 and AN7583 VIP register layout and definition
> differences.
> Introduce nbq parameter in airoha_gdm_port struct. At the moment nbq
> is set statically to value previously used in airhoha_set_gdm2_loopback
> routine and it will be read from device tree in subsequent patches.
>
> [...]
Here is the summary with links:
- [net] net: airoha: Fix VIP configuration for AN7583 SoC
https://git.kernel.org/netdev/net/c/1acdfbdb516b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH net v3 2/5] i40e: skip unnecessary VF reset when setting trust
From: Loktionov, Aleksandr @ 2026-04-14 11:41 UTC (permalink / raw)
To: Jose Ignacio Tornos Martinez, netdev@vger.kernel.org
Cc: intel-wired-lan@lists.osuosl.org, jesse.brandeburg@intel.com,
Nguyen, Anthony L, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <20260414110006.124286-3-jtornosm@redhat.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Jose Ignacio Tornos Martinez
> Sent: Tuesday, April 14, 2026 1:00 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org; jesse.brandeburg@intel.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Jose Ignacio
> Tornos Martinez <jtornosm@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net v3 2/5] i40e: skip unnecessary
> VF reset when setting trust
>
> When VF trust is changed, i40e_ndo_set_vf_trust() always calls
> i40e_vc_reset_vf() to sync MAC/VLAN filters. However, this reset is
> only necessary when trust is removed from a VF that has ADQ (advanced
> queue) filters, which need to be deleted
>
> In all other cases, the reset causes a ~10 second delay during which:
> - VF must reinitialize completely
> - Any in-progress operations (like bonding enslave) fail with timeouts
> - VF is unavailable
>
> The MAC/VLAN filter sync will happen naturally through the normal VF
> operations and doesn't require a forced reset.
>
> Fix by only resetting when actually needed: when removing trust from a
> VF that has ADQ cloud filters. For all other trust changes, just
> update the trust flag and let normal operation continue.
>
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec15..fea267af7afe 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4987,16 +4987,21 @@ int i40e_ndo_set_vf_trust(struct net_device
> *netdev, int vf_id, bool setting)
> set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
> pf->vsi[vf->lan_vsi_idx]->flags |=
> I40E_VSI_FLAG_FILTER_CHANGED;
>
> - i40e_vc_reset_vf(vf, true);
> dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
> vf_id, setting ? "" : "un");
>
> + /* Only reset VF if we're removing trust and it has ADQ cloud
> filters.
> + * Cloud filters can only be added when trusted, so they must
> be
> + * removed when trust is revoked. Other trust changes don't
> require
> + * reset - MAC/VLAN filter sync happens through normal
> operation.
> + */
> if (vf->adq_enabled) {
> if (!vf->trusted) {
> dev_info(&pf->pdev->dev,
> "VF %u no longer Trusted, deleting all
> cloud filters\n",
> vf_id);
> i40e_del_all_cloud_filters(vf);
> + i40e_vc_reset_vf(vf, true);
> }
> }
>
> --
> 2.53.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH net v2 4/4] ice: skip unnecessary VF reset when setting trust
From: Loktionov, Aleksandr @ 2026-04-14 11:41 UTC (permalink / raw)
To: Jose Ignacio Tornos Martinez, netdev@vger.kernel.org
Cc: intel-wired-lan@lists.osuosl.org, jesse.brandeburg@intel.com,
Nguyen, Anthony L, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <20260407165206.1121317-5-jtornosm@redhat.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Jose Ignacio Tornos Martinez
> Sent: Tuesday, April 7, 2026 6:52 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org; jesse.brandeburg@intel.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Jose Ignacio
> Tornos Martinez <jtornosm@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net v2 4/4] ice: skip unnecessary VF
> reset when setting trust
>
> Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
> ice_reset_vf() when the trust setting changes.
>
> The ice driver already has logic to clean up MAC LLDP filters when
> removing trust, which is the only operation that requires filter
> synchronization. After this cleanup, the VF reset is only necessary if
> there were actually filters to remove.
>
> For all other trust state changes (setting trust, or removing trust
> when no filters exist), the reset is unnecessary as filter
> synchronization happens naturally through normal VF operations.
>
> Fix by only triggering the VF reset when removing trust AND filters
> were actually cleaned up (num_mac_lldp was non-zero).
>
> This saves some time and eliminates unnecessary service disruption
> when changing VF trust settings if not necessary.
>
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> drivers/net/ethernet/intel/ice/ice_sriov.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c
> b/drivers/net/ethernet/intel/ice/ice_sriov.c
> index 7e00e091756d..23f692b1e86c 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sriov.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
> @@ -1399,14 +1399,19 @@ int ice_set_vf_trust(struct net_device
> *netdev, int vf_id, bool trusted)
>
> mutex_lock(&vf->cfg_lock);
>
> - while (!trusted && vf->num_mac_lldp)
> - ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf),
> false);
> -
> vf->trusted = trusted;
> - ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
> dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
> vf_id, trusted ? "" : "un");
>
> + /* Only reset VF if removing trust and there are MAC LLDP
> filters
> + * to clean up. Reset is needed to ensure filter removal
> completes.
> + */
> + if (!trusted && vf->num_mac_lldp) {
> + while (vf->num_mac_lldp)
> + ice_vf_update_mac_lldp_num(vf,
> ice_get_vf_vsi(vf), false);
> + ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
> + }
> +
> mutex_unlock(&vf->cfg_lock);
>
> out_put_vf:
> --
> 2.53.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox