* [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev
@ 2026-06-22 11:34 Petr Oros
2026-06-22 11:34 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
0 siblings, 2 replies; 5+ messages in thread
From: Petr Oros @ 2026-06-22 11:34 UTC (permalink / raw)
To: netdev; +Cc: Petr Oros
Two fixes for the uplink default VSI Rx rule (DFLT) on E810 when the
netdev is in IFF_PROMISC.
Patch 1 drops the redundant per-VLAN promisc expansion that exhausts
the FLU pool on a wide VLAN trunk across several PFs.
Patch 2 keeps the DFLT Rx rule across a switchdev teardown instead of
clobbering the promisc state the operator asked for.
Changes since v1:
- Patch 2: reworked to avoid the service task entirely. v1 scheduled a
filter sync in ice_eswitch_disable_switchdev(); that work could run
after ice_remove() freed the uplink VSI (use-after-free) and was not
guaranteed to fire if ice_set_rx_mode() never ran again. v2 keeps or
drops the DFLT Rx rule synchronously in ice_eswitch_release_env() by
testing the live netdev->flags IFF_PROMISC, the same value
ice_cfg_vlan_pruning() already keys on. No service task is scheduled
and no symbol is exported. Dropped Aleksandr's Reviewed-by since the
fix mechanism changed.
- Patch 1: no functional changes, collected Aleksandr's Reviewed-by.
Link to v1:
https://lore.kernel.org/all/cover.1781786935.git.poros@redhat.com/
Petr Oros (2):
ice: skip per-VLAN promisc rules when default VSI Rx rule is set
ice: preserve uplink DFLT Rx rule on switchdev release
drivers/net/ethernet/intel/ice/ice_eswitch.c | 18 +++-
drivers/net/ethernet/intel/ice/ice_main.c | 90 +++++++++++++++-----
2 files changed, 84 insertions(+), 24 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
@ 2026-06-22 11:34 ` Petr Oros
2026-06-23 16:25 ` Simon Horman
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
1 sibling, 1 reply; 5+ messages in thread
From: Petr Oros @ 2026-06-22 11:34 UTC (permalink / raw)
To: netdev; +Cc: Petr Oros, Aleksandr Loktionov
When an ice port is part of a vlan-filtering bridge with a wide VLAN
trunk and the netdev is in IFF_PROMISC (typical for bond slaves
attached to a bridge), the driver installs per-VLAN
ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad
ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes
one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to
32K FLU entries" per device, documented in the E810 datasheet
(613875-009 section 7.8.10, Table 7-18, page 1015).
With three active PFs sharing one switch context and a bridge trunk of
vid 2-4094, the configuration would require roughly
3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules
which exceeds the 32K FLU budget. Firmware then responds to further
Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and
the user-visible failure surfaces as
ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding
VSI, error -5
ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule
After a switch context has been driven into overrun, subsequent
retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has
misled triage attempts toward a perceived recipe binding defect
rather than a capacity issue.
When the DFLT VSI Rx rule is in place it catches every packet on the
lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion
is redundant. The recipe 4 VLAN prune entries are still installed
per VLAN and continue to track the allowed VID set, but the
IFF_PROMISC sync path disables their enforcement on the VSI via
vlan_ops->dis_rx_filtering() before ice_set_promisc() runs.
ena_rx_filtering() is restored when IFF_PROMISC is cleared.
Skip the per-VLAN expansion at the two call sites that drive it:
ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and
ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS
add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is
unchanged and still installs per-VLAN multicast promisc rules.
Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe
filter list for an installed DFLT rule on this VSI, not
netdev->flags & IFF_PROMISC. The HW-state predicate avoids two
regression vectors that a user-intent predicate would introduce:
1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi()
to return 0 without installing the DFLT rule for a PF in
switchdev LAG mode. An IFF_PROMISC-only check would also
suppress the per-VLAN fallback, leaving the PF with no rule.
2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU
exhausted, switch context divergence), the driver clears
IFF_PROMISC from vsi->current_netdev_flags but the netdev's own
flags retain IFF_PROMISC. The user-intent predicate would still
suppress the per-VLAN fallback even though DFLT failed to
install.
The predicate is install-time only. The IFF_PROMISC off path closes
the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI
VSI with VLANs it reinstates the per-VID rules before clearing the
default rule, so multicast coverage never lapses. If that AQ call
fails the default rule is left in place, ice_vsi_exit_dflt_promisc()
returns the error, and the sync_fltr pass bails with
vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag
mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing
the default rule first would instead expose a window where neither
the default rule nor the per-VID rules carry multicast.
If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated
they are deliberately not rolled back. Clearing the default rule is a
removal that frees an FLU entry rather than allocating one, so it
cannot fail for lack of space; a failure is a transient AdminQ error.
The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI,
so the only redundant entry left behind is the single un-removed
default rule, not the per-VID set. The retry re-enters this path,
ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that
already exist so nothing is reallocated, and the default rule is
removed on the next attempt. Rolling the per-VID rules back here would
instead churn thousands of removes and re-adds on every retry.
After the default rule is gone the vid=0 PROMISC rule that paired
with it is redundant and is dropped, but only to reclaim a filter
entry, so a failure there is logged and does not abort the
transition.
ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the
recipe based on whether ICE_PROMISC_VLAN_RX/TX bits are present in
the mask: with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is
used; otherwise ICE_SW_LKUP_PROMISC. The else branch in
ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC.
Because ice_clear_promisc() with VLANs present adds the VLAN bits
and would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch
would leave the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs
are configured. This is a single stale rule, not a per-cycle leak:
re-adding it on the next promisc on returns -EEXIST rather than
allocating a new entry. The set-time recipe is not recorded, so
ice_clear_promisc() clears both recipes; clearing a rule that is not
present succeeds, both clears run unconditionally, and the first
error is returned.
The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid()
and ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule
between ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are
likewise guarded by !ice_is_vsi_dflt_vsi(). With DFLT in place the
vid=0 rule already covers every VID and a recipe swap would only
install a redundant rule.
Lab reproduction on an E810-C with the same firmware family (4.80,
NVM 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges
with vid 2-4094 and the slaves brought to IFF_PROMISC before the
bridge VLAN bulk add:
before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT
responses in dmesg, DFLT VSI Rx rule install fails on
the affected PF
after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT
VSI Rx rule installs on every PF
The 66.7% reduction in installed switch rules per PF matches the
expected per-VLAN saving: a single DFLT rule replaces the per-VID
PROMISC_VLAN expansion.
Functional regression test with vid 2-100 trunk between two ice
ports through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC
at 4093-VID customer scale):
vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP,
vid 100 multicast IPv6 ND
vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not
leaked: 0 packets reach any bridge endpoint
IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU
IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays)
Regression reproducer for commit 1273f89578f2 ("ice: Fix broken
IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off
-> allmulti on plus the orphan-rule Scenario 2; both converge
with no stale rules
100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each)
switchdev mode toggle preserves IFF_PROMISC pruning state across
the session (vid 999 multicast received before and after the
legacy -> switchdev -> legacy cycle)
SR-IOV: VFs unaffected because ice_set_promisc() early-returns
for non-PF VSI and VF representors do not register
ndo_vlan_rx_add_vid
Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
v2:
- No functional changes; collected the Reviewed-by.
v1: https://lore.kernel.org/all/89efbea9831175e6f57e9fe8557f7a0e48e050b7.1781786935.git.poros@redhat.com/
---
drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++-----
1 file changed, 70 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6d24056c247cf4..af8df81fc45623 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (vsi->type != ICE_VSI_PF)
return 0;
- if (ice_vsi_has_non_zero_vlans(vsi)) {
+ /* skip per-VID expansion; the DFLT Rx rule already covers every VID */
+ if (ice_vsi_has_non_zero_vlans(vsi) && !ice_is_vsi_dflt_vsi(vsi)) {
promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi,
promisc_m);
@@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return 0;
if (ice_vsi_has_non_zero_vlans(vsi)) {
- promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
+ int vid0_status;
+
+ /* set time used either recipe (per-VID PROMISC_VLAN, or vid=0
+ * PROMISC via the ice_set_promisc() else branch), so clear
+ * both; clearing an absent rule succeeds
+ */
status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi,
- promisc_m);
+ promisc_m | ICE_PROMISC_VLAN_RX |
+ ICE_PROMISC_VLAN_TX);
+ vid0_status = ice_fltr_clear_vsi_promisc(&vsi->back->hw,
+ vsi->idx, promisc_m, 0);
+ if (!status)
+ status = vid0_status;
} else {
status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
promisc_m, 0);
@@ -317,6 +328,49 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return status;
}
+/**
+ * ice_vsi_exit_dflt_promisc - drop the default VSI Rx rule on promisc off
+ * @vsi: the VSI leaving promiscuous mode
+ *
+ * For an IFF_ALLMULTI VSI with VLANs the per-VID multicast rules are
+ * reinstated before the default rule is cleared so coverage never lapses;
+ * the then redundant vid=0 rule is dropped best-effort. The callees log
+ * their own failures, so error returns are not re-logged here.
+ *
+ * Return: 0 on success, negative on error with the default rule left in place.
+ */
+static int ice_vsi_exit_dflt_promisc(struct ice_vsi *vsi)
+{
+ struct ice_vsi_vlan_ops *vlan_ops = ice_get_compat_vsi_vlan_ops(vsi);
+ struct net_device *netdev = vsi->netdev;
+ struct ice_hw *hw = &vsi->back->hw;
+ bool restore_mc;
+ int err;
+
+ restore_mc = (vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ ice_vsi_has_non_zero_vlans(vsi);
+
+ if (restore_mc) {
+ err = ice_fltr_set_vlan_vsi_promisc(hw, vsi,
+ ICE_MCAST_VLAN_PROMISC_BITS);
+ if (err && err != -EEXIST)
+ return err;
+ }
+
+ err = ice_clear_dflt_vsi(vsi);
+ if (err)
+ return err;
+
+ if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
+ vlan_ops->ena_rx_filtering(vsi);
+
+ if (restore_mc)
+ ice_fltr_clear_vsi_promisc(hw, vsi->idx, ICE_MCAST_PROMISC_BITS,
+ 0);
+
+ return 0;
+}
+
/**
* ice_vsi_sync_fltr - Update the VSI filter list to the HW
* @vsi: ptr to the VSI
@@ -442,17 +496,12 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
} else {
/* Clear Rx filter to remove traffic from wire */
if (ice_is_vsi_dflt_vsi(vsi)) {
- err = ice_clear_dflt_vsi(vsi);
+ err = ice_vsi_exit_dflt_promisc(vsi);
if (err) {
- netdev_err(netdev, "Error %d clearing default VSI %i Rx rule\n",
- err, vsi->vsi_num);
vsi->current_netdev_flags |=
IFF_PROMISC;
goto out_promisc;
}
- if (vsi->netdev->features &
- NETIF_F_HW_VLAN_CTAG_FILTER)
- vlan_ops->ena_rx_filtering(vsi);
}
/* disable allmulti here, but only if allmulti is not
@@ -3676,10 +3725,9 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
usleep_range(1000, 2000);
- /* Add multicast promisc rule for the VLAN ID to be added if
- * all-multicast is currently enabled.
- */
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ /* skip the per-VID rule when the DFLT Rx rule already covers this VID */
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
vid);
@@ -3697,11 +3745,12 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
if (ret)
goto finish;
- /* If all-multicast is currently enabled and this VLAN ID is only one
- * besides VLAN-0 we have to update look-up type of multicast promisc
- * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN.
+ /* On the first non-zero VLAN, promote the VLAN-0 multicast promisc
+ * rule from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. Skip when
+ * the DFLT Rx rule is installed; it already covers every VID.
*/
if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi) &&
ice_vsi_num_non_zero_vlans(vsi) == 1) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_PROMISC_BITS, 0);
@@ -3764,11 +3813,12 @@ int ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
ICE_MCAST_VLAN_PROMISC_BITS, vid);
if (!ice_vsi_has_non_zero_vlans(vsi)) {
- /* Update look-up type of multicast promisc rule for VLAN 0
- * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when
- * all-multicast is enabled and VLAN 0 is the only VLAN rule.
+ /* Last non-zero VLAN gone: demote the VLAN-0 multicast promisc
+ * rule back to ICE_SW_LKUP_PROMISC. Skip when the DFLT Rx rule
+ * is installed; no recipe swap is needed.
*/
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
0);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
2026-06-22 11:34 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros
@ 2026-06-22 11:34 ` Petr Oros
2026-06-23 16:25 ` Simon Horman
1 sibling, 1 reply; 5+ messages in thread
From: Petr Oros @ 2026-06-22 11:34 UTC (permalink / raw)
To: netdev; +Cc: Petr Oros
ice_eswitch_setup_env() calls ice_set_dflt_vsi() to install the
ICE_SW_LKUP_DFLT Rx rule on the uplink VSI. The helper returns 0 even
when the rule is already in place, so the call is a no-op if
ice_vsi_sync_fltr() had previously installed the DFLT rule in response
to IFF_PROMISC on the uplink netdev. ice_remove_vsi_fltr() called
earlier in ice_eswitch_setup_env() does not affect this rule because
ice_remove_vsi_lkup_fltr() lacks a case for ICE_SW_LKUP_DFLT and falls
into its default branch which only logs. Switchdev mode then adds an
ICE_FLTR_TX leg via ice_cfg_dflt_vsi() on the same VSI handle.
ice_eswitch_release_env() unconditionally removed both the Rx and Tx
DFLT rules. When the Rx DFLT was installed by ice_vsi_sync_fltr()
before the switchdev session started, this clobbered promisc state the
operator had asked for: the DFLT Rx rule disappeared while IFF_PROMISC
was still set on the netdev, and the IFF_PROMISC sync path was not
retriggered, so the uplink ended the session without the catch-all
rule the netdev flags requested.
Skip the Rx DFLT removal when the uplink is promiscuous, both in
ice_eswitch_release_env() and in the err_def_tx unwind of
ice_eswitch_setup_env(). The Tx leg installed by switchdev is always
removed since switchdev owns it.
Test the live netdev->flags for this decision. The ena_rx_filtering()
call right above in ice_eswitch_release_env() reaches
ice_cfg_vlan_pruning(), which already keys on the live netdev->flags
IFF_PROMISC bit, so reusing the same value keeps the preserved DFLT
rule and the VLAN pruning state mutually consistent across every
promisc transition, including one the operator made while switchdev
ran: ice_set_rx_mode() is gated off for the uplink during the session,
so such a change never reaches the filter sync, but it is reflected in
netdev->flags and is therefore honored here on release.
Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
Signed-off-by: Petr Oros <poros@redhat.com>
---
v2:
- Reworked the fix to avoid the service task entirely. v1 scheduled a
filter sync in ice_eswitch_disable_switchdev() to reconcile the uplink
DFLT Rx rule; that work could run after ice_remove() freed the uplink
VSI (use-after-free) and was not guaranteed to fire if ice_set_rx_mode()
never ran again. v2 keeps or drops the DFLT Rx rule synchronously in
ice_eswitch_release_env() (and the setup_env error unwind) by testing
the live netdev->flags IFF_PROMISC, the same value ice_cfg_vlan_pruning()
already keys on, so the preserved rule and the pruning state stay
consistent. No service task is scheduled and no symbol is exported.
- Dropped the Reviewed-by since the fix mechanism changed.
v1: https://lore.kernel.org/all/deef5756e534ef06c12d910c5305d3fd205d30a0.1781786935.git.poros@redhat.com/
---
drivers/net/ethernet/intel/ice/ice_eswitch.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index 2e4f0969035f77..48273ef9f69dc8 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -66,8 +66,10 @@ static int ice_eswitch_setup_env(struct ice_pf *pf)
ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
ICE_FLTR_TX);
err_def_tx:
- ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
- ICE_FLTR_RX);
+ /* keep the Rx DFLT rule if the uplink is promiscuous (see release_env) */
+ if (!(uplink_vsi->netdev->flags & IFF_PROMISC))
+ ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx,
+ false, ICE_FLTR_RX);
err_def_rx:
ice_vsi_del_vlan_zero(uplink_vsi);
err_vlan_zero:
@@ -278,8 +280,16 @@ static void ice_eswitch_release_env(struct ice_pf *pf)
vlan_ops->ena_rx_filtering(uplink_vsi);
ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
ICE_FLTR_TX);
- ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
- ICE_FLTR_RX);
+
+ /* Keep the Rx DFLT rule if the uplink is promiscuous; it must outlive
+ * the session. Test the live netdev->flags, the same value
+ * ena_rx_filtering() -> ice_cfg_vlan_pruning() above keys its decision
+ * on, so the preserved DFLT rule and the pruning state stay consistent.
+ */
+ if (!(uplink_vsi->netdev->flags & IFF_PROMISC))
+ ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx,
+ false, ICE_FLTR_RX);
+
ice_fltr_add_mac_and_broadcast(uplink_vsi,
uplink_vsi->port_info->mac.perm_addr,
ICE_FWD_TO_VSI);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set
2026-06-22 11:34 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros
@ 2026-06-23 16:25 ` Simon Horman
0 siblings, 0 replies; 5+ messages in thread
From: Simon Horman @ 2026-06-23 16:25 UTC (permalink / raw)
To: poros; +Cc: Simon Horman, netdev, aleksandr.loktionov
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
Subject: ice: skip per-VLAN promisc rules when default VSI Rx rule is set
This patch optimizes the number of hardware switch rules used when an ice
port is part of a vlan-filtering bridge. It avoids installing per-VLAN
promiscuous rules if the broad default VSI Rx rule is already catching
all traffic.
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index e2fbe111f8496..efb1f536de6c7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
[ ... ]
> @@ -442,17 +496,12 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
[Severity: High]
This is a pre-existing issue, but does MAC filter overflow fail to force
promiscuous mode, resulting in silent packet drops?
When the hardware MAC filter limit is reached (LIBIE_AQ_RC_ENOSPC), the driver
attempts to fall back to promiscuous mode by setting promisc_forced_on = true.
However, the execution flow then reaches a check for IFF_PROMISC:
if (vsi->current_netdev_flags & IFF_PROMISC) {
/* Apply Rx filter rule to get traffic from wire */
...
} else {
/* Clear Rx filter to remove traffic from wire */
Since the user never explicitly requested IFF_PROMISC, this evaluates to false,
causing the driver to take the else branch. The else branch actively disables
the default VSI rule and clears promiscuous mode instead of enabling it. The
fallback feature appears to be non-functional, leading to lost traffic for
newly added MACs.
[Severity: High]
This is a pre-existing issue, but could this function prevent retries of
incomplete configurations and break multicast promiscuity if any default
VSI is already active?
In ice_vsi_sync_fltr(), the setup of multicast promiscuity and disabling of
Rx filtering is gated:
if (!ice_is_dflt_vsi_in_use(vsi->port_info)) {
err = ice_set_dflt_vsi(vsi);
...
err = ice_set_promisc(vsi, ICE_MCAST_PROMISC_BITS);
if (err)
goto out_promisc;
}
If ice_set_dflt_vsi() succeeds but the subsequent ice_set_promisc() fails,
the function jumps to out_promisc to trigger a retry. On the next cycle,
ice_is_dflt_vsi_in_use() returns true because this VSI is now the default,
causing the entire block to be skipped and leaving the VSI misconfigured.
Furthermore, if another VSI (such as a VF) is already the default VSI,
the PF VSI entirely skips setting multicast promiscuity, breaking multicast
reception in promiscuous mode.
> } else {
> /* Clear Rx filter to remove traffic from wire */
> if (ice_is_vsi_dflt_vsi(vsi)) {
> - err = ice_clear_dflt_vsi(vsi);
> + err = ice_vsi_exit_dflt_promisc(vsi);
> if (err) {
> - netdev_err(netdev, "Error %d clearing default VSI %i Rx rule\n",
> - err, vsi->vsi_num);
> vsi->current_netdev_flags |=
> IFF_PROMISC;
> goto out_promisc;
> }
> - if (vsi->netdev->features &
> - NETIF_F_HW_VLAN_CTAG_FILTER)
> - vlan_ops->ena_rx_filtering(vsi);
> }
[ ... ]
> @@ -3675,10 +3724,9 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
> while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
> usleep_range(1000, 2000);
[Severity: High]
This is a pre-existing issue, but isn't there a risk of TOCTOU races with
this ad-hoc lock pattern lacking memory barriers?
The driver spins on ICE_CFG_BUSY using test_and_set_bit() and usleep_range()
to guard VSI filter synchronization. The lock is subsequently dropped using
clear_bit(ICE_CFG_BUSY, vsi->state) at the exit label.
Because clear_bit() lacks a memory release barrier, the CPU or compiler can
reorder memory stores from the critical section to occur after the flag is
cleared, leading to data races and corruption of shared filter state on
weakly ordered architectures.
> - /* Add multicast promisc rule for the VLAN ID to be added if
> - * all-multicast is currently enabled.
> - */
> - if (vsi->current_netdev_flags & IFF_ALLMULTI) {
> + /* skip the per-VID rule when the DFLT Rx rule already covers this VID */
> + if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
> + !ice_is_vsi_dflt_vsi(vsi)) {
> ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx,
> ICE_MCAST_VLAN_PROMISC_BITS,
> vid);
> @@ -3696,11 +3744,12 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
> if (ret)
> goto finish;
[Severity: High]
This is a pre-existing issue, but what happens if vlan_ops->add_vlan()
fails here? Is the hardware switch rule leaked?
When adding a VLAN to an interface with IFF_ALLMULTI enabled, this function
first allocates a promiscuous switch rule in hardware via
ice_fltr_set_vsi_promisc().
If this succeeds but the subsequent call to vlan_ops->add_vlan() fails
(e.g., due to ENOSPC or memory allocation failure), the function returns
early without rolling back the promiscuous rule. Because the network stack
considers the VLAN addition to have failed, ice_vlan_rx_kill_vid() will
never be called for this VID, permanently leaking a Flow Lookup Unit (FLU).
> - /* If all-multicast is currently enabled and this VLAN ID is only one
> - * besides VLAN-0 we have to update look-up type of multicast promisc
> - * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN.
> + /* On the first non-zero VLAN, promote the VLAN-0 multicast promisc
> + * rule from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. Skip when
> + * the DFLT Rx rule is installed; it already covers every VID.
> */
> if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
> + !ice_is_vsi_dflt_vsi(vsi) &&
> ice_vsi_num_non_zero_vlans(vsi) == 1) {
[ ... ]
> @@ -3763,11 +3812,12 @@ int ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
> ICE_MCAST_VLAN_PROMISC_BITS, vid);
>
> if (!ice_vsi_has_non_zero_vlans(vsi)) {
> - /* Update look-up type of multicast promisc rule for VLAN 0
> - * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when
> - * all-multicast is enabled and VLAN 0 is the only VLAN rule.
> + /* Last non-zero VLAN gone: demote the VLAN-0 multicast promisc
> + * rule back to ICE_SW_LKUP_PROMISC. Skip when the DFLT Rx rule
> + * is installed; no recipe swap is needed.
> */
> - if (vsi->current_netdev_flags & IFF_ALLMULTI) {
> + if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
> + !ice_is_vsi_dflt_vsi(vsi)) {
[Severity: High]
Will this modification cause a stale switch rule (ICE_SW_LKUP_PROMISC_VLAN for
vid=0) to be permanently leaked if the last VLAN is removed while the DFLT VSI
rule is active?
This patch skips demoting the vid=0 multicast promiscuous rule from
ICE_SW_LKUP_PROMISC_VLAN back to ICE_SW_LKUP_PROMISC when the last VLAN is
removed, if ice_is_vsi_dflt_vsi(vsi) is true.
However, if the interface later drops IFF_PROMISC (removing the DFLT VSI rule)
and then drops IFF_ALLMULTI, ice_clear_promisc() is called. Because the
interface now has zero VLANs, ice_clear_promisc() executes its else branch:
} else {
status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
promisc_m, 0);
}
This only searches and clears rules from the ICE_SW_LKUP_PROMISC recipe. The
stranded rule in the ICE_SW_LKUP_PROMISC_VLAN recipe is never cleared,
resulting in a permanent hardware switch rule leak.
> ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
> ICE_MCAST_VLAN_PROMISC_BITS,
> 0);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
@ 2026-06-23 16:25 ` Simon Horman
0 siblings, 0 replies; 5+ messages in thread
From: Simon Horman @ 2026-06-23 16:25 UTC (permalink / raw)
To: poros; +Cc: Simon Horman, netdev, aleksandr.loktionov
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ice: preserve uplink DFLT Rx rule on switchdev release
This commit preserves the default Rx rule on the uplink VSI when releasing
the switchdev environment if the interface is in promiscuous mode. This
prevents the catch-all rule from being removed while the netdev flags still
request promiscuity.
> diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> index 2e4f0969035f7..48273ef9f69dc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
[ ... ]
> @@ -278,8 +280,16 @@ static void ice_eswitch_release_env(struct ice_pf *pf)
> vlan_ops->ena_rx_filtering(uplink_vsi);
[Severity: High]
Could this lockless read of netdev->flags cause a regression due to a TOCTOU
race?
The ena_rx_filtering() call eventually reaches ice_cfg_vlan_pruning(),
which reads uplink_vsi->netdev->flags & IFF_PROMISC locklessly to decide
whether to enable VLAN pruning.
> ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
> ICE_FLTR_TX);
> - ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
> - ICE_FLTR_RX);
> +
> + /* Keep the Rx DFLT rule if the uplink is promiscuous; it must outlive
> + * the session. Test the live netdev->flags, the same value
> + * ena_rx_filtering() -> ice_cfg_vlan_pruning() above keys its decision
> + * on, so the preserved DFLT rule and the pruning state stay consistent.
> + */
> + if (!(uplink_vsi->netdev->flags & IFF_PROMISC))
... and then the exact same flags are read locklessly again right here.
Because setup/teardown paths are protected by devl_lock rather than
rtnl_lock, could a concurrent dev_change_flags() holding rtnl_lock toggle
IFF_PROMISC between the two checks?
If IFF_PROMISC changes from 0 to 1 in this window, the code would incorrectly
enable VLAN pruning (seeing 0) and then incorrectly retain the DFLT_RX rule
(seeing 1), leaving the hardware with VLAN pruning enabled during promiscuous
mode.
[Severity: High]
Does preserving the Rx DFLT rule here cause a regression by permanently
breaking multicast promiscuity configuration if IFF_PROMISC was set?
During the switchdev session, ice_remove_vsi_fltr() removes the
ICE_SW_LKUP_PROMISC rules. By manually preserving the DFLT rule here, any
future synchronization via ice_vsi_sync_fltr() will evaluate
!ice_is_dflt_vsi_in_use(vsi->port_info) as false:
ice_main.c:ice_vsi_sync_fltr() {
...
if (!ice_is_dflt_vsi_in_use(vsi->port_info)) {
err = ice_set_dflt_vsi(vsi);
...
err = ice_set_promisc(vsi, ICE_MCAST_PROMISC_BITS);
}
...
}
Because it is skipped, ICE_MCAST_PROMISC_BITS is never configured, leaving
the VSI in unicast promiscuous mode but silently dropping multicast traffic
destined for other MAC addresses.
> + ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx,
> + false, ICE_FLTR_RX);
> +
> ice_fltr_add_mac_and_broadcast(uplink_vsi,
> uplink_vsi->port_info->mac.perm_addr,
> ICE_FWD_TO_VSI);
[Severity: High]
This is a pre-existing issue, but does this teardown function fail to
synchronize hardware state for previously joined multicast groups and
secondary unicast addresses?
When entering switchdev mode, __dev_uc_unsync() and __dev_mc_unsync() clear
the synchronization state of OS-level MAC addresses, and
ice_remove_vsi_fltr() removes the actual hardware filters. While switchdev
runs, calls to ndo_set_rx_mode return early and do not update hardware.
When returning to legacy mode, this function restores only the primary MAC
and broadcast addresses, but doesn't seem to invoke netif_set_rx_mode() or
equivalent to resynchronize the deferred Rx filters from the netdev to the
hardware. Could this cause the hardware to silently drop packets for
previously joined groups until the interface is bounced?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-23 16:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
2026-06-22 11:34 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros
2026-06-23 16:25 ` Simon Horman
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
2026-06-23 16:25 ` Simon Horman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.