* [PATCH iwl-net 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev @ 2026-06-18 15:09 Petr Oros 2026-06-18 15:09 ` [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros 2026-06-18 15:09 ` [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros 0 siblings, 2 replies; 5+ messages in thread From: Petr Oros @ 2026-06-18 15:09 UTC (permalink / raw) To: netdev Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alice Michael, Jacob Keller, Ivan Vecera, Michal Swiatkowski, Grzegorz Nitka, intel-wired-lan, linux-kernel Two fixes for the uplink default VSI Rx rule (DFLT) on E810 when the netdev is in IFF_PROMISC. Patch 1 drops the redundant per-VLAN promisc expansion that exhausts the FLU pool on a wide VLAN trunk across several PFs. Patch 2 keeps the DFLT Rx rule across a switchdev teardown instead of clobbering the promisc state the operator asked for. Lab tested on E810-C: functional, VLAN isolation, IFF_ALLMULTI regression, stress/flap and switchdev-toggle suites pass with no AQ errors, and the FLU pool stays under its ceiling with all four PFs loaded. Petr Oros (2): ice: skip per-VLAN promisc rules when default VSI Rx rule is set ice: preserve uplink DFLT Rx rule on switchdev release drivers/net/ethernet/intel/ice/ice_eswitch.c | 32 ++++++- drivers/net/ethernet/intel/ice/ice_main.c | 90 +++++++++++++++----- 2 files changed, 98 insertions(+), 24 deletions(-) -- 2.53.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set 2026-06-18 15:09 [PATCH iwl-net 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros @ 2026-06-18 15:09 ` Petr Oros 2026-06-18 16:01 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-06-18 15:09 ` [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros 1 sibling, 1 reply; 5+ messages in thread From: Petr Oros @ 2026-06-18 15:09 UTC (permalink / raw) To: netdev Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alice Michael, Jacob Keller, Ivan Vecera, Michal Swiatkowski, Grzegorz Nitka, intel-wired-lan, linux-kernel When an ice port is part of a vlan-filtering bridge with a wide VLAN trunk and the netdev is in IFF_PROMISC (typical for bond slaves attached to a bridge), the driver installs per-VLAN ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to 32K FLU entries" per device, documented in the E810 datasheet (613875-009 section 7.8.10, Table 7-18, page 1015). With three active PFs sharing one switch context and a bridge trunk of vid 2-4094, the configuration would require roughly 3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules which exceeds the 32K FLU budget. Firmware then responds to further Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and the user-visible failure surfaces as ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding VSI, error -5 ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule After a switch context has been driven into overrun, subsequent retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has misled triage attempts toward a perceived recipe binding defect rather than a capacity issue. When the DFLT VSI Rx rule is in place it catches every packet on the lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion is redundant. The recipe 4 VLAN prune entries are still installed per VLAN and continue to track the allowed VID set, but the IFF_PROMISC sync path disables their enforcement on the VSI via vlan_ops->dis_rx_filtering() before ice_set_promisc() runs. ena_rx_filtering() is restored when IFF_PROMISC is cleared. Skip the per-VLAN expansion at the two call sites that drive it: ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is unchanged and still installs per-VLAN multicast promisc rules. Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe filter list for an installed DFLT rule on this VSI, not netdev->flags & IFF_PROMISC. The HW-state predicate avoids two regression vectors that a user-intent predicate would introduce: 1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi() to return 0 without installing the DFLT rule for a PF in switchdev LAG mode. An IFF_PROMISC-only check would also suppress the per-VLAN fallback, leaving the PF with no rule. 2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU exhausted, switch context divergence), the driver clears IFF_PROMISC from vsi->current_netdev_flags but the netdev's own flags retain IFF_PROMISC. The user-intent predicate would still suppress the per-VLAN fallback even though DFLT failed to install. The predicate is install-time only. The IFF_PROMISC off path closes the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI VSI with VLANs it reinstates the per-VID rules before clearing the default rule, so multicast coverage never lapses. If that AQ call fails the default rule is left in place, ice_vsi_exit_dflt_promisc() returns the error, and the sync_fltr pass bails with vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing the default rule first would instead expose a window where neither the default rule nor the per-VID rules carry multicast. If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated they are deliberately not rolled back. Clearing the default rule is a removal that frees an FLU entry rather than allocating one, so it cannot fail for lack of space; a failure is a transient AdminQ error. The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI, so the only redundant entry left behind is the single un-removed default rule, not the per-VID set. The retry re-enters this path, ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that already exist so nothing is reallocated, and the default rule is removed on the next attempt. Rolling the per-VID rules back here would instead churn thousands of removes and re-adds on every retry. After the default rule is gone the vid=0 PROMISC rule that paired with it is redundant and is dropped, but only to reclaim a filter entry, so a failure there is logged and does not abort the transition. ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the recipe based on whether ICE_PROMISC_VLAN_RX/TX bits are present in the mask: with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is used; otherwise ICE_SW_LKUP_PROMISC. The else branch in ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC. Because ice_clear_promisc() with VLANs present adds the VLAN bits and would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch would leave the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs are configured. This is a single stale rule, not a per-cycle leak: re-adding it on the next promisc on returns -EEXIST rather than allocating a new entry. The set-time recipe is not recorded, so ice_clear_promisc() clears both recipes; clearing a rule that is not present succeeds, both clears run unconditionally, and the first error is returned. The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule between ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are likewise guarded by !ice_is_vsi_dflt_vsi(). With DFLT in place the vid=0 rule already covers every VID and a recipe swap would only install a redundant rule. Lab reproduction on an E810-C with the same firmware family (4.80, NVM 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges with vid 2-4094 and the slaves brought to IFF_PROMISC before the bridge VLAN bulk add: before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT responses in dmesg, DFLT VSI Rx rule install fails on the affected PF after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT VSI Rx rule installs on every PF The 66.7% reduction in installed switch rules per PF matches the expected per-VLAN saving: a single DFLT rule replaces the per-VID PROMISC_VLAN expansion. Functional regression test with vid 2-100 trunk between two ice ports through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC at 4093-VID customer scale): vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP, vid 100 multicast IPv6 ND vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not leaked: 0 packets reach any bridge endpoint IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays) Regression reproducer for commit 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off -> allmulti on plus the orphan-rule Scenario 2; both converge with no stale rules 100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each) switchdev mode toggle preserves IFF_PROMISC pruning state across the session (vid 999 multicast received before and after the legacy -> switchdev -> legacy cycle) SR-IOV: VFs unaffected because ice_set_promisc() early-returns for non-PF VSI and VF representors do not register ndo_vlan_rx_add_vid Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling") Signed-off-by: Petr Oros <poros@redhat.com> --- drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++----- 1 file changed, 70 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6d24056c247cf4..af8df81fc45623 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m) if (vsi->type != ICE_VSI_PF) return 0; - if (ice_vsi_has_non_zero_vlans(vsi)) { + /* skip per-VID expansion; the DFLT Rx rule already covers every VID */ + if (ice_vsi_has_non_zero_vlans(vsi) && !ice_is_vsi_dflt_vsi(vsi)) { promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX); status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi, promisc_m); @@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m) return 0; if (ice_vsi_has_non_zero_vlans(vsi)) { - promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX); + int vid0_status; + + /* set time used either recipe (per-VID PROMISC_VLAN, or vid=0 + * PROMISC via the ice_set_promisc() else branch), so clear + * both; clearing an absent rule succeeds + */ status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi, - promisc_m); + promisc_m | ICE_PROMISC_VLAN_RX | + ICE_PROMISC_VLAN_TX); + vid0_status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, + vsi->idx, promisc_m, 0); + if (!status) + status = vid0_status; } else { status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, promisc_m, 0); @@ -317,6 +328,49 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m) return status; } +/** + * ice_vsi_exit_dflt_promisc - drop the default VSI Rx rule on promisc off + * @vsi: the VSI leaving promiscuous mode + * + * For an IFF_ALLMULTI VSI with VLANs the per-VID multicast rules are + * reinstated before the default rule is cleared so coverage never lapses; + * the then redundant vid=0 rule is dropped best-effort. The callees log + * their own failures, so error returns are not re-logged here. + * + * Return: 0 on success, negative on error with the default rule left in place. + */ +static int ice_vsi_exit_dflt_promisc(struct ice_vsi *vsi) +{ + struct ice_vsi_vlan_ops *vlan_ops = ice_get_compat_vsi_vlan_ops(vsi); + struct net_device *netdev = vsi->netdev; + struct ice_hw *hw = &vsi->back->hw; + bool restore_mc; + int err; + + restore_mc = (vsi->current_netdev_flags & IFF_ALLMULTI) && + ice_vsi_has_non_zero_vlans(vsi); + + if (restore_mc) { + err = ice_fltr_set_vlan_vsi_promisc(hw, vsi, + ICE_MCAST_VLAN_PROMISC_BITS); + if (err && err != -EEXIST) + return err; + } + + err = ice_clear_dflt_vsi(vsi); + if (err) + return err; + + if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER) + vlan_ops->ena_rx_filtering(vsi); + + if (restore_mc) + ice_fltr_clear_vsi_promisc(hw, vsi->idx, ICE_MCAST_PROMISC_BITS, + 0); + + return 0; +} + /** * ice_vsi_sync_fltr - Update the VSI filter list to the HW * @vsi: ptr to the VSI @@ -442,17 +496,12 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi) } else { /* Clear Rx filter to remove traffic from wire */ if (ice_is_vsi_dflt_vsi(vsi)) { - err = ice_clear_dflt_vsi(vsi); + err = ice_vsi_exit_dflt_promisc(vsi); if (err) { - netdev_err(netdev, "Error %d clearing default VSI %i Rx rule\n", - err, vsi->vsi_num); vsi->current_netdev_flags |= IFF_PROMISC; goto out_promisc; } - if (vsi->netdev->features & - NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_ops->ena_rx_filtering(vsi); } /* disable allmulti here, but only if allmulti is not @@ -3676,10 +3725,9 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid) while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) usleep_range(1000, 2000); - /* Add multicast promisc rule for the VLAN ID to be added if - * all-multicast is currently enabled. - */ - if (vsi->current_netdev_flags & IFF_ALLMULTI) { + /* skip the per-VID rule when the DFLT Rx rule already covers this VID */ + if ((vsi->current_netdev_flags & IFF_ALLMULTI) && + !ice_is_vsi_dflt_vsi(vsi)) { ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx, ICE_MCAST_VLAN_PROMISC_BITS, vid); @@ -3697,11 +3745,12 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid) if (ret) goto finish; - /* If all-multicast is currently enabled and this VLAN ID is only one - * besides VLAN-0 we have to update look-up type of multicast promisc - * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. + /* On the first non-zero VLAN, promote the VLAN-0 multicast promisc + * rule from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. Skip when + * the DFLT Rx rule is installed; it already covers every VID. */ if ((vsi->current_netdev_flags & IFF_ALLMULTI) && + !ice_is_vsi_dflt_vsi(vsi) && ice_vsi_num_non_zero_vlans(vsi) == 1) { ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, ICE_MCAST_PROMISC_BITS, 0); @@ -3764,11 +3813,12 @@ int ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid) ICE_MCAST_VLAN_PROMISC_BITS, vid); if (!ice_vsi_has_non_zero_vlans(vsi)) { - /* Update look-up type of multicast promisc rule for VLAN 0 - * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when - * all-multicast is enabled and VLAN 0 is the only VLAN rule. + /* Last non-zero VLAN gone: demote the VLAN-0 multicast promisc + * rule back to ICE_SW_LKUP_PROMISC. Skip when the DFLT Rx rule + * is installed; no recipe swap is needed. */ - if (vsi->current_netdev_flags & IFF_ALLMULTI) { + if ((vsi->current_netdev_flags & IFF_ALLMULTI) && + !ice_is_vsi_dflt_vsi(vsi)) { ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx, ICE_MCAST_VLAN_PROMISC_BITS, 0); -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set 2026-06-18 15:09 ` [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros @ 2026-06-18 16:01 ` Loktionov, Aleksandr 0 siblings, 0 replies; 5+ messages in thread From: Loktionov, Aleksandr @ 2026-06-18 16:01 UTC (permalink / raw) To: Oros, Petr, netdev@vger.kernel.org Cc: Vecera, Ivan, Alice Michael, Kitszel, Przemyslaw, Eric Dumazet, linux-kernel@vger.kernel.org, Andrew Lunn, Nguyen, Anthony L, Michal Swiatkowski, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller, intel-wired-lan@lists.osuosl.org > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Petr Oros > Sent: Thursday, June 18, 2026 5:09 PM > To: netdev@vger.kernel.org > Cc: Vecera, Ivan <ivecera@redhat.com>; Alice Michael > <alice.michael@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; Eric Dumazet <edumazet@google.com>; > linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com>; Keller, Jacob E > <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo > Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; > intel-wired-lan@lists.osuosl.org > Subject: [Intel-wired-lan] [PATCH iwl-net 1/2] ice: skip per-VLAN > promisc rules when default VSI Rx rule is set > > When an ice port is part of a vlan-filtering bridge with a wide VLAN > trunk and the netdev is in IFF_PROMISC (typical for bond slaves > attached to a bridge), the driver installs per-VLAN > ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad > ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes > one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to > 32K FLU entries" per device, documented in the E810 datasheet > (613875-009 section 7.8.10, Table 7-18, page 1015). > > With three active PFs sharing one switch context and a bridge trunk of > vid 2-4094, the configuration would require roughly > > 3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules > > which exceeds the 32K FLU budget. Firmware then responds to further > Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and > the user-visible failure surfaces as > > ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding > VSI, error -5 > ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule > > After a switch context has been driven into overrun, subsequent > retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has > misled triage attempts toward a perceived recipe binding defect rather > than a capacity issue. > > When the DFLT VSI Rx rule is in place it catches every packet on the > lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion > is redundant. The recipe 4 VLAN prune entries are still installed per > VLAN and continue to track the allowed VID set, but the IFF_PROMISC > sync path disables their enforcement on the VSI via > vlan_ops->dis_rx_filtering() before ice_set_promisc() runs. > ena_rx_filtering() is restored when IFF_PROMISC is cleared. > > Skip the per-VLAN expansion at the two call sites that drive it: > ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and > ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS > add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is > unchanged and still installs per-VLAN multicast promisc rules. > > Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe filter > list for an installed DFLT rule on this VSI, not > netdev->flags & IFF_PROMISC. The HW-state predicate avoids two > regression vectors that a user-intent predicate would introduce: > > 1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi() > to return 0 without installing the DFLT rule for a PF in > switchdev LAG mode. An IFF_PROMISC-only check would also > suppress the per-VLAN fallback, leaving the PF with no rule. > > 2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU > exhausted, switch context divergence), the driver clears > IFF_PROMISC from vsi->current_netdev_flags but the netdev's own > flags retain IFF_PROMISC. The user-intent predicate would still > suppress the per-VLAN fallback even though DFLT failed to > install. > > The predicate is install-time only. The IFF_PROMISC off path closes > the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI > VSI with VLANs it reinstates the per-VID rules before clearing the > default rule, so multicast coverage never lapses. If that AQ call > fails the default rule is left in place, ice_vsi_exit_dflt_promisc() > returns the error, and the sync_fltr pass bails with > vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag > mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing > the default rule first would instead expose a window where neither the > default rule nor the per-VID rules carry multicast. > > If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated > they are deliberately not rolled back. Clearing the default rule is a > removal that frees an FLU entry rather than allocating one, so it > cannot fail for lack of space; a failure is a transient AdminQ error. > The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI, > so the only redundant entry left behind is the single un-removed > default rule, not the per-VID set. The retry re-enters this path, > ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that > already exist so nothing is reallocated, and the default rule is > removed on the next attempt. Rolling the per-VID rules back here would > instead churn thousands of removes and re-adds on every retry. > > After the default rule is gone the vid=0 PROMISC rule that paired with > it is redundant and is dropped, but only to reclaim a filter entry, so > a failure there is logged and does not abort the transition. > > ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the recipe > based on whether ICE_PROMISC_VLAN_RX/TX bits are present in the mask: > with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is used; otherwise > ICE_SW_LKUP_PROMISC. The else branch in > ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC. > Because ice_clear_promisc() with VLANs present adds the VLAN bits and > would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch would leave > the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs are configured. > This is a single stale rule, not a per-cycle leak: > re-adding it on the next promisc on returns -EEXIST rather than > allocating a new entry. The set-time recipe is not recorded, so > ice_clear_promisc() clears both recipes; clearing a rule that is not > present succeeds, both clears run unconditionally, and the first error > is returned. > > The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid() and > ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule between > ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are likewise guarded > by !ice_is_vsi_dflt_vsi(). With DFLT in place the > vid=0 rule already covers every VID and a recipe swap would only > install a redundant rule. > > Lab reproduction on an E810-C with the same firmware family (4.80, NVM > 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges with > vid 2-4094 and the slaves brought to IFF_PROMISC before the bridge > VLAN bulk add: > > before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT > responses in dmesg, DFLT VSI Rx rule install fails on > the affected PF > after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT > VSI Rx rule installs on every PF > > The 66.7% reduction in installed switch rules per PF matches the > expected per-VLAN saving: a single DFLT rule replaces the per-VID > PROMISC_VLAN expansion. > > Functional regression test with vid 2-100 trunk between two ice ports > through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC at 4093-VID > customer scale): > > vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP, > vid 100 multicast IPv6 ND > vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not > leaked: 0 packets reach any bridge endpoint > IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU > IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays) > Regression reproducer for commit 1273f89578f2 ("ice: Fix broken > IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off > -> allmulti on plus the orphan-rule Scenario 2; both converge > with no stale rules > 100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each) > switchdev mode toggle preserves IFF_PROMISC pruning state across > the session (vid 999 multicast received before and after the > legacy -> switchdev -> legacy cycle) > SR-IOV: VFs unaffected because ice_set_promisc() early-returns > for non-PF VSI and VF representors do not register > ndo_vlan_rx_add_vid > > Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling") > Signed-off-by: Petr Oros <poros@redhat.com> > --- > drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++---- > - > 1 file changed, 70 insertions(+), 20 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c > b/drivers/net/ethernet/intel/ice/ice_main.c > index 6d24056c247cf4..af8df81fc45623 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 > promisc_m) > if (vsi->type != ICE_VSI_PF) > return 0; > > - if (ice_vsi_has_non_zero_vlans(vsi)) { > + /* skip per-VID expansion; the DFLT Rx rule already covers > every VID */ > + if (ice_vsi_has_non_zero_vlans(vsi) && > !ice_is_vsi_dflt_vsi(vsi)) { > promisc_m |= (ICE_PROMISC_VLAN_RX | > ICE_PROMISC_VLAN_TX); > status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, > vsi, > promisc_m); > @@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi, > u8 promisc_m) > return 0; > > if (ice_vsi_has_non_zero_vlans(vsi)) { ... > ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi- > >idx, > > ICE_MCAST_VLAN_PROMISC_BITS, > 0); > -- > 2.53.0 Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release 2026-06-18 15:09 [PATCH iwl-net 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros 2026-06-18 15:09 ` [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros @ 2026-06-18 15:09 ` Petr Oros 2026-06-18 16:02 ` [Intel-wired-lan] " Loktionov, Aleksandr 1 sibling, 1 reply; 5+ messages in thread From: Petr Oros @ 2026-06-18 15:09 UTC (permalink / raw) To: netdev Cc: Petr Oros, Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alice Michael, Jacob Keller, Ivan Vecera, Michal Swiatkowski, Grzegorz Nitka, intel-wired-lan, linux-kernel ice_eswitch_setup_env() calls ice_set_dflt_vsi() to install the ICE_SW_LKUP_DFLT Rx rule on the uplink VSI. The helper returns 0 even when the rule is already in place, so the call is a no-op if ice_vsi_sync_fltr() had previously installed the DFLT rule in response to IFF_PROMISC on the uplink netdev. ice_remove_vsi_fltr() called earlier in ice_eswitch_setup_env() does not affect this rule because ice_remove_vsi_lkup_fltr() lacks a case for ICE_SW_LKUP_DFLT and falls into its default branch which only logs. Switchdev mode then adds an ICE_FLTR_TX leg via ice_cfg_dflt_vsi() on the same VSI handle. ice_eswitch_release_env() unconditionally removed both the Rx and Tx DFLT rules. When the Rx DFLT was installed by ice_vsi_sync_fltr() before the switchdev session started, this clobbered promisc state the operator had asked for: the DFLT Rx rule disappeared while IFF_PROMISC was still set on the netdev, and the IFF_PROMISC sync path was not retriggered, so the uplink ended the session without the catch-all rule the netdev flags requested. Skip the Rx DFLT removal when the uplink is still promiscuous, both in ice_eswitch_release_env() and in the err_def_tx unwind of ice_eswitch_setup_env(). The Tx leg installed by switchdev is always removed since switchdev owns it. The ena_rx_filtering() call earlier in ice_eswitch_release_env() is left unconditional: it calls ice_cfg_vlan_pruning(), which returns without enabling pruning while the netdev is in IFF_PROMISC, so it cannot re-enable VLAN pruning under the preserved DFLT rule and drop tagged traffic. Pruning is re-enabled later, when the IFF_PROMISC sync path runs after promisc is actually cleared. Use vsi->current_netdev_flags rather than the live netdev->flags for this test. netdev->flags is written under RTNL by dev_change_flags(), while ice_eswitch_release_env() runs under devl_lock, so reading it here would be a TOCTOU against a concurrent promisc change. The IFF_PROMISC bit of current_netdev_flags is written only under ICE_CFG_BUSY by ice_vsi_sync_fltr(), and ice_set_rx_mode() gates that sync off for the uplink while ice_is_switchdev_running() is true. The bit is therefore frozen for the whole session and stable when release_env reads it. Because the sync is gated off during the session, a promisc change the operator makes while switchdev runs never reaches ice_vsi_sync_fltr(): current_netdev_flags keeps the value captured before the session while netdev->flags carries the new one. Once switchdev is torn down and pf->eswitch.is_running is cleared, schedule a filter sync from ice_eswitch_disable_switchdev() so the suppressed change is replayed and the DFLT Rx rule is reconciled with the current netdev flags. This also closes the window where release_env kept the rule based on the frozen flag but the operator had since cleared IFF_PROMISC. Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Signed-off-by: Petr Oros <poros@redhat.com> --- drivers/net/ethernet/intel/ice/ice_eswitch.c | 32 +++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c index 2e4f0969035f77..b6073fc2375019 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c @@ -66,8 +66,10 @@ static int ice_eswitch_setup_env(struct ice_pf *pf) ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, ICE_FLTR_TX); err_def_tx: - ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, - ICE_FLTR_RX); + /* keep the Rx DFLT rule if still promiscuous (see release_env) */ + if (!(uplink_vsi->current_netdev_flags & IFF_PROMISC)) + ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, + false, ICE_FLTR_RX); err_def_rx: ice_vsi_del_vlan_zero(uplink_vsi); err_vlan_zero: @@ -275,11 +277,23 @@ static void ice_eswitch_release_env(struct ice_pf *pf) vlan_ops = ice_get_compat_vsi_vlan_ops(uplink_vsi); ice_vsi_update_local_lb(uplink_vsi, false); + /* No-op while IFF_PROMISC is set: ice_cfg_vlan_pruning() self-gates on + * it, so this cannot re-enable VLAN pruning under a preserved DFLT rule. + */ vlan_ops->ena_rx_filtering(uplink_vsi); ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, ICE_FLTR_TX); - ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, - ICE_FLTR_RX); + + /* Keep the Rx DFLT rule if the uplink is still promiscuous; it must + * outlive the session. current_netdev_flags is used because its + * IFF_PROMISC bit only changes under ice_vsi_sync_fltr(), gated off + * during switchdev, so the read cannot race the RTNL netdev->flags. + * Any change made during the session is replayed on teardown. + */ + if (!(uplink_vsi->current_netdev_flags & IFF_PROMISC)) + ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, + false, ICE_FLTR_RX); + ice_fltr_add_mac_and_broadcast(uplink_vsi, uplink_vsi->port_info->mac.perm_addr, ICE_FWD_TO_VSI); @@ -327,10 +341,20 @@ static int ice_eswitch_enable_switchdev(struct ice_pf *pf) */ static void ice_eswitch_disable_switchdev(struct ice_pf *pf) { + struct ice_vsi *uplink_vsi = pf->eswitch.uplink_vsi; + ice_eswitch_br_offloads_deinit(pf); ice_eswitch_release_env(pf); pf->eswitch.is_running = false; + + /* ice_set_rx_mode() was gated off during the session; replay a filter + * sync so any suppressed promisc change reconciles the DFLT Rx rule. + */ + set_bit(ICE_VSI_UMAC_FLTR_CHANGED, uplink_vsi->state); + set_bit(ICE_VSI_MMAC_FLTR_CHANGED, uplink_vsi->state); + set_bit(ICE_FLAG_FLTR_SYNC, pf->flags); + ice_service_task_schedule(pf); } /** -- 2.53.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release 2026-06-18 15:09 ` [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros @ 2026-06-18 16:02 ` Loktionov, Aleksandr 0 siblings, 0 replies; 5+ messages in thread From: Loktionov, Aleksandr @ 2026-06-18 16:02 UTC (permalink / raw) To: Oros, Petr, netdev@vger.kernel.org Cc: Vecera, Ivan, Alice Michael, Kitszel, Przemyslaw, Eric Dumazet, linux-kernel@vger.kernel.org, Andrew Lunn, Nguyen, Anthony L, Michal Swiatkowski, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller, intel-wired-lan@lists.osuosl.org > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Petr Oros > Sent: Thursday, June 18, 2026 5:09 PM > To: netdev@vger.kernel.org > Cc: Vecera, Ivan <ivecera@redhat.com>; Alice Michael > <alice.michael@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; Eric Dumazet <edumazet@google.com>; > linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com>; Keller, Jacob E > <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo > Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; > intel-wired-lan@lists.osuosl.org > Subject: [Intel-wired-lan] [PATCH iwl-net 2/2] ice: preserve uplink > DFLT Rx rule on switchdev release > > ice_eswitch_setup_env() calls ice_set_dflt_vsi() to install the > ICE_SW_LKUP_DFLT Rx rule on the uplink VSI. The helper returns 0 even > when the rule is already in place, so the call is a no-op if > ice_vsi_sync_fltr() had previously installed the DFLT rule in response > to IFF_PROMISC on the uplink netdev. ice_remove_vsi_fltr() called > earlier in ice_eswitch_setup_env() does not affect this rule because > ice_remove_vsi_lkup_fltr() lacks a case for ICE_SW_LKUP_DFLT and falls > into its default branch which only logs. Switchdev mode then adds an > ICE_FLTR_TX leg via ice_cfg_dflt_vsi() on the same VSI handle. > > ice_eswitch_release_env() unconditionally removed both the Rx and Tx > DFLT rules. When the Rx DFLT was installed by ice_vsi_sync_fltr() > before the switchdev session started, this clobbered promisc state the > operator had asked for: the DFLT Rx rule disappeared while IFF_PROMISC > was still set on the netdev, and the IFF_PROMISC sync path was not > retriggered, so the uplink ended the session without the catch-all > rule the netdev flags requested. > > Skip the Rx DFLT removal when the uplink is still promiscuous, both in > ice_eswitch_release_env() and in the err_def_tx unwind of > ice_eswitch_setup_env(). The Tx leg installed by switchdev is always > removed since switchdev owns it. > > The ena_rx_filtering() call earlier in ice_eswitch_release_env() is > left unconditional: it calls ice_cfg_vlan_pruning(), which returns > without enabling pruning while the netdev is in IFF_PROMISC, so it > cannot re-enable VLAN pruning under the preserved DFLT rule and drop > tagged traffic. Pruning is re-enabled later, when the IFF_PROMISC sync > path runs after promisc is actually cleared. > > Use vsi->current_netdev_flags rather than the live netdev->flags for > this test. netdev->flags is written under RTNL by dev_change_flags(), > while ice_eswitch_release_env() runs under devl_lock, so reading it > here would be a TOCTOU against a concurrent promisc change. The > IFF_PROMISC bit of current_netdev_flags is written only under > ICE_CFG_BUSY by ice_vsi_sync_fltr(), and ice_set_rx_mode() gates that > sync off for the uplink while ice_is_switchdev_running() is true. The > bit is therefore frozen for the whole session and stable when > release_env reads it. > > Because the sync is gated off during the session, a promisc change the > operator makes while switchdev runs never reaches ice_vsi_sync_fltr(): > current_netdev_flags keeps the value captured before the session while > netdev->flags carries the new one. Once switchdev is torn down and > pf->eswitch.is_running is cleared, schedule a filter sync from > ice_eswitch_disable_switchdev() so the suppressed change is replayed > and the DFLT Rx rule is reconciled with the current netdev flags. This > also closes the window where release_env kept the rule based on the > frozen flag but the operator had since cleared IFF_PROMISC. > > Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") > Signed-off-by: Petr Oros <poros@redhat.com> > --- > drivers/net/ethernet/intel/ice/ice_eswitch.c | 32 +++++++++++++++++-- > - > 1 file changed, 28 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c > b/drivers/net/ethernet/intel/ice/ice_eswitch.c > index 2e4f0969035f77..b6073fc2375019 100644 > --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c > +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c > @@ -66,8 +66,10 @@ static int ice_eswitch_setup_env(struct ice_pf *pf) > ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, > ICE_FLTR_TX); > err_def_tx: > - ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, > - ICE_FLTR_RX); > + /* keep the Rx DFLT rule if still promiscuous (see release_env) > */ > + if (!(uplink_vsi->current_netdev_flags & IFF_PROMISC)) > + ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, > + false, ICE_FLTR_RX); > err_def_rx: > ice_vsi_del_vlan_zero(uplink_vsi); > err_vlan_zero: > @@ -275,11 +277,23 @@ static void ice_eswitch_release_env(struct > ice_pf *pf) > vlan_ops = ice_get_compat_vsi_vlan_ops(uplink_vsi); > > ice_vsi_update_local_lb(uplink_vsi, false); > + /* No-op while IFF_PROMISC is set: ice_cfg_vlan_pruning() self- > gates on > + * it, so this cannot re-enable VLAN pruning under a preserved > DFLT rule. > + */ > vlan_ops->ena_rx_filtering(uplink_vsi); > ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, > ICE_FLTR_TX); > - ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false, > - ICE_FLTR_RX); > + > + /* Keep the Rx DFLT rule if the uplink is still promiscuous; it > must > + * outlive the session. current_netdev_flags is used because > its > + * IFF_PROMISC bit only changes under ice_vsi_sync_fltr(), > gated off > + * during switchdev, so the read cannot race the RTNL netdev- > >flags. > + * Any change made during the session is replayed on teardown. > + */ > + if (!(uplink_vsi->current_netdev_flags & IFF_PROMISC)) > + ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, > + false, ICE_FLTR_RX); > + > ice_fltr_add_mac_and_broadcast(uplink_vsi, > uplink_vsi->port_info- > >mac.perm_addr, > ICE_FWD_TO_VSI); > @@ -327,10 +341,20 @@ static int ice_eswitch_enable_switchdev(struct > ice_pf *pf) > */ > static void ice_eswitch_disable_switchdev(struct ice_pf *pf) { > + struct ice_vsi *uplink_vsi = pf->eswitch.uplink_vsi; > + > ice_eswitch_br_offloads_deinit(pf); > ice_eswitch_release_env(pf); > > pf->eswitch.is_running = false; > + > + /* ice_set_rx_mode() was gated off during the session; replay a > filter > + * sync so any suppressed promisc change reconciles the DFLT Rx > rule. > + */ > + set_bit(ICE_VSI_UMAC_FLTR_CHANGED, uplink_vsi->state); > + set_bit(ICE_VSI_MMAC_FLTR_CHANGED, uplink_vsi->state); > + set_bit(ICE_FLAG_FLTR_SYNC, pf->flags); > + ice_service_task_schedule(pf); > } > > /** > -- > 2.53.0 Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-18 16:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-18 15:09 [PATCH iwl-net 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros 2026-06-18 15:09 ` [PATCH iwl-net 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros 2026-06-18 16:01 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-06-18 15:09 ` [PATCH iwl-net 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros 2026-06-18 16:02 ` [Intel-wired-lan] " Loktionov, Aleksandr
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox