* [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
@ 2026-06-22 11:34 ` Petr Oros
2026-06-23 16:25 ` Simon Horman
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
1 sibling, 1 reply; 5+ messages in thread
From: Petr Oros @ 2026-06-22 11:34 UTC (permalink / raw)
To: netdev; +Cc: Petr Oros, Aleksandr Loktionov
When an ice port is part of a vlan-filtering bridge with a wide VLAN
trunk and the netdev is in IFF_PROMISC (typical for bond slaves
attached to a bridge), the driver installs per-VLAN
ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad
ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes
one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to
32K FLU entries" per device, documented in the E810 datasheet
(613875-009 section 7.8.10, Table 7-18, page 1015).
With three active PFs sharing one switch context and a bridge trunk of
vid 2-4094, the configuration would require roughly
3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules
which exceeds the 32K FLU budget. Firmware then responds to further
Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and
the user-visible failure surfaces as
ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding
VSI, error -5
ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule
After a switch context has been driven into overrun, subsequent
retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has
misled triage attempts toward a perceived recipe binding defect
rather than a capacity issue.
When the DFLT VSI Rx rule is in place it catches every packet on the
lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion
is redundant. The recipe 4 VLAN prune entries are still installed
per VLAN and continue to track the allowed VID set, but the
IFF_PROMISC sync path disables their enforcement on the VSI via
vlan_ops->dis_rx_filtering() before ice_set_promisc() runs.
ena_rx_filtering() is restored when IFF_PROMISC is cleared.
Skip the per-VLAN expansion at the two call sites that drive it:
ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and
ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS
add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is
unchanged and still installs per-VLAN multicast promisc rules.
Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe
filter list for an installed DFLT rule on this VSI, not
netdev->flags & IFF_PROMISC. The HW-state predicate avoids two
regression vectors that a user-intent predicate would introduce:
1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi()
to return 0 without installing the DFLT rule for a PF in
switchdev LAG mode. An IFF_PROMISC-only check would also
suppress the per-VLAN fallback, leaving the PF with no rule.
2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU
exhausted, switch context divergence), the driver clears
IFF_PROMISC from vsi->current_netdev_flags but the netdev's own
flags retain IFF_PROMISC. The user-intent predicate would still
suppress the per-VLAN fallback even though DFLT failed to
install.
The predicate is install-time only. The IFF_PROMISC off path closes
the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI
VSI with VLANs it reinstates the per-VID rules before clearing the
default rule, so multicast coverage never lapses. If that AQ call
fails the default rule is left in place, ice_vsi_exit_dflt_promisc()
returns the error, and the sync_fltr pass bails with
vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag
mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing
the default rule first would instead expose a window where neither
the default rule nor the per-VID rules carry multicast.
If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated
they are deliberately not rolled back. Clearing the default rule is a
removal that frees an FLU entry rather than allocating one, so it
cannot fail for lack of space; a failure is a transient AdminQ error.
The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI,
so the only redundant entry left behind is the single un-removed
default rule, not the per-VID set. The retry re-enters this path,
ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that
already exist so nothing is reallocated, and the default rule is
removed on the next attempt. Rolling the per-VID rules back here would
instead churn thousands of removes and re-adds on every retry.
After the default rule is gone the vid=0 PROMISC rule that paired
with it is redundant and is dropped, but only to reclaim a filter
entry, so a failure there is logged and does not abort the
transition.
ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the
recipe based on whether ICE_PROMISC_VLAN_RX/TX bits are present in
the mask: with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is
used; otherwise ICE_SW_LKUP_PROMISC. The else branch in
ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC.
Because ice_clear_promisc() with VLANs present adds the VLAN bits
and would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch
would leave the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs
are configured. This is a single stale rule, not a per-cycle leak:
re-adding it on the next promisc on returns -EEXIST rather than
allocating a new entry. The set-time recipe is not recorded, so
ice_clear_promisc() clears both recipes; clearing a rule that is not
present succeeds, both clears run unconditionally, and the first
error is returned.
The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid()
and ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule
between ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are
likewise guarded by !ice_is_vsi_dflt_vsi(). With DFLT in place the
vid=0 rule already covers every VID and a recipe swap would only
install a redundant rule.
Lab reproduction on an E810-C with the same firmware family (4.80,
NVM 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges
with vid 2-4094 and the slaves brought to IFF_PROMISC before the
bridge VLAN bulk add:
before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT
responses in dmesg, DFLT VSI Rx rule install fails on
the affected PF
after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT
VSI Rx rule installs on every PF
The 66.7% reduction in installed switch rules per PF matches the
expected per-VLAN saving: a single DFLT rule replaces the per-VID
PROMISC_VLAN expansion.
Functional regression test with vid 2-100 trunk between two ice
ports through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC
at 4093-VID customer scale):
vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP,
vid 100 multicast IPv6 ND
vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not
leaked: 0 packets reach any bridge endpoint
IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU
IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays)
Regression reproducer for commit 1273f89578f2 ("ice: Fix broken
IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off
-> allmulti on plus the orphan-rule Scenario 2; both converge
with no stale rules
100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each)
switchdev mode toggle preserves IFF_PROMISC pruning state across
the session (vid 999 multicast received before and after the
legacy -> switchdev -> legacy cycle)
SR-IOV: VFs unaffected because ice_set_promisc() early-returns
for non-PF VSI and VF representors do not register
ndo_vlan_rx_add_vid
Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
v2:
- No functional changes; collected the Reviewed-by.
v1: https://lore.kernel.org/all/89efbea9831175e6f57e9fe8557f7a0e48e050b7.1781786935.git.poros@redhat.com/
---
drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++-----
1 file changed, 70 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6d24056c247cf4..af8df81fc45623 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (vsi->type != ICE_VSI_PF)
return 0;
- if (ice_vsi_has_non_zero_vlans(vsi)) {
+ /* skip per-VID expansion; the DFLT Rx rule already covers every VID */
+ if (ice_vsi_has_non_zero_vlans(vsi) && !ice_is_vsi_dflt_vsi(vsi)) {
promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi,
promisc_m);
@@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return 0;
if (ice_vsi_has_non_zero_vlans(vsi)) {
- promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
+ int vid0_status;
+
+ /* set time used either recipe (per-VID PROMISC_VLAN, or vid=0
+ * PROMISC via the ice_set_promisc() else branch), so clear
+ * both; clearing an absent rule succeeds
+ */
status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi,
- promisc_m);
+ promisc_m | ICE_PROMISC_VLAN_RX |
+ ICE_PROMISC_VLAN_TX);
+ vid0_status = ice_fltr_clear_vsi_promisc(&vsi->back->hw,
+ vsi->idx, promisc_m, 0);
+ if (!status)
+ status = vid0_status;
} else {
status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
promisc_m, 0);
@@ -317,6 +328,49 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return status;
}
+/**
+ * ice_vsi_exit_dflt_promisc - drop the default VSI Rx rule on promisc off
+ * @vsi: the VSI leaving promiscuous mode
+ *
+ * For an IFF_ALLMULTI VSI with VLANs the per-VID multicast rules are
+ * reinstated before the default rule is cleared so coverage never lapses;
+ * the then redundant vid=0 rule is dropped best-effort. The callees log
+ * their own failures, so error returns are not re-logged here.
+ *
+ * Return: 0 on success, negative on error with the default rule left in place.
+ */
+static int ice_vsi_exit_dflt_promisc(struct ice_vsi *vsi)
+{
+ struct ice_vsi_vlan_ops *vlan_ops = ice_get_compat_vsi_vlan_ops(vsi);
+ struct net_device *netdev = vsi->netdev;
+ struct ice_hw *hw = &vsi->back->hw;
+ bool restore_mc;
+ int err;
+
+ restore_mc = (vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ ice_vsi_has_non_zero_vlans(vsi);
+
+ if (restore_mc) {
+ err = ice_fltr_set_vlan_vsi_promisc(hw, vsi,
+ ICE_MCAST_VLAN_PROMISC_BITS);
+ if (err && err != -EEXIST)
+ return err;
+ }
+
+ err = ice_clear_dflt_vsi(vsi);
+ if (err)
+ return err;
+
+ if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
+ vlan_ops->ena_rx_filtering(vsi);
+
+ if (restore_mc)
+ ice_fltr_clear_vsi_promisc(hw, vsi->idx, ICE_MCAST_PROMISC_BITS,
+ 0);
+
+ return 0;
+}
+
/**
* ice_vsi_sync_fltr - Update the VSI filter list to the HW
* @vsi: ptr to the VSI
@@ -442,17 +496,12 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
} else {
/* Clear Rx filter to remove traffic from wire */
if (ice_is_vsi_dflt_vsi(vsi)) {
- err = ice_clear_dflt_vsi(vsi);
+ err = ice_vsi_exit_dflt_promisc(vsi);
if (err) {
- netdev_err(netdev, "Error %d clearing default VSI %i Rx rule\n",
- err, vsi->vsi_num);
vsi->current_netdev_flags |=
IFF_PROMISC;
goto out_promisc;
}
- if (vsi->netdev->features &
- NETIF_F_HW_VLAN_CTAG_FILTER)
- vlan_ops->ena_rx_filtering(vsi);
}
/* disable allmulti here, but only if allmulti is not
@@ -3676,10 +3725,9 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
usleep_range(1000, 2000);
- /* Add multicast promisc rule for the VLAN ID to be added if
- * all-multicast is currently enabled.
- */
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ /* skip the per-VID rule when the DFLT Rx rule already covers this VID */
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
vid);
@@ -3697,11 +3745,12 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
if (ret)
goto finish;
- /* If all-multicast is currently enabled and this VLAN ID is only one
- * besides VLAN-0 we have to update look-up type of multicast promisc
- * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN.
+ /* On the first non-zero VLAN, promote the VLAN-0 multicast promisc
+ * rule from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. Skip when
+ * the DFLT Rx rule is installed; it already covers every VID.
*/
if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi) &&
ice_vsi_num_non_zero_vlans(vsi) == 1) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_PROMISC_BITS, 0);
@@ -3764,11 +3813,12 @@ int ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
ICE_MCAST_VLAN_PROMISC_BITS, vid);
if (!ice_vsi_has_non_zero_vlans(vsi)) {
- /* Update look-up type of multicast promisc rule for VLAN 0
- * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when
- * all-multicast is enabled and VLAN 0 is the only VLAN rule.
+ /* Last non-zero VLAN gone: demote the VLAN-0 multicast promisc
+ * rule back to ICE_SW_LKUP_PROMISC. Skip when the DFLT Rx rule
+ * is installed; no recipe swap is needed.
*/
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
0);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
2026-06-22 11:34 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Petr Oros
@ 2026-06-22 11:34 ` Petr Oros
2026-06-23 16:25 ` Simon Horman
1 sibling, 1 reply; 5+ messages in thread
From: Petr Oros @ 2026-06-22 11:34 UTC (permalink / raw)
To: netdev; +Cc: Petr Oros
ice_eswitch_setup_env() calls ice_set_dflt_vsi() to install the
ICE_SW_LKUP_DFLT Rx rule on the uplink VSI. The helper returns 0 even
when the rule is already in place, so the call is a no-op if
ice_vsi_sync_fltr() had previously installed the DFLT rule in response
to IFF_PROMISC on the uplink netdev. ice_remove_vsi_fltr() called
earlier in ice_eswitch_setup_env() does not affect this rule because
ice_remove_vsi_lkup_fltr() lacks a case for ICE_SW_LKUP_DFLT and falls
into its default branch which only logs. Switchdev mode then adds an
ICE_FLTR_TX leg via ice_cfg_dflt_vsi() on the same VSI handle.
ice_eswitch_release_env() unconditionally removed both the Rx and Tx
DFLT rules. When the Rx DFLT was installed by ice_vsi_sync_fltr()
before the switchdev session started, this clobbered promisc state the
operator had asked for: the DFLT Rx rule disappeared while IFF_PROMISC
was still set on the netdev, and the IFF_PROMISC sync path was not
retriggered, so the uplink ended the session without the catch-all
rule the netdev flags requested.
Skip the Rx DFLT removal when the uplink is promiscuous, both in
ice_eswitch_release_env() and in the err_def_tx unwind of
ice_eswitch_setup_env(). The Tx leg installed by switchdev is always
removed since switchdev owns it.
Test the live netdev->flags for this decision. The ena_rx_filtering()
call right above in ice_eswitch_release_env() reaches
ice_cfg_vlan_pruning(), which already keys on the live netdev->flags
IFF_PROMISC bit, so reusing the same value keeps the preserved DFLT
rule and the VLAN pruning state mutually consistent across every
promisc transition, including one the operator made while switchdev
ran: ice_set_rx_mode() is gated off for the uplink during the session,
so such a change never reaches the filter sync, but it is reflected in
netdev->flags and is therefore honored here on release.
Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
Signed-off-by: Petr Oros <poros@redhat.com>
---
v2:
- Reworked the fix to avoid the service task entirely. v1 scheduled a
filter sync in ice_eswitch_disable_switchdev() to reconcile the uplink
DFLT Rx rule; that work could run after ice_remove() freed the uplink
VSI (use-after-free) and was not guaranteed to fire if ice_set_rx_mode()
never ran again. v2 keeps or drops the DFLT Rx rule synchronously in
ice_eswitch_release_env() (and the setup_env error unwind) by testing
the live netdev->flags IFF_PROMISC, the same value ice_cfg_vlan_pruning()
already keys on, so the preserved rule and the pruning state stay
consistent. No service task is scheduled and no symbol is exported.
- Dropped the Reviewed-by since the fix mechanism changed.
v1: https://lore.kernel.org/all/deef5756e534ef06c12d910c5305d3fd205d30a0.1781786935.git.poros@redhat.com/
---
drivers/net/ethernet/intel/ice/ice_eswitch.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index 2e4f0969035f77..48273ef9f69dc8 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -66,8 +66,10 @@ static int ice_eswitch_setup_env(struct ice_pf *pf)
ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
ICE_FLTR_TX);
err_def_tx:
- ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
- ICE_FLTR_RX);
+ /* keep the Rx DFLT rule if the uplink is promiscuous (see release_env) */
+ if (!(uplink_vsi->netdev->flags & IFF_PROMISC))
+ ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx,
+ false, ICE_FLTR_RX);
err_def_rx:
ice_vsi_del_vlan_zero(uplink_vsi);
err_vlan_zero:
@@ -278,8 +280,16 @@ static void ice_eswitch_release_env(struct ice_pf *pf)
vlan_ops->ena_rx_filtering(uplink_vsi);
ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
ICE_FLTR_TX);
- ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx, false,
- ICE_FLTR_RX);
+
+ /* Keep the Rx DFLT rule if the uplink is promiscuous; it must outlive
+ * the session. Test the live netdev->flags, the same value
+ * ena_rx_filtering() -> ice_cfg_vlan_pruning() above keys its decision
+ * on, so the preserved DFLT rule and the pruning state stay consistent.
+ */
+ if (!(uplink_vsi->netdev->flags & IFF_PROMISC))
+ ice_cfg_dflt_vsi(uplink_vsi->port_info, uplink_vsi->idx,
+ false, ICE_FLTR_RX);
+
ice_fltr_add_mac_and_broadcast(uplink_vsi,
uplink_vsi->port_info->mac.perm_addr,
ICE_FWD_TO_VSI);
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread