From: Petr Oros <poros@redhat.com>
To: netdev@vger.kernel.org
Cc: Petr Oros <poros@redhat.com>,
Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Subject: [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set
Date: Mon, 22 Jun 2026 13:34:27 +0200 [thread overview]
Message-ID: <20260622113428.2565255-2-poros@redhat.com> (raw)
In-Reply-To: <20260622113428.2565255-1-poros@redhat.com>
When an ice port is part of a vlan-filtering bridge with a wide VLAN
trunk and the netdev is in IFF_PROMISC (typical for bond slaves
attached to a bridge), the driver installs per-VLAN
ICE_SW_LKUP_PROMISC_VLAN entries (recipe 9) in addition to the broad
ICE_SW_LKUP_DFLT VSI Rx rule (recipe 5). Each per-VLAN rule consumes
one Flow Lookup Unit (FLU) entry from a fixed hardware pool of "up to
32K FLU entries" per device, documented in the E810 datasheet
(613875-009 section 7.8.10, Table 7-18, page 1015).
With three active PFs sharing one switch context and a bridge trunk of
vid 2-4094, the configuration would require roughly
3 PFs * 4093 VLANs * 3 rules per VLAN per PF ~= 36,800 rules
which exceeds the 32K FLU budget. Firmware then responds to further
Add Switch Rules requests with AQ retval 0x10 (LIBIE_AQ_RC_ENOSPC) and
the user-visible failure surfaces as
ice 0000:5c:00.1: Failed to set VSI 14 as the default forwarding
VSI, error -5
ice 0000:5c:00.1 ens1f1: Error -5 setting default VSI 14 Rx rule
After a switch context has been driven into overrun, subsequent
retries can come back as AQ retval 0x2 (LIBIE_AQ_RC_ENOENT), which has
misled triage attempts toward a perceived recipe binding defect
rather than a capacity issue.
When the DFLT VSI Rx rule is in place it catches every packet on the
lport regardless of VLAN tag, so the per-VLAN PROMISC_VLAN expansion
is redundant. The recipe 4 VLAN prune entries are still installed
per VLAN and continue to track the allowed VID set, but the
IFF_PROMISC sync path disables their enforcement on the VSI via
vlan_ops->dis_rx_filtering() before ice_set_promisc() runs.
ena_rx_filtering() is restored when IFF_PROMISC is cleared.
Skip the per-VLAN expansion at the two call sites that drive it:
ice_set_promisc() falls through to ice_fltr_set_vsi_promisc() and
ice_vlan_rx_add_vid() omits the per-VLAN ICE_MCAST_VLAN_PROMISC_BITS
add. Plain IFF_ALLMULTI without an installed DFLT VSI rule is
unchanged and still installs per-VLAN multicast promisc rules.
Both checks use ice_is_vsi_dflt_vsi() which inspects the recipe
filter list for an installed DFLT rule on this VSI, not
netdev->flags & IFF_PROMISC. The HW-state predicate avoids two
regression vectors that a user-intent predicate would introduce:
1. ice_lag_is_switchdev_running() short-circuits ice_set_dflt_vsi()
to return 0 without installing the DFLT rule for a PF in
switchdev LAG mode. An IFF_PROMISC-only check would also
suppress the per-VLAN fallback, leaving the PF with no rule.
2. When ice_set_dflt_vsi() returns a non-EEXIST error (FLU
exhausted, switch context divergence), the driver clears
IFF_PROMISC from vsi->current_netdev_flags but the netdev's own
flags retain IFF_PROMISC. The user-intent predicate would still
suppress the per-VLAN fallback even though DFLT failed to
install.
The predicate is install-time only. The IFF_PROMISC off path closes
the lifecycle gap in ice_vsi_exit_dflt_promisc(): for an IFF_ALLMULTI
VSI with VLANs it reinstates the per-VID rules before clearing the
default rule, so multicast coverage never lapses. If that AQ call
fails the default rule is left in place, ice_vsi_exit_dflt_promisc()
returns the error, and the sync_fltr pass bails with
vsi->current_netdev_flags |= IFF_PROMISC; the current/netdev flag
mismatch re-fires the IFF_PROMISC off path on the next sync. Clearing
the default rule first would instead expose a window where neither
the default rule nor the per-VID rules carry multicast.
If ice_clear_dflt_vsi() fails after the per-VID rules were reinstated
they are deliberately not rolled back. Clearing the default rule is a
removal that frees an FLU entry rather than allocating one, so it
cannot fail for lack of space; a failure is a transient AdminQ error.
The per-VID rules are the steady state for an IFF_ALLMULTI VLAN VSI,
so the only redundant entry left behind is the single un-removed
default rule, not the per-VID set. The retry re-enters this path,
ice_fltr_set_vlan_vsi_promisc() returns -EEXIST for the rules that
already exist so nothing is reallocated, and the default rule is
removed on the next attempt. Rolling the per-VID rules back here would
instead churn thousands of removes and re-adds on every retry.
After the default rule is gone the vid=0 PROMISC rule that paired
with it is redundant and is dropped, but only to reclaim a filter
entry, so a failure there is logged and does not abort the
transition.
ice_set_vsi_promisc() and ice_clear_vsi_promisc() dispatch the
recipe based on whether ICE_PROMISC_VLAN_RX/TX bits are present in
the mask: with the bits set, recipe ICE_SW_LKUP_PROMISC_VLAN is
used; otherwise ICE_SW_LKUP_PROMISC. The else branch in
ice_set_promisc() installs the vid=0 rule in ICE_SW_LKUP_PROMISC.
Because ice_clear_promisc() with VLANs present adds the VLAN bits
and would search ICE_SW_LKUP_PROMISC_VLAN, the recipe mismatch
would leave the vid=0 ICE_SW_LKUP_PROMISC rule orphaned when VLANs
are configured. This is a single stale rule, not a per-cycle leak:
re-adding it on the next promisc on returns -EEXIST rather than
allocating a new entry. The set-time recipe is not recorded, so
ice_clear_promisc() clears both recipes; clearing a rule that is not
present succeeds, both clears run unconditionally, and the first
error is returned.
The two VLAN-0 recipe transition blocks in ice_vlan_rx_add_vid()
and ice_vlan_rx_kill_vid() that promote / demote the vid=0 rule
between ICE_SW_LKUP_PROMISC and ICE_SW_LKUP_PROMISC_VLAN are
likewise guarded by !ice_is_vsi_dflt_vsi(). With DFLT in place the
vid=0 rule already covers every VID and a recipe swap would only
install a redundant rule.
Lab reproduction on an E810-C with the same firmware family (4.80,
NVM 1.3805.0, DDP 1.3.43.0) using four PFs in vlan-filtering bridges
with vid 2-4094 and the slaves brought to IFF_PROMISC before the
bridge VLAN bulk add:
before fix: ~12,279 AQ Add Switch Rules per PF, ENOSPC and ENOENT
responses in dmesg, DFLT VSI Rx rule install fails on
the affected PF
after fix: ~4,093 AQ Add Switch Rules per PF, no AQ errors, DFLT
VSI Rx rule installs on every PF
The 66.7% reduction in installed switch rules per PF matches the
expected per-VLAN saving: a single DFLT rule replaces the per-VID
PROMISC_VLAN expansion.
Functional regression test with vid 2-100 trunk between two ice
ports through the lab switch (40/40 PASS, 0 AQ errors, 0 ENOSPC
at 4093-VID customer scale):
vid 50 unicast, vid 100 unicast, vid 50 broadcast ARP,
vid 100 multicast IPv6 ND
vid 200/500/1500/4000 isolation (out-of-trunk) and untagged not
leaked: 0 packets reach any bridge endpoint
IGMP/MLD snooping, Jumbo MTU 9000, reserved-multicast STP BPDU
IFF_PROMISC + IFF_ALLMULTI transition (off while allmulti stays)
Regression reproducer for commit 1273f89578f2 ("ice: Fix broken
IFF_ALLMULTI handling"): allmulti on -> add vid -> allmulti off
-> allmulti on plus the orphan-rule Scenario 2; both converge
with no stale rules
100-VID, 1000-VID, 4093-VID stress cycles (5/3/2 iterations each)
switchdev mode toggle preserves IFF_PROMISC pruning state across
the session (vid 999 multicast received before and after the
legacy -> switchdev -> legacy cycle)
SR-IOV: VFs unaffected because ice_set_promisc() early-returns
for non-PF VSI and VF representors do not register
ndo_vlan_rx_add_vid
Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
v2:
- No functional changes; collected the Reviewed-by.
v1: https://lore.kernel.org/all/89efbea9831175e6f57e9fe8557f7a0e48e050b7.1781786935.git.poros@redhat.com/
---
drivers/net/ethernet/intel/ice/ice_main.c | 90 ++++++++++++++++++-----
1 file changed, 70 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6d24056c247cf4..af8df81fc45623 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -274,7 +274,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (vsi->type != ICE_VSI_PF)
return 0;
- if (ice_vsi_has_non_zero_vlans(vsi)) {
+ /* skip per-VID expansion; the DFLT Rx rule already covers every VID */
+ if (ice_vsi_has_non_zero_vlans(vsi) && !ice_is_vsi_dflt_vsi(vsi)) {
promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
status = ice_fltr_set_vlan_vsi_promisc(&vsi->back->hw, vsi,
promisc_m);
@@ -304,9 +305,19 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return 0;
if (ice_vsi_has_non_zero_vlans(vsi)) {
- promisc_m |= (ICE_PROMISC_VLAN_RX | ICE_PROMISC_VLAN_TX);
+ int vid0_status;
+
+ /* set time used either recipe (per-VID PROMISC_VLAN, or vid=0
+ * PROMISC via the ice_set_promisc() else branch), so clear
+ * both; clearing an absent rule succeeds
+ */
status = ice_fltr_clear_vlan_vsi_promisc(&vsi->back->hw, vsi,
- promisc_m);
+ promisc_m | ICE_PROMISC_VLAN_RX |
+ ICE_PROMISC_VLAN_TX);
+ vid0_status = ice_fltr_clear_vsi_promisc(&vsi->back->hw,
+ vsi->idx, promisc_m, 0);
+ if (!status)
+ status = vid0_status;
} else {
status = ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
promisc_m, 0);
@@ -317,6 +328,49 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
return status;
}
+/**
+ * ice_vsi_exit_dflt_promisc - drop the default VSI Rx rule on promisc off
+ * @vsi: the VSI leaving promiscuous mode
+ *
+ * For an IFF_ALLMULTI VSI with VLANs the per-VID multicast rules are
+ * reinstated before the default rule is cleared so coverage never lapses;
+ * the then redundant vid=0 rule is dropped best-effort. The callees log
+ * their own failures, so error returns are not re-logged here.
+ *
+ * Return: 0 on success, negative on error with the default rule left in place.
+ */
+static int ice_vsi_exit_dflt_promisc(struct ice_vsi *vsi)
+{
+ struct ice_vsi_vlan_ops *vlan_ops = ice_get_compat_vsi_vlan_ops(vsi);
+ struct net_device *netdev = vsi->netdev;
+ struct ice_hw *hw = &vsi->back->hw;
+ bool restore_mc;
+ int err;
+
+ restore_mc = (vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ ice_vsi_has_non_zero_vlans(vsi);
+
+ if (restore_mc) {
+ err = ice_fltr_set_vlan_vsi_promisc(hw, vsi,
+ ICE_MCAST_VLAN_PROMISC_BITS);
+ if (err && err != -EEXIST)
+ return err;
+ }
+
+ err = ice_clear_dflt_vsi(vsi);
+ if (err)
+ return err;
+
+ if (netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
+ vlan_ops->ena_rx_filtering(vsi);
+
+ if (restore_mc)
+ ice_fltr_clear_vsi_promisc(hw, vsi->idx, ICE_MCAST_PROMISC_BITS,
+ 0);
+
+ return 0;
+}
+
/**
* ice_vsi_sync_fltr - Update the VSI filter list to the HW
* @vsi: ptr to the VSI
@@ -442,17 +496,12 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
} else {
/* Clear Rx filter to remove traffic from wire */
if (ice_is_vsi_dflt_vsi(vsi)) {
- err = ice_clear_dflt_vsi(vsi);
+ err = ice_vsi_exit_dflt_promisc(vsi);
if (err) {
- netdev_err(netdev, "Error %d clearing default VSI %i Rx rule\n",
- err, vsi->vsi_num);
vsi->current_netdev_flags |=
IFF_PROMISC;
goto out_promisc;
}
- if (vsi->netdev->features &
- NETIF_F_HW_VLAN_CTAG_FILTER)
- vlan_ops->ena_rx_filtering(vsi);
}
/* disable allmulti here, but only if allmulti is not
@@ -3676,10 +3725,9 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
usleep_range(1000, 2000);
- /* Add multicast promisc rule for the VLAN ID to be added if
- * all-multicast is currently enabled.
- */
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ /* skip the per-VID rule when the DFLT Rx rule already covers this VID */
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ret = ice_fltr_set_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
vid);
@@ -3697,11 +3745,12 @@ int ice_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
if (ret)
goto finish;
- /* If all-multicast is currently enabled and this VLAN ID is only one
- * besides VLAN-0 we have to update look-up type of multicast promisc
- * rule for VLAN-0 from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN.
+ /* On the first non-zero VLAN, promote the VLAN-0 multicast promisc
+ * rule from ICE_SW_LKUP_PROMISC to ICE_SW_LKUP_PROMISC_VLAN. Skip when
+ * the DFLT Rx rule is installed; it already covers every VID.
*/
if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi) &&
ice_vsi_num_non_zero_vlans(vsi) == 1) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_PROMISC_BITS, 0);
@@ -3764,11 +3813,12 @@ int ice_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
ICE_MCAST_VLAN_PROMISC_BITS, vid);
if (!ice_vsi_has_non_zero_vlans(vsi)) {
- /* Update look-up type of multicast promisc rule for VLAN 0
- * from ICE_SW_LKUP_PROMISC_VLAN to ICE_SW_LKUP_PROMISC when
- * all-multicast is enabled and VLAN 0 is the only VLAN rule.
+ /* Last non-zero VLAN gone: demote the VLAN-0 multicast promisc
+ * rule back to ICE_SW_LKUP_PROMISC. Skip when the DFLT Rx rule
+ * is installed; no recipe swap is needed.
*/
- if (vsi->current_netdev_flags & IFF_ALLMULTI) {
+ if ((vsi->current_netdev_flags & IFF_ALLMULTI) &&
+ !ice_is_vsi_dflt_vsi(vsi)) {
ice_fltr_clear_vsi_promisc(&vsi->back->hw, vsi->idx,
ICE_MCAST_VLAN_PROMISC_BITS,
0);
--
2.53.0
next prev parent reply other threads:[~2026-06-22 11:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-22 11:34 [PATCH iwl-net v2 0/2] ice: fix DFLT Rx rule handling for promisc and switchdev Petr Oros
2026-06-22 11:34 ` Petr Oros [this message]
2026-06-23 16:25 ` [PATCH iwl-net v2 1/2] ice: skip per-VLAN promisc rules when default VSI Rx rule is set Simon Horman
2026-06-22 11:34 ` [PATCH iwl-net v2 2/2] ice: preserve uplink DFLT Rx rule on switchdev release Petr Oros
2026-06-23 16:25 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260622113428.2565255-2-poros@redhat.com \
--to=poros@redhat.com \
--cc=aleksandr.loktionov@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox