public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] mlx5: VLAN-aware bridge drops all traffic in legacy eswitch mode without promiscuous
@ 2026-04-24 11:07 bryan
  2026-04-27 13:55 ` Dragos Tatulea
  0 siblings, 1 reply; 4+ messages in thread
From: bryan @ 2026-04-24 11:07 UTC (permalink / raw)
  To: netdev; +Cc: saeedm, tariqt

Good day,

I wanted to check whether there is an open bug report or known fix in
progress for an issue that has been affecting mlx5 users (specifically
ConnectX-4 Lx, but likely broader from what I have seen other
reporting) since at least 2021:

When an mlx5 interface is added as a port to a VLAN-aware Linux bridge
(bridge-vlan-aware yes / vlan_filtering 1) in legacy eswitch mode, all
traffic stops passing through the bridge. Both tagged and untagged
traffic is affected. The same configuration works correctly with non-
mlx5 NICs (tested Intel, Chelsio cards).

The only known workarounds are:
1. Enable promiscuous mode on the interface (ip link set dev <iface>
promisc on), which bypasses hardware VLAN filtering but has security
and performance implications. (this is what I am doing on my systems at
the moment)
2. Switch the eswitch to switchdev mode, which was fixed for a kernel
panic in February 2023 (net/mlx5e: Fix crash unsetting rx-vlan-filter
in switchdev mode) but introduces other issues including MDB errors and
is not suitable for all configurations. 

Based on reports I have seen from other in forums, this appears to have
been introduced somewhere around kernel 6.1-6.5, possibly related to a
commit that changed promiscuous mode efficiency in mlx5_core. I was not
using this hardware at the time, and cannot confirm firsthand. The
NVIDIA out-of-tree MLNX_EN driver does not exhibit this behavior in
legacy eswitch mode, which strongly suggests this is a regression in
the upstream mlx5 driver rather than a firmware or hardware issue. I do
not have first-hand experience with the mlx5 driver ever working
correctly - the idea that it did historically work correctly is based
purely on the reports of others (and the existence of old setup guides
that do not mention needing to try either of these workarounds.)

If it helps at all, I have tried various firmware versions on ConnectX-
4 Lx cards ranging from from an old release from 2017 all the way up to
the latest 14_32_1912. There has been no difference in behaviour with
regard to this issue. 

This is well documented in community forums but does not appear to have
been formally reported to netdev that I have been able to find. My
apologies in advance if this has been reported and I wasn't able to
locate it. Here are a couple of forum examples where this is discussed
among other affected users:

- NVIDIA Developer Forum (opened 2021, unresolved):
 
https://forums.developer.nvidia.com/t/vlan-aware-linux-bridging-is-not-functional-on-connectx4lx-card-unless-manually-put-in-promiscuous-mode/206083

- Proxmox Forum thread (2023, ongoing):
 
https://forum.proxmox.com/threads/mellanox-connectx-4-lx-and-brigde-vlan-aware-on-proxmox-8-0-1.130902/

- Community writeup with analysis:
  https://www.apalrd.net/posts/2023/tip_mellanox/

Has anyone bisected this or is there a fix already in progress that I
did not find? This affects a fairly common hypervisor configuration
(VLAN-aware bridge for VM networking) and the workarounds are not
conducive to production use.



Thank you for your time,

Bryan Pliscott

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-28 11:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 11:07 [BUG] mlx5: VLAN-aware bridge drops all traffic in legacy eswitch mode without promiscuous bryan
2026-04-27 13:55 ` Dragos Tatulea
2026-04-27 21:10   ` bryan
2026-04-28 11:32     ` Dragos Tatulea

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox