From: bryan <bpliscott@gmail.com>
To: Dragos Tatulea <dtatulea@nvidia.com>, netdev@vger.kernel.org
Cc: saeedm@nvidia.com, tariqt@nvidia.com
Subject: Re: [BUG] mlx5: VLAN-aware bridge drops all traffic in legacy eswitch mode without promiscuous
Date: Mon, 27 Apr 2026 16:10:28 -0500 [thread overview]
Message-ID: <5d5524e8077bc3f169ab5ce6ea267d344efd3336.camel@gmail.com> (raw)
In-Reply-To: <1126aa35-1924-492f-8d7f-072c0dec9bde@nvidia.com>
Here would be one example config (sanitized). The promisc on link is
what allows traffic to pass - I disable promisc, and traffic stops.
These are single-port CX4Lx cards. nic0 is the physical interface, no
VFs configured, SrIOV has been disabled as part of testing and
troubleshooting, Kernel 6.17 currently:
auto lo
iface lo inet loopback
auto nic0
iface nic0 inet manual
up ip link set nic0 promisc on
auto vmbr0
iface vmbr0 inet manual
bridge-ports nic0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-10 555
auto vmbr0.555
iface vmbr0.555 inet static
address 192.168.1.123/24
gateway 192.168.1.1
iface nic3 inet manual
iface nic1 inet manual
iface nic2 inet manual
This was ported over to use the new nic# bindings, before this it was
the standard enps0np0 naming. No difference in behvaiour.
>Is this even with one vlan? I ran a flow on a CX4LX pair with one vlan
>and vlan_filtering set and traffic seems to be flowing normally.
I have not checked with literally only one VLAN, as that is not at all
the use case. I can absolutely test that if it would help! Would you
like me to remove every VLAN but 555 from the interface, and leave the
rest of the config as-is?
>This last link seems the only one that provides some extra data. From
it I can see that the amount of VLAN ids > what the FW supports. This
could result in loss of traffic for the vlan ids > 512. Do you also see
in your dmesg these kinds of errors:
>mlx5_core 0000:19:00.1: mlx5e_vport_context_update_vlans:179:(pid
13470): netdev vlans list size (4080) > (512) max vport list size, some
vlans will be dropped
>This is not a bug, simply a limit being reached.
Considering I am only operating with 10 VLANs, as can be seen in my
config, I do not think that is my issue. I am aware that there is also
quite a bit of noise in these threads - they are just forum posts - But
that does not appear to me to be related to my issue, or the issue of
others here. I get no such messages or warnings. Additionally reports
about that bug were about some some VLANs being dropped. In my case
(and as others have reported) all VLANs are being dropped.
>eth2 is a PF in legacy switchdev mode.
It was my understanding that Legacy mode and Switchdev mode were two
independent modes, with Legacy done in-software and Switchdev using the
eSwitch on the NIC itself. Please excuse my ignorance if that is not
the case. Would you be able to specify if you used Switchdev mode or
Legacy mode? because Switchdev mode DOES function as a workaround and
passes traffic (but in my case results in system instability after a
time).
Thank You,
Bryan
On Mon, 2026-04-27 at 15:55 +0200, Dragos Tatulea wrote:
> Hi,
>
> On 24.04.26 13:07, bryan wrote:
> > Good day,
> >
> > I wanted to check whether there is an open bug report or known fix
> > in
> > progress for an issue that has been affecting mlx5 users
> > (specifically
> > ConnectX-4 Lx, but likely broader from what I have seen other
> > reporting) since at least 2021:
> >
> > When an mlx5 interface is added as a port to a VLAN-aware Linux
> > bridge
> > (bridge-vlan-aware yes / vlan_filtering 1) in legacy eswitch mode,
> > all
> > traffic stops passing through the bridge. Both tagged and untagged
> > traffic is affected. The same configuration works correctly with
> > non-
> > mlx5 NICs (tested Intel, Chelsio cards).
> >
> Is this even with one vlan? I ran a flow on a CX4LX pair with one
> vlan
> and vlan_filtering set and traffic seems to be flowing normally.
> Something like:
>
> # IFACE=eth2
> # VID=100
> # ip link add br0 type bridge vlan_filtering 1
> # ip link set "$IFACE" master br0
> # bridge vlan add vid "$VID" dev "$IFACE"
> # bridge vlan add vid "$VID" dev br0 self
> # ip link add link br0 name "br0.$VID" type vlan id "$VID"
> # ip addr add 10.0.0.1/24 dev br0
> # ip addr add "10.0.$VID.1/24" dev "br0.$VID"
> # ip link set "$IFACE" up
> # ip link set br0 up
> # ip link set "br0.$VID" up
>
> From the other side where I have a similar setup I can ping
> br0.100.
>
> Tested on a CX4LX with FW version 28.48.1000 and kernel 6.18.
> eth2 is a PF in legacy switchdev mode.
>
> > [...]
> > This is well documented in community forums but does not appear to
> > have
> > been formally reported to netdev that I have been able to find. My
> > apologies in advance if this has been reported and I wasn't able to
> > locate it. Here are a couple of forum examples where this is
> > discussed
> > among other affected users:
> >
> > - NVIDIA Developer Forum (opened 2021, unresolved):
> >
> > https://forums.developer.nvidia.com/t/vlan-aware-linux-bridging-is-not-functional-on-connectx4lx-card-unless-manually-put-in-promiscuous-mode/206083
> >
> > - Proxmox Forum thread (2023, ongoing):
> >
> > https://forum.proxmox.com/threads/mellanox-connectx-4-lx-and-brigde-vlan-aware-on-proxmox-8-0-1.130902/
> >
> > - Community writeup with analysis:
> > https://www.apalrd.net/posts/2023/tip_mellanox/
> >
> This last link seems the only one that provides some extra data. From
> it
> I can see that the amount of VLAN ids > what the FW supports. This
> could
> result in loss of traffic for the vlan ids > 512. Do you also see in
> your dmesg these kinds of errors:
>
> mlx5_core 0000:19:00.1: mlx5e_vport_context_update_vlans:179:(pid
> 13470): netdev vlans list size (4080) > (512) max vport list size,
> some vlans will be dropped
>
> This is not a bug, simply a limit being reached.
>
> > Has anyone bisected this or is there a fix already in progress that
> > I
> > did not find? This affects a fairly common hypervisor configuration
> > (VLAN-aware bridge for VM networking) and the workarounds are not
> > conducive to production use.
> >
> Could you provide a short repro script for this. Not being able to
> reproduce the issue makes it hard to check :).
>
> Thanks,
> Dragos
prev parent reply other threads:[~2026-04-27 21:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 11:07 [BUG] mlx5: VLAN-aware bridge drops all traffic in legacy eswitch mode without promiscuous bryan
2026-04-27 13:55 ` Dragos Tatulea
2026-04-27 21:10 ` bryan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d5524e8077bc3f169ab5ce6ea267d344efd3336.camel@gmail.com \
--to=bpliscott@gmail.com \
--cc=dtatulea@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox