All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Vladimir Oltean <olteanv@gmail.com>, Andrew Lunn <andrew@lunn.ch>
Cc: netdev@vger.kernel.org
Subject: Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
Date: Fri, 11 Apr 2025 22:51:19 +0100	[thread overview]
Message-ID: <Z_mO15XmCYj8BIB8@shell.armlinux.org.uk> (raw)
In-Reply-To: <20250411212325.knn3a3id3p7oidug@skbuf>

On Sat, Apr 12, 2025 at 12:23:25AM +0300, Vladimir Oltean wrote:
> On Fri, Apr 11, 2025 at 09:51:39PM +0100, Russell King (Oracle) wrote:
> > On Fri, Apr 11, 2025 at 09:48:22PM +0100, Russell King (Oracle) wrote:
> > > On Fri, Apr 11, 2025 at 09:49:02PM +0300, Vladimir Oltean wrote:
> > > > From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > Date: Fri, 11 Apr 2025 21:38:52 +0300
> > > > Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
> > > >  VLANs
> > > > 
> > > > Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
> > > > from a user port fails with -ENOENT:
> > > > https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/
> > > > 
> > > > This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
> > > > which tries to find an MST entry in &chip->msts associated with the SID,
> > > > but fails and returns -ENOENT as such.
> > > > 
> > > > But we know that this chip does not support MST at all, so that is not
> > > > surprising. The question is why does the guard in mv88e6xxx_mst_put()
> > > > not exit early:
> > > > 
> > > > 	if (!sid)
> > > > 		return 0;
> > > > 
> > > > And the answer seems to be simple: the sid comes from vlan.sid which
> > > > supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
> > > > But some chip->info->ops->vtu_loadpurge() implementations do not look at
> > > > vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().
> > > 
> > > This paragraph isn't accurate. It's actually:
> > > 
> > > mv88e6xxx_port_vlan_leave()
> > > {
> > > 	struct mv88e6xxx_vtu_entry vlan;
> > > 
> > > 	err = mv88e6xxx_vtu_get(chip, vid, &vlan);
> > > 
> > > and _this_ leaves vlan.sid uninitialised when mv88e6xxx_vtu_get()
> > > ends up calling mv88e6185_g1_vtu_getnext().
> 
> Correct, vtu_getnext() reads the SID and vtu_loadpurge() writes it.
> I got carried away when I found a plausible explanation for the issue,
> and I was in too much of a haste to post it (plus, I had no equipment to
> test).
> 
> > > I posioned to vlan (using 0xde) and then hexdump'd it after this call,
> > > and got:
> > > 
> > > [   50.748068] mv88e6085 mdio_mux-0.4:00: p9 dsa_port_do_vlan_del vid 1
> > > [   50.754802] e0b61b08: 01 00 02 00 de 01 de 03 03 03 03 03 03 03 03 03
> > > [   50.761343] e0b61b18: 00 de de 00 00 00 00 00 00 00 00 00 00 de de de
> > > [   50.767855] mv88e6085 mdio_mux-0.4:00: p9 vid 1 valid 0 (0-10)
> > > [   50.773943] mv88e6085 mdio_mux-0.4:00: p9 !user err=-2
> > > 
> > > Note byte 4, which is the sid, is the poison value.
> > > 
> > > So, should mv88e6xxx_vtu_get(), being the first caller of the iterator,
> > > clear vlan entirely before calling chip->info->ops->vtu_getnext()
> > > rather than just initialising a few fields? Or should
> > > mv88e6185_g1_vtu_getnext() ensure that entry->sid is set to zero?
> > 
> > Or maybe test mv88e6xxx_has_stu() before calling mv88e6xxx_mst_put() ?
> > 
> > If mv88e6xxx_has_stu() is not sufficient, then mv88e6xxx_vlan_msti_set()
> > is another site where mv88e6xxx_vtu_get() is used followed by use of
> > vlan.sid.
> 
> mv88e6xxx_has_stu() is sufficient, the question is whether it is necessary.
> 
> Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge
> VLAN mapped to the default MSTI. For some chips, SID 0 is valid and
> installed by mv88e6xxx_stu_setup(). A chip which does not support the
> STU would implicitly only support mapping all VLANs to the default MSTI,
> so although SID 0 is not valid, the behavior coincidentally is the same.
> I'm not a huge fan of coincidence, being explicit is more helpful to a
> human reader.
> 
> In my opinion, I would opt for both changes. To be symmetric with
> mv88e6xxx_mst_get() which has mv88e6xxx_has_stu() inside, I would also
> like mv88e6xxx_mst_put() to have mv88e6xxx_has_stu() inside. But that
> means the caller will have to dereference vlan.sid, which means it will
> access uninitialized memory, which is not nice even if it ignores it
> later. So I would also add the memset() in mv88e6xxx_vtu_get(), prior to
> the chip->info->ops->vtu_getnext() call.

That sounds sensible.

Andrew might be able to test next week - I believe he had a ZII dev
rev B platform, which is what I discovered this on.

Unfortunately, the ZII dev rev B platform doesn't lend itself to being
remotely controlled because of the need to disconnect a network cable
each time it boots to ensure DHCP and root-NFS uses the non-switch
port. Also, the barebox boot loader configuration can't be changed as
there's no facility to write updates to non-voltage storage.

To trigger the VLAN issue, I have:

/etc/network/interfaces.d/br0:
auto br0
iface br0 inet manual
        bridge-ports lan0 lan1 lan2 lan3 lan4 lan5 lan6 lan7 optical2
        bridge-maxwait 0

and I boot the board, and then ifdown br0. That's basically it...

For the other issue, I was doing:

# cd /sys/bus/mdio_bus/drivers/mv88e6085
# echo mdio_mux-0.4:00 > unbind

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

      reply	other threads:[~2025-04-11 21:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 17:24 [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4 Russell King (Oracle)
2025-04-11 18:49 ` Vladimir Oltean
2025-04-11 20:48   ` Russell King (Oracle)
2025-04-11 20:51     ` Russell King (Oracle)
2025-04-11 21:23       ` Vladimir Oltean
2025-04-11 21:51         ` Russell King (Oracle) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_mO15XmCYj8BIB8@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.