netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
@ 2025-04-11 17:24 Russell King (Oracle)
  2025-04-11 18:49 ` Vladimir Oltean
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King (Oracle) @ 2025-04-11 17:24 UTC (permalink / raw)
  To: netdev

Hi,

When executing:

# ifdown br0

on the ZII dev rev B platform with br0 being a bridge between mv88e6xxx
DSA ports, the following was spewed:

[  628.418720] br0: port 9(optical2) failed to delete vlan 1: -ENOENT
[  628.425297] ------------[ cut here ]------------
[  628.430124] WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
[  628.438446] Modules linked in: caam_jr ofpart caamhash_desc reset_gpio caamalg_desc tag_dsa crypto_engine cmdlinepart authenc libdes i2c_mux_pca954x mv88e6xxx at24 lm75 spi_nor mtd dsa_core eeprom_93xx46 caam vf610_adc error industrialio_triggered_buffer fsl_edma kfifo_buf virt_dma spi_gpio spi_bitbang sfp iio_hwmon sff mdio_mux_gpio industrialio mdio_i2c mdio_mux rpcsec_gss_krb5 auth_rpcgss
[  628.473585] CPU: 0 UID: 0 PID: 478 Comm: brctl Not tainted 6.14.0+ #965
[  628.473621] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[  628.473634] Call trace: 
[  628.473655] [<c0009c44>] (unwind_backtrace) from [<c0022b78>] (show_stack+0x10/0x14)
[  628.473740] [<c0022b78>] (show_stack) from [<c0019b5c>] (dump_stack_lvl+0x50/0x64)
[  628.473814] [<c0019b5c>] (dump_stack_lvl) from [<c0043cd4>] (__warn+0x80/0x128)
[  628.473879] [<c0043cd4>] (__warn) from [<c0043ee4>] (warn_slowpath_fmt+0x168/0x16c)
[  628.473927] [<c0043ee4>] (warn_slowpath_fmt) from [<c09b8a8c>] (nbp_vlan_flush+0xc0/0xc4)
[  628.473982] [<c09b8a8c>] (nbp_vlan_flush) from [<c099d21c>] (del_nbp+0xc4/0x2c0)
[  628.474050] [<c099d21c>] (del_nbp) from [<c099de30>] (br_del_if+0x30/0x94)
[  628.474100] [<c099de30>] (br_del_if) from [<c099f438>] (br_ioctl_stub+0xe4/0x38c)
[  628.474155] [<c099f438>] (br_ioctl_stub) from [<c07d3084>] (br_ioctl_call+0x5c/0x94)
[  628.474224] [<c07d3084>] (br_ioctl_call) from [<c0835284>] (dev_ifsioc+0x360/0x61c)
[  628.474303] [<c0835284>] (dev_ifsioc) from [<c0835868>] (dev_ioctl+0x328/0x634)
[  628.474357] [<c0835868>] (dev_ioctl) from [<c07d3564>] (sock_ioctl+0x4a8/0x4ec)
[  628.474410] [<c07d3564>] (sock_ioctl) from [<c0276a80>] (sys_ioctl+0x49c/0xc28)
[  628.474483] [<c0276a80>] (sys_ioctl) from [<c0008320>] (ret_fast_syscall+0x0/0x54)
[  628.474530] Exception stack(0xe0961fa8 to 0xe0961ff0)
[  628.474557] 1fa0:                   007350f0 b6cb0968 00000003 000089a3 becd8c2c 000000d0
[  628.474582] 1fc0: 007350f0 b6cb0968 00000007 00000036 becd8ed4 00000000 00735000 00000000
[  628.474601] 1fe0: b6c22f01 becd8c14 00723085 b6c22f08
[  628.621830] ---[ end trace 0000000000000000 ]---

Unfortunately, because I'm concentrating on PTP stuff right now, I don't
have time to investigate this beyond reporting this, and it's highly
probable that after Sunday, it's going to be a few months before I can
do any further testing.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
  2025-04-11 17:24 [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4 Russell King (Oracle)
@ 2025-04-11 18:49 ` Vladimir Oltean
  2025-04-11 20:48   ` Russell King (Oracle)
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Oltean @ 2025-04-11 18:49 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

On Fri, Apr 11, 2025 at 06:24:44PM +0100, Russell King (Oracle) wrote:
> Hi,
> 
> When executing:
> 
> # ifdown br0
> 
> on the ZII dev rev B platform with br0 being a bridge between mv88e6xxx
> DSA ports, the following was spewed:
> 
> [  628.418720] br0: port 9(optical2) failed to delete vlan 1: -ENOENT

(trimming the rest, which is just the bridge complaining that &vg->vlan_list
is not empty, but __vlan_del() aborted the VLAN deletion due to
br_switchdev_port_vlan_del() returning an error, and left the VLAN in
the VLAN group's list. So that part is expected and is just a symptom,
we should focus on why DSA returns -ENOENT when requesting to delete
VLAN 1 from the CPU port).

Please test the attached patch. This is more speculative, but I've run
all the options in my mind and this is the only thing that makes sense.

[-- Attachment #2: 0001-net-dsa-mv88e6xxx-fix-ENOENT-while-deleting-user-por.patch --]
[-- Type: text/x-diff, Size: 2058 bytes --]

From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Fri, 11 Apr 2025 21:38:52 +0300
Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
 VLANs

Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
from a user port fails with -ENOENT:
https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/

This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
which tries to find an MST entry in &chip->msts associated with the SID,
but fails and returns -ENOENT as such.

But we know that this chip does not support MST at all, so that is not
surprising. The question is why does the guard in mv88e6xxx_mst_put()
not exit early:

	if (!sid)
		return 0;

And the answer seems to be simple: the sid comes from vlan.sid which
supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
But some chip->info->ops->vtu_loadpurge() implementations do not look at
vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().

It was probably intended for the on-stack struct mv88e6xxx_vtu_entry
vlan entry to be zero-initialized, because currently it looks like we're
looking at a garbage sid which is just residual stack memory. So
zero-initialize this to avoid MST operations on switches which don't
support MST.

Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 29a89ab4b789..c94c228434fc 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2706,7 +2706,7 @@ static int mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
 static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
 				     int port, u16 vid)
 {
-	struct mv88e6xxx_vtu_entry vlan;
+	struct mv88e6xxx_vtu_entry vlan = {};
 	int i, err;
 
 	if (!vid)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
  2025-04-11 18:49 ` Vladimir Oltean
@ 2025-04-11 20:48   ` Russell King (Oracle)
  2025-04-11 20:51     ` Russell King (Oracle)
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King (Oracle) @ 2025-04-11 20:48 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev

On Fri, Apr 11, 2025 at 09:49:02PM +0300, Vladimir Oltean wrote:
> From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Date: Fri, 11 Apr 2025 21:38:52 +0300
> Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
>  VLANs
> 
> Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
> from a user port fails with -ENOENT:
> https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/
> 
> This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
> which tries to find an MST entry in &chip->msts associated with the SID,
> but fails and returns -ENOENT as such.
> 
> But we know that this chip does not support MST at all, so that is not
> surprising. The question is why does the guard in mv88e6xxx_mst_put()
> not exit early:
> 
> 	if (!sid)
> 		return 0;
> 
> And the answer seems to be simple: the sid comes from vlan.sid which
> supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
> But some chip->info->ops->vtu_loadpurge() implementations do not look at
> vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().

This paragraph isn't accurate. It's actually:

mv88e6xxx_port_vlan_leave()
{
	struct mv88e6xxx_vtu_entry vlan;

	err = mv88e6xxx_vtu_get(chip, vid, &vlan);

and _this_ leaves vlan.sid uninitialised when mv88e6xxx_vtu_get()
ends up calling mv88e6185_g1_vtu_getnext().

I posioned to vlan (using 0xde) and then hexdump'd it after this call,
and got:

[   50.748068] mv88e6085 mdio_mux-0.4:00: p9 dsa_port_do_vlan_del vid 1
[   50.754802] e0b61b08: 01 00 02 00 de 01 de 03 03 03 03 03 03 03 03 03
[   50.761343] e0b61b18: 00 de de 00 00 00 00 00 00 00 00 00 00 de de de
[   50.767855] mv88e6085 mdio_mux-0.4:00: p9 vid 1 valid 0 (0-10)
[   50.773943] mv88e6085 mdio_mux-0.4:00: p9 !user err=-2

Note byte 4, which is the sid, is the poison value.

So, should mv88e6xxx_vtu_get(), being the first caller of the iterator,
clear vlan entirely before calling chip->info->ops->vtu_getnext()
rather than just initialising a few fields? Or should
mv88e6185_g1_vtu_getnext() ensure that entry->sid is set to zero?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
  2025-04-11 20:48   ` Russell King (Oracle)
@ 2025-04-11 20:51     ` Russell King (Oracle)
  2025-04-11 21:23       ` Vladimir Oltean
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King (Oracle) @ 2025-04-11 20:51 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev

On Fri, Apr 11, 2025 at 09:48:22PM +0100, Russell King (Oracle) wrote:
> On Fri, Apr 11, 2025 at 09:49:02PM +0300, Vladimir Oltean wrote:
> > From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > Date: Fri, 11 Apr 2025 21:38:52 +0300
> > Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
> >  VLANs
> > 
> > Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
> > from a user port fails with -ENOENT:
> > https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/
> > 
> > This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
> > which tries to find an MST entry in &chip->msts associated with the SID,
> > but fails and returns -ENOENT as such.
> > 
> > But we know that this chip does not support MST at all, so that is not
> > surprising. The question is why does the guard in mv88e6xxx_mst_put()
> > not exit early:
> > 
> > 	if (!sid)
> > 		return 0;
> > 
> > And the answer seems to be simple: the sid comes from vlan.sid which
> > supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
> > But some chip->info->ops->vtu_loadpurge() implementations do not look at
> > vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().
> 
> This paragraph isn't accurate. It's actually:
> 
> mv88e6xxx_port_vlan_leave()
> {
> 	struct mv88e6xxx_vtu_entry vlan;
> 
> 	err = mv88e6xxx_vtu_get(chip, vid, &vlan);
> 
> and _this_ leaves vlan.sid uninitialised when mv88e6xxx_vtu_get()
> ends up calling mv88e6185_g1_vtu_getnext().
> 
> I posioned to vlan (using 0xde) and then hexdump'd it after this call,
> and got:
> 
> [   50.748068] mv88e6085 mdio_mux-0.4:00: p9 dsa_port_do_vlan_del vid 1
> [   50.754802] e0b61b08: 01 00 02 00 de 01 de 03 03 03 03 03 03 03 03 03
> [   50.761343] e0b61b18: 00 de de 00 00 00 00 00 00 00 00 00 00 de de de
> [   50.767855] mv88e6085 mdio_mux-0.4:00: p9 vid 1 valid 0 (0-10)
> [   50.773943] mv88e6085 mdio_mux-0.4:00: p9 !user err=-2
> 
> Note byte 4, which is the sid, is the poison value.
> 
> So, should mv88e6xxx_vtu_get(), being the first caller of the iterator,
> clear vlan entirely before calling chip->info->ops->vtu_getnext()
> rather than just initialising a few fields? Or should
> mv88e6185_g1_vtu_getnext() ensure that entry->sid is set to zero?

Or maybe test mv88e6xxx_has_stu() before calling mv88e6xxx_mst_put() ?

If mv88e6xxx_has_stu() is not sufficient, then mv88e6xxx_vlan_msti_set()
is another site where mv88e6xxx_vtu_get() is used followed by use of
vlan.sid.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
  2025-04-11 20:51     ` Russell King (Oracle)
@ 2025-04-11 21:23       ` Vladimir Oltean
  2025-04-11 21:51         ` Russell King (Oracle)
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Oltean @ 2025-04-11 21:23 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: netdev

On Fri, Apr 11, 2025 at 09:51:39PM +0100, Russell King (Oracle) wrote:
> On Fri, Apr 11, 2025 at 09:48:22PM +0100, Russell King (Oracle) wrote:
> > On Fri, Apr 11, 2025 at 09:49:02PM +0300, Vladimir Oltean wrote:
> > > From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
> > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > Date: Fri, 11 Apr 2025 21:38:52 +0300
> > > Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
> > >  VLANs
> > > 
> > > Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
> > > from a user port fails with -ENOENT:
> > > https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/
> > > 
> > > This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
> > > which tries to find an MST entry in &chip->msts associated with the SID,
> > > but fails and returns -ENOENT as such.
> > > 
> > > But we know that this chip does not support MST at all, so that is not
> > > surprising. The question is why does the guard in mv88e6xxx_mst_put()
> > > not exit early:
> > > 
> > > 	if (!sid)
> > > 		return 0;
> > > 
> > > And the answer seems to be simple: the sid comes from vlan.sid which
> > > supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
> > > But some chip->info->ops->vtu_loadpurge() implementations do not look at
> > > vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().
> > 
> > This paragraph isn't accurate. It's actually:
> > 
> > mv88e6xxx_port_vlan_leave()
> > {
> > 	struct mv88e6xxx_vtu_entry vlan;
> > 
> > 	err = mv88e6xxx_vtu_get(chip, vid, &vlan);
> > 
> > and _this_ leaves vlan.sid uninitialised when mv88e6xxx_vtu_get()
> > ends up calling mv88e6185_g1_vtu_getnext().

Correct, vtu_getnext() reads the SID and vtu_loadpurge() writes it.
I got carried away when I found a plausible explanation for the issue,
and I was in too much of a haste to post it (plus, I had no equipment to
test).

> > I posioned to vlan (using 0xde) and then hexdump'd it after this call,
> > and got:
> > 
> > [   50.748068] mv88e6085 mdio_mux-0.4:00: p9 dsa_port_do_vlan_del vid 1
> > [   50.754802] e0b61b08: 01 00 02 00 de 01 de 03 03 03 03 03 03 03 03 03
> > [   50.761343] e0b61b18: 00 de de 00 00 00 00 00 00 00 00 00 00 de de de
> > [   50.767855] mv88e6085 mdio_mux-0.4:00: p9 vid 1 valid 0 (0-10)
> > [   50.773943] mv88e6085 mdio_mux-0.4:00: p9 !user err=-2
> > 
> > Note byte 4, which is the sid, is the poison value.
> > 
> > So, should mv88e6xxx_vtu_get(), being the first caller of the iterator,
> > clear vlan entirely before calling chip->info->ops->vtu_getnext()
> > rather than just initialising a few fields? Or should
> > mv88e6185_g1_vtu_getnext() ensure that entry->sid is set to zero?
> 
> Or maybe test mv88e6xxx_has_stu() before calling mv88e6xxx_mst_put() ?
> 
> If mv88e6xxx_has_stu() is not sufficient, then mv88e6xxx_vlan_msti_set()
> is another site where mv88e6xxx_vtu_get() is used followed by use of
> vlan.sid.

mv88e6xxx_has_stu() is sufficient, the question is whether it is necessary.

Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge
VLAN mapped to the default MSTI. For some chips, SID 0 is valid and
installed by mv88e6xxx_stu_setup(). A chip which does not support the
STU would implicitly only support mapping all VLANs to the default MSTI,
so although SID 0 is not valid, the behavior coincidentally is the same.
I'm not a huge fan of coincidence, being explicit is more helpful to a
human reader.

In my opinion, I would opt for both changes. To be symmetric with
mv88e6xxx_mst_get() which has mv88e6xxx_has_stu() inside, I would also
like mv88e6xxx_mst_put() to have mv88e6xxx_has_stu() inside. But that
means the caller will have to dereference vlan.sid, which means it will
access uninitialized memory, which is not nice even if it ignores it
later. So I would also add the memset() in mv88e6xxx_vtu_get(), prior to
the chip->info->ops->vtu_getnext() call.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4
  2025-04-11 21:23       ` Vladimir Oltean
@ 2025-04-11 21:51         ` Russell King (Oracle)
  0 siblings, 0 replies; 6+ messages in thread
From: Russell King (Oracle) @ 2025-04-11 21:51 UTC (permalink / raw)
  To: Vladimir Oltean, Andrew Lunn; +Cc: netdev

On Sat, Apr 12, 2025 at 12:23:25AM +0300, Vladimir Oltean wrote:
> On Fri, Apr 11, 2025 at 09:51:39PM +0100, Russell King (Oracle) wrote:
> > On Fri, Apr 11, 2025 at 09:48:22PM +0100, Russell King (Oracle) wrote:
> > > On Fri, Apr 11, 2025 at 09:49:02PM +0300, Vladimir Oltean wrote:
> > > > From 508d912b5f6b56c3f588b1bf28d3caed9e30db1b Mon Sep 17 00:00:00 2001
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > Date: Fri, 11 Apr 2025 21:38:52 +0300
> > > > Subject: [PATCH] net: dsa: mv88e6xxx: fix -ENOENT while deleting user port
> > > >  VLANs
> > > > 
> > > > Russell King reports that on the ZII dev rev B, deleting a bridge VLAN
> > > > from a user port fails with -ENOENT:
> > > > https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/
> > > > 
> > > > This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(),
> > > > which tries to find an MST entry in &chip->msts associated with the SID,
> > > > but fails and returns -ENOENT as such.
> > > > 
> > > > But we know that this chip does not support MST at all, so that is not
> > > > surprising. The question is why does the guard in mv88e6xxx_mst_put()
> > > > not exit early:
> > > > 
> > > > 	if (!sid)
> > > > 		return 0;
> > > > 
> > > > And the answer seems to be simple: the sid comes from vlan.sid which
> > > > supposedly was previously populated by mv88e6xxx_vtu_loadpurge().
> > > > But some chip->info->ops->vtu_loadpurge() implementations do not look at
> > > > vlan.sid at all, for example see mv88e6185_g1_vtu_loadpurge().
> > > 
> > > This paragraph isn't accurate. It's actually:
> > > 
> > > mv88e6xxx_port_vlan_leave()
> > > {
> > > 	struct mv88e6xxx_vtu_entry vlan;
> > > 
> > > 	err = mv88e6xxx_vtu_get(chip, vid, &vlan);
> > > 
> > > and _this_ leaves vlan.sid uninitialised when mv88e6xxx_vtu_get()
> > > ends up calling mv88e6185_g1_vtu_getnext().
> 
> Correct, vtu_getnext() reads the SID and vtu_loadpurge() writes it.
> I got carried away when I found a plausible explanation for the issue,
> and I was in too much of a haste to post it (plus, I had no equipment to
> test).
> 
> > > I posioned to vlan (using 0xde) and then hexdump'd it after this call,
> > > and got:
> > > 
> > > [   50.748068] mv88e6085 mdio_mux-0.4:00: p9 dsa_port_do_vlan_del vid 1
> > > [   50.754802] e0b61b08: 01 00 02 00 de 01 de 03 03 03 03 03 03 03 03 03
> > > [   50.761343] e0b61b18: 00 de de 00 00 00 00 00 00 00 00 00 00 de de de
> > > [   50.767855] mv88e6085 mdio_mux-0.4:00: p9 vid 1 valid 0 (0-10)
> > > [   50.773943] mv88e6085 mdio_mux-0.4:00: p9 !user err=-2
> > > 
> > > Note byte 4, which is the sid, is the poison value.
> > > 
> > > So, should mv88e6xxx_vtu_get(), being the first caller of the iterator,
> > > clear vlan entirely before calling chip->info->ops->vtu_getnext()
> > > rather than just initialising a few fields? Or should
> > > mv88e6185_g1_vtu_getnext() ensure that entry->sid is set to zero?
> > 
> > Or maybe test mv88e6xxx_has_stu() before calling mv88e6xxx_mst_put() ?
> > 
> > If mv88e6xxx_has_stu() is not sufficient, then mv88e6xxx_vlan_msti_set()
> > is another site where mv88e6xxx_vtu_get() is used followed by use of
> > vlan.sid.
> 
> mv88e6xxx_has_stu() is sufficient, the question is whether it is necessary.
> 
> Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge
> VLAN mapped to the default MSTI. For some chips, SID 0 is valid and
> installed by mv88e6xxx_stu_setup(). A chip which does not support the
> STU would implicitly only support mapping all VLANs to the default MSTI,
> so although SID 0 is not valid, the behavior coincidentally is the same.
> I'm not a huge fan of coincidence, being explicit is more helpful to a
> human reader.
> 
> In my opinion, I would opt for both changes. To be symmetric with
> mv88e6xxx_mst_get() which has mv88e6xxx_has_stu() inside, I would also
> like mv88e6xxx_mst_put() to have mv88e6xxx_has_stu() inside. But that
> means the caller will have to dereference vlan.sid, which means it will
> access uninitialized memory, which is not nice even if it ignores it
> later. So I would also add the memset() in mv88e6xxx_vtu_get(), prior to
> the chip->info->ops->vtu_getnext() call.

That sounds sensible.

Andrew might be able to test next week - I believe he had a ZII dev
rev B platform, which is what I discovered this on.

Unfortunately, the ZII dev rev B platform doesn't lend itself to being
remotely controlled because of the need to disconnect a network cable
each time it boots to ensure DHCP and root-NFS uses the non-switch
port. Also, the barebox boot loader configuration can't be changed as
there's no facility to write updates to non-voltage storage.

To trigger the VLAN issue, I have:

/etc/network/interfaces.d/br0:
auto br0
iface br0 inet manual
        bridge-ports lan0 lan1 lan2 lan3 lan4 lan5 lan6 lan7 optical2
        bridge-maxwait 0

and I boot the board, and then ifdown br0. That's basically it...

For the other issue, I was doing:

# cd /sys/bus/mdio_bus/drivers/mv88e6085
# echo mdio_mux-0.4:00 > unbind

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-11 21:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 17:24 [BUG] 6.14: WARNING: CPU: 0 PID: 478 at net/bridge/br_vlan.c:433 nbp_vlan_flush+0xc0/0xc4 Russell King (Oracle)
2025-04-11 18:49 ` Vladimir Oltean
2025-04-11 20:48   ` Russell King (Oracle)
2025-04-11 20:51     ` Russell King (Oracle)
2025-04-11 21:23       ` Vladimir Oltean
2025-04-11 21:51         ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).