netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
@ 2025-08-09 22:35 Daniel Golle
  2025-08-10 13:06 ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Golle @ 2025-08-09 22:35 UTC (permalink / raw)
  To: Hauke Mehrtens, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky,
	Florian Fainelli, netdev, linux-kernel
  Cc: Andreas Schirm, Alexander Sverdlin, Lukas Stockmann, John Crispin

Commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification") added a dev_close() call "to indicate inconsistent
situation" when we could not delete an FDB entry from the port. In case
of the lantiq_gswip driver this is problematic on standalone ports for
which all calls to either .port_fdb_add() or .port_fdb_del() would just
always return -EINVAL as adding or removing FDB entries is currently
only supported for ports which are a member of a bridge.

As since commit c26933639b54 ("net: dsa: request drivers to perform FDB
isolation") the dsa_db is passed to the .port_fdb_add() or
.port_fdb_del() calls we can use that to set the FID accordingly,
similar to how it was for bridge ports, and to FID 0 for standalone
ports. In order for FID 0 to work at all we also need to set bit 1 in
val[1], so always set it.

This solution was found in a downstream driver provided by MaxLinear
(which is the current owner of the former Lantiq switch IP) under
GPL-2.0. Import the implementation and the copyright headers from that
driver.

Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 drivers/net/dsa/lantiq_gswip.c | 55 ++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 6eb3140d4044..fed86b2d78fc 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -2,9 +2,11 @@
 /*
  * Lantiq / Intel GSWIP switch driver for VRX200, xRX300 and xRX330 SoCs
  *
- * Copyright (C) 2010 Lantiq Deutschland
- * Copyright (C) 2012 John Crispin <john@phrozen.org>
+ * Copyright (C) 2023 - 2024 MaxLinear Inc.
+ * Copyright (C) 2022 Snap One, LLC.  All rights reserved.
  * Copyright (C) 2017 - 2019 Hauke Mehrtens <hauke@hauke-m.de>
+ * Copyright (C) 2012 John Crispin <john@phrozen.org>
+ * Copyright (C) 2010 Lantiq Deutschland
  *
  * The VLAN and bridge model the GSWIP hardware uses does not directly
  * matches the model DSA uses.
@@ -239,6 +241,7 @@
 #define  GSWIP_TABLE_MAC_BRIDGE_KEY3_FID	GENMASK(5, 0)	/* Filtering identifier */
 #define  GSWIP_TABLE_MAC_BRIDGE_VAL0_PORT	GENMASK(7, 4)	/* Port on learned entries */
 #define  GSWIP_TABLE_MAC_BRIDGE_VAL1_STATIC	BIT(0)		/* Static, non-aging entry */
+#define  GSWIP_TABLE_MAC_BRIDGE_VAL1_VALID	BIT(1)		/* Valid bit */
 
 #define XRX200_GPHY_FW_ALIGN	(16 * 1024)
 
@@ -1349,30 +1352,37 @@ static void gswip_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
 }
 
 static int gswip_port_fdb(struct dsa_switch *ds, int port,
-			  const unsigned char *addr, u16 vid, bool add)
+			  const unsigned char *addr, u16 vid, struct dsa_db db,
+			  bool add)
 {
-	struct net_device *bridge = dsa_port_bridge_dev_get(dsa_to_port(ds, port));
 	struct gswip_priv *priv = ds->priv;
 	struct gswip_pce_table_entry mac_bridge = {0,};
-	unsigned int max_ports = priv->hw_info->max_ports;
 	int fid = -1;
-	int i;
 	int err;
+	int i;
 
-	if (!bridge)
-		return -EINVAL;
-
-	for (i = max_ports; i < ARRAY_SIZE(priv->vlans); i++) {
-		if (priv->vlans[i].bridge == bridge) {
-			fid = priv->vlans[i].fid;
-			break;
+	switch (db.type) {
+	case DSA_DB_BRIDGE:
+		for (i = 0; i < ARRAY_SIZE(priv->vlans); i++) {
+			if (priv->vlans[i].bridge == db.bridge.dev) {
+				fid = priv->vlans[i].fid;
+				break;
+			}
 		}
-	}
-
-	if (fid == -1) {
-		dev_err(priv->dev, "no FID found for bridge %s\n",
-			bridge->name);
-		return -EINVAL;
+		if (fid == -1) {
+			dev_err(priv->dev, "Port %d not part of a bridge\n", port);
+			return -EINVAL;
+		}
+		break;
+	case DSA_DB_PORT:
+		if (dsa_is_cpu_port(ds, port) &&
+			dsa_fdb_present_in_other_db(ds, port, addr, vid, db))
+			return 0;
+		/* FID of a standalone / single port bridge */
+		fid = 0;
+		break;
+	default:
+		return -EOPNOTSUPP;
 	}
 
 	mac_bridge.table = GSWIP_TABLE_MAC_BRIDGE;
@@ -1382,7 +1392,8 @@ static int gswip_port_fdb(struct dsa_switch *ds, int port,
 	mac_bridge.key[2] = addr[1] | (addr[0] << 8);
 	mac_bridge.key[3] = FIELD_PREP(GSWIP_TABLE_MAC_BRIDGE_KEY3_FID, fid);
 	mac_bridge.val[0] = add ? BIT(port) : 0; /* port map */
-	mac_bridge.val[1] = GSWIP_TABLE_MAC_BRIDGE_VAL1_STATIC;
+	mac_bridge.val[1] = add ? (GSWIP_TABLE_MAC_BRIDGE_VAL1_STATIC |
+				   GSWIP_TABLE_MAC_BRIDGE_VAL1_VALID) : 0;
 	mac_bridge.valid = add;
 
 	err = gswip_pce_table_entry_write(priv, &mac_bridge);
@@ -1396,14 +1407,14 @@ static int gswip_port_fdb_add(struct dsa_switch *ds, int port,
 			      const unsigned char *addr, u16 vid,
 			      struct dsa_db db)
 {
-	return gswip_port_fdb(ds, port, addr, vid, true);
+	return gswip_port_fdb(ds, port, addr, vid, db, true);
 }
 
 static int gswip_port_fdb_del(struct dsa_switch *ds, int port,
 			      const unsigned char *addr, u16 vid,
 			      struct dsa_db db)
 {
-	return gswip_port_fdb(ds, port, addr, vid, false);
+	return gswip_port_fdb(ds, port, addr, vid, db, false);
 }
 
 static int gswip_port_fdb_dump(struct dsa_switch *ds, int port,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-09 22:35 [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del} Daniel Golle
@ 2025-08-10 13:06 ` Vladimir Oltean
  2025-08-10 14:48   ` Daniel Golle
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2025-08-10 13:06 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

On Sat, Aug 09, 2025 at 11:35:28PM +0100, Daniel Golle wrote:
> Commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
> notification") added a dev_close() call "to indicate inconsistent
> situation" when we could not delete an FDB entry from the port. In case
> of the lantiq_gswip driver this is problematic on standalone ports for
> which all calls to either .port_fdb_add() or .port_fdb_del() would just
> always return -EINVAL as adding or removing FDB entries is currently
> only supported for ports which are a member of a bridge.
> 
> As since commit c26933639b54 ("net: dsa: request drivers to perform FDB
> isolation") the dsa_db is passed to the .port_fdb_add() or
> .port_fdb_del() calls we can use that to set the FID accordingly,
> similar to how it was for bridge ports, and to FID 0 for standalone
> ports. In order for FID 0 to work at all we also need to set bit 1 in
> val[1], so always set it.
> 
> This solution was found in a downstream driver provided by MaxLinear
> (which is the current owner of the former Lantiq switch IP) under
> GPL-2.0. Import the implementation and the copyright headers from that
> driver.
> 
> Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> ---

1. The dev_close() call was removed in commit 2fd186501b1c ("net: dsa:
   be louder when a non-legacy FDB operation fails"); what kernel are you
   seeing failures on?

2. The call paths which set DSA_DB_PORT should be all guarded by
   dsa_switch_supports_uc_filtering(), which the gswip driver doesn't
   fulfill (it's missing ds->fdb_isolation). Can you put a dump_stack()
   in the DSA_DB_PORT handler and let me know where it's called from?

3. You haven't actually explained the context that leads to
   gswip_port_fdb() returning -EINVAL. It would be great to have that as
   a starting point, perhaps a dump_stack() in the unmodified code could
   reveal more.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-10 13:06 ` Vladimir Oltean
@ 2025-08-10 14:48   ` Daniel Golle
  2025-08-10 16:32     ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Golle @ 2025-08-10 14:48 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

Hi Vladimir,

thank you for spending parts of your weekend dealing with this problem ;)

On Sun, Aug 10, 2025 at 04:06:37PM +0300, Vladimir Oltean wrote:
> On Sat, Aug 09, 2025 at 11:35:28PM +0100, Daniel Golle wrote:
> > Commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
> > notification") added a dev_close() call "to indicate inconsistent
> > situation" when we could not delete an FDB entry from the port. In case
> > of the lantiq_gswip driver this is problematic on standalone ports for
> > which all calls to either .port_fdb_add() or .port_fdb_del() would just
> > always return -EINVAL as adding or removing FDB entries is currently
> > only supported for ports which are a member of a bridge.
> > 
> > As since commit c26933639b54 ("net: dsa: request drivers to perform FDB
> > isolation") the dsa_db is passed to the .port_fdb_add() or
> > .port_fdb_del() calls we can use that to set the FID accordingly,
> > similar to how it was for bridge ports, and to FID 0 for standalone
> > ports. In order for FID 0 to work at all we also need to set bit 1 in
> > val[1], so always set it.
> > 
> > This solution was found in a downstream driver provided by MaxLinear
> > (which is the current owner of the former Lantiq switch IP) under
> > GPL-2.0. Import the implementation and the copyright headers from that
> > driver.
> > 
> > Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
> > Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> > ---
> 
> 1. The dev_close() call was removed in commit 2fd186501b1c ("net: dsa:
>    be louder when a non-legacy FDB operation fails"); what kernel are you
>    seeing failures on?

I'm working on net-next at this moment, trying historic OpenWrt releases
on that hardware I noticed the problem has been introduced somewhere
between Linxu 5.10 and Linux 5.15, but it is still present up to net-next
as of today. That's why I concluded c9eb3e0f8701 would be the original
cause.

> 
> 2. The call paths which set DSA_DB_PORT should be all guarded by
>    dsa_switch_supports_uc_filtering(), which the gswip driver doesn't
>    fulfill (it's missing ds->fdb_isolation). Can you put a dump_stack()
>    in the DSA_DB_PORT handler and let me know where it's called from?

In case you meant the driver implementation of .port_fdb_add() and
.port_fdb_del(), see below. If you meant somewhere else, please let me
know exactly where you want the dump_stack() to be placed.

> 
> 3. You haven't actually explained the context that leads to
>    gswip_port_fdb() returning -EINVAL. It would be great to have that as
>    a starting point, perhaps a dump_stack() in the unmodified code could
>    reveal more.

So instead of the fix I have now applied this simple patch to reveal the
stacktrace which leads to the return of -EINVAL:

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 6eb3140d4044..befca64cce6d 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -1359,7 +1359,7 @@ static int gswip_port_fdb(struct dsa_switch *ds, int port,
 	int i;
 	int err;
 
-	if (!bridge)
+	if (WARN_ON(!bridge))
 		return -EINVAL;
 
 	for (i = max_ports; i < ARRAY_SIZE(priv->vlans); i++) {
---

Unfortunately that doesn't tell much:
[   52.910000] ------------[ cut here ]------------
[   52.910000] WARNING: CPU: 0 PID: 12 at drivers/net/dsa/lantiq_gswip.c:1362 0xc026c364
[   52.920000] Modules linked in: tag_gswip lantiq_gswip
[   52.930000] CPU: 0 UID: 0 PID: 12 Comm: kworker/u8:0 Not tainted 6.16.0+ #0 NONE 
[   52.930000] Hardware name: o2 Box 6431
[   52.930000] Workqueue: dsa_ordered dsa_user_switchdev_event_work
[   52.930000] Stack : 8187db3c 8187db10 00000000 000affff 807ec348 81c05200 81bd5e00 6473615f
[   52.930000]         6f726465 72656400 00000000 00000000 00000000 00000001 8187daf8 8183ddc0
[   52.930000]         00000000 00000000 808edd50 8187d920 ffffefff 00000000 00000098 0000009a
[   52.930000]         8187d92c 0000009a 809fd7d8 fffffffb 00000001 00000000 808edd50 00000000
[   52.930000]         00000000 c026c364 00000000 81cb6154 00000003 fffc2513 00000000 80eb0000
[   52.930000]         ...
[   52.930000] Call Trace:
[   52.930000] [<800167cc>] show_stack+0x28/0xf0
[   52.930000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   52.930000] [<8003bb58>] __warn+0x9c/0x114
[   52.930000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   52.930000] [<c026c364>] 0xc026c364
[   52.930000] 
[   52.930000] ---[ end trace 0000000000000000 ]---

So my next attempt is this

diff --git a/net/dsa/user.c b/net/dsa/user.c
index f59d66f0975d..34d67486ec2f 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -3649,7 +3649,7 @@ static void dsa_user_switchdev_event_work(struct work_struct *work)
 			err = dsa_port_lag_fdb_add(dp, addr, vid);
 		else
 			err = dsa_port_fdb_add(dp, addr, vid);
-		if (err) {
+		if (WARN_ON(err)) {
 			dev_err(ds->dev,
 				"port %d failed to add %pM vid %d to fdb: %d\n",
 				dp->index, addr, vid, err);
@@ -3665,7 +3665,7 @@ static void dsa_user_switchdev_event_work(struct work_struct *work)
 			err = dsa_port_lag_fdb_del(dp, addr, vid);
 		else
 			err = dsa_port_fdb_del(dp, addr, vid);
-		if (err) {
+		if (WARN_ON(err)) {
 			dev_err(ds->dev,
 				"port %d failed to delete %pM vid %d from fdb: %d\n",
 				dp->index, addr, vid, err);
---

That lead to a little more revealing result:
[   48.930000] ------------[ cut here ]------------
[   48.930000] WARNING: CPU: 0 PID: 31 at net/dsa/user.c:3652 dsa_user_switchdev_event_work+0x1e8/0x210
[   48.940000] Modules linked in: tag_gswip lantiq_gswip
[   48.940000] CPU: 0 UID: 0 PID: 31 Comm: kworker/u8:2 Tainted: G        W           6.16.0+ #0 NONE 
[   48.940000] Tainted: [W]=WARN
[   48.940000] Hardware name: o2 Box 6431
[   48.940000] Workqueue: dsa_ordered dsa_user_switchdev_event_work
[   48.940000] Stack : 81c2dd24 81c2dcf8 00000000 000affff 807ec348 81c6b000 81bcc200 6473615f
[   48.940000]         6f726465 72656400 00000000 00000000 00000000 00000001 81c2dce0 8183b840
[   48.940000]         00000000 00000000 808edd50 81c2db08 ffffefff 00000000 00000191 00000193
[   48.940000]         81c2db14 00000193 809fd7d8 fffffffb 00000001 00000000 808edd50 00000000
[   48.940000]         00000000 807ec530 00000000 81804420 00000003 fffc5b17 000a3b9f 00000001
[   48.940000]         ...
[   48.940000] Call Trace:
[   48.940000] [<800167cc>] show_stack+0x28/0xf0
[   48.940000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   48.940000] [<8003bb58>] __warn+0x9c/0x114
[   48.940000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   48.940000] [<807ec530>] dsa_user_switchdev_event_work+0x1e8/0x210
[   48.940000] [<8005892c>] process_one_work+0x1c4/0x3e0
[   48.940000] [<8005a1ac>] worker_thread+0x318/0x488
[   48.940000] [<80062000>] kthread+0x118/0x258
[   48.940000] [<80011db8>] ret_from_kernel_thread+0x14/0x1c
[   48.940000] 
[   48.940000] ---[ end trace 0000000000000000 ]---

So I went ahead and instead put the WARN_ON() in the function which schedules the work.

diff --git a/net/dsa/user.c b/net/dsa/user.c
index f59d66f0975d..1b56c0a5d5d6 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -3748,6 +3748,7 @@ static int dsa_user_fdb_event(struct net_device *dev,
 		   orig_dev->name, fdb_info->addr, fdb_info->vid,
 		   host_addr ? " as host address" : "");
 
+	WARN_ON(1);
 	INIT_WORK(&switchdev_work->work, dsa_user_switchdev_event_work);
 	switchdev_work->event = event;
	switchdev_work->dev = dev;
---

It is now a bit difficult to tell which of the calls actually return
-EINVAL, but counting the number of stackdumps and the number of -EINVAL
error messages seem to match:

[   65.500000] ------------[ cut here ]------------
[   65.500000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[   65.510000] Modules linked in: tag_gswip lantiq_gswip
[   65.510000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Not tainted 6.16.0+ #0 NONE 
[   65.510000] Hardware name: o2 Box 6431
[   65.510000] Stack : 81f5b8b4 00000096 00000000 00000001 00000000 00000000 00000000 00000000
[   65.510000]         00000000 00000000 00000000 00000000 00000000 00000001 81f5b870 81839f40
[   65.510000]         00000000 00000000 808edd50 81f5b708 ffffefff 00000000 00000096 00000098
[   65.510000]         81f5b714 00000098 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[   65.510000]         00000000 807ed128 00000000 807eb87c 00000003 fffc249b 000726df 00000001
[   65.510000]         ...
[   65.510000] Call Trace:
[   65.510000] [<800167cc>] show_stack+0x28/0xf0
[   65.510000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   65.510000] [<8003bb58>] __warn+0x9c/0x114
[   65.510000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   65.510000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[   65.510000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[   65.510000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[   65.510000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[   65.510000] [<807c43c0>] br_switchdev_fdb_replay+0xd0/0x138
[   65.510000] [<807c4de8>] br_switchdev_port_offload+0x240/0x39c
[   65.510000] [<80799b6c>] br_switchdev_blocking_event+0x80/0xec
[   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[   65.510000] [<807f83e0>] switchdev_bridge_port_offload+0x5c/0xd0
[   65.510000] [<807e4c90>] dsa_port_bridge_join+0x170/0x410
[   65.510000] [<807ed5fc>] dsa_user_changeupper.part.0+0x40/0x180
[   65.510000] [<807f0ac0>] dsa_user_netdevice_event+0x5b4/0xc34
[   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[   65.510000] [<805edeec>] __netdev_upper_dev_link+0x1bc/0x450
[   65.510000] [<805ee1dc>] netdevbmaster_upper_dev_link+0x2c/0x38
[   65.510000] [<807a055c>] br_add_if+0x494/0x890
[   65.510000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[   65.510000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[   65.510000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[   65.510000] [<8002204c>] syscall_common+0x34/0x58
[   65.510000] 
[   65.510000] ---[ end trace 0000000000000000 ]---
[   65.710000] gswip 1e108000.switch lan1: entered promiscuous mode
[   65.710000] ------------[ cut here ]------------
[   65.720000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[   65.730000] Modules linked in: tag_gswip lantiq_gswip
[   65.730000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G        W           6.16.0+ #0 NONE 
[   65.730000] Tainted: [W]=WARN
[   65.730000] Hardware name: o2 Box 6431
[   65.730000] Stack : 81f5bb04 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[   65.730000]         00000000 00000000 00000000 00000000 00000000 00000001 81f5bac0 81839f40
[   65.730000]         00000000 00000000 808edd50 81f5b958 ffffefff 00000000 000000be 000000c0
[   65.730000]         81f5b964 000000c0 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[   65.730000]         00000000 807ed128 00000000 807eb87c 00000003 fffc2d13 00000000 80eb0000
[   65.730000]         ...
[   65.730000] Call Trace:
[   65.730000] [<800167cc>] show_stack+0x28/0xf0
[   65.730000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   65.730000] [<8003bb58>] __warn+0x9c/0x114
[   65.730000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   65.730000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[   65.730000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[   65.730000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[   65.730000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[   65.730000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[   65.730000] [<807c47f0>] br_switchdev_fdb_notify+0xf0/0x11c
[   65.730000] [<8079b33c>] fdb_notify+0x110/0x158
[   65.730000] [<8079c1d0>] fdb_add_local+0x124/0x140
[   65.730000] [<8079ce64>] br_fdb_add_local+0x4c/0x7c
[   65.730000] [<807a0670>] br_add_if+0x5a8/0x890
[   65.730000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[   65.730000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[   65.730000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[   65.730000] [<8002204c>] syscall_common+0x34/0x58
[   65.730000] 
[   65.730000] ---[ end trace 0000000000000000 ]---
[   65.890000] ------------[ cut here ]------------
[   65.900000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[   65.900000] Modules linked in: tag_gswip lantiq_gswip
[   65.910000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G        W           6.16.0+ #0 NONE 
[   65.910000] Tainted: [W]=WARN
[   65.910000] Hardware name: o2 Box 6431
[   65.910000] Stack : 81f5ba34 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[   65.910000]         00000000 00000000 00000000 00000000 00000000 00000001 81f5b9f0 81839f40
[   65.910000]         00000000 00000000 808edd50 81f5b888 ffffefff 00000000 000000df 000000e1
[   65.910000]         81f5b894 000000e1 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[   65.910000]         00000000 807ed128 00000000 807eb87c 00000003 fffc33df 00000000 80eb0000
[   65.910000]         ...
[   65.910000] Call Trace:
[   65.910000] [<800167cc>] show_stack+0x28/0xf0
[   65.910000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   65.910000] [<8003bb58>] __warn+0x9c/0x114
[   65.910000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   65.910000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[   65.910000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[   65.910000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[   65.910000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[   65.910000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[   65.910000] [<807c47f0>] br_switchdev_fdb_notify+0xf0/0x11c
[   65.910000] [<8079b33c>] fdb_notify+0x110/0x158
[   65.910000] [<8079c1d0>] fdb_add_local+0x124/0x140
[   65.910000] [<8079ce64>] br_fdb_add_local+0x4c/0x7c
[   65.910000] [<807bee58>] __vlan_add+0xa0/0x93c
[   65.910000] [<807bfc04>] nbp_vlan_add+0x134/0x1fc
[   65.910000] [<807c0750>] nbp_vlan_init+0x120/0x178
[   65.910000] [<807a06ac>] br_add_if+0x5e4/0x890
[   65.910000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[   65.910000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[   65.910000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[   65.910000] [<8002204c>] syscall_common+0x34/0x58
[   65.910000] 
[   65.910000] ---[ end trace 0000000000000000 ]---
[   66.080000] ------------[ cut here ]------------
[   66.090000] WARNING: CPU: 0 PID: 1238 at net/dsa/user.c:3751 dsa_user_fdb_event+0x110/0x1c8
[   66.100000] Modules linked in: tag_gswip lantiq_gswip
[   66.100000] CPU: 0 UID: 0 PID: 1238 Comm: netifd Tainted: G        W           6.16.0+ #0 NONE 
[   66.100000] Tainted: [W]=WARN
[   66.100000] Hardware name: o2 Box 6431
[   66.100000] Stack : 81f5baa4 8003b4bc 00000000 00000001 00000000 00000000 00000000 00000000
[   66.100000]         00000000 00000000 00000000 00000000 00000000 00000001 81f5ba60 81839f40
[   66.100000]         00000000 00000000 808edd50 81f5b8f8 ffffefff 00000000 00000103 00000105
[   66.100000]         81f5b904 00000105 809fd7d8 fffffff9 00000001 00000000 808edd50 00000000
[   66.100000]         00000000 807ed128 00000000 807eb87c 00000003 fffc3b2b 00000000 80eb0000
[   66.100000]         ...
[   66.100000] Call Trace:
[   66.100000] [<800167cc>] show_stack+0x28/0xf0
[   66.100000] [<8000d234>] dump_stack_lvl+0x70/0xb0
[   66.100000] [<8003bb58>] __warn+0x9c/0x114
[   66.100000] [<8003bcf4>] warn_slowpath_fmt+0x124/0x17c
[   66.100000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[   66.100000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[   66.100000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[   66.100000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[   66.100000] [<80065ea8>] atomic_notifier_call_chain+0x48/0x88
[   66.100000] [<807c4810>] br_switchdev_fdb_notify+0x110/0x11c
[   66.100000] [<8079b33c>] fdb_notify+0x110/0x158
[   66.100000] [<8079bdb4>] fdb_delete+0x1e4/0x310
[   66.100000] [<8079c59c>] br_fdb_change_mac_address+0x180/0x188
[   66.100000] [<807a46f0>] br_stp_change_bridge_id+0x4c/0x1c8
[   66.100000] [<807a496c>] br_stp_recalculate_bridge_id+0x100/0x148
[   66.100000] [<807a06c4>] br_add_if+0x5fc/0x890
[   66.100000] [<807a20cc>] br_ioctl_stub+0x238/0x514
[   66.100000] [<805bc06c>] br_ioctl_call+0x58/0xa8
[   66.100000] [<80255a60>] sys_ioctl+0x4a8/0xa60
[   66.100000] [<8002204c>] syscall_common+0x34/0x58
[   66.100000] 
[   66.100000] ---[ end trace 0000000000000000 ]---
[   66.300000] gswip 1e108000.switch: port 3 failed to add 6a:94:c2:xx:xx:xx vid 1 to fdb: -22
[   66.300000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 0 to fdb: -22
[   66.320000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 1 to fdb: -22
[   66.320000] gswip 1e108000.switch: port 3 failed to delete 6a:94:c2:xx:xx:xx vid 1 from fdb: -2

So the problem is apparently that at the point of calling br_add_if() the
port obviously isn't (yet) a member of the bridge and hence
dsa_port_bridge_dev_get() would still return NULL at this point, which
then causes gswip_port_fdb() to return -EINVAL.


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-10 14:48   ` Daniel Golle
@ 2025-08-10 16:32     ` Vladimir Oltean
  2025-08-10 16:54       ` Daniel Golle
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2025-08-10 16:32 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

On Sun, Aug 10, 2025 at 03:48:30PM +0100, Daniel Golle wrote:
> [   66.300000] gswip 1e108000.switch: port 3 failed to add 6a:94:c2:xx:xx:xx vid 1 to fdb: -22
> [   66.300000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 0 to fdb: -22
> [   66.320000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 1 to fdb: -22
> [   66.320000] gswip 1e108000.switch: port 3 failed to delete 6a:94:c2:xx:xx:xx vid 1 from fdb: -2
> 
> So the problem is apparently that at the point of calling br_add_if() the
> port obviously isn't (yet) a member of the bridge and hence
> dsa_port_bridge_dev_get() would still return NULL at this point, which
> then causes gswip_port_fdb() to return -EINVAL.

Nope, this theory is false because the user port _is_ a member of the
bridge when it processes the SWITCHDEV_FDB_ADD_TO_DEVICE events.

There are 2 cases for handling these events. One is handling past events
which were missed and are re-generated during FDB replay:

[   65.510000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
[   65.510000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
[   65.510000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
[   65.510000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
[   65.510000] [<807c43c0>] br_switchdev_fdb_replay+0xd0/0x138
[   65.510000] [<807c4de8>] br_switchdev_port_offload+0x240/0x39c
[   65.510000] [<80799b6c>] br_switchdev_blocking_event+0x80/0xec
[   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[   65.510000] [<807f83e0>] switchdev_bridge_port_offload+0x5c/0xd0
[   65.510000] [<807e4c90>] dsa_port_bridge_join+0x170/0x410
[   65.510000] [<807ed5fc>] dsa_user_changeupper.part.0+0x40/0x180
[   65.510000] [<807f0ac0>] dsa_user_netdevice_event+0x5b4/0xc34
[   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
[   65.510000] [<805edeec>] __netdev_upper_dev_link+0x1bc/0x450
[   65.510000] [<805ee1dc>] netdev_master_upper_dev_link+0x2c/0x38
[   65.510000] [<807a055c>] br_add_if+0x494/0x890

If you look at the order of operations, you'll see that:

int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
			 struct netlink_ext_ack *extack)
{
	...
	err = dsa_port_bridge_create(dp, br, extack); // this sets dp->bridge
	if (err)
		return err;

	brport_dev = dsa_port_to_bridge_port(dp);

	info.bridge = *dp->bridge;
	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info); // this calls ds->ops->port_bridge_join()
	if (err)
		goto out_rollback;

	/* Drivers which support bridge TX forwarding should set this */
	dp->bridge->tx_fwd_offload = info.tx_fwd_offload;

	err = switchdev_bridge_port_offload(brport_dev, dev, dp,
					    &dsa_user_switchdev_notifier,
					    &dsa_user_switchdev_blocking_notifier,
					    dp->bridge->tx_fwd_offload, extack); // this calls br_switchdev_fdb_replay()
	if (err)
		goto out_rollback_unbridge;
	...
}

by the time br_switchdev_fdb_replay() is called, dp->bridge correctly
reflects the bridge that is generating the events.

The problem is not a race condition, the problem is that the driver does
not correctly handle host FDB entries.

The truly revealing step is to uncomment this:

	netdev_dbg(dev, "%s FDB entry towards %s, addr %pM vid %d%s\n",
		   event == SWITCHDEV_FDB_ADD_TO_DEVICE ? "Adding" : "Deleting",
		   orig_dev->name, fdb_info->addr, fdb_info->vid,
		   host_addr ? " as host address" : "");

and see it will print "as host address", meaning dsa_port_bridge_host_fdb_add()
will be called.

At the DSA cross-chip notifier layer, this generates a DSA_NOTIFIER_HOST_FDB_ADD
event rather than the port-level DSA_NOTIFIER_FDB_ADD. The major difference in
handling is that HOST_FDB_ADD events are matched by the *CPU* port, see
dsa_port_host_address_match().

The CPU port is not part of the bridge that generated the host FDB entry,
only the user port it services is.

The problem originates, roughly speaking, since commit 10fae4ac89ce
("net: dsa: include bridge addresses which are local in the host fdb
list"), which first appeared in v5.14. We rolled out a new feature using
existing API, and didn't notice the gswip driver wouldn't tolerate
ds->ops->port_fdb_add() getting called on the CPU port.

Anyway, your intuition for fixing this properly was somewhat correct.
Even if gswip does not implement FDB isolation, it is correct to look at
the dsa_db :: bridge member to see which bridge originated the host FDB
entry. That was its exact purpose, as documented in section "Address
databases" of Documentation/networking/dsa/dsa.rst. Even if cpu_dp->bridge
is NULL (as expected), drivers have to know on behalf of which user port
(member of a bridge) the CPU port is filtering an entry towards the host.
This is because CPU and DSA ports are servicing multiple address databases.

The only problem is that the API you're making use for fixing this was
introduced in commit c26933639b54 ("net: dsa: request drivers to perform
FDB isolation"), first appeared in v5.18. Thus, you can't fix an issue
in linux-5.15.y using this method, unless the API change is backported.

Alternatively, for stable kernels you could suppress the error and
return 0 in the gswip driver, with the appropriate comment that the API
doesn't communicate the bridge ID for host FDB entries, and thus, the
driver just won't filter them. Then, you could develop the solution
further for net-next, keeping in mind how things are supposed to work.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-10 16:32     ` Vladimir Oltean
@ 2025-08-10 16:54       ` Daniel Golle
  2025-08-10 21:02         ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Golle @ 2025-08-10 16:54 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

On Sun, Aug 10, 2025 at 07:32:29PM +0300, Vladimir Oltean wrote:
> On Sun, Aug 10, 2025 at 03:48:30PM +0100, Daniel Golle wrote:
> > [   66.300000] gswip 1e108000.switch: port 3 failed to add 6a:94:c2:xx:xx:xx vid 1 to fdb: -22
> > [   66.300000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 0 to fdb: -22
> > [   66.320000] gswip 1e108000.switch: port 3 failed to add 1a:f8:a8:xx:xx:xx vid 1 to fdb: -22
> > [   66.320000] gswip 1e108000.switch: port 3 failed to delete 6a:94:c2:xx:xx:xx vid 1 from fdb: -2
> > 
> > So the problem is apparently that at the point of calling br_add_if() the
> > port obviously isn't (yet) a member of the bridge and hence
> > dsa_port_bridge_dev_get() would still return NULL at this point, which
> > then causes gswip_port_fdb() to return -EINVAL.
> 
> Nope, this theory is false because the user port _is_ a member of the
> bridge when it processes the SWITCHDEV_FDB_ADD_TO_DEVICE events.
> 
> There are 2 cases for handling these events. One is handling past events
> which were missed and are re-generated during FDB replay:
> 
> [   65.510000] [<807ed128>] dsa_user_fdb_event+0x110/0x1c8
> [   65.510000] [<807f89b8>] __switchdev_handle_fdb_event_to_device+0x138/0x228
> [   65.510000] [<807f8ad8>] switchdev_handle_fdb_event_to_device+0x30/0x48
> [   65.510000] [<807ec328>] dsa_user_switchdev_event+0x90/0xb0
> [   65.510000] [<807c43c0>] br_switchdev_fdb_replay+0xd0/0x138
> [   65.510000] [<807c4de8>] br_switchdev_port_offload+0x240/0x39c
> [   65.510000] [<80799b6c>] br_switchdev_blocking_event+0x80/0xec
> [   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
> [   65.510000] [<807f83e0>] switchdev_bridge_port_offload+0x5c/0xd0
> [   65.510000] [<807e4c90>] dsa_port_bridge_join+0x170/0x410
> [   65.510000] [<807ed5fc>] dsa_user_changeupper.part.0+0x40/0x180
> [   65.510000] [<807f0ac0>] dsa_user_netdevice_event+0x5b4/0xc34
> [   65.510000] [<80065e20>] raw_notifier_call_chain+0x48/0x88
> [   65.510000] [<805edeec>] __netdev_upper_dev_link+0x1bc/0x450
> [   65.510000] [<805ee1dc>] netdev_master_upper_dev_link+0x2c/0x38
> [   65.510000] [<807a055c>] br_add_if+0x494/0x890
> 
> If you look at the order of operations, you'll see that:
> 
> int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br,
> 			 struct netlink_ext_ack *extack)
> {
> 	...
> 	err = dsa_port_bridge_create(dp, br, extack); // this sets dp->bridge
> 	if (err)
> 		return err;
> 
> 	brport_dev = dsa_port_to_bridge_port(dp);
> 
> 	info.bridge = *dp->bridge;
> 	err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info); // this calls ds->ops->port_bridge_join()
> 	if (err)
> 		goto out_rollback;
> 
> 	/* Drivers which support bridge TX forwarding should set this */
> 	dp->bridge->tx_fwd_offload = info.tx_fwd_offload;
> 
> 	err = switchdev_bridge_port_offload(brport_dev, dev, dp,
> 					    &dsa_user_switchdev_notifier,
> 					    &dsa_user_switchdev_blocking_notifier,
> 					    dp->bridge->tx_fwd_offload, extack); // this calls br_switchdev_fdb_replay()
> 	if (err)
> 		goto out_rollback_unbridge;
> 	...
> }
> 
> by the time br_switchdev_fdb_replay() is called, dp->bridge correctly
> reflects the bridge that is generating the events.
> 
> The problem is not a race condition, the problem is that the driver does
> not correctly handle host FDB entries.
> 
> The truly revealing step is to uncomment this:
> 
> 	netdev_dbg(dev, "%s FDB entry towards %s, addr %pM vid %d%s\n",
> 		   event == SWITCHDEV_FDB_ADD_TO_DEVICE ? "Adding" : "Deleting",
> 		   orig_dev->name, fdb_info->addr, fdb_info->vid,
> 		   host_addr ? " as host address" : "");
> 
> and see it will print "as host address", meaning dsa_port_bridge_host_fdb_add()
> will be called.

Thank you for explaining the details, it makes perfect sense to me now.

> 
> At the DSA cross-chip notifier layer, this generates a DSA_NOTIFIER_HOST_FDB_ADD
> event rather than the port-level DSA_NOTIFIER_FDB_ADD. The major difference in
> handling is that HOST_FDB_ADD events are matched by the *CPU* port, see
> dsa_port_host_address_match().
> 
> The CPU port is not part of the bridge that generated the host FDB entry,
> only the user port it services is.
> 
> The problem originates, roughly speaking, since commit 10fae4ac89ce
> ("net: dsa: include bridge addresses which are local in the host fdb
> list"), which first appeared in v5.14. We rolled out a new feature using
> existing API, and didn't notice the gswip driver wouldn't tolerate
> ds->ops->port_fdb_add() getting called on the CPU port.
> 
> Anyway, your intuition for fixing this properly was somewhat correct.
> Even if gswip does not implement FDB isolation, it is correct to look at
> the dsa_db :: bridge member to see which bridge originated the host FDB
> entry. That was its exact purpose, as documented in section "Address
> databases" of Documentation/networking/dsa/dsa.rst. Even if cpu_dp->bridge
> is NULL (as expected), drivers have to know on behalf of which user port
> (member of a bridge) the CPU port is filtering an entry towards the host.
> This is because CPU and DSA ports are servicing multiple address databases.
> 
> The only problem is that the API you're making use for fixing this was
> introduced in commit c26933639b54 ("net: dsa: request drivers to perform
> FDB isolation"), first appeared in v5.18. Thus, you can't fix an issue
> in linux-5.15.y using this method, unless the API change is backported.
> 
> Alternatively, for stable kernels you could suppress the error and
> return 0 in the gswip driver, with the appropriate comment that the API
> doesn't communicate the bridge ID for host FDB entries, and thus, the
> driver just won't filter them. Then, you could develop the solution
> further for net-next, keeping in mind how things are supposed to work.

As it would be nice to have the proper fix backported at least all the way
down to linux-6.1.y, do you think it would be ok to have that solution
I proposed (and picked from the GPL-2.0 licensed vendor driver) applied to
the 'net' tree (with a more appropriate Fixes: tag and commit description,
obviously) and either just not fix it for linux-5.15.y, or only there
replace the 'return -EINVAL;' with a 'dev_warn(...); return 0;'?

In fact, commit c26933639b54 ("net: dsa: request drivers to perform FDB
isolation") also touches drivers/net/dsa/lantiq_gswip.c and does add
struct dsa_db as parameter for the .port_fdb_{add,del} ops. Would it be
ok to hence target that commit in the Fixes: tag?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-10 16:54       ` Daniel Golle
@ 2025-08-10 21:02         ` Vladimir Oltean
  2025-08-11 15:32           ` Vladimir Oltean
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2025-08-10 21:02 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

On Sun, Aug 10, 2025 at 05:54:55PM +0100, Daniel Golle wrote:
> As it would be nice to have the proper fix backported at least all the way
> down to linux-6.1.y, do you think it would be ok to have that solution
> I proposed (and picked from the GPL-2.0 licensed vendor driver) applied to
> the 'net' tree (with a more appropriate Fixes: tag and commit description,
> obviously) and either just not fix it for linux-5.15.y, or only there
> replace the 'return -EINVAL;' with a 'dev_warn(...); return 0;'?
> 
> In fact, commit c26933639b54 ("net: dsa: request drivers to perform FDB
> isolation") also touches drivers/net/dsa/lantiq_gswip.c and does add
> struct dsa_db as parameter for the .port_fdb_{add,del} ops. Would it be
> ok to hence target that commit in the Fixes: tag?

Generally the commit from the Fixes: tag is the one which "git bisect"
points to, when monitoring the described problem.

If the problem are the error messages, the proper fix which restores
previous behavior would be returning zero with no further logging.

If the problem is the lack of host FDB entries from the hardware, the
new logic to be properly tested in a context where it makes a real difference.
I suggest tools/testing/selftests/net/forwarding/local_termination.sh
once dsa_switch_supports_uc_filtering() returns true. Otherwise, the CPU
port must be in the flood mask of bridged user ports in order to receive
unknown traffic, and for the basic usage, I don't see much practical
benefit from adding host FDB entries to hardware. One situation is when
there are 2 bridged ports A and B, and the station connected to A wants
to talk to the CPU (the bridge net device), what will happen is the
station connected to port B will also see this traffic due to flooding.
Anyway, that behavior has existed since forever for drivers which don't
learn the MAC SA from host-transmitted packets, and don't have functional
software-based host RX filtering.

So I'm concerned that if you try to push solution B for problem A, and
test it only for problem A's circumstances and current requirements,
this will create the premise for a never-ending rabbit hole of further
fixups onto the new logic, sent to stable kernels.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-10 21:02         ` Vladimir Oltean
@ 2025-08-11 15:32           ` Vladimir Oltean
  2025-08-12  1:23             ` Daniel Golle
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2025-08-11 15:32 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

Hi Daniel,

On Mon, Aug 11, 2025 at 12:02:00AM +0300, Vladimir Oltean wrote:
> I suggest tools/testing/selftests/net/forwarding/local_termination.sh
> once dsa_switch_supports_uc_filtering() returns true.

Since you're working with the lantiq_gswip driver which receives
relatively few patches...

I would like to submit this patch to remove the legacy behavior:

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 6b8a5101b0e7..7e11f198ff2b 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -886,8 +886,6 @@ static int gswip_setup(struct dsa_switch *ds)

 	ds->mtu_enforcement_ingress = true;

-	ds->configure_vlan_while_not_filtering = false;
-
 	return 0;
 }

however I'm sure that the driver will break, so I have more, in an
attempt to avoid that :)

Would you please look at the patches I've prepared on this branch and
reviewing with extra info you might have / giving them a test, one by one?
I was only able to compile-test them. I also lack proper documentation
(which I'm sure you lack too), I only saw the "developer resources" from
Martin Blumenstingl's Github (which lack actual PCE register descriptions)
https://github.com/xdarklight/ltq-upstream-status
and the Maxlinear PRPLOS code at
https://github.com/maxlinear/linux/tree/UPDK_9.1.90/drivers/net/datapath/gswip/switchcore/src
(which I think is what you were also referencing)

The branch over net-next is here:
https://github.com/vladimiroltean/linux/commits/lantiq-gswip/

Thanks!

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del}
  2025-08-11 15:32           ` Vladimir Oltean
@ 2025-08-12  1:23             ` Daniel Golle
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Golle @ 2025-08-12  1:23 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hauke Mehrtens, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Arkadi Sharshevsky, Florian Fainelli,
	netdev, linux-kernel, Andreas Schirm, Alexander Sverdlin,
	Lukas Stockmann, John Crispin

On Mon, Aug 11, 2025 at 06:32:42PM +0300, Vladimir Oltean wrote:
> Hi Daniel,
> 
> On Mon, Aug 11, 2025 at 12:02:00AM +0300, Vladimir Oltean wrote:
> > I suggest tools/testing/selftests/net/forwarding/local_termination.sh
> > once dsa_switch_supports_uc_filtering() returns true.
> 
> Since you're working with the lantiq_gswip driver which receives
> relatively few patches...
> 
> I would like to submit this patch to remove the legacy behavior:
> 
> diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
> index 6b8a5101b0e7..7e11f198ff2b 100644
> --- a/drivers/net/dsa/lantiq_gswip.c
> +++ b/drivers/net/dsa/lantiq_gswip.c
> @@ -886,8 +886,6 @@ static int gswip_setup(struct dsa_switch *ds)
> 
>  	ds->mtu_enforcement_ingress = true;
> 
> -	ds->configure_vlan_while_not_filtering = false;
> -
>  	return 0;
>  }
> 
> however I'm sure that the driver will break, so I have more, in an
> attempt to avoid that :)

This honourable endeavour deserves my full support :)

> 
> Would you please look at the patches I've prepared on this branch and
> reviewing with extra info you might have / giving them a test, one by one?
> I was only able to compile-test them. I also lack proper documentation
> (which I'm sure you lack too), I only saw the "developer resources" from
> Martin Blumenstingl's Github (which lack actual PCE register descriptions)
> https://github.com/xdarklight/ltq-upstream-status
> and the Maxlinear PRPLOS code at
> https://github.com/maxlinear/linux/tree/UPDK_9.1.90/drivers/net/datapath/gswip/switchcore/src
> (which I think is what you were also referencing)

There is also a DSA driver for the newer standalone GSW1xx switch ICs
which are connected to a CPU using either MDIO or SPI for management and
2500Base-X/SGMII or RGMII/RMII for the datapath. I'm working on merging
that with the existing lantiq_gswip driver as those newer switch ICs have
a lot in common with the older Lantiq/Intel in-SoC switches.

You can see my work-in-progress first cleaning up the reference mxl-gsw1xx
driver and then preparing the lantiq_gswip driver for a potential merge
with that driver here:

https://github.com/dangowrt/linux/commits/mxl-gsw1xx-cleanup/

The old Lantiq VRV20x turned out to be difficult to even undergo proper testing
using tools/test/selftest/drivers/net/dsa scripts due to most boards coming with
only 64 MiB or DDR2 RAM -- even with nothing else running and the root
filesystem on-flash tcpdump quickly causes oom when trying to run
local_termination.sh.

In the next days I'm going to receive two boards with 256 MiB of RAM which
I found used for little money on ebay which will allow me to at least test
GSWIP 2.1 (xrx200) and GSWIP 2.2 (xrx330).
I don't think any of those boards comes with an old enough xrx200 SoC to
still contain the GSWIP 2.0 IP...

> 
> The branch over net-next is here:
> https://github.com/vladimiroltean/linux/commits/lantiq-gswip/

Nice, I'll happily take a look tomorrow and can do basic testing on my
TL-W8970 (VRV200, but only 64 MiB of DDR2 RAM), and hopefully the full
DSA selftest suite (or rather the part of it which covers features
actually supported by the driver/hardware) starting with
local_termination.sh once I receive the more resourceful boards with
VRV200 and VRV330 in the next couple of days.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-12  1:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-09 22:35 [PATCH/RFC net] net: dsa: lantiq_gswip: honor dsa_db passed to port_fdb_{add,del} Daniel Golle
2025-08-10 13:06 ` Vladimir Oltean
2025-08-10 14:48   ` Daniel Golle
2025-08-10 16:32     ` Vladimir Oltean
2025-08-10 16:54       ` Daniel Golle
2025-08-10 21:02         ` Vladimir Oltean
2025-08-11 15:32           ` Vladimir Oltean
2025-08-12  1:23             ` Daniel Golle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).