netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
@ 2025-04-25 15:13 Marek Behún
  2025-04-25 16:04 ` Andrew Lunn
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Behún @ 2025-04-25 15:13 UTC (permalink / raw)
  To: netdev
  Cc: Marek Behún, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

Completely forbid access to the internal switch PHYs via management
frames if there is another MDIO device on the MDIO bus besides the QCA8K
switch.

It seems that when internal PHYs are accessed via management frames,
internally the switch core translates these requests to MDIO accesses
and communicates with the internal PHYs via the MDIO bus. This
communication leaks outside on the MDC and MDIO pins. If there is
another PHY device connected on the MDIO bus besides the QCA8K switch,
and the kernel tries to communicate with the PHY at the same time, the
communication gets corrupted.

This makes the WAN PHY break on the Turris 1.x device.

In commit 526c8ee04bdbd4d8 ("net: dsa: qca8k: fix potential MDIO bus
conflict when accessing internal PHYs via management frames") we tried
to solve this issue by locking access to the MDIO bus when accessing
internal PHYs via management frames. It seems that the bug still
prevails though, and we are not able to determine why. Therefore this
seems the only viable fix for now.

Fixes: 526c8ee04bdbd4d8 ("net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames")
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/net/dsa/qca/qca8k-8xxx.c | 53 ++++++++++++++++++++++++--------
 drivers/net/dsa/qca/qca8k.h      |  1 +
 2 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index a36b8b07030e..7f9fd3fc0d75 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -633,6 +633,9 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 	if (regnum >= QCA8K_MDIO_MASTER_MAX_REG)
 		return -EINVAL;
 
+	if (!priv->can_access_phys_over_mgmt)
+		return -EBUSY;
+
 	mgmt_eth_data = &priv->mgmt_eth_data;
 
 	write_val = QCA8K_MDIO_MASTER_BUSY | QCA8K_MDIO_MASTER_EN |
@@ -666,15 +669,6 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 		goto err_read_skb;
 	}
 
-	/* It seems that accessing the switch's internal PHYs via management
-	 * packets still uses the MDIO bus within the switch internally, and
-	 * these accesses can conflict with external MDIO accesses to other
-	 * devices on the MDIO bus.
-	 * We therefore need to lock the MDIO bus onto which the switch is
-	 * connected.
-	 */
-	mutex_lock_nested(&priv->bus->mdio_lock, MDIO_MUTEX_NESTED);
-
 	/* Actually start the request:
 	 * 1. Send mdio master packet
 	 * 2. Busy Wait for mdio master command
@@ -687,7 +681,6 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 	mgmt_conduit = priv->mgmt_conduit;
 	if (!mgmt_conduit) {
 		mutex_unlock(&mgmt_eth_data->mutex);
-		mutex_unlock(&priv->bus->mdio_lock);
 		ret = -EINVAL;
 		goto err_mgmt_conduit;
 	}
@@ -775,7 +768,6 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 				    QCA8K_ETHERNET_TIMEOUT);
 
 	mutex_unlock(&mgmt_eth_data->mutex);
-	mutex_unlock(&priv->bus->mdio_lock);
 
 	return ret;
 
@@ -943,6 +935,39 @@ qca8k_legacy_mdio_read(struct mii_bus *slave_bus, int port, int regnum)
 	return qca8k_internal_mdio_read(slave_bus, port, regnum);
 }
 
+static int qca8k_mdio_determine_access_over_eth(struct qca8k_priv *priv)
+{
+	struct device_node *parent = of_get_parent(priv->dev->of_node);
+	int addr = to_mdio_device(priv->dev)->addr;
+	bool result = true;
+	u32 reg;
+	int ret;
+
+	/* It seems that accessing the switch's internal PHYs via management
+	 * packets still uses the MDIO bus within the switch internally, and
+	 * these accesses can conflict with external MDIO accesses to other
+	 * devices on the MDIO bus.
+	 *
+	 * Determine whether there are other devices on the MDIO bus besides
+	 * the switch, and if there are, forbid access to the internal PHYs
+	 * via management frames.
+	 */
+	for_each_available_child_of_node_scoped(parent, sibling) {
+		ret = of_property_read_u32(sibling, "reg", &reg);
+		if (ret)
+			return ret;
+
+		if (reg != addr) {
+			result = false;
+			break;
+		}
+	}
+
+	priv->can_access_phys_over_mgmt = result;
+
+	return 0;
+}
+
 static int
 qca8k_mdio_register(struct qca8k_priv *priv)
 {
@@ -950,7 +975,11 @@ qca8k_mdio_register(struct qca8k_priv *priv)
 	struct device *dev = ds->dev;
 	struct device_node *mdio;
 	struct mii_bus *bus;
-	int ret = 0;
+	int ret;
+
+	ret = qca8k_mdio_determine_access_over_eth(priv);
+	if (ret)
+		return ret;
 
 	mdio = of_get_child_by_name(dev->of_node, "mdio");
 	if (mdio && !of_device_is_available(mdio))
diff --git a/drivers/net/dsa/qca/qca8k.h b/drivers/net/dsa/qca/qca8k.h
index d046679265fa..346f79f81fe5 100644
--- a/drivers/net/dsa/qca/qca8k.h
+++ b/drivers/net/dsa/qca/qca8k.h
@@ -463,6 +463,7 @@ struct qca8k_priv {
 	struct net_device *mgmt_conduit; /* Track if mdio/mib Ethernet is available */
 	struct qca8k_mgmt_eth_data mgmt_eth_data;
 	struct qca8k_mib_eth_data mib_eth_data;
+	bool can_access_phys_over_mgmt;
 	struct qca8k_mdio_cache mdio_cache;
 	struct qca8k_pcs pcs_port_0;
 	struct qca8k_pcs pcs_port_6;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
  2025-04-25 15:13 [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus Marek Behún
@ 2025-04-25 16:04 ` Andrew Lunn
  2025-04-25 16:29   ` Marek Behún
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2025-04-25 16:04 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

On Fri, Apr 25, 2025 at 05:13:09PM +0200, Marek Behún wrote:
> Completely forbid access to the internal switch PHYs via management
> frames if there is another MDIO device on the MDIO bus besides the QCA8K
> switch.
> 
> It seems that when internal PHYs are accessed via management frames,
> internally the switch core translates these requests to MDIO accesses
> and communicates with the internal PHYs via the MDIO bus. This
> communication leaks outside on the MDC and MDIO pins. If there is
> another PHY device connected on the MDIO bus besides the QCA8K switch,
> and the kernel tries to communicate with the PHY at the same time, the
> communication gets corrupted.
> 
> This makes the WAN PHY break on the Turris 1.x device.

Let me see if i understand the architecture correctly...

You have a host MDIO bus, which has both the QCA8K and a PHY on it.

Within the QCA8K there are a number of PHYs. Looking at qca8k-8xxx.c,
there is qca8k_mdio_write() and qca8k_mdio_read() which use a register
within the switch to access an MDIO bus for the internal PHYs. The
assumption is that the internal MDIO is isolated from the host MDIO
bus. That should be easy to prove, just read register 2 and 3 from all
32 addresses of the host MDIO bus, and make sure you don't find the
internal PHYs.

The issue is that when you use MDIO over Ethernet frames, both the
internal and external bus see the bus transaction. This happens
asynchronously to whatever the host MDIO bus driver is doing, and you
sometimes get collisions. This in theory should affect both the PHY
and the switch on the bus, but maybe because of the address being
accessed, you don't notice any issue with the switch?

The assumption is that the switch hardware interpreting the MDIO over
Ethernet is driving both the internal and external MDIO bus. Again,
this should be easy to prove, read ID registers 2 and 3 from the
address the external PHY is on, and see what you get.

I guess the real question here is, has it been proven that there are
actually two MDIO busses, not one bus with two masters? I could
theoretically see an architecture where the switch is not managed over
MDIO at all. It has an EEPROM to do sufficient configuration that you
can do all the management using Ethernet frames. The MDIO over
Ethernet then allow you to manage any external PHYs.

	 Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
  2025-04-25 16:04 ` Andrew Lunn
@ 2025-04-25 16:29   ` Marek Behún
  2025-04-25 16:53     ` Andrew Lunn
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Behún @ 2025-04-25 16:29 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Behún, netdev, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

On Fri, Apr 25, 2025 at 06:04:43PM +0200, Andrew Lunn wrote:
> On Fri, Apr 25, 2025 at 05:13:09PM +0200, Marek Behún wrote:
> > Completely forbid access to the internal switch PHYs via management
> > frames if there is another MDIO device on the MDIO bus besides the QCA8K
> > switch.
> > 
> > It seems that when internal PHYs are accessed via management frames,
> > internally the switch core translates these requests to MDIO accesses
> > and communicates with the internal PHYs via the MDIO bus. This
> > communication leaks outside on the MDC and MDIO pins. If there is
> > another PHY device connected on the MDIO bus besides the QCA8K switch,
> > and the kernel tries to communicate with the PHY at the same time, the
> > communication gets corrupted.
> > 
> > This makes the WAN PHY break on the Turris 1.x device.
> 
> Let me see if i understand the architecture correctly...
> 
> You have a host MDIO bus, which has both the QCA8K and a PHY on it.
> 
> Within the QCA8K there are a number of PHYs. Looking at qca8k-8xxx.c,
> there is qca8k_mdio_write() and qca8k_mdio_read() which use a register
> within the switch to access an MDIO bus for the internal PHYs. The
> assumption is that the internal MDIO is isolated from the host MDIO
> bus. That should be easy to prove, just read register 2 and 3 from all
> 32 addresses of the host MDIO bus, and make sure you don't find the
> internal PHYs.

I did not think of doing this test to determine whether the buses are
isolated when I debugged this issue and came up with commit
526c8ee04bdbd4d8.

I assumed that the buses are not isolated because I saw activity on the
bus on an oscilloscope when accessing the PHYs over Ethernet frames.

Please look at the commit message of 526c8ee04bdbd4d8
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/dsa/qca?id=526c8ee04bdbd4d8

> The issue is that when you use MDIO over Ethernet frames, both the
> internal and external bus see the bus transaction. This happens
> asynchronously to whatever the host MDIO bus driver is doing, and you
> sometimes get collisions. This in theory should affect both the PHY
> and the switch on the bus, but maybe because of the address being
> accessed, you don't notice any issue with the switch?

Yes, maybe, but also maybe because after the above mentioned commit,
the access to the MDIO bus was locked when accessing over Ethernet
frames. That was how I tried to fix this in 526c8ee04bdbd4d8.

> The assumption is that the switch hardware interpreting the MDIO over
> Ethernet is driving both the internal and external MDIO bus. Again,
> this should be easy to prove, read ID registers 2 and 3 from the
> address the external PHY is on, and see what you get.

I can try this test, but the result will change the need for this
patch.

> I guess the real question here is, has it been proven that there are
> actually two MDIO busses, not one bus with two masters? I could
> theoretically see an architecture where the switch is not managed over
> MDIO at all. It has an EEPROM to do sufficient configuration that you
> can do all the management using Ethernet frames. The MDIO over
> Ethernet then allow you to manage any external PHYs.

From the ASCII art in commit 526c8ee04bdbd4d8 you can clearly see that
I just assumed there is just one MDIO bus, because I saw activity on
an oscilloscope. I didn't test it the way you now suggested. I also
didn't think of the consequences for the driver design and
device-trees. If we prove that there is just one MDIO bus with two
masters instead of two separate buses that leak some noise in some
situations, the situation will become more complicated...

Marek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
  2025-04-25 16:29   ` Marek Behún
@ 2025-04-25 16:53     ` Andrew Lunn
  2025-04-25 17:25       ` Marek Behún
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2025-04-25 16:53 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

> >From the ASCII art in commit 526c8ee04bdbd4d8 you can clearly see that
> I just assumed there is just one MDIO bus, because I saw activity on
> an oscilloscope. I didn't test it the way you now suggested. I also
> didn't think of the consequences for the driver design and
> device-trees. If we prove that there is just one MDIO bus with two
> masters instead of two separate buses that leak some noise in some
> situations, the situation will become more complicated...

It is not really that more complex. Some of the mv88e6xxx devices have
a similar architecture.

You need to throw away MDIO over Ethernet and stop using the switches
master. Because it is async, it cannot be used. The switch MDIO bus
driver then just issues bus transactions on the host MDIO bus, using
mdiobus_read_nested().

	Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
  2025-04-25 16:53     ` Andrew Lunn
@ 2025-04-25 17:25       ` Marek Behún
  2025-04-25 19:11         ` Andrew Lunn
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Behún @ 2025-04-25 17:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Behún, netdev, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

On Fri, Apr 25, 2025 at 06:53:21PM +0200, Andrew Lunn wrote:
> > >From the ASCII art in commit 526c8ee04bdbd4d8 you can clearly see that
> > I just assumed there is just one MDIO bus, because I saw activity on
> > an oscilloscope. I didn't test it the way you now suggested. I also
> > didn't think of the consequences for the driver design and
> > device-trees. If we prove that there is just one MDIO bus with two
> > masters instead of two separate buses that leak some noise in some
> > situations, the situation will become more complicated...
> 
> It is not really that more complex. Some of the mv88e6xxx devices have
> a similar architecture.
> 
> You need to throw away MDIO over Ethernet and stop using the switches
> master. Because it is async, it cannot be used. The switch MDIO bus
> driver then just issues bus transactions on the host MDIO bus, using
> mdiobus_read_nested().

But this is just in case when there is a separate device on the external
bus besides the switch, right?

The access over ethernet frames needs to stay because it speeds up
DSA operations. The problem is only on Turris 1.x because it breaks
WAN PHY.

Marek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus
  2025-04-25 17:25       ` Marek Behún
@ 2025-04-25 19:11         ` Andrew Lunn
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2025-04-25 19:11 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King (Oracle),
	Rob Herring (Arm), Javier Carrasco, Wojciech Drewek,
	Matthias Schiffer, Christian Marangi, Marek Mojík

On Fri, Apr 25, 2025 at 07:25:04PM +0200, Marek Behún wrote:
> On Fri, Apr 25, 2025 at 06:53:21PM +0200, Andrew Lunn wrote:
> > > >From the ASCII art in commit 526c8ee04bdbd4d8 you can clearly see that
> > > I just assumed there is just one MDIO bus, because I saw activity on
> > > an oscilloscope. I didn't test it the way you now suggested. I also
> > > didn't think of the consequences for the driver design and
> > > device-trees. If we prove that there is just one MDIO bus with two
> > > masters instead of two separate buses that leak some noise in some
> > > situations, the situation will become more complicated...
> > 
> > It is not really that more complex. Some of the mv88e6xxx devices have
> > a similar architecture.
> > 
> > You need to throw away MDIO over Ethernet and stop using the switches
> > master. Because it is async, it cannot be used. The switch MDIO bus
> > driver then just issues bus transactions on the host MDIO bus, using
> > mdiobus_read_nested().
> 
> But this is just in case when there is a separate device on the external
> bus besides the switch, right?
> 
> The access over ethernet frames needs to stay because it speeds up
> DSA operations. The problem is only on Turris 1.x because it breaks
> WAN PHY.

You can keep access to other registers, and statistics etc via
Ethernet frames, but MDIO other Ethernet is going to be an issue.

I suppose you could take the Linux MDIO bus lock, transmit the MDIO
over Ethernet command to the switch, wait for the reply, and then
unlock the MDIO bus. That will prevent Linux accessing the PHY or
switch while there is an outstanding MDIO over Ethernet requests. But
it seems simpler to just do all MDIO over the host MDIO bus. So long
as you are not bit banging, it won't be much slower.

	Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-25 19:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-25 15:13 [PATCH net] net: dsa: qca8k: forbid management frames access to internal PHYs if another device is on the MDIO bus Marek Behún
2025-04-25 16:04 ` Andrew Lunn
2025-04-25 16:29   ` Marek Behún
2025-04-25 16:53     ` Andrew Lunn
2025-04-25 17:25       ` Marek Behún
2025-04-25 19:11         ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).