netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] net: dsa: qca8k: fix qca8k driver for Turris 1.x
@ 2023-10-02 10:46 Marek Behún
  2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
  2023-10-02 10:46 ` [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames Marek Behún
  0 siblings, 2 replies; 8+ messages in thread
From: Marek Behún @ 2023-10-02 10:46 UTC (permalink / raw)
  To: Christian Marangi, David S. Miller, Paolo Abeni, netdev; +Cc: Marek Behún

Hi,

this patch series contains two fixes to commits that broke qca8k driver
on the Turris 1.x router.

The first patch is another fix of qca8k's regmap implementation on
big-endian systems.

The second patch locks the MDIO bus even when accessing switch internal
PHYs via ethernet management frames, since it seems that internally the
switch still uses MDIO transfers and these can leak outside of the
switch back to the MDIO bus, potentially conflicting with the WAN PHY
register accesses.

Marek

Marek Behún (2):
  net: dsa: qca8k: fix regmap bulk read/write methods on big endian
    systems
  net: dsa: qca8k: fix potential MDIO bus conflict when accessing
    internal PHYs via management frames

 drivers/net/dsa/qca/qca8k-8xxx.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems
  2023-10-02 10:46 [PATCH net 0/2] net: dsa: qca8k: fix qca8k driver for Turris 1.x Marek Behún
@ 2023-10-02 10:46 ` Marek Behún
  2023-10-02 12:03   ` Christian Marangi
  2023-10-03 15:22   ` Simon Horman
  2023-10-02 10:46 ` [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames Marek Behún
  1 sibling, 2 replies; 8+ messages in thread
From: Marek Behún @ 2023-10-02 10:46 UTC (permalink / raw)
  To: Christian Marangi, David S. Miller, Paolo Abeni, netdev; +Cc: Marek Behún

Commit c766e077d927 ("net: dsa: qca8k: convert to regmap read/write
API") introduced bulk read/write methods to qca8k's regmap.

The regmap bulk read/write methods get the register address in a buffer
passed as a void pointer parameter (the same buffer contains also the
read/written values). The register address occupies only as many bytes
as it requires at the beginning of this buffer. For example if the
.reg_bits member in regmap_config is 16 (as is the case for this
driver), the register address occupies only the first 2 bytes in this
buffer, so it can be cast to u16.

But the original commit implementing these bulk read/write methods cast
the buffer to u32:
  u32 reg = *(u32 *)reg_buf & U16_MAX;
taking the first 4 bytes. This works on little endian systems where the
first 2 bytes of the buffer correnspond to the low 16-bits, but it
obviously cannot work on big endian systems.

Fix this by casting the beginning of the buffer to u16 as
   u32 reg = *(u16 *)reg_buf;

Fixes: c766e077d927 ("net: dsa: qca8k: convert to regmap read/write API")
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/net/dsa/qca/qca8k-8xxx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index de1dc22cf683..d2df30640269 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -505,8 +505,8 @@ qca8k_bulk_read(void *ctx, const void *reg_buf, size_t reg_len,
 		void *val_buf, size_t val_len)
 {
 	int i, count = val_len / sizeof(u32), ret;
-	u32 reg = *(u32 *)reg_buf & U16_MAX;
 	struct qca8k_priv *priv = ctx;
+	u32 reg = *(u16 *)reg_buf;
 
 	if (priv->mgmt_master &&
 	    !qca8k_read_eth(priv, reg, val_buf, val_len))
@@ -527,8 +527,8 @@ qca8k_bulk_gather_write(void *ctx, const void *reg_buf, size_t reg_len,
 			const void *val_buf, size_t val_len)
 {
 	int i, count = val_len / sizeof(u32), ret;
-	u32 reg = *(u32 *)reg_buf & U16_MAX;
 	struct qca8k_priv *priv = ctx;
+	u32 reg = *(u16 *)reg_buf;
 	u32 *val = (u32 *)val_buf;
 
 	if (priv->mgmt_master &&
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames
  2023-10-02 10:46 [PATCH net 0/2] net: dsa: qca8k: fix qca8k driver for Turris 1.x Marek Behún
  2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
@ 2023-10-02 10:46 ` Marek Behún
  2023-10-02 12:11   ` Christian Marangi
  1 sibling, 1 reply; 8+ messages in thread
From: Marek Behún @ 2023-10-02 10:46 UTC (permalink / raw)
  To: Christian Marangi, David S. Miller, Paolo Abeni, netdev; +Cc: Marek Behún

Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus
also Micron ethernet PHY (dedicated to the WAN port).

We've been experiencing a strange behavior of the WAN ethernet
interface, wherein the WAN PHY started timing out the MDIO accesses, for
example when the interface was brought down and then back up.

Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for
phy read/write with mgmt Ethernet"), which added support to access the
QCA8337 switch's internal PHYs via management ethernet frames.

Connecting the MDIO bus pins onto an oscilloscope, I was able to see
that the MDIO bus was active whenever a request to read/write an
internal PHY register was done via an management ethernet frame.

My theory is that when the switch core always communicates with the
internal PHYs via the MDIO bus, even when externally we request the
access via ethernet. This MDIO bus is the same one via which the switch
and internal PHYs are accessible to the board, and the board may have
other devices connected on this bus. An ASCII illustration may give more
insight:

           +---------+
      +----|         |
      |    | WAN PHY |
      | +--|         |
      | |  +---------+
      | |
      | |  +----------------------------------+
      | |  | QCA8337                          |
MDC   | |  |                        +-------+ |
------o-+--|--------o------------o--|       | |
MDIO    |  |        |            |  | PHY 1 |-|--to RJ45
--------o--|---o----+---------o--+--|       | |
           |   |    |         |  |  +-------+ |
	   | +-------------+  |  o--|       | |
	   | | MDIO MDC    |  |  |  | PHY 2 |-|--to RJ45
eth1	   | |             |  o--+--|       | |
-----------|-|port0        |  |  |  +-------+ |
           | |             |  |  o--|       | |
	   | | switch core |  |  |  | PHY 3 |-|--to RJ45
           | +-------------+  o--+--|       | |
	   |                  |  |  +-------+ |
	   |                  |  o--|  ...  | |
	   +----------------------------------+

When we send a request to read an internal PHY register via an ethernet
management frame via eth1, the switch core receives the ethernet frame
on port 0 and then communicates with the internal PHY via MDIO. At this
time, other potential devices, such as the WAN PHY on Turris 1.x, cannot
use the MDIO bus, since it may cause a bus conflict.

Fix this issue by locking the MDIO bus even when we are accessing the
PHY registers via ethernet management frames.

Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/net/dsa/qca/qca8k-8xxx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index d2df30640269..4ce68e655a63 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -666,6 +666,15 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 		goto err_read_skb;
 	}
 
+	/* It seems that accessing the switch's internal PHYs via management
+	 * packets still uses the MDIO bus within the switch internally, and
+	 * these accesses can conflict with external MDIO accesses to other
+	 * devices on the MDIO bus.
+	 * We therefore need to lock the MDIO bus onto which the switch is
+	 * connected.
+	 */
+	mutex_lock(&priv->bus->mdio_lock);
+
 	/* Actually start the request:
 	 * 1. Send mdio master packet
 	 * 2. Busy Wait for mdio master command
@@ -678,6 +687,7 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 	mgmt_master = priv->mgmt_master;
 	if (!mgmt_master) {
 		mutex_unlock(&mgmt_eth_data->mutex);
+		mutex_unlock(&priv->bus->mdio_lock);
 		ret = -EINVAL;
 		goto err_mgmt_master;
 	}
@@ -765,6 +775,7 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
 				    QCA8K_ETHERNET_TIMEOUT);
 
 	mutex_unlock(&mgmt_eth_data->mutex);
+	mutex_unlock(&priv->bus->mdio_lock);
 
 	return ret;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems
  2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
@ 2023-10-02 12:03   ` Christian Marangi
  2023-10-03 15:22   ` Simon Horman
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Marangi @ 2023-10-02 12:03 UTC (permalink / raw)
  To: Marek Behún; +Cc: David S. Miller, Paolo Abeni, netdev

On Mon, Oct 02, 2023 at 12:46:11PM +0200, Marek Behún wrote:
> Commit c766e077d927 ("net: dsa: qca8k: convert to regmap read/write
> API") introduced bulk read/write methods to qca8k's regmap.
> 
> The regmap bulk read/write methods get the register address in a buffer
> passed as a void pointer parameter (the same buffer contains also the
> read/written values). The register address occupies only as many bytes
> as it requires at the beginning of this buffer. For example if the
> .reg_bits member in regmap_config is 16 (as is the case for this
> driver), the register address occupies only the first 2 bytes in this
> buffer, so it can be cast to u16.
> 
> But the original commit implementing these bulk read/write methods cast
> the buffer to u32:
>   u32 reg = *(u32 *)reg_buf & U16_MAX;
> taking the first 4 bytes. This works on little endian systems where the
> first 2 bytes of the buffer correnspond to the low 16-bits, but it
> obviously cannot work on big endian systems.
> 
> Fix this by casting the beginning of the buffer to u16 as
>    u32 reg = *(u16 *)reg_buf;
> 
> Fixes: c766e077d927 ("net: dsa: qca8k: convert to regmap read/write API")
> Signed-off-by: Marek Behún <kabel@kernel.org>

Reviewed-by: Christian Marangi <ansuelsmth@gmail.com>

-- 
	Ansuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames
  2023-10-02 10:46 ` [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames Marek Behún
@ 2023-10-02 12:11   ` Christian Marangi
  2023-10-03 10:05     ` Marek Behún
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Marangi @ 2023-10-02 12:11 UTC (permalink / raw)
  To: Marek Behún; +Cc: David S. Miller, Paolo Abeni, netdev

On Mon, Oct 02, 2023 at 12:46:12PM +0200, Marek Behún wrote:
> Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus
> also Micron ethernet PHY (dedicated to the WAN port).
> 
> We've been experiencing a strange behavior of the WAN ethernet
> interface, wherein the WAN PHY started timing out the MDIO accesses, for
> example when the interface was brought down and then back up.
> 
> Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for
> phy read/write with mgmt Ethernet"), which added support to access the
> QCA8337 switch's internal PHYs via management ethernet frames.
> 
> Connecting the MDIO bus pins onto an oscilloscope, I was able to see
> that the MDIO bus was active whenever a request to read/write an
> internal PHY register was done via an management ethernet frame.
> 
> My theory is that when the switch core always communicates with the
> internal PHYs via the MDIO bus, even when externally we request the
> access via ethernet. This MDIO bus is the same one via which the switch
> and internal PHYs are accessible to the board, and the board may have
> other devices connected on this bus. An ASCII illustration may give more
> insight:
> 
>            +---------+
>       +----|         |
>       |    | WAN PHY |
>       | +--|         |
>       | |  +---------+
>       | |
>       | |  +----------------------------------+
>       | |  | QCA8337                          |
> MDC   | |  |                        +-------+ |
> ------o-+--|--------o------------o--|       | |
> MDIO    |  |        |            |  | PHY 1 |-|--to RJ45
> --------o--|---o----+---------o--+--|       | |
>            |   |    |         |  |  +-------+ |
> 	   | +-------------+  |  o--|       | |
> 	   | | MDIO MDC    |  |  |  | PHY 2 |-|--to RJ45
> eth1	   | |             |  o--+--|       | |
> -----------|-|port0        |  |  |  +-------+ |
>            | |             |  |  o--|       | |
> 	   | | switch core |  |  |  | PHY 3 |-|--to RJ45
>            | +-------------+  o--+--|       | |
> 	   |                  |  |  +-------+ |
> 	   |                  |  o--|  ...  | |
> 	   +----------------------------------+
> 
> When we send a request to read an internal PHY register via an ethernet
> management frame via eth1, the switch core receives the ethernet frame
> on port 0 and then communicates with the internal PHY via MDIO. At this
> time, other potential devices, such as the WAN PHY on Turris 1.x, cannot
> use the MDIO bus, since it may cause a bus conflict.
> 
> Fix this issue by locking the MDIO bus even when we are accessing the
> PHY registers via ethernet management frames.
> 
> Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
> Signed-off-by: Marek Behún <kabel@kernel.org>

Just some comments (micro-optimization) and one question.

Wonder if the extra lock would result in a bit of overhead for simple
implementation where the switch is the only thing connected to the MDIO.

It's just an idea and probably not even something to consider (since
probably the overhead is so little that it's not worth it)

But we might consider to add some logic in the MDIO setup function to
check if the MDIO have other PHY connected and enable this lock (and
make this optional with an if and a bool like require_mdio_locking)

If we don't account for this, yes the lock should have been there from
the start and this is correct. (we can make it optional only in the case
where only the switch is connected as it would be the only user and
everything is already locked by the eth_mgmt lock)

> ---
>  drivers/net/dsa/qca/qca8k-8xxx.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> index d2df30640269..4ce68e655a63 100644
> --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> @@ -666,6 +666,15 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
>  		goto err_read_skb;
>  	}
>  
> +	/* It seems that accessing the switch's internal PHYs via management
> +	 * packets still uses the MDIO bus within the switch internally, and
> +	 * these accesses can conflict with external MDIO accesses to other
> +	 * devices on the MDIO bus.
> +	 * We therefore need to lock the MDIO bus onto which the switch is
> +	 * connected.
> +	 */
> +	mutex_lock(&priv->bus->mdio_lock);
> +

Please move this down before the first dev_queue_xmit. (we can save a
few cycle where locking is not needed)

Also should we use mutex_lock_nested?

>  	/* Actually start the request:
>  	 * 1. Send mdio master packet
>  	 * 2. Busy Wait for mdio master command
> @@ -678,6 +687,7 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
>  	mgmt_master = priv->mgmt_master;
>  	if (!mgmt_master) {
>  		mutex_unlock(&mgmt_eth_data->mutex);
> +		mutex_unlock(&priv->bus->mdio_lock);
>  		ret = -EINVAL;
>  		goto err_mgmt_master;
>  	}
> @@ -765,6 +775,7 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
>  				    QCA8K_ETHERNET_TIMEOUT);
>  
>  	mutex_unlock(&mgmt_eth_data->mutex);
> +	mutex_unlock(&priv->bus->mdio_lock);
>  
>  	return ret;
>  
> -- 
> 2.41.0
> 

-- 
	Ansuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames
  2023-10-02 12:11   ` Christian Marangi
@ 2023-10-03 10:05     ` Marek Behún
  2023-10-03 10:45       ` Christian Marangi
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Behún @ 2023-10-03 10:05 UTC (permalink / raw)
  To: Christian Marangi; +Cc: David S. Miller, Paolo Abeni, netdev

On Mon, 2 Oct 2023 14:11:43 +0200
Christian Marangi <ansuelsmth@gmail.com> wrote:

> On Mon, Oct 02, 2023 at 12:46:12PM +0200, Marek Behún wrote:
> > Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus
> > also Micron ethernet PHY (dedicated to the WAN port).
> > 
> > We've been experiencing a strange behavior of the WAN ethernet
> > interface, wherein the WAN PHY started timing out the MDIO accesses, for
> > example when the interface was brought down and then back up.
> > 
> > Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for
> > phy read/write with mgmt Ethernet"), which added support to access the
> > QCA8337 switch's internal PHYs via management ethernet frames.
> > 
> > Connecting the MDIO bus pins onto an oscilloscope, I was able to see
> > that the MDIO bus was active whenever a request to read/write an
> > internal PHY register was done via an management ethernet frame.
> > 
> > My theory is that when the switch core always communicates with the
> > internal PHYs via the MDIO bus, even when externally we request the
> > access via ethernet. This MDIO bus is the same one via which the switch
> > and internal PHYs are accessible to the board, and the board may have
> > other devices connected on this bus. An ASCII illustration may give more
> > insight:
> > 
> >            +---------+
> >       +----|         |
> >       |    | WAN PHY |
> >       | +--|         |
> >       | |  +---------+
> >       | |
> >       | |  +----------------------------------+
> >       | |  | QCA8337                          |
> > MDC   | |  |                        +-------+ |
> > ------o-+--|--------o------------o--|       | |
> > MDIO    |  |        |            |  | PHY 1 |-|--to RJ45
> > --------o--|---o----+---------o--+--|       | |
> >            |   |    |         |  |  +-------+ |
> > 	   | +-------------+  |  o--|       | |
> > 	   | | MDIO MDC    |  |  |  | PHY 2 |-|--to RJ45
> > eth1	   | |             |  o--+--|       | |
> > -----------|-|port0        |  |  |  +-------+ |
> >            | |             |  |  o--|       | |
> > 	   | | switch core |  |  |  | PHY 3 |-|--to RJ45
> >            | +-------------+  o--+--|       | |
> > 	   |                  |  |  +-------+ |
> > 	   |                  |  o--|  ...  | |
> > 	   +----------------------------------+
> > 
> > When we send a request to read an internal PHY register via an ethernet
> > management frame via eth1, the switch core receives the ethernet frame
> > on port 0 and then communicates with the internal PHY via MDIO. At this
> > time, other potential devices, such as the WAN PHY on Turris 1.x, cannot
> > use the MDIO bus, since it may cause a bus conflict.
> > 
> > Fix this issue by locking the MDIO bus even when we are accessing the
> > PHY registers via ethernet management frames.
> > 
> > Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
> > Signed-off-by: Marek Behún <kabel@kernel.org>  
> 
> Just some comments (micro-optimization) and one question.
> 
> Wonder if the extra lock would result in a bit of overhead for simple
> implementation where the switch is the only thing connected to the MDIO.
> 
> It's just an idea and probably not even something to consider (since
> probably the overhead is so little that it's not worth it)
> 
> But we might consider to add some logic in the MDIO setup function to
> check if the MDIO have other PHY connected and enable this lock (and
> make this optional with an if and a bool like require_mdio_locking)
> 
> If we don't account for this, yes the lock should have been there from
> the start and this is correct. (we can make it optional only in the case
> where only the switch is connected as it would be the only user and
> everything is already locked by the eth_mgmt lock)

I don't think we should do that. It is possible that a PHY may be
registered during the time that the mutex is locked, even if the PHY is
not defined in device-tree. A driver may be probed that calls
mdiobus_scan, which will cause transactions on the MDIO bus. Currently
there are no such drivers in kernel, but they may be in the future.

Anyway, this is a regression fix, it should be merged. If you want to
optimize it, I think it should be done afterwards in net-next.

> > ---
> >  drivers/net/dsa/qca/qca8k-8xxx.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> > index d2df30640269..4ce68e655a63 100644
> > --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> > +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> > @@ -666,6 +666,15 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
> >  		goto err_read_skb;
> >  	}
> >  
> > +	/* It seems that accessing the switch's internal PHYs via management
> > +	 * packets still uses the MDIO bus within the switch internally, and
> > +	 * these accesses can conflict with external MDIO accesses to other
> > +	 * devices on the MDIO bus.
> > +	 * We therefore need to lock the MDIO bus onto which the switch is
> > +	 * connected.
> > +	 */
> > +	mutex_lock(&priv->bus->mdio_lock);
> > +  
> 
> Please move this down before the first dev_queue_xmit. (we can save a
> few cycle where locking is not needed)

I put it before the mgmt lock for the following reason: if I first lock
the mgmt_eth_data and only then the MDIO bus mutex, and a MDIO
transaction is being done on another device, the mgmt_eth_data mutex is
unnecessarily locked for a longer time (since MDIO is slow). I thought
that the whole point of register writes via ethernet frames was to make
it faster. If another part of the driver wants to read/write a
switch register, it should not be unnecessarily slowed down because a
MDIO transaction to a unrelated device.

Illustration when MDIO mutex is locked before first skb queue, as you
suggested:

  WAN PHY driver	qca8k PHY read		qca8k reg read

  mdio mutex locked
  reading		eth mutex locked
  reading		mdio mutex lock
  reading		waiting			eth mutex lock
  reading		waiting			waiting
  reading		waiting			waiting
  mdio mutex unlocked	waiting			waiting
			mdio mutex locked	waiting
			reading			waiting
			mdio mutex unlocked	waiting
			eth mutex unlocked	waiting
						eth mutex locked
						reading
						eth mutex unlocked

Illustration when MDIO mutex is locked before eth mutex:

  WAN PHY driver	qca8k PHY read		qca8k reg read

  mdio mutex locked
  reading		mdio mutex lock
  reading		waiting			eth mutex locked
  reading		waiting			reading
  reading		waiting			eth mutex unlocked
  reading		waiting
  mdio mutex unlocked   waiting
			mdio mutex locked
			eth mutex locked
			reading
			eth mutex unlocked
			mdio mutex unlocked

Notice how in the second illustration the qca8k register read is not
slowed by the mdio mutex.

> Also should we use mutex_lock_nested?

That would allow some MDIO bus reads, for example if someone called
mdiobus_read() on the bus. We specifically want to completely avoid 
this. We are not doing any nested reads on the MDIO bus here, so no,
we should not be using mutex_lock_nested().

Marek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames
  2023-10-03 10:05     ` Marek Behún
@ 2023-10-03 10:45       ` Christian Marangi
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Marangi @ 2023-10-03 10:45 UTC (permalink / raw)
  To: Marek Behún; +Cc: David S. Miller, Paolo Abeni, netdev

On Tue, Oct 03, 2023 at 12:05:10PM +0200, Marek Behún wrote:
> On Mon, 2 Oct 2023 14:11:43 +0200
> Christian Marangi <ansuelsmth@gmail.com> wrote:
> 
> > On Mon, Oct 02, 2023 at 12:46:12PM +0200, Marek Behún wrote:
> > > Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus
> > > also Micron ethernet PHY (dedicated to the WAN port).
> > > 
> > > We've been experiencing a strange behavior of the WAN ethernet
> > > interface, wherein the WAN PHY started timing out the MDIO accesses, for
> > > example when the interface was brought down and then back up.
> > > 
> > > Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for
> > > phy read/write with mgmt Ethernet"), which added support to access the
> > > QCA8337 switch's internal PHYs via management ethernet frames.
> > > 
> > > Connecting the MDIO bus pins onto an oscilloscope, I was able to see
> > > that the MDIO bus was active whenever a request to read/write an
> > > internal PHY register was done via an management ethernet frame.
> > > 
> > > My theory is that when the switch core always communicates with the
> > > internal PHYs via the MDIO bus, even when externally we request the
> > > access via ethernet. This MDIO bus is the same one via which the switch
> > > and internal PHYs are accessible to the board, and the board may have
> > > other devices connected on this bus. An ASCII illustration may give more
> > > insight:
> > > 
> > >            +---------+
> > >       +----|         |
> > >       |    | WAN PHY |
> > >       | +--|         |
> > >       | |  +---------+
> > >       | |
> > >       | |  +----------------------------------+
> > >       | |  | QCA8337                          |
> > > MDC   | |  |                        +-------+ |
> > > ------o-+--|--------o------------o--|       | |
> > > MDIO    |  |        |            |  | PHY 1 |-|--to RJ45
> > > --------o--|---o----+---------o--+--|       | |
> > >            |   |    |         |  |  +-------+ |
> > > 	   | +-------------+  |  o--|       | |
> > > 	   | | MDIO MDC    |  |  |  | PHY 2 |-|--to RJ45
> > > eth1	   | |             |  o--+--|       | |
> > > -----------|-|port0        |  |  |  +-------+ |
> > >            | |             |  |  o--|       | |
> > > 	   | | switch core |  |  |  | PHY 3 |-|--to RJ45
> > >            | +-------------+  o--+--|       | |
> > > 	   |                  |  |  +-------+ |
> > > 	   |                  |  o--|  ...  | |
> > > 	   +----------------------------------+
> > > 
> > > When we send a request to read an internal PHY register via an ethernet
> > > management frame via eth1, the switch core receives the ethernet frame
> > > on port 0 and then communicates with the internal PHY via MDIO. At this
> > > time, other potential devices, such as the WAN PHY on Turris 1.x, cannot
> > > use the MDIO bus, since it may cause a bus conflict.
> > > 
> > > Fix this issue by locking the MDIO bus even when we are accessing the
> > > PHY registers via ethernet management frames.
> > > 
> > > Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
> > > Signed-off-by: Marek Behún <kabel@kernel.org>  
> > 
> > Just some comments (micro-optimization) and one question.
> > 
> > Wonder if the extra lock would result in a bit of overhead for simple
> > implementation where the switch is the only thing connected to the MDIO.
> > 
> > It's just an idea and probably not even something to consider (since
> > probably the overhead is so little that it's not worth it)
> > 
> > But we might consider to add some logic in the MDIO setup function to
> > check if the MDIO have other PHY connected and enable this lock (and
> > make this optional with an if and a bool like require_mdio_locking)
> > 
> > If we don't account for this, yes the lock should have been there from
> > the start and this is correct. (we can make it optional only in the case
> > where only the switch is connected as it would be the only user and
> > everything is already locked by the eth_mgmt lock)
> 
> I don't think we should do that. It is possible that a PHY may be
> registered during the time that the mutex is locked, even if the PHY is
> not defined in device-tree. A driver may be probed that calls
> mdiobus_scan, which will cause transactions on the MDIO bus. Currently
> there are no such drivers in kernel, but they may be in the future.
>

Yep was just an idea, happy it was trashed with correct explaination. It
would have added extra logic and more bloat to the code, totally not
worth for lots of reason. Also yep not doable with the problem of PHY
not declared in DT.

> Anyway, this is a regression fix, it should be merged. If you want to
> optimize it, I think it should be done afterwards in net-next.
> 

Nha, was just to discuss chance to improve this patch directly without
adding additional commit later.

> > > ---
> > >  drivers/net/dsa/qca/qca8k-8xxx.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> > > index d2df30640269..4ce68e655a63 100644
> > > --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> > > +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> > > @@ -666,6 +666,15 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
> > >  		goto err_read_skb;
> > >  	}
> > >  
> > > +	/* It seems that accessing the switch's internal PHYs via management
> > > +	 * packets still uses the MDIO bus within the switch internally, and
> > > +	 * these accesses can conflict with external MDIO accesses to other
> > > +	 * devices on the MDIO bus.
> > > +	 * We therefore need to lock the MDIO bus onto which the switch is
> > > +	 * connected.
> > > +	 */
> > > +	mutex_lock(&priv->bus->mdio_lock);
> > > +  
> > 
> > Please move this down before the first dev_queue_xmit. (we can save a
> > few cycle where locking is not needed)
> 
> I put it before the mgmt lock for the following reason: if I first lock
> the mgmt_eth_data and only then the MDIO bus mutex, and a MDIO
> transaction is being done on another device, the mgmt_eth_data mutex is
> unnecessarily locked for a longer time (since MDIO is slow). I thought
> that the whole point of register writes via ethernet frames was to make
> it faster. If another part of the driver wants to read/write a
> switch register, it should not be unnecessarily slowed down because a
> MDIO transaction to a unrelated device.
> 
> Illustration when MDIO mutex is locked before first skb queue, as you
> suggested:
> 
>   WAN PHY driver	qca8k PHY read		qca8k reg read
> 
>   mdio mutex locked
>   reading		eth mutex locked
>   reading		mdio mutex lock
>   reading		waiting			eth mutex lock
>   reading		waiting			waiting
>   reading		waiting			waiting
>   mdio mutex unlocked	waiting			waiting
> 			mdio mutex locked	waiting
> 			reading			waiting
> 			mdio mutex unlocked	waiting
> 			eth mutex unlocked	waiting
> 						eth mutex locked
> 						reading
> 						eth mutex unlocked
> 
> Illustration when MDIO mutex is locked before eth mutex:
> 
>   WAN PHY driver	qca8k PHY read		qca8k reg read
> 
>   mdio mutex locked
>   reading		mdio mutex lock
>   reading		waiting			eth mutex locked
>   reading		waiting			reading
>   reading		waiting			eth mutex unlocked
>   reading		waiting
>   mdio mutex unlocked   waiting
> 			mdio mutex locked
> 			eth mutex locked
> 			reading
> 			eth mutex unlocked
> 			mdio mutex unlocked
> 
> Notice how in the second illustration the qca8k register read is not
> slowed by the mdio mutex.
> 

Thanks for the nice table. I didn't think that mgmt eth is much faster
and moving the lock down would result is worse perf.

> > Also should we use mutex_lock_nested?
> 
> That would allow some MDIO bus reads, for example if someone called
> mdiobus_read() on the bus. We specifically want to completely avoid 
> this. We are not doing any nested reads on the MDIO bus here, so no,
> we should not be using mutex_lock_nested().
> 

Reviewed-by: Christian Marangi <ansuelsmth@gmail.com>

-- 
	Ansuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems
  2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
  2023-10-02 12:03   ` Christian Marangi
@ 2023-10-03 15:22   ` Simon Horman
  1 sibling, 0 replies; 8+ messages in thread
From: Simon Horman @ 2023-10-03 15:22 UTC (permalink / raw)
  To: Marek Behún; +Cc: Christian Marangi, David S. Miller, Paolo Abeni, netdev

On Mon, Oct 02, 2023 at 12:46:11PM +0200, Marek Behún wrote:
> Commit c766e077d927 ("net: dsa: qca8k: convert to regmap read/write
> API") introduced bulk read/write methods to qca8k's regmap.
> 
> The regmap bulk read/write methods get the register address in a buffer
> passed as a void pointer parameter (the same buffer contains also the
> read/written values). The register address occupies only as many bytes
> as it requires at the beginning of this buffer. For example if the
> .reg_bits member in regmap_config is 16 (as is the case for this
> driver), the register address occupies only the first 2 bytes in this
> buffer, so it can be cast to u16.
> 
> But the original commit implementing these bulk read/write methods cast
> the buffer to u32:
>   u32 reg = *(u32 *)reg_buf & U16_MAX;
> taking the first 4 bytes. This works on little endian systems where the
> first 2 bytes of the buffer correnspond to the low 16-bits, but it

nit: correspond

> obviously cannot work on big endian systems.
> 
> Fix this by casting the beginning of the buffer to u16 as
>    u32 reg = *(u16 *)reg_buf;
> 
> Fixes: c766e077d927 ("net: dsa: qca8k: convert to regmap read/write API")
> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---
>  drivers/net/dsa/qca/qca8k-8xxx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> index de1dc22cf683..d2df30640269 100644
> --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> @@ -505,8 +505,8 @@ qca8k_bulk_read(void *ctx, const void *reg_buf, size_t reg_len,
>  		void *val_buf, size_t val_len)
>  {
>  	int i, count = val_len / sizeof(u32), ret;
> -	u32 reg = *(u32 *)reg_buf & U16_MAX;
>  	struct qca8k_priv *priv = ctx;
> +	u32 reg = *(u16 *)reg_buf;
>  
>  	if (priv->mgmt_master &&
>  	    !qca8k_read_eth(priv, reg, val_buf, val_len))
> @@ -527,8 +527,8 @@ qca8k_bulk_gather_write(void *ctx, const void *reg_buf, size_t reg_len,
>  			const void *val_buf, size_t val_len)
>  {
>  	int i, count = val_len / sizeof(u32), ret;
> -	u32 reg = *(u32 *)reg_buf & U16_MAX;
>  	struct qca8k_priv *priv = ctx;
> +	u32 reg = *(u16 *)reg_buf;
>  	u32 *val = (u32 *)val_buf;
>  
>  	if (priv->mgmt_master &&
> -- 
> 2.41.0
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-03 15:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 10:46 [PATCH net 0/2] net: dsa: qca8k: fix qca8k driver for Turris 1.x Marek Behún
2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
2023-10-02 12:03   ` Christian Marangi
2023-10-03 15:22   ` Simon Horman
2023-10-02 10:46 ` [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames Marek Behún
2023-10-02 12:11   ` Christian Marangi
2023-10-03 10:05     ` Marek Behún
2023-10-03 10:45       ` Christian Marangi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).