From: "Marek Behún" <kabel@kernel.org>
To: Christian Marangi <ansuelsmth@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames
Date: Tue, 3 Oct 2023 12:05:10 +0200 [thread overview]
Message-ID: <20231003120510.6abd08af@dellmb> (raw)
In-Reply-To: <651ab382.df0a0220.e74df.fc51@mx.google.com>
On Mon, 2 Oct 2023 14:11:43 +0200
Christian Marangi <ansuelsmth@gmail.com> wrote:
> On Mon, Oct 02, 2023 at 12:46:12PM +0200, Marek Behún wrote:
> > Besides the QCA8337 switch the Turris 1.x device has on it's MDIO bus
> > also Micron ethernet PHY (dedicated to the WAN port).
> >
> > We've been experiencing a strange behavior of the WAN ethernet
> > interface, wherein the WAN PHY started timing out the MDIO accesses, for
> > example when the interface was brought down and then back up.
> >
> > Bisecting led to commit 2cd548566384 ("net: dsa: qca8k: add support for
> > phy read/write with mgmt Ethernet"), which added support to access the
> > QCA8337 switch's internal PHYs via management ethernet frames.
> >
> > Connecting the MDIO bus pins onto an oscilloscope, I was able to see
> > that the MDIO bus was active whenever a request to read/write an
> > internal PHY register was done via an management ethernet frame.
> >
> > My theory is that when the switch core always communicates with the
> > internal PHYs via the MDIO bus, even when externally we request the
> > access via ethernet. This MDIO bus is the same one via which the switch
> > and internal PHYs are accessible to the board, and the board may have
> > other devices connected on this bus. An ASCII illustration may give more
> > insight:
> >
> > +---------+
> > +----| |
> > | | WAN PHY |
> > | +--| |
> > | | +---------+
> > | |
> > | | +----------------------------------+
> > | | | QCA8337 |
> > MDC | | | +-------+ |
> > ------o-+--|--------o------------o--| | |
> > MDIO | | | | | PHY 1 |-|--to RJ45
> > --------o--|---o----+---------o--+--| | |
> > | | | | | +-------+ |
> > | +-------------+ | o--| | |
> > | | MDIO MDC | | | | PHY 2 |-|--to RJ45
> > eth1 | | | o--+--| | |
> > -----------|-|port0 | | | +-------+ |
> > | | | | o--| | |
> > | | switch core | | | | PHY 3 |-|--to RJ45
> > | +-------------+ o--+--| | |
> > | | | +-------+ |
> > | | o--| ... | |
> > +----------------------------------+
> >
> > When we send a request to read an internal PHY register via an ethernet
> > management frame via eth1, the switch core receives the ethernet frame
> > on port 0 and then communicates with the internal PHY via MDIO. At this
> > time, other potential devices, such as the WAN PHY on Turris 1.x, cannot
> > use the MDIO bus, since it may cause a bus conflict.
> >
> > Fix this issue by locking the MDIO bus even when we are accessing the
> > PHY registers via ethernet management frames.
> >
> > Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet")
> > Signed-off-by: Marek Behún <kabel@kernel.org>
>
> Just some comments (micro-optimization) and one question.
>
> Wonder if the extra lock would result in a bit of overhead for simple
> implementation where the switch is the only thing connected to the MDIO.
>
> It's just an idea and probably not even something to consider (since
> probably the overhead is so little that it's not worth it)
>
> But we might consider to add some logic in the MDIO setup function to
> check if the MDIO have other PHY connected and enable this lock (and
> make this optional with an if and a bool like require_mdio_locking)
>
> If we don't account for this, yes the lock should have been there from
> the start and this is correct. (we can make it optional only in the case
> where only the switch is connected as it would be the only user and
> everything is already locked by the eth_mgmt lock)
I don't think we should do that. It is possible that a PHY may be
registered during the time that the mutex is locked, even if the PHY is
not defined in device-tree. A driver may be probed that calls
mdiobus_scan, which will cause transactions on the MDIO bus. Currently
there are no such drivers in kernel, but they may be in the future.
Anyway, this is a regression fix, it should be merged. If you want to
optimize it, I think it should be done afterwards in net-next.
> > ---
> > drivers/net/dsa/qca/qca8k-8xxx.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> > index d2df30640269..4ce68e655a63 100644
> > --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> > +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> > @@ -666,6 +666,15 @@ qca8k_phy_eth_command(struct qca8k_priv *priv, bool read, int phy,
> > goto err_read_skb;
> > }
> >
> > + /* It seems that accessing the switch's internal PHYs via management
> > + * packets still uses the MDIO bus within the switch internally, and
> > + * these accesses can conflict with external MDIO accesses to other
> > + * devices on the MDIO bus.
> > + * We therefore need to lock the MDIO bus onto which the switch is
> > + * connected.
> > + */
> > + mutex_lock(&priv->bus->mdio_lock);
> > +
>
> Please move this down before the first dev_queue_xmit. (we can save a
> few cycle where locking is not needed)
I put it before the mgmt lock for the following reason: if I first lock
the mgmt_eth_data and only then the MDIO bus mutex, and a MDIO
transaction is being done on another device, the mgmt_eth_data mutex is
unnecessarily locked for a longer time (since MDIO is slow). I thought
that the whole point of register writes via ethernet frames was to make
it faster. If another part of the driver wants to read/write a
switch register, it should not be unnecessarily slowed down because a
MDIO transaction to a unrelated device.
Illustration when MDIO mutex is locked before first skb queue, as you
suggested:
WAN PHY driver qca8k PHY read qca8k reg read
mdio mutex locked
reading eth mutex locked
reading mdio mutex lock
reading waiting eth mutex lock
reading waiting waiting
reading waiting waiting
mdio mutex unlocked waiting waiting
mdio mutex locked waiting
reading waiting
mdio mutex unlocked waiting
eth mutex unlocked waiting
eth mutex locked
reading
eth mutex unlocked
Illustration when MDIO mutex is locked before eth mutex:
WAN PHY driver qca8k PHY read qca8k reg read
mdio mutex locked
reading mdio mutex lock
reading waiting eth mutex locked
reading waiting reading
reading waiting eth mutex unlocked
reading waiting
mdio mutex unlocked waiting
mdio mutex locked
eth mutex locked
reading
eth mutex unlocked
mdio mutex unlocked
Notice how in the second illustration the qca8k register read is not
slowed by the mdio mutex.
> Also should we use mutex_lock_nested?
That would allow some MDIO bus reads, for example if someone called
mdiobus_read() on the bus. We specifically want to completely avoid
this. We are not doing any nested reads on the MDIO bus here, so no,
we should not be using mutex_lock_nested().
Marek
next prev parent reply other threads:[~2023-10-03 10:05 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-02 10:46 [PATCH net 0/2] net: dsa: qca8k: fix qca8k driver for Turris 1.x Marek Behún
2023-10-02 10:46 ` [PATCH net 1/2] net: dsa: qca8k: fix regmap bulk read/write methods on big endian systems Marek Behún
2023-10-02 12:03 ` Christian Marangi
2023-10-03 15:22 ` Simon Horman
2023-10-02 10:46 ` [PATCH net 2/2] net: dsa: qca8k: fix potential MDIO bus conflict when accessing internal PHYs via management frames Marek Behún
2023-10-02 12:11 ` Christian Marangi
2023-10-03 10:05 ` Marek Behún [this message]
2023-10-03 10:45 ` Christian Marangi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231003120510.6abd08af@dellmb \
--to=kabel@kernel.org \
--cc=ansuelsmth@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).