public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Lunn <andrew@lunn.ch>
To: Francesco Dolcini <francesco.dolcini@toradex.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>,
	netdev@vger.kernel.org, Andy Duan <fugang.duan@nxp.com>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Fabio Estevam <festevam@gmail.com>,
	Tim Harvey <tharvey@gateworks.com>,
	Chris Healy <cphealy@gmail.com>
Subject: Re: FEC MDIO read timeout on linkup
Date: Thu, 5 May 2022 19:41:00 +0200	[thread overview]
Message-ID: <YnQMLLXsldQt5Pve@lunn.ch> (raw)
In-Reply-To: <20220505082901.GA195398@francesco-nb.int.toradex.com>

On Thu, May 05, 2022 at 10:29:01AM +0200, Francesco Dolcini wrote:
> Hello Andrew and all, I believe I finally found the problem and I'm
> preparing a patch for it.
> 
> On Wed, May 04, 2022 at 12:17:59AM +0200, Andrew Lunn wrote:
> > > I'm wondering could this be related to
> > > fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
> > > and one of the many register write in fec_restart() just creates the
> > > issue, maybe while resetting the FEC? Does this makes any sense?
> > 
> > phylib is 'single threaded', in that only one thing will be active at
> > once for a PHY. While fec_enet_adjust_link() is being called, there
> > will not be any read/writes occurring for that PHY.
> 
> I think this is not the whole story here. We can have a phy interrupt
> handler that runs in its own context and it could be doing a MDIO
> transaction, and this is exactly my case.
> 
> Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
>                            |
>                            | phy_interrupt()            <-- PHY IRQ
> 	                   |  handle_interrupt()
> 	                   |   phy_read()
> 	                   |   phy_trigger_machine()
> 	                   |    --> schedule WQ
>                            |
> 	                   |
> phy_state_machine()        |                        
>  phy_check_link_status()   |
>   phy_link_change()        |
>    phydev->adjust_link()   |
>     fec_enet_adjust_link() | 
>      --> FEC reset         | phy_interrupt()            <-- PHY IRQ
> 	                   |  phy_read()
> 	 	           |
> 
> To confirm this I have added a spinlock to detect this race condition
> with just a trylock and a WARN_ON(1) when the locking is failing. On
> "MDIO read timeout" acquiring the spinlock fails.
> 
> This is also in agreement with the fact that polling the PHY instead of
> having the interrupt is working just fine.

Yes, that makes sense.

But i would fix this differently. The interrupt handler runs in a
threaded interrupt. So it can use mutex. So it should actually take
the phy mutex.

Please try this:

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index beb2b66da132..7d3a64d04820 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
        struct phy_device *phydev = phy_dat;
        struct phy_driver *drv = phydev->drv;
+       int ret;
 
-       return drv->handle_interrupt(phydev);
+       mutex_lock(&phydev->lock);
+       ret = drv->handle_interrupt(phydev);
+       mutex_unlock(&phydev->lock);
+
+       return ret;
 }
 
That will stop it running in parallel to the adjust_link callback, or
anything else in phylib.

	 Andrew

  reply	other threads:[~2022-05-05 17:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 15:26 FEC MDIO read timeout on linkup Francesco Dolcini
2022-04-22 15:55 ` Fabio Estevam
2022-04-22 16:04   ` Francesco Dolcini
2022-04-29 15:15     ` Francesco Dolcini
2022-05-02 17:05 ` Francesco Dolcini
2022-05-02 18:21   ` Andrew Lunn
2022-05-02 18:25     ` Francesco Dolcini
2022-05-02 18:24   ` Andrew Lunn
2022-05-02 18:34     ` Francesco Dolcini
2022-05-03 16:13       ` Francesco Dolcini
2022-05-03 22:17         ` Andrew Lunn
2022-05-05  8:29           ` Francesco Dolcini
2022-05-05 17:41             ` Andrew Lunn [this message]
2022-05-05 17:54               ` Francesco Dolcini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnQMLLXsldQt5Pve@lunn.ch \
    --to=andrew@lunn.ch \
    --cc=cphealy@gmail.com \
    --cc=davem@davemloft.net \
    --cc=festevam@gmail.com \
    --cc=francesco.dolcini@toradex.com \
    --cc=fugang.duan@nxp.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=qiangqing.zhang@nxp.com \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox