public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Francesco Dolcini <francesco.dolcini@toradex.com>
To: Andrew Lunn <andrew@lunn.ch>,
	Joakim Zhang <qiangqing.zhang@nxp.com>,
	netdev@vger.kernel.org
Cc: Francesco Dolcini <francesco.dolcini@toradex.com>,
	Andy Duan <fugang.duan@nxp.com>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Fabio Estevam <festevam@gmail.com>,
	Tim Harvey <tharvey@gateworks.com>,
	Chris Healy <cphealy@gmail.com>
Subject: Re: FEC MDIO read timeout on linkup
Date: Thu, 5 May 2022 10:29:01 +0200	[thread overview]
Message-ID: <20220505082901.GA195398@francesco-nb.int.toradex.com> (raw)
In-Reply-To: <YnGqF4/040/Y9RjS@lunn.ch>

Hello Andrew and all, I believe I finally found the problem and I'm
preparing a patch for it.

On Wed, May 04, 2022 at 12:17:59AM +0200, Andrew Lunn wrote:
> > I'm wondering could this be related to
> > fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
> > and one of the many register write in fec_restart() just creates the
> > issue, maybe while resetting the FEC? Does this makes any sense?
> 
> phylib is 'single threaded', in that only one thing will be active at
> once for a PHY. While fec_enet_adjust_link() is being called, there
> will not be any read/writes occurring for that PHY.

I think this is not the whole story here. We can have a phy interrupt
handler that runs in its own context and it could be doing a MDIO
transaction, and this is exactly my case.

Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
                           |
                           | phy_interrupt()            <-- PHY IRQ
	                   |  handle_interrupt()
	                   |   phy_read()
	                   |   phy_trigger_machine()
	                   |    --> schedule WQ
                           |
	                   |
phy_state_machine()        |                        
 phy_check_link_status()   |
  phy_link_change()        |
   phydev->adjust_link()   |
    fec_enet_adjust_link() | 
     --> FEC reset         | phy_interrupt()            <-- PHY IRQ
	                   |  phy_read()
	 	           |

To confirm this I have added a spinlock to detect this race condition
with just a trylock and a WARN_ON(1) when the locking is failing. On
"MDIO read timeout" acquiring the spinlock fails.

This is also in agreement with the fact that polling the PHY instead of
having the interrupt is working just fine.

For my specific problem just taking the MDIO lock in
fec_enet_adjust_link() should do the work, however there are other
code path in the FEC that could create this issues:
 - fec_enet_set_pauseparam() => fec_stop()/fec_restart()
 - fec_enet_close() => fec_stop() 
 - fec_set_features() => fec_stop()
 - fec_suspend() => fec_stop()

1. Should all of those be protected? Are these real issues?
2. Which fixes tag to use? This seems likely a quite old issue, not
introduced with the MII polling.

Francesco


  reply	other threads:[~2022-05-05  8:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 15:26 FEC MDIO read timeout on linkup Francesco Dolcini
2022-04-22 15:55 ` Fabio Estevam
2022-04-22 16:04   ` Francesco Dolcini
2022-04-29 15:15     ` Francesco Dolcini
2022-05-02 17:05 ` Francesco Dolcini
2022-05-02 18:21   ` Andrew Lunn
2022-05-02 18:25     ` Francesco Dolcini
2022-05-02 18:24   ` Andrew Lunn
2022-05-02 18:34     ` Francesco Dolcini
2022-05-03 16:13       ` Francesco Dolcini
2022-05-03 22:17         ` Andrew Lunn
2022-05-05  8:29           ` Francesco Dolcini [this message]
2022-05-05 17:41             ` Andrew Lunn
2022-05-05 17:54               ` Francesco Dolcini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220505082901.GA195398@francesco-nb.int.toradex.com \
    --to=francesco.dolcini@toradex.com \
    --cc=andrew@lunn.ch \
    --cc=cphealy@gmail.com \
    --cc=davem@davemloft.net \
    --cc=festevam@gmail.com \
    --cc=fugang.duan@nxp.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=qiangqing.zhang@nxp.com \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox