Re: Potential issue with f5e64032a799 "net: phy: fix resume handling" - Russell King

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Andrew Lunn <andrew@lunn.ch>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: Potential issue with f5e64032a799 "net: phy: fix resume handling"
Date: Tue, 6 Feb 2018 11:00:13 +0000	[thread overview]
Message-ID: <20180206110012.GJ9418@n2100.armlinux.org.uk> (raw)
In-Reply-To: <928cdba7-bc44-92af-9763-3a14da036289@gmail.com>

On Mon, Feb 05, 2018 at 10:48:55PM +0100, Heiner Kallweit wrote:
> Am 04.02.2018 um 03:48 schrieb Florian Fainelli:
> > 
> > 
> > On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
> >> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
> >>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
> >>>> This commit forces callers of phy_resume() and phy_suspend() to hold
> >>>> mutex phydev->lock. This was done for calls to phy_resume() and
> >>>> phy_suspend() in phylib, however there are more callers in network
> >>>> drivers. I'd assume that these other calls issue a warning now
> >>>> because of the lock not being held.
> >>>> So is there something I miss or would this have to be fixed?
> >>>
> >>> Hi Heiner
> >>>
> >>> This is a good point.
> >>>
> >>> Yes, it looks like some fixes are needed. But what exactly?
> >>>
> >>> The phy state machine will suspend and resume the phy is you call
> >>> phy_stop() and phy_start() in the MAC suspend and resume functions.
> >>>
> >> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
> >> to PHY_HALTED and (at least if we're not in polling mode) doesn't
> >> call phy_suspend(). Maybe a call to phy_trigger_machine() is
> >> needed like in phy_start() ? Then the state machine would call
> >> phy_suspend(), provided the link is still up.
> > 
> > Right, phy_stop() merely just moves the state machine to PHY_HALTED and
> > this is actually a great source of problems which I tried to address here:
> > 
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
> > 
> > because phy_stop() is not a synchronous call, so when it returns the
> > state machine might still be running (it can take up to a 1 HZ depending
> > on when you called phy_stop()) and so if you took that as a
> > synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
> > you will likely see problems. phy_stop_machine() would provide that
> > synchronization point, but is not currently exported, despite being a
> > global symbol. This patch series above is all well and good, except that
> > Geert reported issues with suspend/resume interactions which I have not
> > been able to track down.
> > 
> > We should most definitively try to consolidate the different PHY
> > suspend/resume within the Ethernet MAC suspend/resume implementation and
> > document exactly what the appropriate behavior must be under the
> > following circumstances:
> > 
> > - when to call phy_stop() + phy_stop_machine()
> > - when to call phy_suspend() (if the network interface does do not WoL)
> > - when to call phy_resume() (if needed, actually, it usually is not)
> > - when to call phy_start()
> > 
> 
> I think phy_start() / phy_start_machine() / phy_start_interrupts()
> belong together and we may call the latter two functions from phy_start().
> Same for stop.
> 
> This would mean:
> - Remove call to phy_start_interrupts() from phy_connect_direct()
> - Call phy_start_machine() and phy_start_interrupts() from phy_start()
> - mdio_bus_phy_suspend() calls phy_stop()
> Same for stop, plus: phy_error() calls phy_stop().
> 
> In this setup a second call to phy_stop() wouldn't hurt because state
> is PHY_HALTED already and phy_stop() is a no-op.
> 
> A functional change would be that interrupts are disabled during system
> suspend (except WoL because we don't suspend the PHY is this case). 
> 
> These are first thoughts and therefore it's fine if you totally disagree ..
> I didn't test this yet, it's only a "Gedankenexperiment" so far.
> 
> When talking about suspend/resume I think we talked about system suspend /
> resume. However I think we need to consider also runtime pm.
> If a link is down the network driver may decide to runtime-suspend the PHY
> (power it down). In case of runtime pm I'd say we need to keep irq and
> workqueue active to be able to react if a cable is plugged in and the PHY
> wakes up automatically and establishes a link.

Maybe a better solution now would be to restore phy_resume()'s lock-
taking behaviour, and provide a lockless __phy_resume() which can be
used internally within phylib.  This means drivers using phy_resume()
would see no change.  Maybe something like (untested):

 drivers/net/phy/phy.c        |  2 +-
 drivers/net/phy/phy_device.c | 17 ++++++++++++++---
 include/linux/phy.h          |  1 +
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3313a129531..4574d02dce93 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -819,7 +819,7 @@ void phy_start(struct phy_device *phydev)
 		break;
 	case PHY_HALTED:
 		/* if phy was suspended, bring the physical link up again */
-		phy_resume(phydev);
+		__phy_resume(phydev);
 
 		/* make sure interrupts are re-enabled for the PHY */
 		if (phydev->irq != PHY_POLL) {
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index b13eed21c87d..ae0f9306bbdc 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -136,7 +136,7 @@ static int mdio_bus_phy_resume(struct device *dev)
 		goto no_resume;
 
 	mutex_lock(&phydev->lock);
-	ret = phy_resume(phydev);
+	ret = __phy_resume(phydev);
 	mutex_unlock(&phydev->lock);
 	if (ret < 0)
 		return ret;
@@ -1042,7 +1042,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 		goto error;
 
 	mutex_lock(&phydev->lock);
-	phy_resume(phydev);
+	__phy_resume(phydev);
 	mutex_unlock(&phydev->lock);
 	phy_led_triggers_register(phydev);
 
@@ -1172,7 +1172,7 @@ int phy_suspend(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_suspend);
 
-int phy_resume(struct phy_device *phydev)
+int __phy_resume(struct phy_device *phydev)
 {
 	struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
 	int ret = 0;
@@ -1189,6 +1189,17 @@ int phy_resume(struct phy_device *phydev)
 
 	return ret;
 }
+
+int phy_resume(struct phy_device *phydev)
+{
+	int ret;
+
+	mutex_lock(&phydev->lock);
+	ret = __phy-resume(phydev);
+	mutex_unlock(&phydev->lock);
+
+	return ret;
+}
 EXPORT_SYMBOL(phy_resume);
 
 int phy_loopback(struct phy_device *phydev, bool enable)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5a0c3e53e7c2..8f82bd64f82d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -923,6 +923,7 @@ static inline void phy_device_free(struct phy_device *phydev) { }
 void phy_device_remove(struct phy_device *phydev);
 int phy_init_hw(struct phy_device *phydev);
 int phy_suspend(struct phy_device *phydev);
+int __phy_resume(struct phy_device *phydev);
 int phy_resume(struct phy_device *phydev);
 int phy_loopback(struct phy_device *phydev, bool enable);
 struct phy_device *phy_attach(struct net_device *dev, const char *bus_id,


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

next prev parent reply	other threads:[~2018-02-06 11:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-03 16:41 Potential issue with f5e64032a799 "net: phy: fix resume handling" Heiner Kallweit
2018-02-03 20:17 ` Andrew Lunn
2018-02-03 23:58   ` Heiner Kallweit
2018-02-04  2:48     ` Florian Fainelli
2018-02-05 21:48       ` Heiner Kallweit
2018-02-06 11:00         ` Russell King - ARM Linux [this message]
2018-02-06 12:55           ` Andrew Lunn
2018-02-07 20:56       ` handling of phy_stop() and phy_stop_machine() in phylib Heiner Kallweit
2018-02-07 21:13         ` Russell King - ARM Linux
2018-02-07 23:03           ` Florian Fainelli
2018-02-25 13:00   ` Potential issue with f5e64032a799 "net: phy: fix resume handling" Heiner Kallweit
2018-02-25 16:38     ` Andrew Lunn

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f3313a12953 dfblob:4574d02dce9 dfblob:b13eed21c87
dfblob:ae0f9306bbd dfblob:5a0c3e53e7c dfblob:8f82bd64f82 )
 OR (
bs:"Re: Potential issue with f5e64032a799 "
bs:"net: phy: fix resume handling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180206110012.GJ9418@n2100.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=f.fainelli@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.