netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>,
	John David Anglin <dave.anglin@bell.net>,
	Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
	Florian Fainelli <f.fainelli@gmail.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
Date: Tue, 12 Feb 2019 21:11:55 +0100	[thread overview]
Message-ID: <bef1e78d-4c02-68d5-ca1a-380f47ee2647@gmail.com> (raw)
In-Reply-To: <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk>

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
> Looking at Linus' tree as opposed to net-next, things do look rather
> broken wrt interrupts:
> 
> +-phy_attach_direct
>   `-phydev->state = PHY_READY
> +-phy_prepare_link
> +-phy_start_machine
>   `-phy_trigger_machine()
> `-phy_start_interrupts
>   +-request_threaded_irq()
>   `-phy_enable_interrupts()
>     +-phy_clear_interrupt()
>     `-phy_config_interrupt(, PHY_INTERRUPT_ENABLED)
> 
> At this point, the PHY is then able to generate interrupts, which,
> because phy_start() has not been called and phy_interrupt() checks
> that phydev->state >= PHY_UP, get ignored by the interrupt handler
> exactly as Andrew is finding.
> 
> So it looks like 5.0-rc is already in need of this being fixed.
> 
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
Even though this code is new it's kind of heritage in phylib that each
access (read or write) to phydev->state is protected by this lock.
I also once wondered whether it's actually needed but didn't spend
effort so far on challenging this. Seems that now the time has come ..

> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
Thanks for describing this scenario, I'll have a closer look at it.

> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 

Heiner

  reply	other threads:[~2019-02-12 20:12 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-22 19:16 net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs John David Anglin
2019-01-22 20:28 ` Andrew Lunn
2019-01-22 21:40   ` John David Anglin
2019-01-22 22:36     ` Andrew Lunn
2019-01-22 23:52       ` John David Anglin
2019-01-23  0:11       ` John David Anglin
2019-01-23  0:22         ` Andrew Lunn
2019-01-25 16:30           ` John David Anglin
2019-01-25 16:48             ` Russell King - ARM Linux admin
2019-01-25 18:38               ` John David Anglin
2019-01-30 17:08           ` John David Anglin
2019-01-30 17:28             ` Andrew Lunn
2019-01-30 19:01               ` John David Anglin
2019-01-30 19:09                 ` Andrew Lunn
2019-01-30 22:24               ` John David Anglin
2019-01-30 22:38                 ` Andrew Lunn
2019-01-31  1:27                   ` John David Anglin
2019-01-31 17:27                     ` John David Anglin
2019-02-04 18:37                       ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
2019-02-04 19:35                         ` Andrew Lunn
2019-02-04 19:52                           ` John David Anglin
2019-02-04 20:19                             ` Andrew Lunn
2019-02-04 21:38                               ` John David Anglin
2019-02-04 22:47                                 ` Andrew Lunn
2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
2019-02-04 23:14                           ` Andrew Lunn
2019-02-05  0:38                             ` John David Anglin
2019-02-05  2:21                               ` Andrew Lunn
2019-02-05 19:20                                 ` John David Anglin
2019-02-05 19:54                                   ` Andrew Lunn
2019-02-05 18:37                           ` David Miller
2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
2019-02-11 23:33                             ` Andrew Lunn
2019-02-12  0:57                               ` John David Anglin
2019-02-12  1:21                                 ` Andrew Lunn
2019-02-12  3:58                                 ` Andrew Lunn
2019-02-12  6:51                                   ` Heiner Kallweit
2019-02-12 12:56                                     ` Andrew Lunn
2019-02-12 18:42                                       ` Heiner Kallweit
2019-02-12 20:09                                       ` John David Anglin
2019-02-12 16:30                                     ` Russell King - ARM Linux admin
2019-02-12 20:11                                       ` Heiner Kallweit [this message]
2019-02-12 20:54                                       ` Heiner Kallweit
2019-02-12 22:55                                         ` Russell King - ARM Linux admin
2019-02-14  2:07                             ` Andrew Lunn
2019-02-14  4:47                               ` David Miller
2019-02-14  4:50                                 ` Andrew Lunn
2019-02-14 15:27                                   ` David Miller
2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
2019-01-22 23:48   ` John David Anglin
2019-01-23  0:00   ` John David Anglin
2019-01-23  0:04     ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bef1e78d-4c02-68d5-ca1a-380f47ee2647@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=dave.anglin@bell.net \
    --cc=f.fainelli@gmail.com \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=vivien.didelot@savoirfairelinux.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).