From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=RTe9=QT=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,
	USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 66781C282C4
	for <netdev@archiver.kernel.org>; Tue, 12 Feb 2019 22:55:16 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 05DEB222C7
	for <netdev@archiver.kernel.org>; Tue, 12 Feb 2019 22:55:16 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=armlinux.org.uk header.i=@armlinux.org.uk header.b="dpmy6B2N"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730338AbfBLWzO (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Tue, 12 Feb 2019 17:55:14 -0500
Received: from pandora.armlinux.org.uk ([78.32.30.218]:46108 "EHLO
        pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729549AbfBLWzO (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 12 Feb 2019 17:55:14 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
        d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type:
        MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To:
        Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
        Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
        List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
         bh=4HtGyohFrn5URbYJxwOrthXN+TBd5y5X51/4iZ6RL54=; b=dpmy6B2NtpgL8qh/gcUfoFRk6
        dzQ+cQ4IiZs0xIv9W0vUOdyG+buqStcb9nvXdAa1FC0X1d8eSPRvLHGelII78yt3lZG0hjkSxRi/B
        pnlM6MG1SSKGxxUCRzwm6R91Dk9u/m6CJyIwGyjPfcedvVgObyVgIXj0/wFe7hk4ZOnC8=;
Received: from shell.armlinux.org.uk ([2002:4e20:1eda:1:5054:ff:fe00:4ec]:36732)
        by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
        (Exim 4.90_1)
        (envelope-from <linux@armlinux.org.uk>)
        id 1gtgxI-000066-DM; Tue, 12 Feb 2019 22:55:08 +0000
Received: from linux by shell.armlinux.org.uk with local (Exim 4.89)
        (envelope-from <linux@shell.armlinux.org.uk>)
        id 1gtgxG-0002OE-2G; Tue, 12 Feb 2019 22:55:06 +0000
Date:   Tue, 12 Feb 2019 22:55:05 +0000
From:   Russell King - ARM Linux admin <linux@armlinux.org.uk>
To:     Heiner Kallweit <hkallweit1@gmail.com>
Cc:     Andrew Lunn <andrew@lunn.ch>,
        John David Anglin <dave.anglin@bell.net>,
        Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
        Florian Fainelli <f.fainelli@gmail.com>, netdev@vger.kernel.org
Subject: Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are
 handled prior to exit
Message-ID: <20190212225505.hoodvbnnru6dliu7@shell.armlinux.org.uk>
References: <a89d5296-88f3-4dde-74b3-25ad1791d5b6@bell.net>
 <53b49df8-53ed-704f-9197-230b18d83090@bell.net>
 <824d011b-3692-69c3-5e2c-58e950a80abf@bell.net>
 <6a1ebc61-3505-beb8-21cb-ea42ad9fe67e@bell.net>
 <20190211233327.GB8591@lunn.ch>
 <2b6bbb4c-1346-461b-ff7a-cb96b4142f7a@bell.net>
 <20190212035806.GE19023@lunn.ch>
 <13c1e6d5-c287-0091-3b24-1978f9a18e7e@gmail.com>
 <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk>
 <5ba5b654-ca61-253f-042a-2a178ff86b36@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5ba5b654-ca61-253f-042a-2a178ff86b36@gmail.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Tue, Feb 12, 2019 at 09:54:55PM +0100, Heiner Kallweit wrote:
> On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> > On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
> >> On 12.02.2019 04:58, Andrew Lunn wrote:
> >>> That change means we don't check the PHY device if it caused an
> >>> interrupt when its state is less than UP.
> >>>
> >>> What i'm seeing is that the PHY is interrupting pretty early on after
> >>> a reboot when the previous boot had the interface up.
> >>>
> >> So this means that when going down for reboot the interrupts are not
> >> properly masked / disabled? Because (at least for net-next) we enable
> >> interrupts in phy_start() only.
> > 
> [..]
> > In looking at this, I came across this chunk of code:
> > 
> > static inline bool __phy_is_started(struct phy_device *phydev)
> > {
> >         WARN_ON(!mutex_is_locked(&phydev->lock));
> > 
> >         return phydev->state >= PHY_UP;
> > }
> > 
> > /**
> >  * phy_is_started - Convenience function to check whether PHY is started
> >  * @phydev: The phy_device struct
> >  */
> > static inline bool phy_is_started(struct phy_device *phydev)
> > {
> >         bool started;
> > 
> >         mutex_lock(&phydev->lock);
> >         started = __phy_is_started(phydev);
> >         mutex_unlock(&phydev->lock);
> > 
> >         return started;
> > }
> > 
> > which looks to me like over-complication.  The mutex locking there is
> > completely pointless - what are you trying to achieve with it?
> > 
> > Let's go through this.  The above is exactly equivalent to:
> > 
> > bool phy_is_started(phydev)
> > {
> > 	int state;
> > 
> > 	mutex_lock(&phydev->lock);
> > 	state = phydev->state;
> > 	mutex_unlock(&phydev->lock);
> > 
> > 	return state >= PHY_UP;
> > }
> > 
> > since when we do the test is irrelevant.  Architectures that Linux
> > runs on are single-copy atomic, which means that reading phydev->state
> > itself is an atomic operation.  So, the mutex locking around that
> > doesn't add to the atomicity of the entire operation.
> > 
> > How, depending on what you do with the rest of this function depends
> > whether the entire operation is safe or not.  For example, let's take
> > this code at the end of phy_state_machine():
> > 
> >         if (phy_polling_mode(phydev) && phy_is_started(phydev))
> >                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> > 
> > state = PHY_UP
> > 		thread 0			thread 1
> > 						phy_disconnect()
> > 						+-phy_is_started()
> > 		phy_is_started()                |
> > 						`-phy_stop()
> > 						  +-phydev->state = PHY_HALTED
> > 						  `-phy_stop_machine()
> > 						    `-cancel_delayed_work_sync()
> > 		phy_queue_state_machine()
> > 		`-mod_delayed_work()
> > 
> > At this point, the phydev->state_queue() has been added back onto the
> > system workqueue despite phy_stop_machine() having been called and
> > cancel_delayed_work_sync() called on it.
> > 
> > The original code in 4.20 did not have this race condition.
> > 
> > Basically, the lock inside phy_is_started() does nothing useful, and
> > I'd say is dangerously misleading.
> > 
> Then idea would be to first remove the locking from phy_is_started()
> and in a second step do the following to prevent the described race
> (phy_stop() takes phydev->lock too).
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index c1ed03800..69dc64a4d 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work)
>          * state machine would be pointless and possibly error prone when
>          * called from phy_disconnect() synchronously.
>          */
> +       mutex_lock(&phydev->lock);
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> +       mutex_unlock(&phydev->lock);
>  }

Yep, that approach would certainly be better.  I didn't exhaustively
audit the 5.0-rc code though.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up