On Tue, Sep 05, 2023 at 02:48:13PM +0100, Russell King (Oracle) wrote: > On Tue, Sep 05, 2023 at 04:49:31PM +0800, Jijie Shao wrote: > > We note there are several times lock during phy_state_machine(). The first > > is to handle phydev state. It's noting that a competition of phydev lock > > happend again if phy_check_link_status() returns an error. Why we don't > > held lock until changing state to PHY_ERROR if phy_check_link_status() > > returns an error? > > You are quite correct that isn't very good. We can easily get rid of > some of this mess, but I don't think all which leaves it still open to > the race you describe. > > The problem is phy_suspend(). > > First, it calls phy_ethtool_get_wol() which takes the phydev lock. This > can be dealt with if we save the state at probe time, and then update > the state when phy_ethtool_set_wol() is called. > > Second, phy_suspend() calls ->suspend without holding the phydev lock, > and holding the lock while calling that may not be safe. Having had a > brief look over the implementations (but not delving into any PTP > function they may call) does seem to suggest that shouldn't be a big > problem, but I don't know whether holding the phydev lock while calling > PTP functions is likely to cause issues. > > However, looking at that has lead me to the conclusion that there is a > lot of duplication of WoL condition testing. phy_suspend() already > avoids calling ->suspend() if either phy_ethtool_get_wol() indicates > that WoL is enabled, or if the netdev says WoL is enabled. > > Many of the ->suspend() implementations, for example, > lan88xx_suspend(), dp83822_suspend(), etc test some kind of flag to > determine whether WoL is enabled and thus seem to be re-implementing > what phy_suspend() is already doing. I think there is scope to delete > this code from several drivers. > > The easy bits are the patches I've attached to this email. These > won't on their own be sufficient to close the race you've identified > due to the phy_suspend() issue, but are the best we can do until we > can work out what to do about that. Having looked deeper at this, I think there may be a solution. See these follow-on patches. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!