* [PATCH] phy: fix possible double lock calling link changed handler
@ 2021-11-23 2:55 Alessandro B Maurici
2021-11-23 4:18 ` Andrew Lunn
0 siblings, 1 reply; 11+ messages in thread
From: Alessandro B Maurici @ 2021-11-23 2:55 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Alessandro B Maurici
From: Alessandro B Maurici <abmaurici@gmail.com>
Releases the phy lock before calling phy_link_change to avoid any worker
thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to
phy_ethtool_get_link_ksettings inside the link change handler, and, due to
the commit c10a485c3de5 ("phy: phy_ethtool_ksettings_get: Lock the phy for
consistency"), this will cause a lockup.
As that mutex call is needed for consistency, we need to release the lock,
if previously locked, before calling the handler to prevent issues.
Signed-off-by: Alessandro B Maurici <abmaurici@gmail.com>
---
drivers/net/phy/phy.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index beb2b66da132..0914e339e623 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -58,13 +58,31 @@ static const char *phy_state_to_str(enum phy_state st)
static void phy_link_up(struct phy_device *phydev)
{
+ bool must_relock = false;
+
+ if (mutex_is_locked(&phydev->lock)) {
+ must_relock = true;
+ mutex_unlock(&phydev->lock);
+ }
phydev->phy_link_change(phydev, true);
+ if (must_relock)
+ mutex_lock(&phydev->lock);
+
phy_led_trigger_change_speed(phydev);
}
static void phy_link_down(struct phy_device *phydev)
{
+ bool must_relock = false;
+
+ if (mutex_is_locked(&phydev->lock)) {
+ must_relock = true;
+ mutex_unlock(&phydev->lock);
+ }
phydev->phy_link_change(phydev, false);
+ if (must_relock)
+ mutex_lock(&phydev->lock);
+
phy_led_trigger_change_speed(phydev);
}
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 2:55 [PATCH] phy: fix possible double lock calling link changed handler Alessandro B Maurici @ 2021-11-23 4:18 ` Andrew Lunn 2021-11-23 4:49 ` Alessandro B Maurici 0 siblings, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2021-11-23 4:18 UTC (permalink / raw) To: Alessandro B Maurici Cc: netdev, Heiner Kallweit, Russell King, David S. Miller On Mon, Nov 22, 2021 at 11:55:48PM -0300, Alessandro B Maurici wrote: > From: Alessandro B Maurici <abmaurici@gmail.com> > > Releases the phy lock before calling phy_link_change to avoid any worker > thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to > phy_ethtool_get_link_ksettings inside the link change handler I think we need to take a step back here and answer the question, why does it call phy_ethtool_get_link_ksettings in the link change handler. I'm not aware of any other MAC driver which does this. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 4:18 ` Andrew Lunn @ 2021-11-23 4:49 ` Alessandro B Maurici 2021-11-23 8:21 ` Heiner Kallweit 2021-11-23 14:09 ` Andrew Lunn 0 siblings, 2 replies; 11+ messages in thread From: Alessandro B Maurici @ 2021-11-23 4:49 UTC (permalink / raw) To: Andrew Lunn; +Cc: netdev, Heiner Kallweit, Russell King, David S. Miller On Tue, 23 Nov 2021 05:18:14 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > On Mon, Nov 22, 2021 at 11:55:48PM -0300, Alessandro B Maurici wrote: > > From: Alessandro B Maurici <abmaurici@gmail.com> > > > > Releases the phy lock before calling phy_link_change to avoid any worker > > thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to > > phy_ethtool_get_link_ksettings inside the link change handler > > I think we need to take a step back here and answer the question, why > does it call phy_ethtool_get_link_ksettings in the link change > handler. I'm not aware of any other MAC driver which does this. > > Andrew I agree, the use in the lan743x seems related to the PTP, that driver seems to be the only one using it, at least in the Linus tree. I think that driver could be patched as there are other ways to do it, but my take on the problem itself is that the PHY device interface opens a way to break the flow and this behavior does not seem to be documented, so, instead of documenting a possible harmful interface while in the callback, we should just get rid of the problem itself, and calling a callback without any locks held seems to be a good alternative. This is also a non critical performance path and the additional code would not impact much, of course it makes the stuff less nice to look at. The patch also has an additional check for the lock, since there is a function that is not calling the lock explicitly and has a warn if the lock is not held at the start, so I put it there to be extra safe. Alessandro ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 4:49 ` Alessandro B Maurici @ 2021-11-23 8:21 ` Heiner Kallweit 2021-11-23 14:11 ` Andrew Lunn 2021-11-23 14:09 ` Andrew Lunn 1 sibling, 1 reply; 11+ messages in thread From: Heiner Kallweit @ 2021-11-23 8:21 UTC (permalink / raw) To: Alessandro B Maurici, Andrew Lunn; +Cc: netdev, Russell King, David S. Miller On 23.11.2021 05:49, Alessandro B Maurici wrote: > On Tue, 23 Nov 2021 05:18:14 +0100 > Andrew Lunn <andrew@lunn.ch> wrote: > >> On Mon, Nov 22, 2021 at 11:55:48PM -0300, Alessandro B Maurici wrote: >>> From: Alessandro B Maurici <abmaurici@gmail.com> >>> >>> Releases the phy lock before calling phy_link_change to avoid any worker >>> thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to >>> phy_ethtool_get_link_ksettings inside the link change handler >> >> I think we need to take a step back here and answer the question, why >> does it call phy_ethtool_get_link_ksettings in the link change >> handler. I'm not aware of any other MAC driver which does this. >> >> Andrew > > I agree, the use in the lan743x seems related to the PTP, that driver seems > to be the only one using it, at least in the Linus tree. > I think that driver could be patched as there are other ways to do it, > but my take on the problem itself is that the PHY device interface opens > a way to break the flow and this behavior does not seem to be documented, > so, instead of documenting a possible harmful interface while in the callback, > we should just get rid of the problem itself, and calling a callback without > any locks held seems to be a good alternative. > This is also a non critical performance path and the additional code > would not impact much, of course it makes the stuff less nice to look at. > The patch also has an additional check for the lock, since there is a > function that is not calling the lock explicitly and has a warn if the lock > is not held at the start, so I put it there to be extra safe. > > Alessandro > Seeing the following code snippet in lan743x_phy_link_status_change() I wonder why it doesn't use phydev->speed and phydev->duplex directly. The current code seems to include unneeded overhead. phy_ethtool_get_link_ksettings(netdev, &ksettings); local_advertisement = linkmode_adv_to_mii_adv_t(phydev->advertising); remote_advertisement = linkmode_adv_to_mii_adv_t(phydev->lp_advertising); lan743x_phy_update_flowcontrol(adapter, ksettings.base.duplex, local_advertisement, remote_advertisement); lan743x_ptp_update_latency(adapter, ksettings.base.speed); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 8:21 ` Heiner Kallweit @ 2021-11-23 14:11 ` Andrew Lunn 2021-11-23 16:06 ` Alessandro B Maurici 0 siblings, 1 reply; 11+ messages in thread From: Andrew Lunn @ 2021-11-23 14:11 UTC (permalink / raw) To: Heiner Kallweit Cc: Alessandro B Maurici, netdev, Russell King, David S. Miller > Seeing the following code snippet in lan743x_phy_link_status_change() > I wonder why it doesn't use phydev->speed and phydev->duplex directly. > The current code seems to include unneeded overhead. Yes, that is the change i would make. When adding the extra locks i missed that a driver was doing something like this. I will check all other callers to see if they are using it in odd contexts. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 14:11 ` Andrew Lunn @ 2021-11-23 16:06 ` Alessandro B Maurici 2021-11-23 20:32 ` Heiner Kallweit 0 siblings, 1 reply; 11+ messages in thread From: Alessandro B Maurici @ 2021-11-23 16:06 UTC (permalink / raw) To: Andrew Lunn; +Cc: Heiner Kallweit, netdev, Russell King, David S. Miller On Tue, 23 Nov 2021 15:11:22 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > Yes, that is the change i would make. When adding the extra locks i > missed that a driver was doing something like this. I will check all > other callers to see if they are using it in odd contexts. > > Andrew Andrew, this kinda of implementation is really hard to get in a fast review, fortunately I happen to be testing one lan743x board with a 5.10.79 kernel that had the new locks in place, and noticed that really fast, but I wrongly assumed that call was okayish since the driver was on stable. If you need to do some testing I will still have the hardware with me for some time. Alessandro ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 16:06 ` Alessandro B Maurici @ 2021-11-23 20:32 ` Heiner Kallweit 2021-11-23 22:31 ` Alessandro B Maurici 0 siblings, 1 reply; 11+ messages in thread From: Heiner Kallweit @ 2021-11-23 20:32 UTC (permalink / raw) To: Alessandro B Maurici; +Cc: netdev, Russell King, David S. Miller, Andrew Lunn On 23.11.2021 17:06, Alessandro B Maurici wrote: > On Tue, 23 Nov 2021 15:11:22 +0100 > Andrew Lunn <andrew@lunn.ch> wrote: > >> Yes, that is the change i would make. When adding the extra locks i >> missed that a driver was doing something like this. I will check all >> other callers to see if they are using it in odd contexts. >> >> Andrew > > Andrew, this kinda of implementation is really hard to get in a fast review, > fortunately I happen to be testing one lan743x board with a 5.10.79 kernel > that had the new locks in place, and noticed that really fast, but I wrongly > assumed that call was okayish since the driver was on stable. > If you need to do some testing I will still have the hardware with me for > some time. > > Alessandro > Great that you have test hw, could you please test the following patch? The duplex argument of lan743x_phy_update_flowcontrol() seems to be some leftover, it isn't used and can be removed. diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c index 4fc97823b..7d7647481 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.c +++ b/drivers/net/ethernet/microchip/lan743x_main.c @@ -914,8 +914,7 @@ static int lan743x_phy_reset(struct lan743x_adapter *adapter) } static void lan743x_phy_update_flowcontrol(struct lan743x_adapter *adapter, - u8 duplex, u16 local_adv, - u16 remote_adv) + u16 local_adv, u16 remote_adv) { struct lan743x_phy *phy = &adapter->phy; u8 cap; @@ -943,7 +942,6 @@ static void lan743x_phy_link_status_change(struct net_device *netdev) phy_print_status(phydev); if (phydev->state == PHY_RUNNING) { - struct ethtool_link_ksettings ksettings; int remote_advertisement = 0; int local_advertisement = 0; @@ -980,18 +978,14 @@ static void lan743x_phy_link_status_change(struct net_device *netdev) } lan743x_csr_write(adapter, MAC_CR, data); - memset(&ksettings, 0, sizeof(ksettings)); - phy_ethtool_get_link_ksettings(netdev, &ksettings); local_advertisement = linkmode_adv_to_mii_adv_t(phydev->advertising); remote_advertisement = linkmode_adv_to_mii_adv_t(phydev->lp_advertising); - lan743x_phy_update_flowcontrol(adapter, - ksettings.base.duplex, - local_advertisement, + lan743x_phy_update_flowcontrol(adapter, local_advertisement, remote_advertisement); - lan743x_ptp_update_latency(adapter, ksettings.base.speed); + lan743x_ptp_update_latency(adapter, phydev->speed); } } -- 2.34.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 20:32 ` Heiner Kallweit @ 2021-11-23 22:31 ` Alessandro B Maurici 0 siblings, 0 replies; 11+ messages in thread From: Alessandro B Maurici @ 2021-11-23 22:31 UTC (permalink / raw) To: Heiner Kallweit; +Cc: netdev, Russell King, David S. Miller, Andrew Lunn On Tue, 23 Nov 2021 21:32:56 +0100 Heiner Kallweit <hkallweit1@gmail.com> wrote: > Great that you have test hw, could you please test the following patch? > The duplex argument of lan743x_phy_update_flowcontrol() seems to be some > leftover, it isn't used and can be removed. > > > diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c > index 4fc97823b..7d7647481 100644 > --- a/drivers/net/ethernet/microchip/lan743x_main.c > +++ b/drivers/net/ethernet/microchip/lan743x_main.c > @@ -914,8 +914,7 @@ static int lan743x_phy_reset(struct lan743x_adapter *adapter) > } > > static void lan743x_phy_update_flowcontrol(struct lan743x_adapter *adapter, > - u8 duplex, u16 local_adv, > - u16 remote_adv) > + u16 local_adv, u16 remote_adv) > { > struct lan743x_phy *phy = &adapter->phy; > u8 cap; > @@ -943,7 +942,6 @@ static void lan743x_phy_link_status_change(struct net_device *netdev) > > phy_print_status(phydev); > if (phydev->state == PHY_RUNNING) { > - struct ethtool_link_ksettings ksettings; > int remote_advertisement = 0; > int local_advertisement = 0; > > @@ -980,18 +978,14 @@ static void lan743x_phy_link_status_change(struct net_device *netdev) > } > lan743x_csr_write(adapter, MAC_CR, data); > > - memset(&ksettings, 0, sizeof(ksettings)); > - phy_ethtool_get_link_ksettings(netdev, &ksettings); > local_advertisement = > linkmode_adv_to_mii_adv_t(phydev->advertising); > remote_advertisement = > linkmode_adv_to_mii_adv_t(phydev->lp_advertising); > > - lan743x_phy_update_flowcontrol(adapter, > - ksettings.base.duplex, > - local_advertisement, > + lan743x_phy_update_flowcontrol(adapter, local_advertisement, > remote_advertisement); > - lan743x_ptp_update_latency(adapter, ksettings.base.speed); > + lan743x_ptp_update_latency(adapter, phydev->speed); > } > } > Patch tested and working as expected. Alessandro ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 4:49 ` Alessandro B Maurici 2021-11-23 8:21 ` Heiner Kallweit @ 2021-11-23 14:09 ` Andrew Lunn 2021-11-23 14:14 ` Russell King (Oracle) 2021-11-23 15:58 ` Alessandro B Maurici 1 sibling, 2 replies; 11+ messages in thread From: Andrew Lunn @ 2021-11-23 14:09 UTC (permalink / raw) To: Alessandro B Maurici Cc: netdev, Heiner Kallweit, Russell King, David S. Miller On Tue, Nov 23, 2021 at 01:49:46AM -0300, Alessandro B Maurici wrote: > On Tue, 23 Nov 2021 05:18:14 +0100 > Andrew Lunn <andrew@lunn.ch> wrote: > > > On Mon, Nov 22, 2021 at 11:55:48PM -0300, Alessandro B Maurici wrote: > > > From: Alessandro B Maurici <abmaurici@gmail.com> > > > > > > Releases the phy lock before calling phy_link_change to avoid any worker > > > thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to > > > phy_ethtool_get_link_ksettings inside the link change handler > > > > I think we need to take a step back here and answer the question, why > > does it call phy_ethtool_get_link_ksettings in the link change > > handler. I'm not aware of any other MAC driver which does this. > > > > Andrew > > I agree, the use in the lan743x seems related to the PTP, that driver seems > to be the only one using it, at least in the Linus tree. > I think that driver could be patched as there are other ways to do it, > but my take on the problem itself is that the PHY device interface opens > a way to break the flow and this behavior does not seem to be documented, > so, instead of documenting a possible harmful interface while in the callback, > we should just get rid of the problem itself, and calling a callback without > any locks held seems to be a good alternative. That is a really bad alternative. It is only because the lock is held can the MAC driver actually trust anything passed to it. The callback needs phydev->speed, phydev->duplex, etc, and they can change at any time when the lock is not held. The values can be inconsistent with each other, etc, unless the lock is held. The callback has always had the lock held, so is safe. However, recently a few bugs have been reported and fixed for functions like phy_ethtool_get_link_ksettings() and phy_ethtool_set_link_ksettings() where they have accessed phydev members without the lock and got inconsistent values in race condition. These are hard race conditions to reproduce, but a deadlock like this is very obvious, easy to fix. I would also say that _ethtool_ in the function name is also a good hit this is intended to be used for an ethtool callback. Lets remove the inappropriate use of phy_ethtool_get_link_ksettings() here. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 14:09 ` Andrew Lunn @ 2021-11-23 14:14 ` Russell King (Oracle) 2021-11-23 15:58 ` Alessandro B Maurici 1 sibling, 0 replies; 11+ messages in thread From: Russell King (Oracle) @ 2021-11-23 14:14 UTC (permalink / raw) To: Andrew Lunn Cc: Alessandro B Maurici, netdev, Heiner Kallweit, David S. Miller On Tue, Nov 23, 2021 at 03:09:04PM +0100, Andrew Lunn wrote: > On Tue, Nov 23, 2021 at 01:49:46AM -0300, Alessandro B Maurici wrote: > > On Tue, 23 Nov 2021 05:18:14 +0100 > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > On Mon, Nov 22, 2021 at 11:55:48PM -0300, Alessandro B Maurici wrote: > > > > From: Alessandro B Maurici <abmaurici@gmail.com> > > > > > > > > Releases the phy lock before calling phy_link_change to avoid any worker > > > > thread lockup. Some network drivers(eg Microchip's LAN743x), make a call to > > > > phy_ethtool_get_link_ksettings inside the link change handler > > > > > > I think we need to take a step back here and answer the question, why > > > does it call phy_ethtool_get_link_ksettings in the link change > > > handler. I'm not aware of any other MAC driver which does this. > > > > > > Andrew > > > > I agree, the use in the lan743x seems related to the PTP, that driver seems > > to be the only one using it, at least in the Linus tree. > > I think that driver could be patched as there are other ways to do it, > > but my take on the problem itself is that the PHY device interface opens > > a way to break the flow and this behavior does not seem to be documented, > > so, instead of documenting a possible harmful interface while in the callback, > > we should just get rid of the problem itself, and calling a callback without > > any locks held seems to be a good alternative. > > That is a really bad alternative. It is only because the lock is held > can the MAC driver actually trust anything passed to it. The callback > needs phydev->speed, phydev->duplex, etc, and they can change at any > time when the lock is not held. The values can be inconsistent with > each other, etc, unless the lock is held. > > The callback has always had the lock held, so is safe. However, > recently a few bugs have been reported and fixed for functions like > phy_ethtool_get_link_ksettings() and phy_ethtool_set_link_ksettings() > where they have accessed phydev members without the lock and got > inconsistent values in race condition. These are hard race conditions > to reproduce, but a deadlock like this is very obvious, easy to fix. I > would also say that _ethtool_ in the function name is also a good hit > this is intended to be used for an ethtool callback. > > Lets remove the inappropriate use of phy_ethtool_get_link_ksettings() > here. 100% agreed. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] phy: fix possible double lock calling link changed handler 2021-11-23 14:09 ` Andrew Lunn 2021-11-23 14:14 ` Russell King (Oracle) @ 2021-11-23 15:58 ` Alessandro B Maurici 1 sibling, 0 replies; 11+ messages in thread From: Alessandro B Maurici @ 2021-11-23 15:58 UTC (permalink / raw) To: Andrew Lunn; +Cc: netdev, Heiner Kallweit, Russell King, David S. Miller On Tue, 23 Nov 2021 15:09:04 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > The callback has always had the lock held, so is safe. However, > recently a few bugs have been reported and fixed for functions like > phy_ethtool_get_link_ksettings() and phy_ethtool_set_link_ksettings() > where they have accessed phydev members without the lock and got > inconsistent values in race condition. These are hard race conditions > to reproduce, but a deadlock like this is very obvious, easy to fix. I > would also say that _ethtool_ in the function name is also a good hit > this is intended to be used for an ethtool callback. > > Lets remove the inappropriate use of phy_ethtool_get_link_ksettings() > here. > > Andrew Yes, I was under the impression because the lan743x driver used that way, this was an expected use case, and that why the patch, but you are 100% correct that the phy_dev information sent to the call back would be unprotected if used with that patch. My mistake. Alessandro ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-11-23 22:31 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-23 2:55 [PATCH] phy: fix possible double lock calling link changed handler Alessandro B Maurici 2021-11-23 4:18 ` Andrew Lunn 2021-11-23 4:49 ` Alessandro B Maurici 2021-11-23 8:21 ` Heiner Kallweit 2021-11-23 14:11 ` Andrew Lunn 2021-11-23 16:06 ` Alessandro B Maurici 2021-11-23 20:32 ` Heiner Kallweit 2021-11-23 22:31 ` Alessandro B Maurici 2021-11-23 14:09 ` Andrew Lunn 2021-11-23 14:14 ` Russell King (Oracle) 2021-11-23 15:58 ` Alessandro B Maurici
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).