netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
@ 2025-01-17 17:36 Kory Maincent
  2025-01-17 19:06 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Kory Maincent @ 2025-01-17 17:36 UTC (permalink / raw)
  To: Kory Maincent, David S. Miller, netdev, linux-kernel
  Cc: Claudiu Beznea, thomas.petazzoni, Andrew Lunn, Heiner Kallweit,
	Russell King, Eric Dumazet, Jakub Kicinski, Paolo Abeni

The phy_detach function can be called with or without the rtnl lock held.
When the rtnl lock is not held, using rtnl_dereference() triggers a
warning due to the lack of lock context.

Add an rcu_read_lock() to ensure the lock is acquired and to maintain
synchronization.

Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
Fixes: 35f7cad1743e ("net: Add the possibility to support a selected hwtstamp in netdevice")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---

Changes in v2:
- Add a missing ;
---
 drivers/net/phy/phy_device.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 5b34d39d1d52..3eeee7cba923 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
 	if (dev) {
 		struct hwtstamp_provider *hwprov;
 
-		hwprov = rtnl_dereference(dev->hwprov);
+		rcu_read_lock();
+		hwprov = rcu_dereference(dev->hwprov);
 		/* Disable timestamp if it is the one selected */
 		if (hwprov && hwprov->phydev == phydev) {
 			rcu_assign_pointer(dev->hwprov, NULL);
 			kfree_rcu(hwprov, rcu_head);
 		}
+		rcu_read_unlock();
 
 		phydev->attached_dev->phydev = NULL;
 		phydev->attached_dev = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-17 17:36 [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage Kory Maincent
@ 2025-01-17 19:06 ` Eric Dumazet
  2025-01-17 22:16   ` Kory Maincent
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2025-01-17 19:06 UTC (permalink / raw)
  To: Kory Maincent
  Cc: David S. Miller, netdev, linux-kernel, Claudiu Beznea,
	thomas.petazzoni, Andrew Lunn, Heiner Kallweit, Russell King,
	Jakub Kicinski, Paolo Abeni

On Fri, Jan 17, 2025 at 6:36 PM Kory Maincent <kory.maincent@bootlin.com> wrote:
>
> The phy_detach function can be called with or without the rtnl lock held.
> When the rtnl lock is not held, using rtnl_dereference() triggers a
> warning due to the lack of lock context.
>
> Add an rcu_read_lock() to ensure the lock is acquired and to maintain
> synchronization.
>
> Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
> Fixes: 35f7cad1743e ("net: Add the possibility to support a selected hwtstamp in netdevice")
> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
> ---
>
> Changes in v2:
> - Add a missing ;
> ---
>  drivers/net/phy/phy_device.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 5b34d39d1d52..3eeee7cba923 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
>         if (dev) {
>                 struct hwtstamp_provider *hwprov;
>
> -               hwprov = rtnl_dereference(dev->hwprov);
> +               rcu_read_lock();
> +               hwprov = rcu_dereference(dev->hwprov);
>                 /* Disable timestamp if it is the one selected */
>                 if (hwprov && hwprov->phydev == phydev) {
>                         rcu_assign_pointer(dev->hwprov, NULL);
>                         kfree_rcu(hwprov, rcu_head);
>                 }
> +               rcu_read_unlock();
>
>                 phydev->attached_dev->phydev = NULL;
>                 phydev->attached_dev = NULL;
> --
> 2.34.1
>

If not protected by RTNL, what prevents two threads from calling this
function at the same time,
thus attempting to kfree_rcu() the same pointer twice ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-17 19:06 ` Eric Dumazet
@ 2025-01-17 22:16   ` Kory Maincent
  2025-01-18  3:07     ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Kory Maincent @ 2025-01-17 22:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, linux-kernel, Claudiu Beznea,
	thomas.petazzoni, Andrew Lunn, Heiner Kallweit, Russell King,
	Jakub Kicinski, Paolo Abeni

On Fri, 17 Jan 2025 20:06:28 +0100
Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Jan 17, 2025 at 6:36 PM Kory Maincent <kory.maincent@bootlin.com>
> wrote:
> >
> > The phy_detach function can be called with or without the rtnl lock held.
> > When the rtnl lock is not held, using rtnl_dereference() triggers a
> > warning due to the lack of lock context.
> >
> > Add an rcu_read_lock() to ensure the lock is acquired and to maintain
> > synchronization.
> >
> > Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > Closes:
> > https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
> > Fixes: 35f7cad1743e ("net: Add the possibility to support a selected
> > hwtstamp in netdevice") Signed-off-by: Kory Maincent
> > <kory.maincent@bootlin.com> ---
> >
> > Changes in v2:
> > - Add a missing ;
> > ---
> >  drivers/net/phy/phy_device.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 5b34d39d1d52..3eeee7cba923 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
> >         if (dev) {
> >                 struct hwtstamp_provider *hwprov;
> >
> > -               hwprov = rtnl_dereference(dev->hwprov);
> > +               rcu_read_lock();
> > +               hwprov = rcu_dereference(dev->hwprov);
> >                 /* Disable timestamp if it is the one selected */
> >                 if (hwprov && hwprov->phydev == phydev) {
> >                         rcu_assign_pointer(dev->hwprov, NULL);
> >                         kfree_rcu(hwprov, rcu_head);
> >                 }
> > +               rcu_read_unlock();
> >
> >                 phydev->attached_dev->phydev = NULL;
> >                 phydev->attached_dev = NULL;
> > --
> > 2.34.1
> >  
> 
> If not protected by RTNL, what prevents two threads from calling this
> function at the same time,
> thus attempting to kfree_rcu() the same pointer twice ?

I don't think this function can be called simultaneously from two threads,
if this were the case we would have already seen several issues with the phydev
pointer. But maybe I am wrong.

The rcu_lock here is to prevent concurrent dev->hwprov pointer modification done
under rtnl_lock in net/ethtool/tsconfig.c.

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-17 22:16   ` Kory Maincent
@ 2025-01-18  3:07     ` Jakub Kicinski
  2025-01-19 12:45       ` Kory Maincent
  2025-01-20  9:37       ` Kory Maincent
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Kicinski @ 2025-01-18  3:07 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel,
	Claudiu Beznea, thomas.petazzoni, Andrew Lunn, Heiner Kallweit,
	Russell King, Paolo Abeni

On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:
> > If not protected by RTNL, what prevents two threads from calling this
> > function at the same time,
> > thus attempting to kfree_rcu() the same pointer twice ?  
> 
> I don't think this function can be called simultaneously from two threads,
> if this were the case we would have already seen several issues with the phydev
> pointer. But maybe I am wrong.
> 
> The rcu_lock here is to prevent concurrent dev->hwprov pointer modification done
> under rtnl_lock in net/ethtool/tsconfig.c.

I could also be wrong, but I don't recall being told that suspend path
can't race with anything else. So I think ravb should probably take
rtnl_lock or some such when its shutting itself down.. ?

If I'm wrong I think we should mention this is from suspend and
add Claudiu's stack trace to the commit msg.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-18  3:07     ` Jakub Kicinski
@ 2025-01-19 12:45       ` Kory Maincent
  2025-01-19 14:27         ` Russell King (Oracle)
  2025-01-20  9:37       ` Kory Maincent
  1 sibling, 1 reply; 9+ messages in thread
From: Kory Maincent @ 2025-01-19 12:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel,
	Claudiu Beznea, thomas.petazzoni, Andrew Lunn, Heiner Kallweit,
	Russell King, Paolo Abeni

On Fri, 17 Jan 2025 19:07:20 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:
> > > If not protected by RTNL, what prevents two threads from calling this
> > > function at the same time,
> > > thus attempting to kfree_rcu() the same pointer twice ?    
> > 
> > I don't think this function can be called simultaneously from two threads,
> > if this were the case we would have already seen several issues with the
> > phydev pointer. But maybe I am wrong.
> > 
> > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification
> > done under rtnl_lock in net/ethtool/tsconfig.c.  
> 
> I could also be wrong, but I don't recall being told that suspend path
> can't race with anything else. So I think ravb should probably take
> rtnl_lock or some such when its shutting itself down.. ?

Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe
also in phy_attach to be consistent)
Even thought, I think it may raise lots of warning from other NIT drivers.
 
> If I'm wrong I think we should mention this is from suspend and
> add Claudiu's stack trace to the commit msg.

Ack.

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-19 12:45       ` Kory Maincent
@ 2025-01-19 14:27         ` Russell King (Oracle)
  2025-01-19 16:27           ` Kory Maincent
  0 siblings, 1 reply; 9+ messages in thread
From: Russell King (Oracle) @ 2025-01-19 14:27 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Jakub Kicinski, Eric Dumazet, David S. Miller, netdev,
	linux-kernel, Claudiu Beznea, thomas.petazzoni, Andrew Lunn,
	Heiner Kallweit, Paolo Abeni

On Sun, Jan 19, 2025 at 01:45:18PM +0100, Kory Maincent wrote:
> On Fri, 17 Jan 2025 19:07:20 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:
> > > > If not protected by RTNL, what prevents two threads from calling this
> > > > function at the same time,
> > > > thus attempting to kfree_rcu() the same pointer twice ?    
> > > 
> > > I don't think this function can be called simultaneously from two threads,
> > > if this were the case we would have already seen several issues with the
> > > phydev pointer. But maybe I am wrong.
> > > 
> > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification
> > > done under rtnl_lock in net/ethtool/tsconfig.c.  
> > 
> > I could also be wrong, but I don't recall being told that suspend path
> > can't race with anything else. So I think ravb should probably take
> > rtnl_lock or some such when its shutting itself down.. ?
> 
> Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe
> also in phy_attach to be consistent)
> Even thought, I think it may raise lots of warning from other NIT drivers.

How many drivers use phy_detach() ?

The answer is... phylink, bcm genet and xgbe.

Of the phylink ones:

1. phylink_connect_phy() - for use by drivers. This had better be
   called _before_ the netdev is registered (without rtnl) or
   from .ndo_open that holds the RTNL.

2. phylink_fwnode_phy_connect() - same as above.

3. phylink_sfp_config_phy(), called from the SFP code, and its state
   machines. It will be holding RTNL, because it is only safe to
   attach and detach PHYs from a registered netdev while holding RTNL.

I haven't looked any further.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-19 14:27         ` Russell King (Oracle)
@ 2025-01-19 16:27           ` Kory Maincent
  0 siblings, 0 replies; 9+ messages in thread
From: Kory Maincent @ 2025-01-19 16:27 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Eric Dumazet, David S. Miller, netdev,
	linux-kernel, Claudiu Beznea, thomas.petazzoni, Andrew Lunn,
	Heiner Kallweit, Paolo Abeni

On Sun, 19 Jan 2025 14:27:53 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Sun, Jan 19, 2025 at 01:45:18PM +0100, Kory Maincent wrote:
> > On Fri, 17 Jan 2025 19:07:20 -0800
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >   
> > > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:  
>  [...]  
>  [...]  
> > > 
> > > I could also be wrong, but I don't recall being told that suspend path
> > > can't race with anything else. So I think ravb should probably take
> > > rtnl_lock or some such when its shutting itself down.. ?  
> > 
> > Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe
> > also in phy_attach to be consistent)
> > Even thought, I think it may raise lots of warning from other NIT drivers.  
> 
> How many drivers use phy_detach() ?
> 
> The answer is... phylink, bcm genet and xgbe.

phy_detach() is also called by phy_disconnect() which is much more used by the
net drivers.

> Of the phylink ones:
> 
> 1. phylink_connect_phy() - for use by drivers. This had better be
>    called _before_ the netdev is registered (without rtnl) or
>    from .ndo_open that holds the RTNL.
> 
> 2. phylink_fwnode_phy_connect() - same as above.
> 
> 3. phylink_sfp_config_phy(), called from the SFP code, and its state
>    machines. It will be holding RTNL, because it is only safe to
>    attach and detach PHYs from a registered netdev while holding RTNL.
> 
> I haven't looked any further.
> 



-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-18  3:07     ` Jakub Kicinski
  2025-01-19 12:45       ` Kory Maincent
@ 2025-01-20  9:37       ` Kory Maincent
  2025-01-20 10:31         ` Russell King (Oracle)
  1 sibling, 1 reply; 9+ messages in thread
From: Kory Maincent @ 2025-01-20  9:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel,
	Claudiu Beznea, thomas.petazzoni, Andrew Lunn, Heiner Kallweit,
	Russell King, Paolo Abeni

On Fri, 17 Jan 2025 19:07:20 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:
> > > If not protected by RTNL, what prevents two threads from calling this
> > > function at the same time,
> > > thus attempting to kfree_rcu() the same pointer twice ?    
> > 
> > I don't think this function can be called simultaneously from two threads,
> > if this were the case we would have already seen several issues with the
> > phydev pointer. But maybe I am wrong.
> > 
> > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification
> > done under rtnl_lock in net/ethtool/tsconfig.c.  
> 
> I could also be wrong, but I don't recall being told that suspend path
> can't race with anything else. So I think ravb should probably take
> rtnl_lock or some such when its shutting itself down.. ?
> 
> If I'm wrong I think we should mention this is from suspend and
> add Claudiu's stack trace to the commit msg.

Is it ok if I send the v3 fix in net-next even if it is closed?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage
  2025-01-20  9:37       ` Kory Maincent
@ 2025-01-20 10:31         ` Russell King (Oracle)
  0 siblings, 0 replies; 9+ messages in thread
From: Russell King (Oracle) @ 2025-01-20 10:31 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Jakub Kicinski, Eric Dumazet, David S. Miller, netdev,
	linux-kernel, Claudiu Beznea, thomas.petazzoni, Andrew Lunn,
	Heiner Kallweit, Paolo Abeni

On Mon, Jan 20, 2025 at 10:37:22AM +0100, Kory Maincent wrote:
> On Fri, 17 Jan 2025 19:07:20 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote:
> > > > If not protected by RTNL, what prevents two threads from calling this
> > > > function at the same time,
> > > > thus attempting to kfree_rcu() the same pointer twice ?    
> > > 
> > > I don't think this function can be called simultaneously from two threads,
> > > if this were the case we would have already seen several issues with the
> > > phydev pointer. But maybe I am wrong.
> > > 
> > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification
> > > done under rtnl_lock in net/ethtool/tsconfig.c.  
> > 
> > I could also be wrong, but I don't recall being told that suspend path
> > can't race with anything else. So I think ravb should probably take
> > rtnl_lock or some such when its shutting itself down.. ?
> > 
> > If I'm wrong I think we should mention this is from suspend and
> > add Claudiu's stack trace to the commit msg.
> 
> Is it ok if I send the v3 fix in net-next even if it is closed?

In general, fixes are still accepted into net-next if the pull request
hasn't been sent and the code that is being fixed is only in net-next.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-20 10:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 17:36 [PATCH net-next v2] net: phy: Fix suspicious rcu_dereference usage Kory Maincent
2025-01-17 19:06 ` Eric Dumazet
2025-01-17 22:16   ` Kory Maincent
2025-01-18  3:07     ` Jakub Kicinski
2025-01-19 12:45       ` Kory Maincent
2025-01-19 14:27         ` Russell King (Oracle)
2025-01-19 16:27           ` Kory Maincent
2025-01-20  9:37       ` Kory Maincent
2025-01-20 10:31         ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).