devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
@ 2015-09-24 19:17 Russell King - ARM Linux
  2015-09-24 19:18 ` [PATCH 5/9] of_mdio: fix MDIO phy device refcounting Russell King
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2015-09-24 19:17 UTC (permalink / raw)
  To: Florian Fainelli, David Miller
  Cc: Thomas Petazzoni, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Sunil Goutham, Robert Richter, Frank Rowand,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Rob Herring, Michal Simek,
	netdev-u79uwXL29TY76Z2rM5mHXA, Soren Brinkmann,
	Iyappan Subramanian, Grant Likely, Li Yang, Keyur Chudgar,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi,

The third version of this series fixes the build error which David
identified, and drops the broken changes for the Cavium Thunger BGX
ethernet driver as this driver requires some complex changes to
resolve the leakage - and this is best done by people who can test
the driver.

Compared to v2, the only patch which has changed is patch 6
  "net: fix phy refcounting in a bunch of drivers"

I _think_ I've been able to build-test all the drivers touched by
that patch to some degree now, though several of them needed the
Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
their dependencies.)

Previous cover letters below:

This is the second version of the series, with the comments David had
on the first patch fixed up.  Original series description with updated
diffstat below.

While looking at the DSA code, I noticed we have a
of_find_net_device_by_node(), and it looks like users of that are
similarly buggy - it looks like net/dsa/dsa.c is the only user.  Fix
that too.

Hi,

While looking at the phy code, I identified a number of weaknesses
where refcounting on device structures was being leaked, where
modules could be removed while in-use, and where the fixed-phy could
end up having unintended consequences caused by incorrect calls to
fixed_phy_update_state().

This patch series resolves those issues, some of which were discovered
with testing on an Armada 388 board.  Not all patches are fully tested,
particularly the one which touches several network drivers.

When resolving the struct device refcounting problems, several different
solutions were considered before settling on the implementation here -
one of the considerations was to avoid touching many network drivers.
The solution here is:

	phy_attach*() - takes a refcount
	phy_detach*() - drops the phy_attach refcount

Provided drivers always attach and detach their phys, which they should
already be doing, this should change nothing, even if they leak a refcount.

	of_phy_find_device() and of_* functions which use that take
	a refcount.  Arrange for this refcount to be dropped once
	the phy is attached.

This is the reason why the previous change is important - we can't drop
this refcount taken by of_phy_find_device() until something else holds
a reference on the device.  This resolves the leaked refcount caused by
using of_phy_connect() or of_phy_attach().

Even without the above changes, these drivers are leaking by calling
of_phy_find_device().  These drivers are addressed by adding the
appropriate release of that refcount.

The mdiobus code also suffered from the same kind of leak, but thankfully
this only happened in one place - the mdio-mux code.

I also found that the try_module_get() in the phy layer code was utterly
useless: phydev->dev.driver was guaranteed to always be NULL, so
try_module_get() was always being called with a NULL argument.  I proved
this with my SFP code, which declares its own MDIO bus - the module use
count was never incremented irrespective of how I set the MDIO bus up.
This allowed the MDIO bus code to be removed from the kernel while there
were still PHYs attached to it.

One other bug was discovered: while using in-band-status with mvneta, it
was found that if a real phy is attached with in-band-status enabled,
and another ethernet interface is using the fixed-phy infrastructure, the
interface using the fixed-phy infrastructure is configured according to
the other interface using the in-band-status - which is caused by the
fixed-phy code not verifying that the phy_device passed in is actually
a fixed-phy device, rather than a real MDIO phy.

Lastly, having mdio_bus reversing phy_device_register() internals seems
like a layering violation - it's trivial to move that code to the phy
device layer.

 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 24 ++++++----
 drivers/net/ethernet/freescale/gianfar.c       |  6 ++-
 drivers/net/ethernet/freescale/ucc_geth.c      |  8 +++-
 drivers/net/ethernet/marvell/mvneta.c          |  2 +
 drivers/net/ethernet/xilinx/xilinx_emaclite.c  |  2 +
 drivers/net/phy/fixed_phy.c                    |  2 +-
 drivers/net/phy/mdio-mux.c                     | 19 +++++---
 drivers/net/phy/mdio_bus.c                     | 24 ++++++----
 drivers/net/phy/phy_device.c                   | 62 ++++++++++++++++++++------
 drivers/of/of_mdio.c                           | 27 +++++++++--
 include/linux/phy.h                            |  6 ++-
 net/core/net-sysfs.c                           |  9 ++++
 net/dsa/dsa.c                                  | 41 ++++++++++++++---
 13 files changed, 181 insertions(+), 51 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/9] of_mdio: fix MDIO phy device refcounting
  2015-09-24 19:17 [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Russell King - ARM Linux
@ 2015-09-24 19:18 ` Russell King
  2015-09-24 22:20   ` Rob Herring
  2015-09-24 21:57 ` [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Andrew Lunn
  2015-09-25  1:39 ` Florian Fainelli
  2 siblings, 1 reply; 13+ messages in thread
From: Russell King @ 2015-09-24 19:18 UTC (permalink / raw)
  To: Jason Cooper, Thomas Petazzoni
  Cc: Florian Fainelli, Rob Herring, Frank Rowand, Grant Likely, netdev,
	devicetree

bus_find_device() is defined as:

 * This is similar to the bus_for_each_dev() function above, but it
 * returns a reference to a device that is 'found' for later use, as
 * determined by the @match callback.

and it does indeed return a reference-counted pointer to the device:

        while ((dev = next_device(&i)))
                if (match(dev, data) && get_device(dev))
                                        ^^^^^^^^^^^^^^^
                        break;
        klist_iter_exit(&i);
        return dev;

What that means is that when we're done with the struct device, we must
drop that reference.  Neither of_phy_connect() nor of_phy_attach() did
this when phy_connect_direct() or phy_attach_direct() failed.

With our previous patch, phy_connect_direct() and phy_attach_direct()
take a new refcount on the phy device when successful, so we can drop
our local reference immediatley after these functions, whether or not
they succeeded.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 drivers/of/of_mdio.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 1350fa25cdb0..a87a868fed64 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -197,7 +197,8 @@ static int of_phy_match(struct device *dev, void *phy_np)
  * of_phy_find_device - Give a PHY node, find the phy_device
  * @phy_np: Pointer to the phy's device tree node
  *
- * Returns a pointer to the phy_device.
+ * If successful, returns a pointer to the phy_device with the embedded
+ * struct device refcount incremented by one, or NULL on failure.
  */
 struct phy_device *of_phy_find_device(struct device_node *phy_np)
 {
@@ -217,7 +218,9 @@ EXPORT_SYMBOL(of_phy_find_device);
  * @hndlr: Link state callback for the network device
  * @iface: PHY data interface type
  *
- * Returns a pointer to the phy_device if successful.  NULL otherwise
+ * If successful, returns a pointer to the phy_device with the embedded
+ * struct device refcount incremented by one, or NULL on failure. The
+ * refcount must be dropped by calling phy_disconnect() or phy_detach().
  */
 struct phy_device *of_phy_connect(struct net_device *dev,
 				  struct device_node *phy_np,
@@ -225,13 +228,19 @@ struct phy_device *of_phy_connect(struct net_device *dev,
 				  phy_interface_t iface)
 {
 	struct phy_device *phy = of_phy_find_device(phy_np);
+	int ret;
 
 	if (!phy)
 		return NULL;
 
 	phy->dev_flags = flags;
 
-	return phy_connect_direct(dev, phy, hndlr, iface) ? NULL : phy;
+	ret = phy_connect_direct(dev, phy, hndlr, iface);
+
+	/* refcount is held by phy_connect_direct() on success */
+	put_device(&phy->dev);
+
+	return ret ? NULL : phy;
 }
 EXPORT_SYMBOL(of_phy_connect);
 
@@ -241,17 +250,27 @@ EXPORT_SYMBOL(of_phy_connect);
  * @phy_np: Node pointer for the PHY
  * @flags: flags to pass to the PHY
  * @iface: PHY data interface type
+ *
+ * If successful, returns a pointer to the phy_device with the embedded
+ * struct device refcount incremented by one, or NULL on failure. The
+ * refcount must be dropped by calling phy_disconnect() or phy_detach().
  */
 struct phy_device *of_phy_attach(struct net_device *dev,
 				 struct device_node *phy_np, u32 flags,
 				 phy_interface_t iface)
 {
 	struct phy_device *phy = of_phy_find_device(phy_np);
+	int ret;
 
 	if (!phy)
 		return NULL;
 
-	return phy_attach_direct(dev, phy, flags, iface) ? NULL : phy;
+	ret = phy_attach_direct(dev, phy, flags, iface);
+
+	/* refcount is held by phy_attach_direct() on success */
+	put_device(&phy->dev);
+
+	return ret ? NULL : phy;
 }
 EXPORT_SYMBOL(of_phy_attach);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 19:17 [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Russell King - ARM Linux
  2015-09-24 19:18 ` [PATCH 5/9] of_mdio: fix MDIO phy device refcounting Russell King
@ 2015-09-24 21:57 ` Andrew Lunn
  2015-09-24 22:15   ` Russell King - ARM Linux
  2015-09-24 22:15   ` David Miller
  2015-09-25  1:39 ` Florian Fainelli
  2 siblings, 2 replies; 13+ messages in thread
From: Andrew Lunn @ 2015-09-24 21:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Florian Fainelli, David Miller, Thomas Petazzoni, devicetree,
	Sunil Goutham, Robert Richter, Frank Rowand, linuxppc-dev,
	linux-kernel, Rob Herring, Michal Simek, netdev, Soren Brinkmann,
	Iyappan Subramanian, Grant Likely, Li Yang, Keyur Chudgar,
	linux-arm-kernel

...

> While looking at the DSA code, I noticed we have a
> of_find_net_device_by_node(), and it looks like users of that are
> similarly buggy - it looks like net/dsa/dsa.c is the only user.  Fix
> that too.

...
 
> The mdiobus code also suffered from the same kind of leak, but thankfully
> this only happened in one place - the mdio-mux code.

Hi Russell

I tested both of these with my board. It is a Freescale Vybrid, using
the FEC ethernet driver, and i have three switches attached, using
mdio-mux to give three mdio busses.

No obvious regressions, my board boots, the switches are all present
and correct. I built the FEC driver as a module, and it won't unload:

 kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
unregister_netdevice: waiting for eth1 to become free. Usage count = 1

i assume because DSA holds a reference. I've not tried a fully module
build, DSA has issues with that.

Tested-by: Andrew Lunn <andrew@lunn.ch>

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 21:57 ` [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Andrew Lunn
@ 2015-09-24 22:15   ` Russell King - ARM Linux
       [not found]     ` <20150924221541.GF21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
  2015-09-24 22:15   ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2015-09-24 22:15 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Thomas Petazzoni, devicetree,
	Sunil Goutham, Robert Richter, Frank Rowand, linuxppc-dev,
	linux-kernel, Rob Herring, Michal Simek, netdev, Soren Brinkmann,
	Iyappan Subramanian, Grant Likely, Li Yang, Keyur Chudgar,
	linux-arm-kernel

On Thu, Sep 24, 2015 at 11:57:31PM +0200, Andrew Lunn wrote:
> Hi Russell
> 
> I tested both of these with my board. It is a Freescale Vybrid, using
> the FEC ethernet driver, and i have three switches attached, using
> mdio-mux to give three mdio busses.
> 
> No obvious regressions, my board boots, the switches are all present
> and correct. I built the FEC driver as a module, and it won't unload:
> 
>  kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> 
> i assume because DSA holds a reference. I've not tried a fully module
> build, DSA has issues with that.
> 
> Tested-by: Andrew Lunn <andrew@lunn.ch>

Thanks for testing.  Please could you confirm whether the same behaviour
is observed without the patches, just to make absolutely sure that isn't
a regression.

However, I think you are correct - I'm unable to locate where in the
DSA code:
- dst->master_dev's dev_hold() is undone (hence a reference left)
- dst is freed - dsa_probe() allocates it using kzalloc(), but
  dsa_remove() and it's children don't free this structure.

There's no notifier which detects whether the underlying device has
gone away - it registers a netdev notifier (dsa_slave_netdevice_event)
but this only deals with slave devices, not the master device.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 21:57 ` [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Andrew Lunn
  2015-09-24 22:15   ` Russell King - ARM Linux
@ 2015-09-24 22:15   ` David Miller
       [not found]     ` <20150924.151554.619662567057050978.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2015-09-24 22:15 UTC (permalink / raw)
  To: andrew
  Cc: linux, f.fainelli, thomas.petazzoni, devicetree, sgoutham, rric,
	frowand.list, linuxppc-dev, linux-kernel, robh+dt, michal.simek,
	netdev, soren.brinkmann, isubramanian, grant.likely, leoli,
	kchudgar, linux-arm-kernel

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 24 Sep 2015 23:57:31 +0200

> I built the FEC driver as a module, and it won't unload:
> 
>  kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> 
> i assume because DSA holds a reference. I've not tried a fully module
> build, DSA has issues with that.
> 
> Tested-by: Andrew Lunn <andrew@lunn.ch>

So, is this a regression?

Please don't provide a "Tested-by: " tag is you encounter a new
problem which could have been introduced by the changes in question.
That _REALLY_ screws everything up for me.

Thank.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/9] of_mdio: fix MDIO phy device refcounting
  2015-09-24 19:18 ` [PATCH 5/9] of_mdio: fix MDIO phy device refcounting Russell King
@ 2015-09-24 22:20   ` Rob Herring
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Herring @ 2015-09-24 22:20 UTC (permalink / raw)
  To: Russell King
  Cc: Jason Cooper, Thomas Petazzoni, Florian Fainelli, Rob Herring,
	Frank Rowand, Grant Likely, netdev, devicetree@vger.kernel.org



On Thu, Sep 24, 2015 at 2:18 PM, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
> bus_find_device() is defined as:
>
>  * This is similar to the bus_for_each_dev() function above, but it
>  * returns a reference to a device that is 'found' for later use, as
>  * determined by the @match callback.
>
> and it does indeed return a reference-counted pointer to the device:
>
>         while ((dev = next_device(&i)))
>                 if (match(dev, data) && get_device(dev))
>                                         ^^^^^^^^^^^^^^^
>                         break;
>         klist_iter_exit(&i);
>         return dev;
>
> What that means is that when we're done with the struct device, we must
> drop that reference.  Neither of_phy_connect() nor of_phy_attach() did
> this when phy_connect_direct() or phy_attach_direct() failed.
>
> With our previous patch, phy_connect_direct() and phy_attach_direct()
> take a new refcount on the phy device when successful, so we can drop
> our local reference immediatley after these functions, whether or not
> they succeeded.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>  drivers/of/of_mdio.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index 1350fa25cdb0..a87a868fed64 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -197,7 +197,8 @@ static int of_phy_match(struct device *dev, void *phy_np)
>   * of_phy_find_device - Give a PHY node, find the phy_device
>   * @phy_np: Pointer to the phy's device tree node
>   *
> - * Returns a pointer to the phy_device.
> + * If successful, returns a pointer to the phy_device with the embedded
> + * struct device refcount incremented by one, or NULL on failure.
>   */
>  struct phy_device *of_phy_find_device(struct device_node *phy_np)
>  {
> @@ -217,7 +218,9 @@ EXPORT_SYMBOL(of_phy_find_device);
>   * @hndlr: Link state callback for the network device
>   * @iface: PHY data interface type
>   *
> - * Returns a pointer to the phy_device if successful.  NULL otherwise
> + * If successful, returns a pointer to the phy_device with the embedded
> + * struct device refcount incremented by one, or NULL on failure. The
> + * refcount must be dropped by calling phy_disconnect() or phy_detach().
>   */
>  struct phy_device *of_phy_connect(struct net_device *dev,
>                                   struct device_node *phy_np,
> @@ -225,13 +228,19 @@ struct phy_device *of_phy_connect(struct net_device *dev,
>                                   phy_interface_t iface)
>  {
>         struct phy_device *phy = of_phy_find_device(phy_np);
> +       int ret;
>
>         if (!phy)
>                 return NULL;
>
>         phy->dev_flags = flags;
>
> -       return phy_connect_direct(dev, phy, hndlr, iface) ? NULL : phy;
> +       ret = phy_connect_direct(dev, phy, hndlr, iface);
> +
> +       /* refcount is held by phy_connect_direct() on success */
> +       put_device(&phy->dev);
> +
> +       return ret ? NULL : phy;
>  }
>  EXPORT_SYMBOL(of_phy_connect);
>
> @@ -241,17 +250,27 @@ EXPORT_SYMBOL(of_phy_connect);
>   * @phy_np: Node pointer for the PHY
>   * @flags: flags to pass to the PHY
>   * @iface: PHY data interface type
> + *
> + * If successful, returns a pointer to the phy_device with the embedded
> + * struct device refcount incremented by one, or NULL on failure. The
> + * refcount must be dropped by calling phy_disconnect() or phy_detach().
>   */
>  struct phy_device *of_phy_attach(struct net_device *dev,
>                                  struct device_node *phy_np, u32 flags,
>                                  phy_interface_t iface)
>  {
>         struct phy_device *phy = of_phy_find_device(phy_np);
> +       int ret;
>
>         if (!phy)
>                 return NULL;
>
> -       return phy_attach_direct(dev, phy, flags, iface) ? NULL : phy;
> +       ret = phy_attach_direct(dev, phy, flags, iface);
> +
> +       /* refcount is held by phy_attach_direct() on success */
> +       put_device(&phy->dev);
> +
> +       return ret ? NULL : phy;
>  }
>  EXPORT_SYMBOL(of_phy_attach);
>
> --
> 2.1.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
       [not found]     ` <20150924.151554.619662567057050978.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2015-09-24 22:26       ` Andrew Lunn
       [not found]         ` <20150924222654.GG20825-g2DYL2Zd6BY@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Lunn @ 2015-09-24 22:26 UTC (permalink / raw)
  To: David Miller
  Cc: linux-lFZ/pmaqli7XmaaqVzeoHQ, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	sgoutham-YGCgFSpz5w/QT0dZR+AlfA, rric-DgEjT+Ai2ygdnm+yROfE0A,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	michal.simek-gjFFaj9aHVfQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA, isubramanian-qTEPVZfXA3Y,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, leoli-KZfg59tc24xl57MIdRCFDg,
	kchudgar-qTEPVZfXA3Y,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Sep 24, 2015 at 03:15:54PM -0700, David Miller wrote:
> From: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
> Date: Thu, 24 Sep 2015 23:57:31 +0200
> 
> > I built the FEC driver as a module, and it won't unload:
> > 
> >  kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> > unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> > 
> > i assume because DSA holds a reference. I've not tried a fully module
> > build, DSA has issues with that.
> > 
> > Tested-by: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
> 
> So, is this a regression?

Sorry, worded that badly. Since DSA is still active, it should not be
possible to unload the FEC driver. DSA should have a reference to it,
and mdio-mux also should have a reference to the mdio bus of the FEC
driver.

As Russell requested, i will re-test without his patches, just to make
sure.

	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
       [not found]     ` <20150924221541.GF21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
@ 2015-09-24 22:50       ` Andrew Lunn
  2015-09-24 23:33         ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Lunn @ 2015-09-24 22:50 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Florian Fainelli, David Miller, Thomas Petazzoni,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Sunil Goutham, Robert Richter,
	Frank Rowand, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Rob Herring, Michal Simek,
	netdev-u79uwXL29TY76Z2rM5mHXA, Soren Brinkmann,
	Iyappan Subramanian, Grant Likely, Li Yang, Keyur Chudgar,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> Thanks for testing.  Please could you confirm whether the same behaviour
> is observed without the patches, just to make absolutely sure that isn't
> a regression.

So i tested this now.

I have two FEC interfaces. One i my main access interface, and the
second is used by DSA to access switches. With your patches, the
module Used by count is equal to the number of interfaces which are
up.

Without your patches, the count is always 0.

When i try to remove the fec module, without your patches, but DSA
still using the interface, i get the same

 kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1

as with your patch. So this is not a regression.

   Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
       [not found]         ` <20150924222654.GG20825-g2DYL2Zd6BY@public.gmane.org>
@ 2015-09-24 22:51           ` David Miller
  2015-09-24 23:12             ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2015-09-24 22:51 UTC (permalink / raw)
  To: andrew-g2DYL2Zd6BY
  Cc: linux-lFZ/pmaqli7XmaaqVzeoHQ, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	sgoutham-YGCgFSpz5w/QT0dZR+AlfA, rric-DgEjT+Ai2ygdnm+yROfE0A,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	michal.simek-gjFFaj9aHVfQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA, isubramanian-qTEPVZfXA3Y,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, leoli-KZfg59tc24xl57MIdRCFDg,
	kchudgar-qTEPVZfXA3Y,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
Date: Fri, 25 Sep 2015 00:26:54 +0200

> On Thu, Sep 24, 2015 at 03:15:54PM -0700, David Miller wrote:
>> From: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
>> Date: Thu, 24 Sep 2015 23:57:31 +0200
>> 
>> > I built the FEC driver as a module, and it won't unload:
>> > 
>> >  kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
>> > unregister_netdevice: waiting for eth1 to become free. Usage count = 1
>> > 
>> > i assume because DSA holds a reference. I've not tried a fully module
>> > build, DSA has issues with that.
>> > 
>> > Tested-by: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
>> 
>> So, is this a regression?
> 
> Sorry, worded that badly. Since DSA is still active, it should not be
> possible to unload the FEC driver. DSA should have a reference to it,
> and mdio-mux also should have a reference to the mdio bus of the FEC
> driver.
> 
> As Russell requested, i will re-test without his patches, just to make
> sure.

Something needs to hold into the underlying device refcount of a DSA
blob so that an unload can't even be attempted in that state.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 22:51           ` David Miller
@ 2015-09-24 23:12             ` Russell King - ARM Linux
  0 siblings, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2015-09-24 23:12 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, f.fainelli, thomas.petazzoni, devicetree, sgoutham, rric,
	frowand.list, linuxppc-dev, linux-kernel, robh+dt, michal.simek,
	netdev, soren.brinkmann, isubramanian, grant.likely, leoli,
	kchudgar, linux-arm-kernel

On Thu, Sep 24, 2015 at 03:51:37PM -0700, David Miller wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Fri, 25 Sep 2015 00:26:54 +0200
> 
> > On Thu, Sep 24, 2015 at 03:15:54PM -0700, David Miller wrote:
> >> From: Andrew Lunn <andrew@lunn.ch>
> >> Date: Thu, 24 Sep 2015 23:57:31 +0200
> >> 
> >> > I built the FEC driver as a module, and it won't unload:
> >> > 
> >> >  kernel:unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> >> > unregister_netdevice: waiting for eth1 to become free. Usage count = 1
> >> > 
> >> > i assume because DSA holds a reference. I've not tried a fully module
> >> > build, DSA has issues with that.
> >> > 
> >> > Tested-by: Andrew Lunn <andrew@lunn.ch>
> >> 
> >> So, is this a regression?
> > 
> > Sorry, worded that badly. Since DSA is still active, it should not be
> > possible to unload the FEC driver. DSA should have a reference to it,
> > and mdio-mux also should have a reference to the mdio bus of the FEC
> > driver.
> > 
> > As Russell requested, i will re-test without his patches, just to make
> > sure.
> 
> Something needs to hold into the underlying device refcount of a DSA
> blob so that an unload can't even be attempted in that state.

Holding a reference on a struct device does _not_ stop the device
being unbound or the module driving it being removed.  It merely
stops the struct device from being freed before all references have
been dropped.

Devices are always free to be unbound from their bound drivers
irrespective of the struct device refcount.  Even taking a reference
on the module (via try_module_get()) does not stop this.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 22:50       ` Andrew Lunn
@ 2015-09-24 23:33         ` Russell King - ARM Linux
  0 siblings, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2015-09-24 23:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Thomas Petazzoni, devicetree,
	Sunil Goutham, Robert Richter, Frank Rowand, linuxppc-dev,
	linux-kernel, Rob Herring, Michal Simek, netdev, Soren Brinkmann,
	Iyappan Subramanian, Grant Likely, Li Yang, Keyur Chudgar,
	linux-arm-kernel

On Fri, Sep 25, 2015 at 12:50:33AM +0200, Andrew Lunn wrote:
> > Thanks for testing.  Please could you confirm whether the same behaviour
> > is observed without the patches, just to make absolutely sure that isn't
> > a regression.
> 
> So i tested this now.
> 
> I have two FEC interfaces. One i my main access interface, and the
> second is used by DSA to access switches. With your patches, the
> module Used by count is equal to the number of interfaces which are
> up.
> 
> Without your patches, the count is always 0.

That will be as a result of the MDIO bus module refcounting patch -
"phy: fix mdiobus module safety".  The code prior to that patch was
totally useless and ineffectual - it might as well not even have
been present, because it would never have any effect.  bus_module
would always be NULL in phy_attach_direct().

While my change makes the code start to work as originally intended,
it's still unsafe: there's nothing to stop you manually unbinding the
driver providing the MDIO bus from the struct device.  The driver
will then free the resources it claimed in its probe function, which
may include the MMIO mapping for the MDIO bus accessor functions.

If the accessors are then called, despite keeping the mdio bus, phy,
etc data structures properly refcounted, the kernel will oops when
the (many) MDIO bus drivers hit the free'd MMIO mapping.  This is,
unfortunately, just another pre-existing bug in this code.

To stop that, we need some way to say "this MDIO bus has been removed,
prevent further access" and that needs to be done in a race free way.
Right now, that doesn't exist.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
  2015-09-24 19:17 [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Russell King - ARM Linux
  2015-09-24 19:18 ` [PATCH 5/9] of_mdio: fix MDIO phy device refcounting Russell King
  2015-09-24 21:57 ` [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Andrew Lunn
@ 2015-09-25  1:39 ` Florian Fainelli
       [not found]   ` <5604A5EC.7060401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 13+ messages in thread
From: Florian Fainelli @ 2015-09-25  1:39 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller
  Cc: Thomas Petazzoni, devicetree, Sunil Goutham, Robert Richter,
	Frank Rowand, linuxppc-dev, linux-kernel, Rob Herring,
	Michal Simek, netdev, Soren Brinkmann, Iyappan Subramanian,
	Grant Likely, Li Yang, Keyur Chudgar, linux-arm-kernel

On 24/09/15 12:17, Russell King - ARM Linux wrote:
> Hi,
> 
> The third version of this series fixes the build error which David
> identified, and drops the broken changes for the Cavium Thunger BGX
> ethernet driver as this driver requires some complex changes to
> resolve the leakage - and this is best done by people who can test
> the driver.
> 
> Compared to v2, the only patch which has changed is patch 6
>   "net: fix phy refcounting in a bunch of drivers"
> 
> I _think_ I've been able to build-test all the drivers touched by
> that patch to some degree now, though several of them needed the
> Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
> their dependencies.)

Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks for fixing that.

> 
> Previous cover letters below:
> 
> This is the second version of the series, with the comments David had
> on the first patch fixed up.  Original series description with updated
> diffstat below.
> 
> While looking at the DSA code, I noticed we have a
> of_find_net_device_by_node(), and it looks like users of that are
> similarly buggy - it looks like net/dsa/dsa.c is the only user.  Fix
> that too.
> 
> Hi,
> 
> While looking at the phy code, I identified a number of weaknesses
> where refcounting on device structures was being leaked, where
> modules could be removed while in-use, and where the fixed-phy could
> end up having unintended consequences caused by incorrect calls to
> fixed_phy_update_state().
> 
> This patch series resolves those issues, some of which were discovered
> with testing on an Armada 388 board.  Not all patches are fully tested,
> particularly the one which touches several network drivers.
> 
> When resolving the struct device refcounting problems, several different
> solutions were considered before settling on the implementation here -
> one of the considerations was to avoid touching many network drivers.
> The solution here is:
> 
> 	phy_attach*() - takes a refcount
> 	phy_detach*() - drops the phy_attach refcount
> 
> Provided drivers always attach and detach their phys, which they should
> already be doing, this should change nothing, even if they leak a refcount.
> 
> 	of_phy_find_device() and of_* functions which use that take
> 	a refcount.  Arrange for this refcount to be dropped once
> 	the phy is attached.
> 
> This is the reason why the previous change is important - we can't drop
> this refcount taken by of_phy_find_device() until something else holds
> a reference on the device.  This resolves the leaked refcount caused by
> using of_phy_connect() or of_phy_attach().
> 
> Even without the above changes, these drivers are leaking by calling
> of_phy_find_device().  These drivers are addressed by adding the
> appropriate release of that refcount.
> 
> The mdiobus code also suffered from the same kind of leak, but thankfully
> this only happened in one place - the mdio-mux code.
> 
> I also found that the try_module_get() in the phy layer code was utterly
> useless: phydev->dev.driver was guaranteed to always be NULL, so
> try_module_get() was always being called with a NULL argument.  I proved
> this with my SFP code, which declares its own MDIO bus - the module use
> count was never incremented irrespective of how I set the MDIO bus up.
> This allowed the MDIO bus code to be removed from the kernel while there
> were still PHYs attached to it.
> 
> One other bug was discovered: while using in-band-status with mvneta, it
> was found that if a real phy is attached with in-band-status enabled,
> and another ethernet interface is using the fixed-phy infrastructure, the
> interface using the fixed-phy infrastructure is configured according to
> the other interface using the in-band-status - which is caused by the
> fixed-phy code not verifying that the phy_device passed in is actually
> a fixed-phy device, rather than a real MDIO phy.
> 
> Lastly, having mdio_bus reversing phy_device_register() internals seems
> like a layering violation - it's trivial to move that code to the phy
> device layer.
> 
>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 24 ++++++----
>  drivers/net/ethernet/freescale/gianfar.c       |  6 ++-
>  drivers/net/ethernet/freescale/ucc_geth.c      |  8 +++-
>  drivers/net/ethernet/marvell/mvneta.c          |  2 +
>  drivers/net/ethernet/xilinx/xilinx_emaclite.c  |  2 +
>  drivers/net/phy/fixed_phy.c                    |  2 +-
>  drivers/net/phy/mdio-mux.c                     | 19 +++++---
>  drivers/net/phy/mdio_bus.c                     | 24 ++++++----
>  drivers/net/phy/phy_device.c                   | 62 ++++++++++++++++++++------
>  drivers/of/of_mdio.c                           | 27 +++++++++--
>  include/linux/phy.h                            |  6 ++-
>  net/core/net-sysfs.c                           |  9 ++++
>  net/dsa/dsa.c                                  | 41 ++++++++++++++---
>  13 files changed, 181 insertions(+), 51 deletions(-)
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes
       [not found]   ` <5604A5EC.7060401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-25  6:05     ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2015-09-25  6:05 UTC (permalink / raw)
  To: f.fainelli-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-lFZ/pmaqli7XmaaqVzeoHQ,
	thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	sgoutham-YGCgFSpz5w/QT0dZR+AlfA, rric-DgEjT+Ai2ygdnm+yROfE0A,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	michal.simek-gjFFaj9aHVfQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA, isubramanian-qTEPVZfXA3Y,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A, leoli-KZfg59tc24xl57MIdRCFDg,
	kchudgar-qTEPVZfXA3Y,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Thu, 24 Sep 2015 18:39:56 -0700

> On 24/09/15 12:17, Russell King - ARM Linux wrote:
>> Hi,
>> 
>> The third version of this series fixes the build error which David
>> identified, and drops the broken changes for the Cavium Thunger BGX
>> ethernet driver as this driver requires some complex changes to
>> resolve the leakage - and this is best done by people who can test
>> the driver.
>> 
>> Compared to v2, the only patch which has changed is patch 6
>>   "net: fix phy refcounting in a bunch of drivers"
>> 
>> I _think_ I've been able to build-test all the drivers touched by
>> that patch to some degree now, though several of them needed the
>> Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
>> their dependencies.)
> 
> Tested-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Thanks for fixing that.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-09-25  6:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-24 19:17 [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Russell King - ARM Linux
2015-09-24 19:18 ` [PATCH 5/9] of_mdio: fix MDIO phy device refcounting Russell King
2015-09-24 22:20   ` Rob Herring
2015-09-24 21:57 ` [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes Andrew Lunn
2015-09-24 22:15   ` Russell King - ARM Linux
     [not found]     ` <20150924221541.GF21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-24 22:50       ` Andrew Lunn
2015-09-24 23:33         ` Russell King - ARM Linux
2015-09-24 22:15   ` David Miller
     [not found]     ` <20150924.151554.619662567057050978.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-09-24 22:26       ` Andrew Lunn
     [not found]         ` <20150924222654.GG20825-g2DYL2Zd6BY@public.gmane.org>
2015-09-24 22:51           ` David Miller
2015-09-24 23:12             ` Russell King - ARM Linux
2015-09-25  1:39 ` Florian Fainelli
     [not found]   ` <5604A5EC.7060401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-25  6:05     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).