Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 1/3] arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpoints
From: Claudiu Manoil @ 2019-02-15 10:10 UTC (permalink / raw)
  To: Shawn Guo, Li Yang, David S . Miller
  Cc: alexandru.marginean, linux-arm-kernel, devicetree, netdev,
	linux-kernel
In-Reply-To: <1550225414-12125-1-git-send-email-claudiu.manoil@nxp.com>

The LS1028A SoC features a PCI Integrated Endpoint Root Complex
(IERC) defining several integrated PCI devices, including the ENETC
ethernet controller integrated endpoints (IEPs). The IERC implements
ECAM (Enhanced Configuration Access Mechanism) to provide access
to the PCIe config space of the IEPs. This means the the IEPs
(including ENETC) do not support the standard PCIe BARs, instead
the Enhanced Allocation (EA) capability structures in the ECAM space
are used to fix the base addresses in the system, and the PCI
subsystem uses these structures for device enumeration and discovery.
The "ranges" entries contain basic information from these EA capabily
structures required by the kernel for device enumeration.

The current patch also enables the first 2 ENETC PFs (Physiscal
Functions) and the associated VFs (Virtual Functions), 2 VFs for
each PF.  Each of these ENETC PFs has an external ethernet port
on the LS1028A SoC.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
v2 - none

 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi | 33 ++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index a8cf92a..7f5a8e6 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -335,5 +335,38 @@
 				     <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 207 IRQ_TYPE_LEVEL_HIGH>,
 				     <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
 		};
+
+		pcie@1f0000000 { /* Integrated Endpoint Root Complex */
+			compatible = "pci-host-ecam-generic";
+			reg = <0x01 0xf0000000 0x0 0x100000>;
+			#address-cells = <3>;
+			#size-cells = <2>;
+			#interrupt-cells = <1>;
+			msi-parent = <&its>;
+			device_type = "pci";
+			bus-range = <0x0 0x0>;
+			dma-coherent;
+			msi-map = <0 &its 0x17 0xe>;
+			iommu-map = <0 &smmu 0x17 0xe>;
+				  /* PF0-6 BAR0 - non-prefetchable memory */
+			ranges = <0x82000000 0x0 0x00000000  0x1 0xf8000000  0x0 0x160000
+				  /* PF0-6 BAR2 - prefetchable memory */
+				  0xc2000000 0x0 0x00000000  0x1 0xf8160000  0x0 0x070000
+				  /* PF0: VF0-1 BAR0 - non-prefetchable memory */
+				  0x82000000 0x0 0x00000000  0x1 0xf81d0000  0x0 0x020000
+				  /* PF0: VF0-1 BAR2 - prefetchable memory */
+				  0xc2000000 0x0 0x00000000  0x1 0xf81f0000  0x0 0x020000
+				  /* PF1: VF0-1 BAR0 - non-prefetchable memory */
+				  0x82000000 0x0 0x00000000  0x1 0xf8210000  0x0 0x020000
+				  /* PF1: VF0-1 BAR2 - prefetchable memory */
+				  0xc2000000 0x0 0x00000000  0x1 0xf8230000  0x0 0x020000>;
+
+			enetc_port0: pci@0,0 {
+				reg = <0x000000 0 0 0 0>;
+			};
+			enetc_port1: pci@0,1 {
+				reg = <0x000100 0 0 0 0>;
+			};
+		};
 	};
 };
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v2 2/3] arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A RDB board
From: Claudiu Manoil @ 2019-02-15 10:10 UTC (permalink / raw)
  To: Shawn Guo, Li Yang, David S . Miller
  Cc: alexandru.marginean, linux-arm-kernel, devicetree, netdev,
	linux-kernel
In-Reply-To: <1550225414-12125-1-git-send-email-claudiu.manoil@nxp.com>

The LS1028A RDB board features an Atheros PHY connected over
SGMII to the ENETC PF0 (or Port0).  ENETC Port1 (PF1) has no
external connection on this board, so it can be disabled for now.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
v2 - added a mdio node as parent for the phy node

 arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
index fdeb417..f86b054 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
@@ -71,3 +71,20 @@
 &duart1 {
 	status = "okay";
 };
+
+&enetc_port0 {
+	phy-handle = <&sgmii_phy0>;
+	phy-connection-type = "sgmii";
+
+	mdio {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		sgmii_phy0: ethernet-phy@2 {
+			reg = <0x2>;
+		};
+	};
+};
+
+&enetc_port1 {
+	status = "disabled";
+};
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v2 3/3] enetc: Add ENETC PF level external MDIO support
From: Claudiu Manoil @ 2019-02-15 10:10 UTC (permalink / raw)
  To: Shawn Guo, Li Yang, David S . Miller
  Cc: alexandru.marginean, linux-arm-kernel, devicetree, netdev,
	linux-kernel
In-Reply-To: <1550225414-12125-1-git-send-email-claudiu.manoil@nxp.com>

Each ENETC PF has its own MDIO interface, the corresponding
MDIO registers are mapped in the ENETC's Port register block.
The current patch adds a driver for these PF level MDIO buses,
so that each PF can manage directly its own external link.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
v2 - used readx_poll_timeout()
   - added mdio node child to the port node
   - added code comments, removed redundant err messages,
     minor code cleanup

 drivers/net/ethernet/freescale/enetc/Makefile     |   3 +-
 drivers/net/ethernet/freescale/enetc/enetc_mdio.c | 199 ++++++++++++++++++++++
 drivers/net/ethernet/freescale/enetc/enetc_pf.c   |  12 ++
 drivers/net/ethernet/freescale/enetc/enetc_pf.h   |   6 +
 4 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc_mdio.c

diff --git a/drivers/net/ethernet/freescale/enetc/Makefile b/drivers/net/ethernet/freescale/enetc/Makefile
index 6976602..7139e41 100644
--- a/drivers/net/ethernet/freescale/enetc/Makefile
+++ b/drivers/net/ethernet/freescale/enetc/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_FSL_ENETC) += fsl-enetc.o
-fsl-enetc-$(CONFIG_FSL_ENETC) += enetc.o enetc_cbdr.o enetc_ethtool.o
+fsl-enetc-$(CONFIG_FSL_ENETC) += enetc.o enetc_cbdr.o enetc_ethtool.o \
+				 enetc_mdio.o
 fsl-enetc-$(CONFIG_PCI_IOV) += enetc_msg.o
 fsl-enetc-objs := enetc_pf.o $(fsl-enetc-y)
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_mdio.c b/drivers/net/ethernet/freescale/enetc/enetc_mdio.c
new file mode 100644
index 0000000..f715487
--- /dev/null
+++ b/drivers/net/ethernet/freescale/enetc/enetc_mdio.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/* Copyright 2019 NXP */
+
+#include <linux/mdio.h>
+#include <linux/of_mdio.h>
+#include <linux/iopoll.h>
+#include <linux/of.h>
+
+#include "enetc_pf.h"
+
+struct enetc_mdio_regs {
+	u32	mdio_cfg;	/* MDIO configuration and status */
+	u32	mdio_ctl;	/* MDIO control */
+	u32	mdio_data;	/* MDIO data */
+	u32	mdio_addr;	/* MDIO address */
+};
+
+#define bus_to_enetc_regs(bus)	(struct enetc_mdio_regs __iomem *)((bus)->priv)
+
+#define ENETC_MDIO_REG_OFFSET	0x1c00
+#define ENETC_MDC_DIV		258
+
+#define MDIO_CFG_CLKDIV(x)	((((x) >> 1) & 0xff) << 8)
+#define MDIO_CFG_BSY		BIT(0)
+#define MDIO_CFG_RD_ER		BIT(1)
+#define MDIO_CFG_ENC45		BIT(6)
+ /* external MDIO only - driven on neg MDC edge */
+#define MDIO_CFG_NEG		BIT(23)
+
+#define MDIO_CTL_DEV_ADDR(x)	((x) & 0x1f)
+#define MDIO_CTL_PORT_ADDR(x)	(((x) & 0x1f) << 5)
+#define MDIO_CTL_READ		BIT(15)
+#define MDIO_DATA(x)		((x) & 0xffff)
+
+#define TIMEOUT	1000
+static int enetc_mdio_wait_complete(struct enetc_mdio_regs __iomem *regs)
+{
+	u32 val;
+
+	return readx_poll_timeout(enetc_rd_reg, &regs->mdio_cfg, val,
+				  !(val & MDIO_CFG_BSY), 10, 10 * TIMEOUT);
+}
+
+static int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum,
+			    u16 value)
+{
+	struct enetc_mdio_regs __iomem *regs = bus_to_enetc_regs(bus);
+	u32 mdio_ctl, mdio_cfg;
+	u16 dev_addr;
+	int ret;
+
+	mdio_cfg = MDIO_CFG_CLKDIV(ENETC_MDC_DIV) | MDIO_CFG_NEG;
+	if (regnum & MII_ADDR_C45) {
+		dev_addr = (regnum >> 16) & 0x1f;
+		mdio_cfg |= MDIO_CFG_ENC45;
+	} else {
+		/* clause 22 (ie 1G) */
+		dev_addr = regnum & 0x1f;
+		mdio_cfg &= ~MDIO_CFG_ENC45;
+	}
+
+	enetc_wr_reg(&regs->mdio_cfg, mdio_cfg);
+
+	ret = enetc_mdio_wait_complete(regs);
+	if (ret)
+		return ret;
+
+	/* set port and dev addr */
+	mdio_ctl = MDIO_CTL_PORT_ADDR(phy_id) | MDIO_CTL_DEV_ADDR(dev_addr);
+	enetc_wr_reg(&regs->mdio_ctl, mdio_ctl);
+
+	/* set the register address */
+	if (regnum & MII_ADDR_C45) {
+		enetc_wr_reg(&regs->mdio_addr, regnum & 0xffff);
+
+		ret = enetc_mdio_wait_complete(regs);
+		if (ret)
+			return ret;
+	}
+
+	/* write the value */
+	enetc_wr_reg(&regs->mdio_data, MDIO_DATA(value));
+
+	ret = enetc_mdio_wait_complete(regs);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum)
+{
+	struct enetc_mdio_regs __iomem *regs = bus_to_enetc_regs(bus);
+	u32 mdio_ctl, mdio_cfg;
+	u16 dev_addr, value;
+	int ret;
+
+	mdio_cfg = MDIO_CFG_CLKDIV(ENETC_MDC_DIV) | MDIO_CFG_NEG;
+	if (regnum & MII_ADDR_C45) {
+		dev_addr = (regnum >> 16) & 0x1f;
+		mdio_cfg |= MDIO_CFG_ENC45;
+	} else {
+		dev_addr = regnum & 0x1f;
+		mdio_cfg &= ~MDIO_CFG_ENC45;
+	}
+
+	enetc_wr_reg(&regs->mdio_cfg, mdio_cfg);
+
+	ret = enetc_mdio_wait_complete(regs);
+	if (ret)
+		return ret;
+
+	/* set port and device addr */
+	mdio_ctl = MDIO_CTL_PORT_ADDR(phy_id) | MDIO_CTL_DEV_ADDR(dev_addr);
+	enetc_wr_reg(&regs->mdio_ctl, mdio_ctl);
+
+	/* set the register address */
+	if (regnum & MII_ADDR_C45) {
+		enetc_wr_reg(&regs->mdio_addr, regnum & 0xffff);
+
+		ret = enetc_mdio_wait_complete(regs);
+		if (ret)
+			return ret;
+	}
+
+	/* initiate the read */
+	enetc_wr_reg(&regs->mdio_ctl, mdio_ctl | MDIO_CTL_READ);
+
+	ret = enetc_mdio_wait_complete(regs);
+	if (ret)
+		return ret;
+
+	/* return all Fs if nothing was there */
+	if (enetc_rd_reg(&regs->mdio_cfg) & MDIO_CFG_RD_ER) {
+		dev_err(&bus->dev,
+			"Error while reading PHY%d reg at %d.%hhu\n",
+			phy_id, dev_addr, regnum);
+		return 0xffff;
+	}
+
+	value = enetc_rd_reg(&regs->mdio_data) & 0xffff;
+
+	return value;
+}
+
+int enetc_mdio_probe(struct enetc_pf *pf)
+{
+	struct device *dev = &pf->si->pdev->dev;
+	struct enetc_mdio_regs __iomem *regs;
+	struct device_node *np;
+	struct mii_bus *bus;
+	int ret;
+
+	bus = mdiobus_alloc_size(sizeof(regs));
+	if (!bus)
+		return -ENOMEM;
+
+	bus->name = "Freescale ENETC MDIO Bus";
+	bus->read = enetc_mdio_read;
+	bus->write = enetc_mdio_write;
+	bus->parent = dev;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s", dev_name(dev));
+
+	/* store the enetc mdio base address for this bus */
+	regs = pf->si->hw.port + ENETC_MDIO_REG_OFFSET;
+	bus->priv = regs;
+
+	np = of_get_child_by_name(dev->of_node, "mdio");
+	if (!np) {
+		dev_err(dev, "MDIO node missing\n");
+		ret = -EINVAL;
+		goto err_registration;
+	}
+
+	ret = of_mdiobus_register(bus, np);
+	if (ret) {
+		of_node_put(np);
+		dev_err(dev, "cannot register MDIO bus\n");
+		goto err_registration;
+	}
+
+	of_node_put(np);
+	pf->mdio = bus;
+
+	return 0;
+
+err_registration:
+	mdiobus_free(bus);
+
+	return ret;
+}
+
+void enetc_mdio_remove(struct enetc_pf *pf)
+{
+	if (pf->mdio) {
+		mdiobus_unregister(pf->mdio);
+		mdiobus_free(pf->mdio);
+	}
+}
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 7d28f5e..15876a6 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -746,6 +746,7 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 
 static int enetc_of_get_phy(struct enetc_ndev_priv *priv)
 {
+	struct enetc_pf *pf = enetc_si_priv(priv->si);
 	struct device_node *np = priv->dev->of_node;
 	int err;
 
@@ -770,12 +771,22 @@ static int enetc_of_get_phy(struct enetc_ndev_priv *priv)
 		priv->phy_node = of_node_get(np);
 	}
 
+	if (!of_phy_is_fixed_link(np)) {
+		err = enetc_mdio_probe(pf);
+		if (err) {
+			of_node_put(priv->phy_node);
+			return err;
+		}
+	}
+
 	priv->if_mode = of_get_phy_mode(np);
 	if (priv->if_mode < 0) {
 		dev_err(priv->dev, "missing phy type\n");
 		of_node_put(priv->phy_node);
 		if (of_phy_is_fixed_link(np))
 			of_phy_deregister_fixed_link(np);
+		else
+			enetc_mdio_remove(pf);
 
 		return -EINVAL;
 	}
@@ -898,6 +909,7 @@ static void enetc_pf_remove(struct pci_dev *pdev)
 
 	unregister_netdev(si->ndev);
 
+	enetc_mdio_remove(pf);
 	enetc_of_put_phy(priv);
 
 	enetc_free_msix(priv);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.h b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
index 2061ae5..10dd1b5 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
@@ -42,8 +42,14 @@ struct enetc_pf {
 	char vlan_promisc_simap; /* bitmap of SIs in VLAN promisc mode */
 	DECLARE_BITMAP(vlan_ht_filter, ENETC_VLAN_HT_SIZE);
 	DECLARE_BITMAP(active_vlans, VLAN_N_VID);
+
+	struct mii_bus *mdio; /* saved for cleanup */
 };
 
 int enetc_msg_psi_init(struct enetc_pf *pf);
 void enetc_msg_psi_free(struct enetc_pf *pf);
 void enetc_msg_handle_rxmsg(struct enetc_pf *pf, int mbox_id, u16 *status);
+
+/* MDIO */
+int enetc_mdio_probe(struct enetc_pf *pf);
+void enetc_mdio_remove(struct enetc_pf *pf);
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v2 0/3] enetc: Add mdio support and device tree nodes
From: Claudiu Manoil @ 2019-02-15 10:10 UTC (permalink / raw)
  To: Shawn Guo, Li Yang, David S . Miller
  Cc: alexandru.marginean, linux-arm-kernel, devicetree, netdev,
	linux-kernel

This is the missing part to enable PCI probing of the ENETC ethernet
ports on the LS1028A SoC and external traffic on the LS1028A RDB board.
It's one of the first items on the TODO list for the recently merged
ENETC ethernet driver.

Claudiu Manoil (3):
  arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpoints
  arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the
    LS1028A RDB board
  enetc: Add ENETC PF level external MDIO support

 arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts |  17 ++
 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi    |  33 ++++
 drivers/net/ethernet/freescale/enetc/Makefile     |   3 +-
 drivers/net/ethernet/freescale/enetc/enetc_mdio.c | 199 ++++++++++++++++++++++
 drivers/net/ethernet/freescale/enetc/enetc_pf.c   |  12 ++
 drivers/net/ethernet/freescale/enetc/enetc_pf.h   |   6 +
 6 files changed, 269 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc_mdio.c

-- 
2.7.4


^ permalink raw reply

* Re: [PATCH] net: phy: at803x: disable delay only for RGMII mode
From: Vinod Koul @ 2019-02-15 10:16 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: David S Miller, linux-arm-msm, Bjorn Andersson, netdev,
	Niklas Cassel, Andrew Lunn, Florian Fainelli, Nori, Sekhar,
	Marc Gonzalez
In-Reply-To: <89c39811-b100-564a-da40-abf46e450c95@ti.com>

On 13-02-19, 09:02, Peter Ujfalusi wrote:
> Hi Vinod,
> 
> On 12/02/2019 16.19, Vinod Koul wrote:
> > Per "Documentation/devicetree/bindings/net/ethernet.txt" RGMII mode
> > should not have delay in PHY whereas RGMII_ID and RGMII_RXID/RGMII_TXID
> > can have delay in phy.
> > 
> > So disable the delay only for RGMII mode and disable for other modes
> s/disable for other modes/enable for other modes

oops

> 
> Works fine on am335x-evmsk:
> Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

Thanks for quick testing..

> 
> and few comment
> 
> > Fixes: cd28d1d6e52e: ("net: phy: at803x: Disable phy delay for RGMII mode")
> > Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> > Signed-off-by: Vinod Koul <vkoul@kernel.org>
> > ---
> >  drivers/net/phy/at803x.c | 33 ++++++++++++++++++++++++++-------
> >  1 file changed, 26 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
> > index 8ff12938ab47..7b54b54e3316 100644
> > --- a/drivers/net/phy/at803x.c
> > +++ b/drivers/net/phy/at803x.c
> > @@ -110,6 +110,18 @@ static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
> >  	return phy_write(phydev, AT803X_DEBUG_DATA, val);
> >  }
> >  
> > +static inline int at803x_enable_rx_delay(struct phy_device *phydev)
> > +{
> > +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0, 0,
> > +				     AT803X_DEBUG_RX_CLK_DLY_EN);
> > +}
> 
> static inline int at803x_select_rx_delay(struct phy_device *phydev,
> 					 bool enable)
> {
> }
> 
> > +
> > +static inline int at803x_enable_tx_delay(struct phy_device *phydev)
> > +{
> > +	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_5, 0,
> > +				     AT803X_DEBUG_TX_CLK_DLY_EN);
> > +}
> 
> static inline int at803x_select_tx_delay(struct phy_device *phydev,
> 					 bool enable)
> {
> }
> 
> perhaps?

Given that we would again branch off on enable, I see this as no better
case so I will keep existing

> 
> > +
> >  static inline int at803x_disable_rx_delay(struct phy_device *phydev)
> >  {
> >  	return at803x_debug_reg_mask(phydev, AT803X_DEBUG_REG_0,
> > @@ -255,18 +267,25 @@ static int at803x_config_init(struct phy_device *phydev)
> >  	if (ret < 0)
> >  		return ret;
> >  
> > -	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID ||
> > -			phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
> > -			phydev->interface == PHY_INTERFACE_MODE_RGMII) {
> > +	if (phydev->interface == PHY_INTERFACE_MODE_RGMII) {
> >  		ret = at803x_disable_rx_delay(phydev);
> >  		if (ret < 0)
> >  			return ret;
> > +		ret = at803x_disable_tx_delay(phydev);
> > +		if (ret < 0)
> > +			return ret;
> 
> is it a possibility to have PHY_INTERFACE_MODE_RGMII and other RGMII_ID
> || RGMII_TXID || RGMII_RXID set at the same time?

Nope that would not make sense

> if not you can just return from here, no need to check for other RGMII
> modes?

for RGMII yes indeed..

-- 
~Vinod

^ permalink raw reply

* ip xfrm policy, dir out vs dir fwd
From: Oleg @ 2019-02-15 10:11 UTC (permalink / raw)
  To: netdev

  Hi, all.

I don't understand why i need to create dir out policy for transit
ipsec traffic?

For example(conf from 192.168.77.1; it acts as a gateway between world and
private network behind 192.168.77.35):

ip xfrm policy add src 192.168.77.35 dst 0.0.0.0/0 dir fwd tmpl src 192.168.77.35 dst 192.168.77.1 proto esp reqid 1 mode tunnel
ip xfrm policy add src 0.0.0.0/0 dst 192.168.77.35 dir fwd tmpl src 192.168.77.1 dst 192.168.77.35 proto esp reqid 2 mode tunnel

doesn't work. But:

ip xfrm policy add src 192.168.77.35 dst 0.0.0.0/0 dir fwd tmpl src 192.168.77.35 dst 192.168.77.1 proto esp reqid 1 mode tunnel
ip xfrm policy add src 0.0.0.0/0 dst 192.168.77.35 dir out tmpl src 192.168.77.1 dst 192.168.77.35 proto esp reqid 2 mode tunnel

works well.

May be anybody can help me with this?

Thanks!

-- 
Олег Неманов (Oleg Nemanov)

^ permalink raw reply

* Re: [PATCH net-next 1/3] devlink: add flash update command
From: Jiri Pirko @ 2019-02-15 10:10 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190214214046.19182-2-jakub.kicinski@netronome.com>

Thu, Feb 14, 2019 at 10:40:44PM CET, jakub.kicinski@netronome.com wrote:
>Add devlink flash update command. Advanced NICs have firmware
>stored in flash and often cryptographically secured. Updating
>that flash is handled by management firmware. Ethtool has a
>flash update command which served us well, however, it has two
>shortcomings:
> - it takes rtnl_lock unnecessarily - really flash update has
>   nothing to do with networking, so using a networking device
>   as a handle is suboptimal, which leads us to the second one:
> - it requires a functioning netdev - in case device enters an
>   error state and can't spawn a netdev (e.g. communication
>   with the device fails) there is no netdev to use as a handle
>   for flashing.
>
>Devlink already has the ability to report the firmware versions,
>now with the ability to update the firmware/flash we will be
>able to recover devices in bad state.
>
>To enable updates of sub-components of the FW allow passing
>component name.  This name should correspond to one of the
>versions reported in devlink info.
>
>v1: - replace target id with component name (Jiri).
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next 2/3] ethtool: add compat for flash update
From: Jiri Pirko @ 2019-02-15 10:12 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190214214046.19182-3-jakub.kicinski@netronome.com>

Thu, Feb 14, 2019 at 10:40:45PM CET, jakub.kicinski@netronome.com wrote:
>If driver does not support ethtool flash update operation
>call into devlink.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* [PATCH net-next] net: sched: sch_api: set an error msg when qdisc_alloc_handle() fails
From: Ivan Vecera @ 2019-02-15 10:23 UTC (permalink / raw)
  To: netdev

This patch sets an error message in extack when the number of qdisc
handles exceeds the maximum. Also the error-code ENOSPC is more
appropriate than ENOMEM in this situation.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 net/sched/sch_api.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2283924fb56d..b8a388e4bcc4 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1201,9 +1201,11 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 	} else {
 		if (handle == 0) {
 			handle = qdisc_alloc_handle(dev);
-			err = -ENOMEM;
-			if (handle == 0)
+			if (handle == 0) {
+				NL_SET_ERR_MSG(extack, "Maximum number of qdisc handles was exceeded");
+				err = -ENOSPC;
 				goto err_out3;
+			}
 		}
 		if (!netif_is_multiqueue(dev))
 			sch->flags |= TCQ_F_ONETXQUEUE;
-- 
2.19.2


^ permalink raw reply related

* Re: [PATCH net-next 3/3] nfp: devlink: allow flashing the device via devlink
From: Jiri Pirko @ 2019-02-15 10:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190214214046.19182-4-jakub.kicinski@netronome.com>

Thu, Feb 14, 2019 at 10:40:46PM CET, jakub.kicinski@netronome.com wrote:
>Devlink now allows updating device flash.  Implement this
>callback.
>
>Compared to ethtool update we no longer have to release
>the networking locks - devlink doesn't take them.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> .../net/ethernet/netronome/nfp/nfp_devlink.c  | 10 +++++
> drivers/net/ethernet/netronome/nfp/nfp_main.c | 41 +++++++++++++++++++
> drivers/net/ethernet/netronome/nfp/nfp_main.h |  2 +
> .../ethernet/netronome/nfp/nfp_net_ethtool.c  | 35 ++--------------
> 4 files changed, 56 insertions(+), 32 deletions(-)
>
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>index 080a301f379e..db2da99f6aa7 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>@@ -330,6 +330,15 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
> 	return err;
> }
> 
>+static int
>+nfp_devlink_flash_update(struct devlink *devlink, const char *path,

devlink code calles "path" a "file_name". It is good to be consistent in
this.

Other than this, the patch looks fine:


>+			 const char *component, struct netlink_ext_ack *extack)
>+{
>+	if (component)
>+		return -EOPNOTSUPP;
>+	return nfp_flash_update_common(devlink_priv(devlink), path, extack);
>+}
>+
> const struct devlink_ops nfp_devlink_ops = {
> 	.port_split		= nfp_devlink_port_split,
> 	.port_unsplit		= nfp_devlink_port_unsplit,
>@@ -338,6 +347,7 @@ const struct devlink_ops nfp_devlink_ops = {
> 	.eswitch_mode_get	= nfp_devlink_eswitch_mode_get,
> 	.eswitch_mode_set	= nfp_devlink_eswitch_mode_set,
> 	.info_get		= nfp_devlink_info_get,
>+	.flash_update		= nfp_devlink_flash_update,
> };
> 
> int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
>index 6c10e8d119e4..f4c8776e42b6 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
>@@ -300,6 +300,47 @@ static int nfp_pcie_sriov_configure(struct pci_dev *pdev, int num_vfs)
> 		return nfp_pcie_sriov_enable(pdev, num_vfs);
> }
> 
>+int nfp_flash_update_common(struct nfp_pf *pf, const char *path,
>+			    struct netlink_ext_ack *extack)
>+{
>+	struct device *dev = &pf->pdev->dev;
>+	const struct firmware *fw;
>+	struct nfp_nsp *nsp;
>+	int err;
>+
>+	nsp = nfp_nsp_open(pf->cpp);
>+	if (IS_ERR(nsp)) {
>+		err = PTR_ERR(nsp);
>+		if (extack)
>+			NL_SET_ERR_MSG_MOD(extack, "can't access NSP");
>+		else
>+			dev_err(dev, "Failed to access the NSP: %d\n", err);
>+		return err;
>+	}
>+
>+	err = request_firmware_direct(&fw, path, dev);
>+	if (err) {
>+		NL_SET_ERR_MSG_MOD(extack,
>+				   "unable to read flash file from disk");
>+		goto exit_close_nsp;
>+	}
>+
>+	dev_info(dev, "Please be patient while writing flash image: %s\n",
>+		 path);
>+
>+	err = nfp_nsp_write_flash(nsp, fw);
>+	if (err < 0)
>+		goto exit_release_fw;
>+	dev_info(dev, "Finished writing flash image\n");
>+	err = 0;
>+
>+exit_release_fw:
>+	release_firmware(fw);
>+exit_close_nsp:
>+	nfp_nsp_close(nsp);
>+	return err;
>+}
>+
> static const struct firmware *
> nfp_net_fw_request(struct pci_dev *pdev, struct nfp_pf *pf, const char *name)
> {
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h
>index a3613a2e0aa5..b7211f200d22 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
>@@ -164,6 +164,8 @@ nfp_pf_map_rtsym(struct nfp_pf *pf, const char *name, const char *sym_fmt,
> 		 unsigned int min_size, struct nfp_cpp_area **area);
> int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, u64 in_length,
> 		 void *out_data, u64 out_length);
>+int nfp_flash_update_common(struct nfp_pf *pf, const char *path,
>+			    struct netlink_ext_ack *extack);
> 
> enum nfp_dump_diag {
> 	NFP_DUMP_NSP_DIAG = 0,
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
>index cb9c512abc76..8f189149efc5 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
>@@ -1237,11 +1237,8 @@ static int nfp_net_set_channels(struct net_device *netdev,
> static int
> nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
> {
>-	const struct firmware *fw;
> 	struct nfp_app *app;
>-	struct nfp_nsp *nsp;
>-	struct device *dev;
>-	int err;
>+	int ret;
> 
> 	if (flash->region != ETHTOOL_FLASH_ALL_REGIONS)
> 		return -EOPNOTSUPP;
>@@ -1250,39 +1247,13 @@ nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
> 	if (!app)
> 		return -EOPNOTSUPP;
> 
>-	dev = &app->pdev->dev;
>-
>-	nsp = nfp_nsp_open(app->cpp);
>-	if (IS_ERR(nsp)) {
>-		err = PTR_ERR(nsp);
>-		dev_err(dev, "Failed to access the NSP: %d\n", err);
>-		return err;
>-	}
>-
>-	err = request_firmware_direct(&fw, flash->data, dev);
>-	if (err)
>-		goto exit_close_nsp;
>-
>-	dev_info(dev, "Please be patient while writing flash image: %s\n",
>-		 flash->data);
> 	dev_hold(netdev);
> 	rtnl_unlock();
>-
>-	err = nfp_nsp_write_flash(nsp, fw);
>-	if (err < 0) {
>-		dev_err(dev, "Flash write failed: %d\n", err);
>-		goto exit_rtnl_lock;
>-	}
>-	dev_info(dev, "Finished writing flash image\n");
>-
>-exit_rtnl_lock:
>+	ret = nfp_flash_update_common(app->pf, flash->data, NULL);
> 	rtnl_lock();
> 	dev_put(netdev);
>-	release_firmware(fw);
> 
>-exit_close_nsp:
>-	nfp_nsp_close(nsp);
>-	return err;
>+	return ret;
> }
> 
> static const struct ethtool_ops nfp_net_ethtool_ops = {

Why don't you use the compat fallback? I think you should.


>-- 
>2.19.2
>

^ permalink raw reply

* Re: [PATCH net-next 3/3] nfp: devlink: allow flashing the device via devlink
From: Michal Kubecek @ 2019-02-15 10:26 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, davem, jiri, oss-drivers, andrew
In-Reply-To: <20190214214046.19182-4-jakub.kicinski@netronome.com>

On Thu, Feb 14, 2019 at 01:40:46PM -0800, Jakub Kicinski wrote:
> Devlink now allows updating device flash.  Implement this
> callback.
> 
> Compared to ethtool update we no longer have to release
> the networking locks - devlink doesn't take them.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
...
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> index cb9c512abc76..8f189149efc5 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> @@ -1237,11 +1237,8 @@ static int nfp_net_set_channels(struct net_device *netdev,
>  static int
>  nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
>  {
> -	const struct firmware *fw;
>  	struct nfp_app *app;
> -	struct nfp_nsp *nsp;
> -	struct device *dev;
> -	int err;
> +	int ret;
>  
>  	if (flash->region != ETHTOOL_FLASH_ALL_REGIONS)
>  		return -EOPNOTSUPP;
> @@ -1250,39 +1247,13 @@ nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
>  	if (!app)
>  		return -EOPNOTSUPP;
>  
> -	dev = &app->pdev->dev;
> -
> -	nsp = nfp_nsp_open(app->cpp);
> -	if (IS_ERR(nsp)) {
> -		err = PTR_ERR(nsp);
> -		dev_err(dev, "Failed to access the NSP: %d\n", err);
> -		return err;
> -	}
> -
> -	err = request_firmware_direct(&fw, flash->data, dev);
> -	if (err)
> -		goto exit_close_nsp;
> -
> -	dev_info(dev, "Please be patient while writing flash image: %s\n",
> -		 flash->data);
>  	dev_hold(netdev);
>  	rtnl_unlock();
> -
> -	err = nfp_nsp_write_flash(nsp, fw);
> -	if (err < 0) {
> -		dev_err(dev, "Flash write failed: %d\n", err);
> -		goto exit_rtnl_lock;
> -	}
> -	dev_info(dev, "Finished writing flash image\n");
> -
> -exit_rtnl_lock:
> +	ret = nfp_flash_update_common(app->pf, flash->data, NULL);
>  	rtnl_lock();
>  	dev_put(netdev);
> -	release_firmware(fw);
>  
> -exit_close_nsp:
> -	nfp_nsp_close(nsp);
> -	return err;
> +	return ret;
>  }
>  
>  static const struct ethtool_ops nfp_net_ethtool_ops = {

Out of curiosity: why don't you drop the ethtool_ops callback and let
the fallback introduced in patch 2/3 do the fallback? Is it to preserve
the check of flash->region or to avoid the lookup? My understanding of
the previous patch was that it would allow new drivers providing the
devlink callback to get rid of ethtool_ops one.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next 2/3] ethtool: add compat for flash update
From: Jiri Pirko @ 2019-02-15 10:17 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev, Jakub Kicinski, davem, oss-drivers, andrew
In-Reply-To: <20190215085315.GL25518@unicorn.suse.cz>

Fri, Feb 15, 2019 at 09:53:15AM CET, mkubecek@suse.cz wrote:
>On Thu, Feb 14, 2019 at 01:40:45PM -0800, Jakub Kicinski wrote:
>> If driver does not support ethtool flash update operation
>> call into devlink.
>> 
>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> ---
>...
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index bd507e13bb7b..d169b5426d3d 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -6435,6 +6435,36 @@ void devlink_compat_running_version(struct net_device *dev,
>>  	mutex_unlock(&devlink_mutex);
>>  }
>>  
>> +int devlink_compat_flash_update(struct net_device *dev, const char *file_name)
>> +{
>> +	struct devlink_port *devlink_port;
>> +	struct devlink *devlink;
>> +
>> +	mutex_lock(&devlink_mutex);
>> +	list_for_each_entry(devlink, &devlink_list, list) {
>> +		mutex_lock(&devlink->lock);
>> +		list_for_each_entry(devlink_port, &devlink->port_list, list) {
>> +			int ret = -EOPNOTSUPP;
>> +
>> +			if (devlink_port->type != DEVLINK_PORT_TYPE_ETH ||
>> +			    devlink_port->type_dev != dev)
>> +				continue;
>> +
>> +			mutex_unlock(&devlink_mutex);
>> +			if (devlink->ops->flash_update)
>> +				ret = devlink->ops->flash_update(devlink,
>> +								 file_name,
>> +								 NULL, NULL);
>> +			mutex_unlock(&devlink->lock);
>> +			return ret;
>> +		}
>> +		mutex_unlock(&devlink->lock);
>> +	}
>> +	mutex_unlock(&devlink_mutex);
>> +
>> +	return -EOPNOTSUPP;
>> +}
>> +
>>  static int __init devlink_module_init(void)
>>  {
>>  	return genl_register_family(&devlink_nl_family);
>
>We already have similar lookup in devlink_compat_running_version() (the
>only difference seems to be that we keep holding devlink_mutex until the
>end there) and it's likely the net_device -> devlink lookup will be
>needed in more places in the future. How about having a helper for it?
>
>I also wonder how does the lookup scale. But I don't have clear idea
>how long the lists can become in real life and the ethtool operations
>are not really time critical.

Another thing is, that you don't really have to do the lookup. If you
have struct net_device *dev inside the driver, you can get the devlink
instance according to that. So it is just a matter of another ndo I
guess.


>
>Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next] net: sched: sch_api: set an error msg when qdisc_alloc_handle() fails
From: Stefano Brivio @ 2019-02-15 10:30 UTC (permalink / raw)
  To: Ivan Vecera; +Cc: netdev, Li Shuang
In-Reply-To: <20190215102325.30832-1-ivecera@redhat.com>

On Fri, 15 Feb 2019 11:23:25 +0100
Ivan Vecera <ivecera@redhat.com> wrote:

> This patch sets an error message in extack when the number of qdisc
> handles exceeds the maximum. Also the error-code ENOSPC is more
> appropriate than ENOMEM in this situation.

Reported-by: Li Shuang <shuali@redhat.com>

> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano

^ permalink raw reply

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
From: Vlad Buslov @ 2019-02-15 10:38 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <20190214213402.67919dea@redhat.com>

On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:02 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> As a preparation for using classifier spinlock instead of relying on
>> external rtnl lock, rearrange code in fl_change. The goal is to group the
>> code which changes classifier state in single block in order to allow
>> following commits in this set to protect it from parallel modification with
>> tp->lock. Data structures that require tp->lock protection are mask
>> hashtable and filters list, and classifier handle_idr.
>>
>> fl_hw_replace_filter() is a sleeping function and cannot be called while
>> holding a spinlock. In order to execute all sequence of changes to shared
>> classifier data structures atomically, call fl_hw_replace_filter() before
>> modifying them.
>>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
>>  1 file changed, 44 insertions(+), 41 deletions(-)
>>
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 88d7af78ba7e..91596a6271f8 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	if (err < 0)
>>  		goto errout;
>>
>> -	if (!handle) {
>> -		handle = 1;
>> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> -				    INT_MAX, GFP_KERNEL);
>> -	} else if (!fold) {
>> -		/* user specifies a handle and it doesn't exist */
>> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> -				    handle, GFP_KERNEL);
>> -	}
>> -	if (err)
>> -		goto errout;
>> -	fnew->handle = handle;
>> -
>>
>> [...]
>>
>>  	if (fold) {
>> +		fnew->handle = handle;
>
> I'm probably missing something, but what if fold is passed and the
> handle isn't specified? That can still happen, right? In that case we
> wouldn't be allocating the handle.

Hi Stefano,

Thank you for reviewing my code.

Cls API lookups fold by handle, so this pointer can only be not NULL
when user specified a handle and filter with such handle exists on tp.

>
>> +
>> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
>> +					     fnew->mask->filter_ht_params);
>> +		if (err)
>> +			goto errout_hw;
>> +
>>  		rhashtable_remove_fast(&fold->mask->ht,
>>  				       &fold->ht_node,
>>  				       fold->mask->filter_ht_params);
>> -		if (!tc_skip_hw(fold->flags))
>> -			fl_hw_destroy_filter(tp, fold, NULL);
>> -	}
>> -
>> -	*arg = fnew;
>> -
>> -	if (fold) {
>>  		idr_replace(&head->handle_idr, fnew, fnew->handle);
>>  		list_replace_rcu(&fold->list, &fnew->list);
>> +
>> +		if (!tc_skip_hw(fold->flags))
>> +			fl_hw_destroy_filter(tp, fold, NULL);
>>  		tcf_unbind_filter(tp, &fold->res);
>>  		tcf_exts_get_net(&fold->exts);
>>  		tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
>>  	} else {
>> +		if (__fl_lookup(fnew->mask, &fnew->mkey)) {
>> +			err = -EEXIST;
>> +			goto errout_hw;
>> +		}
>> +
>> +		if (handle) {
>> +			/* user specifies a handle and it doesn't exist */
>> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> +					    handle, GFP_ATOMIC);
>> +		} else {
>> +			handle = 1;
>> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> +					    INT_MAX, GFP_ATOMIC);
>> +		}
>> +		if (err)
>> +			goto errout_hw;
>
> Just if you respin: a newline here would be nice to have.

Agree.

>
>> +		fnew->handle = handle;
>> +
>> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
>> +					     fnew->mask->filter_ht_params);
>> +		if (err)
>> +			goto errout_idr;
>> +
>>  		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
>>  	}
>>
>> +	*arg = fnew;
>> +
>>  	kfree(tb);
>>  	kfree(mask);
>>  	return 0;
>>
>> -errout_mask_ht:
>> -	rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
>> -			       fnew->mask->filter_ht_params);
>> -
>> -errout_mask:
>> -	fl_mask_put(head, fnew->mask, false);
>> -
>>  errout_idr:
>>  	if (!fold)
>
> This check could go away, I guess (not a strong preference though).

Yes, it seems that after this change errout_idr lable is only accessed
from else branch of if(fold) conditional so the check is redundant.

>
>>  		idr_remove(&head->handle_idr, fnew->handle);
>> +errout_hw:
>> +	if (!tc_skip_hw(fnew->flags))
>> +		fl_hw_destroy_filter(tp, fnew, NULL);
>> +errout_mask:
>> +	fl_mask_put(head, fnew->mask, false);
>>  errout:
>>  	tcf_exts_destroy(&fnew->exts);
>>  	kfree(fnew);

^ permalink raw reply

* r8169 driver causing screen flickering since commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57
From: Elias Rydberg @ 2019-02-15 10:44 UTC (permalink / raw)
  To: netdev

Hi,

Since commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57, some systems have 
experienced screen flickering, followed by a black screen, when using 
WiFi (for example starting NetworkManager). So far this only seems to be 
a problem with the following spec:

CPU: Intel i7-8550U
WiFi module: Intel AC-9260 1730Mbps

I contacted the author of the commit and got the following reply:

"From platforms I’ve seen, if ASPM on r8169 is disabled, the deepest 
Package C-State on Intel CPU will limit to PC3.

If the ASPM on r8169 is enabled, it can reach to PC8 when the display is 
on, and be able to reach PC10 when display is off.

So the issue seems to be a platform bug - deeper Package C state makes 
your system unstable.

Please contact the system vendor, the deepest Package C state can be 
reach should be set by BIOS."

Now, Kai-Heng suggest contacting the system vendor, but after consulting 
my distributions forum, I got the suggestion of contacting this email 
list in hope of implementing a patch disabling the ASPM in the r8169 in 
systems with this spec. I apologize if this is a completely unreasonable 
request; I'm not experienced with kernel developing.

Kind regards,
Elias Rydberg

Thread in my distributions forum, which includes more information:
https://bbs.archlinux.org/viewtopic.php?id=244115 
<https://bbs.archlinux.org/viewtopic.php?id=244115>

The result of a kernel bisection I did:

|a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 is the first bad commit commit 
a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 Author: Kai-Heng Feng 
<kai.heng.feng@canonical.com <mailto:kai.heng.feng@canonical.com>> Date: 
Thu Jun 21 16:30:39 2018 +0800 r8169: Reinstate ASPM Support On Intel 
platforms (Skylake and newer), ASPM support in r8169 is the last missing 
puzzle to let CPU's Package C-State reaches PC8. Without ASPM support, 
the CPU cannot reach beyond PC3. PC8 can save additional ~3W in 
comparison with PC3 on a Coffee Lake platform, Dell G3 3779. This is 
based on the work from Chunhao Lin <hau@realtek.com 
<mailto:hau@realtek.com>>. Signed-off-by: Kai-Heng Feng 
<kai.heng.feng@canonical.com <mailto:kai.heng.feng@canonical.com>> 
Signed-off-by: David S. Miller <davem@davemloft.net 
<mailto:davem@davemloft.net>> :040000 040000 
87d4515c14f8eb8cf99b6f47f97ca4db700931c1 
e542595c74609136a7094777afb10b83012f6643 M drivers|


^ permalink raw reply

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
From: Stefano Brivio @ 2019-02-15 10:47 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <vbfftspjq48.fsf@mellanox.com>

On Fri, 15 Feb 2019 10:38:04 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Thu, 14 Feb 2019 09:47:02 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> As a preparation for using classifier spinlock instead of relying on
> >> external rtnl lock, rearrange code in fl_change. The goal is to group the
> >> code which changes classifier state in single block in order to allow
> >> following commits in this set to protect it from parallel modification with
> >> tp->lock. Data structures that require tp->lock protection are mask
> >> hashtable and filters list, and classifier handle_idr.
> >>
> >> fl_hw_replace_filter() is a sleeping function and cannot be called while
> >> holding a spinlock. In order to execute all sequence of changes to shared
> >> classifier data structures atomically, call fl_hw_replace_filter() before
> >> modifying them.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >> ---
> >>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
> >>  1 file changed, 44 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> >> index 88d7af78ba7e..91596a6271f8 100644
> >> --- a/net/sched/cls_flower.c
> >> +++ b/net/sched/cls_flower.c
> >> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  	if (err < 0)
> >>  		goto errout;
> >>
> >> -	if (!handle) {
> >> -		handle = 1;
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    INT_MAX, GFP_KERNEL);
> >> -	} else if (!fold) {
> >> -		/* user specifies a handle and it doesn't exist */
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    handle, GFP_KERNEL);
> >> -	}
> >> -	if (err)
> >> -		goto errout;
> >> -	fnew->handle = handle;
> >> -
> >>
> >> [...]
> >>
> >>  	if (fold) {
> >> +		fnew->handle = handle;  
> >
> > I'm probably missing something, but what if fold is passed and the
> > handle isn't specified? That can still happen, right? In that case we
> > wouldn't be allocating the handle.  
> 
> Hi Stefano,
> 
> Thank you for reviewing my code.
> 
> Cls API lookups fold by handle, so this pointer can only be not NULL
> when user specified a handle and filter with such handle exists on tp.

Ah, of course. Thanks for clarifying. By the way, what tricked me here
was this check in fl_change():

	if (fold && handle && fold->handle != handle)
		...

which could be turned into:

	if (fold && fold->handle != handle)
		...

at this point.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: John Garry @ 2019-02-15 10:52 UTC (permalink / raw)
  To: Huang Zijiang, yisen.zhuang
  Cc: salil.mehta, davem, lipeng321, liuyonglong, yuehaibing, keescook,
	wangxi11, netdev, linux-kernel, wang.yi59, Linuxarm
In-Reply-To: <1550126505-28394-1-git-send-email-huang.zijiang@zte.com.cn>

On 14/02/2019 06:41, Huang Zijiang wrote:
> The of_find_device_by_node() takes a reference to the underlying device
> structure, we should release that reference.

of_find_device_by_node() is not called for every path, so is this change 
proper:

	/* find the platform device corresponding to fwnode */
	if (is_of_node(dsaf_fwnode)) {
		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
	} else if (is_acpi_device_node(dsaf_fwnode)) {
		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);
	} else {
		pr_err("fwnode is neither OF or ACPI type\n");
		return -EINVAL;
	}

	/* check if we were a success in fetching pdev */
	if (!pdev) {
		pr_err("couldn't find platform device for node\n");
		return -ENODEV;
	}

	/* retrieve the dsaf_device from the driver data */
	dsaf_dev = dev_get_drvdata(&pdev->dev);
	if (!dsaf_dev) {
		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
		return -ENODEV;
	}

John

>
> Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> index 14d7ec7..697d929 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>  	dsaf_dev = dev_get_drvdata(&pdev->dev);
>  	if (!dsaf_dev) {
>  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> +		put_device(&pdev->dev);
>  		return -ENODEV;
>  	}
>
> @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
>  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
>  			dsaf_dev->ae_dev.name);
> +		put_device(&pdev->dev);
>  		return -ENODEV;
>  	}
>
>



^ permalink raw reply

* Re: [REGRESSION 4.20] mvneta - DMA-API: device driver tries to sync DMA memory it has not allocated
From: Russell King - ARM Linux admin @ 2019-02-15 11:10 UTC (permalink / raw)
  To: Thomas Petazzoni, netdev
In-Reply-To: <20190212135919.xbka3lamjn4ifcki@shell.armlinux.org.uk>

On Tue, Feb 12, 2019 at 01:59:19PM +0000, Russell King - ARM Linux admin wrote:
> Hi,
> 
> Booting 4.20 on SolidRun Clearfog reliably provokes the following
> warning - this is with mvneta built in, but DSA as modules:
> 
> WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
> mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
> Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
> CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
> [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
> [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
> [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
> [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
> [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
> [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
> [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
> [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
> [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
> ...
> 
> This appears to be from:
> 
>                 if (rx_bytes <= rx_copybreak) {
>                         /* better copy a small frame and not unmap the DMA region */
>                         skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
>                         if (unlikely(!skb))
>                                 goto err_drop_frame_ret_pool;
> 
>                         dma_sync_single_range_for_cpu(dev->dev.parent,
>                                                       rx_desc->buf_phys_addr,
>                                                       MVNETA_MH_SIZE + NET_SKB_PAD,
>                                                       rx_bytes,
>                                                       DMA_FROM_DEVICE);
> 
> which suggests that rx_desc->buf_phys_addr is not something that should
> be passed to dma_sync_single_range_for_cpu().  I've not been able to
> track down why that is, nor which interface is provoking that.
> 
> As I don't have the details of how the buffer management hardware works
> on Armada 388, I'm unable to debug this myself.

Doing what debugging I _can_ do, it seems that this has been a long-term
error in mvneta, but one that was merely uncovered by:

  commit 562e2f467e71f45f0400ebee5077eaa426d3e426
  Author: Yelena Krivosheev <yelena@marvell.com>
  Date:   Wed Jul 18 18:10:57 2018 +0200

The buffer that is being complained about is sync'd using a device of
dev->dev.parent 'f1070000.ethernet', but is allocated by
mvneta_bm_construct() against a different device:

mvneta_bm_construct: 0x2dd85c00 +0x140 for ee113294 (f10c8000.bm)

namely 'f10c8000.bm'.

It's long-term, because it will only trigger in older kernels if we
hit the copy-break stuff, which used to do:

	if (rx_bytes <= rx_copybreak) {
		skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
		if (unlikely(!skb)) {
			...
		}
		
		dma_sync_single_range_for_cpu(dev->dev.parent,
					      phys_addr,
					      MVNETA_MH_SIZE + NET_SKB_PAD,
					      rx_bytes,
					      DMA_FROM_DEVICE);

where rx_copybreak is 256 bytes.  Quite why that hasn't been seen
already, I do not know.

Looking at the code after the commit, if mvneta is used on a
non-coherent platform, then we have problems:

	copy_size = min(skb_size, rx_bytes);
	...
	memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
	       copy_size);
	...
	if (rxq->left_size == 0) {
		int size = copy_size + MVNETA_MH_SIZE;
		dma_sync_single_range_for_cpu(dev->dev.parent,
					      phys_addr, 0,
					      size,
					      DMA_FROM_DEVICE);

Since the sync is done _after_ we've copied data from a non-coherent
buffer.

If this code has been written to assume that we're always coherent,
then is there any point at all to having the incorrect dma_sync_*()
calls at all?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
From: Vlad Buslov @ 2019-02-15 11:22 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <20190214213423.2260f5b9@redhat.com>


On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:03 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
>> +						unsigned long *handle)
>> +{
>> +	struct cls_fl_head *head = fl_head_dereference(tp);
>> +	struct cls_fl_filter *f;
>> +
>> +	rcu_read_lock();
>> +	/* don't return filters that are being deleted */
>> +	while ((f = idr_get_next_ul(&head->handle_idr,
>> +				    handle)) != NULL &&
>> +	       !refcount_inc_not_zero(&f->refcnt))
>> +		++(*handle);
>
> This... hurts :) What about:
>
> 	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
> 		if (refcount_inc_not_zero(&f->refcnt))
> 			break;
> 		++(*handle);
> 	}
>
> ?

I prefer to avoid using value of assignment as boolean and
non-structured jumps, when possible. In this case it seems OK either
way, but how about:

	for (f = idr_get_next_ul(&head->handle_idr, handle);
	     f && !refcount_inc_not_zero(&f->refcnt);
	     f = idr_get_next_ul(&head->handle_idr, handle))
		++(*handle);

>
>> +	rcu_read_unlock();
>> +
>> +	return f;
>> +}
>> +
>>  static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  			struct netlink_ext_ack *extack)
>>  {
>> @@ -456,10 +503,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  	if (!tc_skip_hw(f->flags))
>>  		fl_hw_destroy_filter(tp, f, extack);
>>  	tcf_unbind_filter(tp, &f->res);
>> -	if (async)
>> -		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
>> -	else
>> -		__fl_destroy_filter(f);
>> +	__fl_put(f);
>>
>>  	return last;
>>  }
>> @@ -494,11 +538,18 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
>>  	tcf_queue_work(&head->rwork, fl_destroy_sleepable);
>>  }
>>
>> +static void fl_put(struct tcf_proto *tp, void *arg)
>> +{
>> +	struct cls_fl_filter *f = arg;
>> +
>> +	__fl_put(f);
>> +}
>> +
>>  static void *fl_get(struct tcf_proto *tp, u32 handle)
>>  {
>>  	struct cls_fl_head *head = fl_head_dereference(tp);
>>
>> -	return idr_find(&head->handle_idr, handle);
>> +	return __fl_get(head, handle);
>>  }
>>
>>  static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
>> @@ -1321,12 +1372,16 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	struct nlattr **tb;
>>  	int err;
>>
>> -	if (!tca[TCA_OPTIONS])
>> -		return -EINVAL;
>> +	if (!tca[TCA_OPTIONS]) {
>> +		err = -EINVAL;
>> +		goto errout_fold;
>> +	}
>>
>>  	mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
>> -	if (!mask)
>> -		return -ENOBUFS;
>> +	if (!mask) {
>> +		err = -ENOBUFS;
>> +		goto errout_fold;
>> +	}
>>
>>  	tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
>>  	if (!tb) {
>> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  		err = -ENOBUFS;
>>  		goto errout_tb;
>>  	}
>> +	refcount_set(&fnew->refcnt, 1);
>>
>>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
>>  	if (err < 0)
>> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	if (!tc_in_hw(fnew->flags))
>>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>>
>> +	refcount_inc(&fnew->refcnt);
>
> I guess I'm not getting the semantics but... why is it 2 now?

As soon as fnew is inserted into head->handle_idr (one reference), it
becomes accessible to concurrent users, which means that it can be
deleted at any time. However, tp->change() returns a reference to newly
created filter to cls_api by assigning "arg" parameter to it (second
reference). After tp->change() returns, cls API continues to use fnew
and releases it with tfilter_put() when finished.

^ permalink raw reply

* RE: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: Salil Mehta @ 2019-02-15 11:25 UTC (permalink / raw)
  To: John Garry, Huang Zijiang, Zhuangyuzeng (Yisen)
  Cc: davem@davemloft.net, lipeng (Y), liuyonglong, yuehaibing,
	keescook@chromium.org, wangxi (M), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, wang.yi59@zte.com.cn, Linuxarm
In-Reply-To: <b20d235c-779c-77bd-e69f-2d63ffd79ec1@huawei.com>

> From: John Garry
> Sent: Friday, February 15, 2019 10:52 AM
> 
> On 14/02/2019 06:41, Huang Zijiang wrote:
> > The of_find_device_by_node() takes a reference to the underlying device
> > structure, we should release that reference.
> 
> of_find_device_by_node() is not called for every path, so is this change proper:


It looks okay to me and with the suggested device reference is being
released from every possible error leg in the function. Could you be
more specific which error path is not being addressed?



> 	/* find the platform device corresponding to fwnode */
> 	if (is_of_node(dsaf_fwnode)) {
> 		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));


This will get the reference to the device, which needs to be
released later.


> 	} else if (is_acpi_device_node(dsaf_fwnode)) {
> 		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);


This will also get the reference to the device when bus_find_device()
gets called and returns the 'device'. Therefore, this needs to be
released as well using put_device()



> 	} else {
> 		pr_err("fwnode is neither OF or ACPI type\n");
> 		return -EINVAL;
> 	}
> 
> 	/* check if we were a success in fetching pdev */
> 	if (!pdev) {
> 		pr_err("couldn't find platform device for node\n");
> 		return -ENODEV;
> 	}
> 
> 	/* retrieve the dsaf_device from the driver data */
> 	dsaf_dev = dev_get_drvdata(&pdev->dev);
> 	if (!dsaf_dev) {
> 		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> 		return -ENODEV;
> 	}
> 
> John


[...]

> > --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> > +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> > @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
> >  	dsaf_dev = dev_get_drvdata(&pdev->dev);
> >  	if (!dsaf_dev) {
> >  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> > +		put_device(&pdev->dev);


This looks okay.


> >  		return -ENODEV;
> >  	}
> >
> > @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
> >  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
> >  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
> >  			dsaf_dev->ae_dev.name);
> > +		put_device(&pdev->dev);


This looks okay.


Salil.

^ permalink raw reply

* Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex
From: Ido Schimmel @ 2019-02-15 11:30 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <vbfh8d5jroi.fsf@mellanox.com>

On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote:
> 
> On Thu 14 Feb 2019 at 18:24, Ido Schimmel <idosch@idosch.org> wrote:
> > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote:
> >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
> >> when accessing filter_chain list, instead of relying on rtnl lock.
> >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to
> >> verify that all users of chain_list have the lock taken.
> >>
> >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
> >> all necessary code while holding chain lock in order to prevent
> >> invalidation of chain_info structure by potential concurrent change. This
> >> also serializes calls to tcf_chain0_head_change(), which allows head change
> >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl
> >> mutex.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >
> > With this sequence [1] I get the following trace [2]. Bisected it to
> > this patch. Note that second filter is rejected by the device driver
> > (that's the intention). I guess it is not properly removed from the
> > filter chain in the error path?
> >
> > Thanks
> >
> > [1]
> > # tc qdisc add dev swp3 clsact
> > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > RTNETLINK answers: File exists
> > We have an error talking to the kernel, -1
> > # tc qdisc del dev swp3 clsact
> >
> > [2]
> > [   70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access
> > [   70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
> > [   70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304
> > [   70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
> > [   70.580204] RIP: 0010:mall_reoffload+0x14a/0x760
> > [   70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83
> > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd
> > [   70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207
> > [   70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000
> > [   70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040
> > [   70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa
> > [   70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000
> > [   70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010
> > [   70.652163] FS:  00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000
> > [   70.661218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0
> > [   70.675633] Call Trace:
> > [   70.693046]  tcf_block_playback_offloads+0x94/0x230
> > [   70.710617]  __tcf_block_cb_unregister+0xf7/0x2d0
> > [   70.734143]  mlxsw_sp_setup_tc+0x20f/0x660
> > [   70.738739]  tcf_block_offload_unbind+0x22b/0x350
> > [   70.748898]  __tcf_block_put+0x203/0x630
> > [   70.769700]  tcf_block_put_ext+0x2f/0x40
> > [   70.774098]  clsact_destroy+0x7a/0xb0
> > [   70.782604]  qdisc_destroy+0x11a/0x5c0
> > [   70.786810]  qdisc_put+0x5a/0x70
> > [   70.790435]  notify_and_destroy+0x8e/0xa0
> > [   70.794933]  qdisc_graft+0xbb7/0xef0
> > [   70.809009]  tc_get_qdisc+0x518/0xa30
> > [   70.821530]  rtnetlink_rcv_msg+0x397/0xa20
> > [   70.835510]  netlink_rcv_skb+0x132/0x380
> > [   70.848419]  netlink_unicast+0x4c0/0x690
> > [   70.866019]  netlink_sendmsg+0x929/0xe10
> > [   70.882134]  sock_sendmsg+0xc8/0x110
> > [   70.886144]  ___sys_sendmsg+0x77a/0x8f0
> > [   70.924064]  __sys_sendmsg+0xf7/0x250
> > [   70.955727]  do_syscall_64+0x14d/0x610
> 
> Hi Ido,
> 
> Thanks for reporting this.
> 
> I looked at the code and problem seems to be matchall classifier
> specific. My implementation of unlocked cls API assumes that concurrent
> insertions are possible and checks for it when deleting "empty" tp.
> Since classifiers don't expose number of elements, the only way to test
> this is to do tp->walk() on them and assume that walk callback is called
> once per filter on every classifier. In your example new tp is created
> for second filter, filter insertion fails, number of elements on newly
> created tp is checked with tp->walk() before deleting it. However,
> matchall classifier always calls the tp->walk() callback once, even when
> it doesn't have a valid filter (in this case with NULL filter pointer).
> 
> Possible ways to fix this:
> 
> 1) Check for NULL filter pointer in tp->walk() callback and ignore it
> when counting filters on tp - will work but I don't like it because I
> don't think it is ever correct to call tp->walk() callback with NULL
> filter pointer.
> 
> 2) Fix matchall to not call tp->walk() callback with NULL filter
> pointers - my preferred simple solution.
> 
> 3) Extend tp API to have direct way to count filters by implementing
> tp->count - requires change to every classifier. Or maybe add it as
> optional API that only unlocked classifiers are required to implement
> and just delete rtnl lock dependent tp's without checking for concurrent
> insertions.
> 
> What do you think?

Since the problem is matchall-specific, then it makes sense to fix it in
matchall like you suggested in option #2.

Can you please use this opportunity and audit other classifiers to
confirm problem is indeed specific to matchall?

In any case, feel free to send me a patch and I'll test it.

Thanks

^ permalink raw reply

* [BUG] 4.20: mv88e6xxx: WARNING: possible circular locking dependency detected
From: Russell King - ARM Linux admin @ 2019-02-15 11:46 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot; +Cc: netdev

Hi Andrew, Vivien,

I decided to try adding support for the DSA switch interrupt on
SolidRun's Clearfog platform, but I notice having done so I get:

WARNING: possible circular locking dependency detected
4.20.0+ #297 Not tainted
------------------------------------------------------
systemd-udevd/157 is trying to acquire lock:
ecc4a080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

but task is already holding lock:
edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&desc->request_mutex){+.+.}:
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0xa0/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbee82ae8

-> #0 (&chip->reg_lock){+.+.}:
       __mutex_lock+0x50/0x8b8
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0x640/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
       mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbee82ae8

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&desc->request_mutex);
                               lock(&chip->reg_lock);
                               lock(&desc->request_mutex);
  lock(&chip->reg_lock);

 *** DEADLOCK ***

2 locks held by systemd-udevd/157:
 #0: ee040868 (&dev->mutex){....}, at: __driver_attach+0x70/0xdc
------------[ cut here ]------------
 #1: edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
WARNING: CPU: 0 PID: 152 at kernel/locking/lockdep.c:355
stack backtrace:
downgrading a read lock
Modules linked in:
CPU: 1 PID: 157 Comm: systemd-udevd Not tainted 4.20.0+ #297
 marvell_cesa(+) mv88e6xxx(+) dsa_core devlink xhci_plat_hcd(+) xhci_hcd armada_Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
[<c07f55c0>] (dump_stack) from [<c0088afc>] (print_circular_bug+0x284/0x2d8)
[<c0088afc>] (print_circular_bug) from [<c0086b5c>] (__lock_acquire+0x15d4/0x19b[<c0086b5c>] (__lock_acquire) from [<c0087828>] (lock_acquire+0xc4/0x1dc)
[<c0087828>] (lock_acquire) from [<c080fe68>] (__mutex_lock+0x50/0x8b8)
[<c080fe68>] (__mutex_lock) from [<c0810758>] (mutex_lock_nested+0x1c/0x24)
[<c0810758>] (mutex_lock_nested) from [<c009e060>] (__setup_irq+0x640/0x704)
[<c009e060>] (__setup_irq) from [<c009e2e0>] (request_threaded_irq+0xd0/0x150)
[<c009e2e0>] (request_threaded_irq) from [<bf0dc970>] (mv88e6xxx_g2_irq_setup+0x[<bf0dc970>] (mv88e6xxx_g2_irq_setup [mv88e6xxx]) from [<bf0d5a90>] (mv88e6xxx_p[<bf0d5a90>] (mv88e6xxx_probe [mv88e6xxx]) from [<c050d420>] (mdio_probe+0x2c/0x[<c050d420>] (mdio_probe) from [<c0496eac>] (really_probe+0x200/0x2c4)
[<c0496eac>] (really_probe) from [<c0497140>] (driver_probe_device+0x5c/0x174)
[<c0497140>] (driver_probe_device) from [<c0497330>] (__driver_attach+0xd8/0xdc)[<c0497330>] (__driver_attach) from [<c0495494>] (bus_for_each_dev+0x58/0x7c)
[<c0495494>] (bus_for_each_dev) from [<c04963d4>] (bus_add_driver+0xe4/0x1f0)
[<c04963d4>] (bus_add_driver) from [<c0498038>] (driver_register+0x7c/0x110)
[<c0498038>] (driver_register) from [<c050d338>] (mdio_driver_register+0x24/0x58[<c050d338>] (mdio_driver_register) from [<c000afdc>] (do_one_initcall+0x74/0x2e[<c000afdc>] (do_one_initcall) from [<c00d4994>] (do_init_module+0x60/0x1d0)
[<c00d4994>] (do_init_module) from [<c00d39e0>] (load_module+0x1968/0x1ff4)
[<c00d39e0>] (load_module) from [<c00d4248>] (sys_finit_module+0x8c/0x98)
[<c00d4248>] (sys_finit_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed42bfa8 to 0xed42bff0)
bfa0:                   00020000 00000000 0000000b b6e814b5 00000000 010b31e0
bfc0: 00020000 00000000 00000000 0000017b 010b1b30 00020000 00000000 010b31e0
bfe0: bee82af8 bee82ae8 b6e7b2ac b6ddad70
CPU: 0 PID: 152 Comm: systemd-udevd Not tainted 4.20.0+ #297
Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
[<c07f55c0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
[<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313b0>] (warn_slowpath_fmt) from [<c0087518>] (lock_downgrade+0x14c/0x1b8)
[<c0087518>] (lock_downgrade) from [<c0081650>] (downgrade_write+0x14/0xd4)
[<c0081650>] (downgrade_write) from [<c0189d10>] (__do_munmap+0x2b8/0x31c)
[<c0189d10>] (__do_munmap) from [<c0189dd4>] (__vm_munmap+0x60/0xa0)
[<c0189dd4>] (__vm_munmap) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed443fa8 to 0xed443ff0)
3fa0:                   010a8240 00001000 b665f000 00001000 00000000 00000000
3fc0: 010a8240 00001000 00000000 0000005b 00000000 00000007 b6ee5f10 00000000
3fe0: fbad2418 bee7f124 b6d7c7b4 b6ddafac
irq event stamp: 83666
hardirqs last  enabled at (83665): [<c001d6b0>] do_page_fault+0x190/0x360
hardirqs last disabled at (83666): [<c080e474>] __schedule+0xbc/0x9c4
softirqs last  enabled at (82980): [<c000a484>] __do_softirq+0x344/0x540
softirqs last disabled at (82971): [<c00386e0>] irq_exit+0x124/0x144
---[ end trace c91466d44e5e3485 ]---

This is caused by the locking order inversion in mv88e6xxx_probe:

        mutex_lock(&chip->reg_lock);
        if (chip->irq > 0)
                err = mv88e6xxx_g1_irq_setup(chip);
        else
                err = mv88e6xxx_irq_poll_setup(chip);
        mutex_unlock(&chip->reg_lock);

Here, we take chip->reg_lock, and then call into mv88e6xxx_g1_irq_setup()
which then calls request_threaded_irq(), taking the request_mutex.
However, when we request the g2 interrupt, we call request_threaded_irq()
again, which takes the request_mutex, which then goes on to call
chip_bus_lock().  This comes through to mv88e6xxx_g1_irq_bus_lock,
which then tries to grab chip->reg_lock.

It looks to me like the mutex_lock()/unlock() for reg_lock should be
moved inside mv88e6xxx_g1_irq_free_common() and
mv88e6xxx_g1_irq_setup_common(), which will avoid holding it while
calling request_threaded_irq() or setting up the delayed work.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: John Garry @ 2019-02-15 12:00 UTC (permalink / raw)
  To: Salil Mehta, Huang Zijiang, Zhuangyuzeng (Yisen)
  Cc: davem@davemloft.net, lipeng (Y), liuyonglong, yuehaibing,
	keescook@chromium.org, wangxi (M), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, wang.yi59@zte.com.cn, Linuxarm
In-Reply-To: <11b3bcdf50ee435587fbfad6043e49bc@huawei.com>

On 15/02/2019 11:25, Salil Mehta wrote:
>> From: John Garry
>> Sent: Friday, February 15, 2019 10:52 AM
>>
>> On 14/02/2019 06:41, Huang Zijiang wrote:
>>> The of_find_device_by_node() takes a reference to the underlying device
>>> structure, we should release that reference.
>>
>> of_find_device_by_node() is not called for every path, so is this change proper:
>
>
> It looks okay to me and with the suggested device reference is being
> released from every possible error leg in the function. Could you be
> more specific which error path is not being addressed?
>
>
>
>> 	/* find the platform device corresponding to fwnode */
>> 	if (is_of_node(dsaf_fwnode)) {
>> 		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
>
>
> This will get the reference to the device, which needs to be
> released later.
>
>
>> 	} else if (is_acpi_device_node(dsaf_fwnode)) {
>> 		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);
>
>
> This will also get the reference to the device when bus_find_device()
> gets called and returns the 'device'. Therefore, this needs to be
> released as well using put_device()

OK, I see. That's non-obvious.

And so the commit message was simply incomplete.

>

So who finally drops this reference? This patch only seems to release on 
error path. I checked the callers and they don't seem to.

John

>
>
>> 	} else {
>> 		pr_err("fwnode is neither OF or ACPI type\n");
>> 		return -EINVAL;
>> 	}
>>
>> 	/* check if we were a success in fetching pdev */
>> 	if (!pdev) {
>> 		pr_err("couldn't find platform device for node\n");
>> 		return -ENODEV;
>> 	}
>>
>> 	/* retrieve the dsaf_device from the driver data */
>> 	dsaf_dev = dev_get_drvdata(&pdev->dev);
>> 	if (!dsaf_dev) {
>> 		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
>> 		return -ENODEV;
>> 	}
>>
>> John
>
>
> [...]
>
>>> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>>>  	dsaf_dev = dev_get_drvdata(&pdev->dev);
>>>  	if (!dsaf_dev) {
>>>  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
>>> +		put_device(&pdev->dev);
>
>
> This looks okay.
>
>
>>>  		return -ENODEV;
>>>  	}
>>>
>>> @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>>>  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
>>>  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
>>>  			dsaf_dev->ae_dev.name);
>>> +		put_device(&pdev->dev);
>
>
> This looks okay.
>
>
> Salil.
>
> .
>



^ permalink raw reply

* Re: [BUG] 4.20: mv88e6xxx: WARNING: possible circular locking dependency detected
From: Russell King - ARM Linux admin @ 2019-02-15 12:11 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot; +Cc: netdev
In-Reply-To: <20190215114617.lfag4sarqzbtca6c@shell.armlinux.org.uk>

On Fri, Feb 15, 2019 at 11:46:17AM +0000, Russell King - ARM Linux admin wrote:
> Hi Andrew, Vivien,
> 
> I decided to try adding support for the DSA switch interrupt on
> SolidRun's Clearfog platform, but I notice having done so I get:
> 
> WARNING: possible circular locking dependency detected
> 4.20.0+ #297 Not tainted
> ------------------------------------------------------
> systemd-udevd/157 is trying to acquire lock:
> ecc4a080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704
> 
> but task is already holding lock:
> edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&desc->request_mutex){+.+.}:
>        mutex_lock_nested+0x1c/0x24
>        __setup_irq+0xa0/0x704
>        request_threaded_irq+0xd0/0x150
>        mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
>        mdio_probe+0x2c/0x54
>        really_probe+0x200/0x2c4
>        driver_probe_device+0x5c/0x174
>        __driver_attach+0xd8/0xdc
>        bus_for_each_dev+0x58/0x7c
>        bus_add_driver+0xe4/0x1f0
>        driver_register+0x7c/0x110
>        mdio_driver_register+0x24/0x58
>        do_one_initcall+0x74/0x2e8
>        do_init_module+0x60/0x1d0
>        load_module+0x1968/0x1ff4
>        sys_finit_module+0x8c/0x98
>        ret_fast_syscall+0x0/0x28
>        0xbee82ae8
> 
> -> #0 (&chip->reg_lock){+.+.}:
>        __mutex_lock+0x50/0x8b8
>        mutex_lock_nested+0x1c/0x24
>        __setup_irq+0x640/0x704
>        request_threaded_irq+0xd0/0x150
>        mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
>        mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
>        mdio_probe+0x2c/0x54
>        really_probe+0x200/0x2c4
>        driver_probe_device+0x5c/0x174
>        __driver_attach+0xd8/0xdc
>        bus_for_each_dev+0x58/0x7c
>        bus_add_driver+0xe4/0x1f0
>        driver_register+0x7c/0x110
>        mdio_driver_register+0x24/0x58
>        do_one_initcall+0x74/0x2e8
>        do_init_module+0x60/0x1d0
>        load_module+0x1968/0x1ff4
>        sys_finit_module+0x8c/0x98
>        ret_fast_syscall+0x0/0x28
>        0xbee82ae8
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&desc->request_mutex);
>                                lock(&chip->reg_lock);
>                                lock(&desc->request_mutex);
>   lock(&chip->reg_lock);
> 
>  *** DEADLOCK ***
> 
> 2 locks held by systemd-udevd/157:
>  #0: ee040868 (&dev->mutex){....}, at: __driver_attach+0x70/0xdc
> ------------[ cut here ]------------
>  #1: edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
> WARNING: CPU: 0 PID: 152 at kernel/locking/lockdep.c:355
> stack backtrace:
> downgrading a read lock
> Modules linked in:
> CPU: 1 PID: 157 Comm: systemd-udevd Not tainted 4.20.0+ #297
>  marvell_cesa(+) mv88e6xxx(+) dsa_core devlink xhci_plat_hcd(+) xhci_hcd armada_Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
> [<c07f55c0>] (dump_stack) from [<c0088afc>] (print_circular_bug+0x284/0x2d8)
> [<c0088afc>] (print_circular_bug) from [<c0086b5c>] (__lock_acquire+0x15d4/0x19b[<c0086b5c>] (__lock_acquire) from [<c0087828>] (lock_acquire+0xc4/0x1dc)
> [<c0087828>] (lock_acquire) from [<c080fe68>] (__mutex_lock+0x50/0x8b8)
> [<c080fe68>] (__mutex_lock) from [<c0810758>] (mutex_lock_nested+0x1c/0x24)
> [<c0810758>] (mutex_lock_nested) from [<c009e060>] (__setup_irq+0x640/0x704)
> [<c009e060>] (__setup_irq) from [<c009e2e0>] (request_threaded_irq+0xd0/0x150)
> [<c009e2e0>] (request_threaded_irq) from [<bf0dc970>] (mv88e6xxx_g2_irq_setup+0x[<bf0dc970>] (mv88e6xxx_g2_irq_setup [mv88e6xxx]) from [<bf0d5a90>] (mv88e6xxx_p[<bf0d5a90>] (mv88e6xxx_probe [mv88e6xxx]) from [<c050d420>] (mdio_probe+0x2c/0x[<c050d420>] (mdio_probe) from [<c0496eac>] (really_probe+0x200/0x2c4)
> [<c0496eac>] (really_probe) from [<c0497140>] (driver_probe_device+0x5c/0x174)
> [<c0497140>] (driver_probe_device) from [<c0497330>] (__driver_attach+0xd8/0xdc)[<c0497330>] (__driver_attach) from [<c0495494>] (bus_for_each_dev+0x58/0x7c)
> [<c0495494>] (bus_for_each_dev) from [<c04963d4>] (bus_add_driver+0xe4/0x1f0)
> [<c04963d4>] (bus_add_driver) from [<c0498038>] (driver_register+0x7c/0x110)
> [<c0498038>] (driver_register) from [<c050d338>] (mdio_driver_register+0x24/0x58[<c050d338>] (mdio_driver_register) from [<c000afdc>] (do_one_initcall+0x74/0x2e[<c000afdc>] (do_one_initcall) from [<c00d4994>] (do_init_module+0x60/0x1d0)
> [<c00d4994>] (do_init_module) from [<c00d39e0>] (load_module+0x1968/0x1ff4)
> [<c00d39e0>] (load_module) from [<c00d4248>] (sys_finit_module+0x8c/0x98)
> [<c00d4248>] (sys_finit_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xed42bfa8 to 0xed42bff0)
> bfa0:                   00020000 00000000 0000000b b6e814b5 00000000 010b31e0
> bfc0: 00020000 00000000 00000000 0000017b 010b1b30 00020000 00000000 010b31e0
> bfe0: bee82af8 bee82ae8 b6e7b2ac b6ddad70
> CPU: 0 PID: 152 Comm: systemd-udevd Not tainted 4.20.0+ #297
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
> [<c07f55c0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c0087518>] (lock_downgrade+0x14c/0x1b8)
> [<c0087518>] (lock_downgrade) from [<c0081650>] (downgrade_write+0x14/0xd4)
> [<c0081650>] (downgrade_write) from [<c0189d10>] (__do_munmap+0x2b8/0x31c)
> [<c0189d10>] (__do_munmap) from [<c0189dd4>] (__vm_munmap+0x60/0xa0)
> [<c0189dd4>] (__vm_munmap) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xed443fa8 to 0xed443ff0)
> 3fa0:                   010a8240 00001000 b665f000 00001000 00000000 00000000
> 3fc0: 010a8240 00001000 00000000 0000005b 00000000 00000007 b6ee5f10 00000000
> 3fe0: fbad2418 bee7f124 b6d7c7b4 b6ddafac
> irq event stamp: 83666
> hardirqs last  enabled at (83665): [<c001d6b0>] do_page_fault+0x190/0x360
> hardirqs last disabled at (83666): [<c080e474>] __schedule+0xbc/0x9c4
> softirqs last  enabled at (82980): [<c000a484>] __do_softirq+0x344/0x540
> softirqs last disabled at (82971): [<c00386e0>] irq_exit+0x124/0x144
> ---[ end trace c91466d44e5e3485 ]---
> 
> This is caused by the locking order inversion in mv88e6xxx_probe:
> 
>         mutex_lock(&chip->reg_lock);
>         if (chip->irq > 0)
>                 err = mv88e6xxx_g1_irq_setup(chip);
>         else
>                 err = mv88e6xxx_irq_poll_setup(chip);
>         mutex_unlock(&chip->reg_lock);
> 
> Here, we take chip->reg_lock, and then call into mv88e6xxx_g1_irq_setup()
> which then calls request_threaded_irq(), taking the request_mutex.
> However, when we request the g2 interrupt, we call request_threaded_irq()
> again, which takes the request_mutex, which then goes on to call
> chip_bus_lock().  This comes through to mv88e6xxx_g1_irq_bus_lock,
> which then tries to grab chip->reg_lock.
> 
> It looks to me like the mutex_lock()/unlock() for reg_lock should be
> moved inside mv88e6xxx_g1_irq_free_common() and
> mv88e6xxx_g1_irq_setup_common(), which will avoid holding it while
> calling request_threaded_irq() or setting up the delayed work.

Maybe something like this, which seems to solve the problem here:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c771a58b975e..c859efd8d329 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -349,9 +349,11 @@ static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip)
 	int irq, virq;
 	u16 mask;
 
+	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
 	mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
 	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
+	mutex_unlock(&chip->reg_lock);
 
 	for (irq = 0; irq < chip->g1_irq.nirqs; irq++) {
 		virq = irq_find_mapping(chip->g1_irq.domain, irq);
@@ -369,9 +371,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
 	 */
 	free_irq(chip->irq, chip);
 
-	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_irq_free_common(chip);
-	mutex_unlock(&chip->reg_lock);
 }
 
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
@@ -392,6 +392,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
 	chip->g1_irq.masked = ~0;
 
+	mutex_lock(&chip->reg_lock);
 	err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
 	if (err)
 		goto out_mapping;
@@ -406,6 +407,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
 	if (err)
 		goto out_disable;
+	mutex_unlock(&chip->reg_lock);
 
 	return 0;
 
@@ -414,6 +416,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
 
 out_mapping:
+	mutex_unlock(&chip->reg_lock);
 	for (irq = 0; irq < 16; irq++) {
 		virq = irq_find_mapping(chip->g1_irq.domain, irq);
 		irq_dispose_mapping(virq);
@@ -479,9 +482,7 @@ static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip)
 	kthread_cancel_delayed_work_sync(&chip->irq_poll_work);
 	kthread_destroy_worker(chip->kworker);
 
-	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_irq_free_common(chip);
-	mutex_unlock(&chip->reg_lock);
 }
 
 int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask)
@@ -4808,12 +4809,10 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	 * the PHYs will link their interrupts to these interrupt
 	 * controllers
 	 */
-	mutex_lock(&chip->reg_lock);
 	if (chip->irq > 0)
 		err = mv88e6xxx_g1_irq_setup(chip);
 	else
 		err = mv88e6xxx_irq_poll_setup(chip);
-	mutex_unlock(&chip->reg_lock);
 
 	if (err)
 		goto out;

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply related

* [PATCH] net: sched: matchall: verify that filter is not NULL in mall_walk()
From: Vlad Buslov @ 2019-02-15 12:11 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Vlad Buslov
In-Reply-To: <20190215113041.GA10511@splinter>

Check that filter is not NULL before passing it to tcf_walker->fn()
callback. This can happen when mall_change() failed to offload filter to
hardware.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/cls_matchall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a37137430e61..1f9d481b0fbb 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -247,6 +247,9 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 
 	if (arg->count < arg->skip)
 		goto skip;
+
+	if (!head)
+		return;
 	if (arg->fn(tp, head, arg) < 0)
 		arg->stop = 1;
 skip:
-- 
2.13.6


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox