Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/4] ravb: Add optional PHY reset during system resume
From: Sergei Shtylyov @ 2017-10-09 12:50 UTC (permalink / raw)
  To: Florian Fainelli, Geert Uytterhoeven
  Cc: Geert Uytterhoeven, David S . Miller, Simon Horman, Magnus Damm,
	Andrew Lunn, Niklas Söderlund, netdev@vger.kernel.org,
	Linux-Renesas, devicetree@vger.kernel.org
In-Reply-To: <316a7f99-d2ca-cdb2-2eca-deb044085c52@cogentembedded.com>

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]

On 10/09/2017 12:37 PM, Sergei Shtylyov wrote:

>>>>>>> If the optional "reset-gpios" property is specified in DT, the generic
>>>>>>> MDIO bus code takes care of resetting the PHY during device probe.
>>>>>>> However, the PHY may still have to be reset explicitly after system
>>>>>>> resume.
>>>>>>>
>>>>>>> This allows to restore Ethernet operation after resume from s2ram on
>>>>>>> Salvator-XS, where the enable pin of the regulator providing PHY power
>>>>>>> is connected to PRESETn, and PSCI suspend powers down the SoC.
>>>>>>>
>>>>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>>>>>> ---
>>>>>>>    drivers/net/ethernet/renesas/ravb_main.c | 9 +++++++++
>>>>>>>    1 file changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> b/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> index fdf30bfa403bf416..96d1d48e302f8c9a 100644
>>>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> [...]
>>>>>>> @@ -2302,6 +2304,13 @@ static int __maybe_unused ravb_resume(struct
>>>>>>> device *dev)he patches
>>>>>>>          * reopen device if it was running before system suspended.
>>>>>>>          */
>>>>>>>
>>>>>>> +     /* PHY reset */
>>>>>>> +     if (bus->reset_gpiod) {
>>>>>>> +             gpiod_set_value_cansleep(bus->reset_gpiod, 1);
>>>>>>> +             udelay(bus->reset_delay_us);
>>>>>>> +             gpiod_set_value_cansleep(bus->reset_gpiod, 0);
>>>>>>> +     }
>>>>>>
>>>>>> This is a clever hack, but unfortunately this is also misusing the MDIO
>>>>>> bus reset line into a PHY reset line. As commented in patch 3, if this
>>>>>> reset line is tied to the PHY, then this should be a PHY property and
>>>>>
>>>>> OK.
>>>>>
>>>>>> you cannot (ab)use the MDIO bus GPIO reset logic anymore...
>>>>>
>>>>> And then I should add reset-gpios support to drivers/net/phy/micrel.c?
>>>>> Or is there already generic code to handle per-PHY reset? I couldn't
>>>>> find it.
>>>>
>>>> There is not such a thing unfortunately, but it would presumably be
>>>
>>>     It's strange you don't remember about my (abandoned) patches to
>>> handle per=PHY reset GPIOs -- perhaps it's time to unearth them. Here
>>> they are:
>>>
>>> http://patchwork.ozlabs.org/patch/616495/
>>> http://patchwork.ozlabs.org/patch/616501/
>>>
>>>     I had v3 in the works before abandoning this series -- it doesn't
>>> apply now.
>>
>> Should Geert pick-up where you left and address the feedback given in
>> v2, or do you plan to post a rebased v3?
> 
>     I was going to address the rejects in v3 and give the patchset to Geert... 
> unfortunately, this took so-o-o long. I'm going to do it today, at last.

    Here you are! Attaching both the patches against net-next.git...
Perhaps I also will be able to return to this subject in the near future...

MBR, Sergei

[-- Attachment #2: phylib-add-device-reset-GPIO-support-v3.patch --]
[-- Type: text/x-patch, Size: 11599 bytes --]

Subject: phylib: add device reset GPIO support

The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...

Note that I had to modify the AT803x PHY driver as it would stop working
otherwise -- it made use of the reset GPIO for its own purposes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Changes in version 3:
- fixed the fwnode_get_named_gpiod() call due to the added parameters (which
  allowed to eliminate gpiod_direction_output() call);
- undeleted one blank line in the AT803x driver;
- resolved rejects, refreshed the patch;
- reworded/reformatted the changelog.

Changes in version 2:
- reformatted the changelog;
- resolved rejects, refreshed the patch.

 Documentation/devicetree/bindings/net/phy.txt |    2 +
 drivers/net/phy/at803x.c                      |   18 ++------------
 drivers/net/phy/mdio_bus.c                    |    4 +++
 drivers/net/phy/mdio_device.c                 |   27 +++++++++++++++++++--
 drivers/net/phy/phy_device.c                  |   33 ++++++++++++++++++++++++--
 drivers/of/of_mdio.c                          |   16 ++++++++++++
 include/linux/mdio.h                          |    3 ++
 include/linux/phy.h                           |    5 +++
 8 files changed, 89 insertions(+), 19 deletions(-)

Index: net-next/Documentation/devicetree/bindings/net/phy.txt
===================================================================
--- net-next.orig/Documentation/devicetree/bindings/net/phy.txt
+++ net-next/Documentation/devicetree/bindings/net/phy.txt
@@ -53,6 +53,8 @@ Optional Properties:
   to ensure the integrated PHY is used. The absence of this property indicates
   the muxers should be configured so that the external PHY is used.
 
+- reset-gpios: The GPIO phandle and specifier for the PHY reset signal.
+
 Example:
 
 ethernet-phy@0 {
Index: net-next/drivers/net/phy/at803x.c
===================================================================
--- net-next.orig/drivers/net/phy/at803x.c
+++ net-next/drivers/net/phy/at803x.c
@@ -71,7 +71,6 @@ MODULE_LICENSE("GPL");
 
 struct at803x_priv {
 	bool phy_reset:1;
-	struct gpio_desc *gpiod_reset;
 };
 
 struct at803x_context {
@@ -254,22 +253,11 @@ static int at803x_probe(struct phy_devic
 {
 	struct device *dev = &phydev->mdio.dev;
 	struct at803x_priv *priv;
-	struct gpio_desc *gpiod_reset;
 
 	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
 	if (!priv)
 		return -ENOMEM;
 
-	if (phydev->drv->phy_id != ATH8030_PHY_ID)
-		goto does_not_require_reset_workaround;
-
-	gpiod_reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
-	if (IS_ERR(gpiod_reset))
-		return PTR_ERR(gpiod_reset);
-
-	priv->gpiod_reset = gpiod_reset;
-
-does_not_require_reset_workaround:
 	phydev->priv = priv;
 
 	return 0;
@@ -343,14 +331,14 @@ static void at803x_link_change_notify(st
 	 * cannot recover from by software.
 	 */
 	if (phydev->state == PHY_NOLINK) {
-		if (priv->gpiod_reset && !priv->phy_reset) {
+		if (phydev->mdio.reset && !priv->phy_reset) {
 			struct at803x_context context;
 
 			at803x_context_save(phydev, &context);
 
-			gpiod_set_value(priv->gpiod_reset, 1);
+			phy_device_reset(phydev, 1);
 			msleep(1);
-			gpiod_set_value(priv->gpiod_reset, 0);
+			phy_device_reset(phydev, 0);
 			msleep(1);
 
 			at803x_context_restore(phydev, &context);
Index: net-next/drivers/net/phy/mdio_bus.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_bus.c
+++ net-next/drivers/net/phy/mdio_bus.c
@@ -38,6 +38,7 @@
 #include <linux/phy.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/gpio/consumer.h>
 
 #include <asm/irq.h>
 
@@ -420,6 +421,9 @@ void mdiobus_unregister(struct mii_bus *
 		if (!mdiodev)
 			continue;
 
+		if (mdiodev->reset)
+			gpiod_put(mdiodev->reset);
+
 		mdiodev->device_remove(mdiodev);
 		mdiodev->device_free(mdiodev);
 	}
Index: net-next/drivers/net/phy/mdio_device.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_device.c
+++ net-next/drivers/net/phy/mdio_device.c
@@ -12,6 +12,8 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/errno.h>
+#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -114,6 +116,13 @@ void mdio_device_remove(struct mdio_devi
 }
 EXPORT_SYMBOL(mdio_device_remove);
 
+void mdio_device_reset(struct mdio_device *mdiodev, int value)
+{
+	if (mdiodev->reset)
+		gpiod_set_value(mdiodev->reset, value);
+}
+EXPORT_SYMBOL(mdio_device_reset);
+
 /**
  * mdio_probe - probe an MDIO device
  * @dev: device to probe
@@ -128,9 +137,16 @@ static int mdio_probe(struct device *dev
 	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
 	int err = 0;
 
-	if (mdiodrv->probe)
+	if (mdiodrv->probe) {
+		/* Deassert the reset signal */
+		mdio_device_reset(mdiodev, 0);
+
 		err = mdiodrv->probe(mdiodev);
 
+		/* Assert the reset signal */
+		mdio_device_reset(mdiodev, 1);
+	}
+
 	return err;
 }
 
@@ -140,9 +156,16 @@ static int mdio_remove(struct device *de
 	struct device_driver *drv = mdiodev->dev.driver;
 	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
 
-	if (mdiodrv->remove)
+	if (mdiodrv->remove) {
+		/* Deassert the reset signal */
+		mdio_device_reset(mdiodev, 0);
+
 		mdiodrv->remove(mdiodev);
 
+		/* Assert the reset signal */
+		mdio_device_reset(mdiodev, 1);
+	}
+
 	return 0;
 }
 
Index: net-next/drivers/net/phy/phy_device.c
===================================================================
--- net-next.orig/drivers/net/phy/phy_device.c
+++ net-next/drivers/net/phy/phy_device.c
@@ -632,6 +632,9 @@ int phy_device_register(struct phy_devic
 	if (err)
 		return err;
 
+	/* Deassert the reset signal */
+	phy_device_reset(phydev, 0);
+
 	/* Run all of the fixups for this PHY */
 	err = phy_scan_fixups(phydev);
 	if (err) {
@@ -647,9 +650,15 @@ int phy_device_register(struct phy_devic
 		goto out;
 	}
 
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
+
 	return 0;
 
  out:
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
+
 	mdiobus_unregister_device(&phydev->mdio);
 	return err;
 }
@@ -849,6 +858,9 @@ int phy_init_hw(struct phy_device *phyde
 {
 	int ret = 0;
 
+	/* Deassert the reset signal */
+	phy_device_reset(phydev, 0);
+
 	if (!phydev->drv || !phydev->drv->config_init)
 		return 0;
 
@@ -1126,6 +1138,9 @@ void phy_detach(struct phy_device *phyde
 	put_device(&phydev->mdio.dev);
 	if (ndev_owner != bus->owner)
 		module_put(bus->owner);
+
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
 }
 EXPORT_SYMBOL(phy_detach);
 
@@ -1811,9 +1826,16 @@ static int phy_probe(struct device *dev)
 	/* Set the state to READY by default */
 	phydev->state = PHY_READY;
 
-	if (phydev->drv->probe)
+	if (phydev->drv->probe) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
 		err = phydev->drv->probe(phydev);
 
+		/* Assert the reset signal */
+		phy_device_reset(phydev, 1);
+	}
+
 	mutex_unlock(&phydev->lock);
 
 	return err;
@@ -1829,8 +1851,15 @@ static int phy_remove(struct device *dev
 	phydev->state = PHY_DOWN;
 	mutex_unlock(&phydev->lock);
 
-	if (phydev->drv && phydev->drv->remove)
+	if (phydev->drv && phydev->drv->remove) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
 		phydev->drv->remove(phydev);
+
+		/* Assert the reset signal */
+		phy_device_reset(phydev, 1);
+	}
 	phydev->drv = NULL;
 
 	return 0;
Index: net-next/drivers/of/of_mdio.c
===================================================================
--- net-next.orig/drivers/of/of_mdio.c
+++ net-next/drivers/of/of_mdio.c
@@ -47,6 +47,7 @@ static int of_get_phy_id(struct device_n
 static void of_mdiobus_register_phy(struct mii_bus *mdio,
 				    struct device_node *child, u32 addr)
 {
+	struct gpio_desc *gpiod;
 	struct phy_device *phy;
 	bool is_c45;
 	int rc;
@@ -55,10 +56,16 @@ static void of_mdiobus_register_phy(stru
 	is_c45 = of_device_is_compatible(child,
 					 "ethernet-phy-ieee802.3-c45");
 
+	/* Deassert the reset signal */
+	gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios", 0,
+				       GPIOD_OUT_LOW, "PHY reset");
 	if (!is_c45 && !of_get_phy_id(child, &phy_id))
 		phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
 	else
 		phy = get_phy_device(mdio, addr, is_c45);
+	/* Assert the reset signal again */
+	if (!IS_ERR(gpiod))
+		gpiod_set_value(gpiod, 1);
 	if (IS_ERR(phy))
 		return;
 
@@ -78,6 +85,9 @@ static void of_mdiobus_register_phy(stru
 	of_node_get(child);
 	phy->mdio.dev.of_node = child;
 
+	if (!IS_ERR(gpiod))
+		phy->mdio.reset = gpiod;
+
 	/* All data is now stored in the phy struct;
 	 * register it */
 	rc = phy_device_register(phy);
@@ -95,6 +105,7 @@ static void of_mdiobus_register_device(s
 				       struct device_node *child, u32 addr)
 {
 	struct mdio_device *mdiodev;
+	struct gpio_desc *gpiod;
 	int rc;
 
 	mdiodev = mdio_device_create(mdio, addr);
@@ -107,6 +118,11 @@ static void of_mdiobus_register_device(s
 	of_node_get(child);
 	mdiodev->dev.of_node = child;
 
+	gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios", 0,
+				       GPIOD_ASIS, "PHY reset");
+	if (!IS_ERR(gpiod))
+		mdiodev->reset = gpiod;
+
 	/* All data is now stored in the mdiodev struct; register it. */
 	rc = mdio_device_register(mdiodev);
 	if (rc) {
Index: net-next/include/linux/mdio.h
===================================================================
--- net-next.orig/include/linux/mdio.h
+++ net-next/include/linux/mdio.h
@@ -12,6 +12,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct gpio_desc;
 struct mii_bus;
 
 /* Multiple levels of nesting are possible. However typically this is
@@ -39,6 +40,7 @@ struct mdio_device {
 	/* Bus address of the MDIO device (0-31) */
 	int addr;
 	int flags;
+	struct gpio_desc *reset;
 };
 #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
 
@@ -71,6 +73,7 @@ void mdio_device_free(struct mdio_device
 struct mdio_device *mdio_device_create(struct mii_bus *bus, int addr);
 int mdio_device_register(struct mdio_device *mdiodev);
 void mdio_device_remove(struct mdio_device *mdiodev);
+void mdio_device_reset(struct mdio_device *mdiodev, int value);
 int mdio_driver_register(struct mdio_driver *drv);
 void mdio_driver_unregister(struct mdio_driver *drv);
 int mdio_device_bus_match(struct device *dev, struct device_driver *drv);
Index: net-next/include/linux/phy.h
===================================================================
--- net-next.orig/include/linux/phy.h
+++ net-next/include/linux/phy.h
@@ -847,6 +847,11 @@ static inline int phy_read_status(struct
 	return phydev->drv->read_status(phydev);
 }
 
+static inline void phy_device_reset(struct phy_device *phydev, int value)
+{
+	mdio_device_reset(&phydev->mdio, value);
+}
+
 #define phydev_err(_phydev, format, args...)	\
 	dev_err(&_phydev->mdio.dev, format, ##args)
 

[-- Attachment #3: macb-kill-PHY-reset-code-v3.patch --]
[-- Type: text/x-patch, Size: 2800 bytes --]

Subject: macb: kill PHY reset code

With  the phylib now  being aware of the "reset-gpios"  PHY node property,
there should  be no need to frob the PHY  reset in this driver anymore...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Changes in version 3:
- resolved rejects due to the file being renamed, refreshed the patch;
- added the code to reset PHY on a probe error;
- edited the patch description.

 drivers/net/ethernet/cadence/macb.h      |    1 -
 drivers/net/ethernet/cadence/macb_main.c |   21 ---------------------
 2 files changed, 22 deletions(-)

Index: net-next/drivers/net/ethernet/cadence/macb.h
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb.h
+++ net-next/drivers/net/ethernet/cadence/macb.h
@@ -1032,7 +1032,6 @@ struct macb {
 	unsigned int		dma_burst_length;
 
 	phy_interface_t		phy_interface;
-	struct gpio_desc	*reset_gpio;
 
 	/* AT91RM9200 transmit */
 	struct sk_buff *skb;			/* holds skb until xmit interrupt completes */
Index: net-next/drivers/net/ethernet/cadence/macb_main.c
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb_main.c
+++ net-next/drivers/net/ethernet/cadence/macb_main.c
@@ -3393,7 +3393,6 @@ static int macb_probe(struct platform_de
 					      = macb_config->clk_init;
 	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
-	struct device_node *phy_node;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
@@ -3499,18 +3498,6 @@ static int macb_probe(struct platform_de
 	else
 		macb_get_hwaddr(bp);
 
-	/* Power up the PHY if there is a GPIO reset */
-	phy_node =  of_get_next_available_child(np, NULL);
-	if (phy_node) {
-		int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-
-		if (gpio_is_valid(gpio)) {
-			bp->reset_gpio = gpio_to_desc(gpio);
-			gpiod_direction_output(bp->reset_gpio, 1);
-		}
-	}
-	of_node_put(phy_node);
-
 	err = of_get_phy_mode(np);
 	if (err < 0) {
 		pdata = dev_get_platdata(&pdev->dev);
@@ -3554,10 +3541,6 @@ err_out_unregister_mdio:
 	mdiobus_unregister(bp->mii_bus);
 	mdiobus_free(bp->mii_bus);
 
-	/* Shutdown the PHY if there is a GPIO reset */
-	if (bp->reset_gpio)
-		gpiod_set_value(bp->reset_gpio, 0);
-
 err_out_free_netdev:
 	free_netdev(dev);
 
@@ -3585,10 +3568,6 @@ static int macb_remove(struct platform_d
 		dev->phydev = NULL;
 		mdiobus_free(bp->mii_bus);
 
-		/* Shutdown the PHY if there is a GPIO reset */
-		if (bp->reset_gpio)
-			gpiod_set_value(bp->reset_gpio, 0);
-
 		unregister_netdev(dev);
 		clk_disable_unprepare(bp->tx_clk);
 		clk_disable_unprepare(bp->hclk);

^ permalink raw reply

* Re: [PATCH v2] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Daniel Borkmann @ 2017-10-09 12:35 UTC (permalink / raw)
  To: Shmulik Ladkani, Pablo Neira Ayuso, netfilter-devel
  Cc: netdev, Rafael Buchbinder, Shmulik Ladkani, Willem de Bruijn
In-Reply-To: <20171009122715.19010-1-shmulik@nsof.io>

On 10/09/2017 02:27 PM, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> support for attaching an eBPF object by an fd, with the
> 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> IPT_SO_SET_REPLACE call.
>
> However this breaks subsequent iptables calls:
>
>   # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
>   # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
>   iptables: Invalid argument. Run `dmesg' for more information.
>
> That's because iptables works by loading exising rules using
> IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
> the replacement set.
>
> However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
> (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
> occurs, userspace passes a bogus fd number, which leads to
> 'bpf_mt_check_v1' to fail.
>
> One suggested solution [1] was to hack iptables userspace, to perform a
> "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
> process-local fd per every 'xt_bpf_info_v1' entry seen.
>
> However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
> depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
>
> This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
> '.fd' and instead perform an in-kernel lookup for the bpf object given
> the provided '.path'.
>
> It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
> XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
> expected to provide the path of the pinned object.
>
> Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
>
> References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
>              [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH v2] netlink: do not set cb_running if dump's start() errs
From: Johannes Berg @ 2017-10-09 12:31 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel
In-Reply-To: <20171009121451.26815-1-Jason@zx2c4.com>

On Mon, 2017-10-09 at 14:14 +0200, Jason A. Donenfeld wrote:
> It turns out that multiple places can call netlink_dump(), which
> means
> it's still possible to dereference partially initialized values in
> dump() that were the result of a faulty returned start().
> 
> This fixes the issue by calling start() _before_ setting cb_running
> to
> true, so that there's no chance at all of hitting the dump() function
> through any indirect paths.
> 
> It also moves the call to start() to be when the mutex is held. This
> has
> the nice side effect of serializing invocations to start(), which is
> likely desirable anyway. It also prevents any possible other races
> that
> might come out of this logic.

I'm not necessarily sure it's _nice_, but I do think it doesn't matter,
so that's just splitting hairs. If you do have a genl family with
parallel_ops, you'd better be prepared to handle parallel things, and
then this could also be in parallel :-)

> In testing this with several different pieces of tricky code to
> trigger
> these issues, this commit fixes all avenues that I'm aware of.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>

johannes

^ permalink raw reply

* Re: [PATCH] netlink: do not set cb_running if dump's start() errs
From: Johannes Berg @ 2017-10-09 12:27 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel
In-Reply-To: <1507550326.26041.39.camel@sipsolutions.net>

Just decided to take another look:

On Mon, 2017-10-09 at 13:58 +0200, Johannes Berg wrote:
> On Mon, 2017-10-09 at 13:56 +0200, Jason A. Donenfeld wrote:
> 
> > @@ -2266,16 +2266,17 @@ int __netlink_dump_start(struct sock *ssk,
> > struct sk_buff *skb,
> >  	cb->min_dump_alloc = control->min_dump_alloc;
> >  	cb->skb = skb;
> >  
> > +	if (cb->start) {
> > +		ret = cb->start(cb);
> > +		if (ret)
> > +			goto error_unlock;
> > +	}
> > +
> >  	nlk->cb_running = true;
> >  
> >  	mutex_unlock(nlk->cb_mutex);
> 
> Hmm. Now start is invoked with the mutex held, I'm not sure it
> actually _matters_, but that should probably be reviewed and
> mentioned in the commit log?

It sort of seems designed to run ->start outside the lock, otherwise we
wouldn't really have to acquire it again in netlink_dump() but could
just keep it across the call (with some locking changes in
netlink_recvmsg())?

Then again, clearly none of the (few) existing users actually care.

Btw - we should (separately) also remove "start" from struct
netlink_callback, it's only ever used within this function and we can
use control->start instead of cb->start here.

johannes

^ permalink raw reply

* [PATCH v2] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Shmulik Ladkani @ 2017-10-09 12:27 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel
  Cc: netdev, Daniel Borkmann, shmulik, Rafael Buchbinder,
	Shmulik Ladkani, Willem de Bruijn

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading exising rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v2: Fix xt_bpf.c compilation if CONFIG_BPF_SYSCALL=n
---
 include/linux/bpf.h                   |  5 +++++
 include/uapi/linux/netfilter/xt_bpf.h |  1 +
 kernel/bpf/inode.c                    |  1 +
 net/netfilter/xt_bpf.c                | 22 ++++++++++++++++++++--
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a67daea731ab..a49b321a1262 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -407,6 +407,11 @@ static inline void __bpf_prog_uncharge(struct user_struct *user, u32 pages)
 {
 }
 
+static inline int bpf_obj_get_user(const char __user *pathname)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline struct net_device  *__dev_map_lookup_elem(struct bpf_map *map,
 						       u32 key)
 {
diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h
index b97725af2ac0..da161b56c79e 100644
--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -23,6 +23,7 @@ enum xt_bpf_modes {
 	XT_BPF_MODE_FD_PINNED,
 	XT_BPF_MODE_FD_ELF,
 };
+#define XT_BPF_MODE_PATH_PINNED XT_BPF_MODE_FD_PINNED
 
 struct xt_bpf_info_v1 {
 	__u16 mode;
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index e833ed914358..be1dde967208 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -363,6 +363,7 @@ int bpf_obj_get_user(const char __user *pathname)
 	putname(pname);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(bpf_obj_get_user);
 
 static void bpf_evict_inode(struct inode *inode)
 {
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 38986a95216c..29123934887b 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/syscalls.h>
 #include <linux/skbuff.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
@@ -49,6 +50,22 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
 	return 0;
 }
 
+static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
+{
+	mm_segment_t oldfs = get_fs();
+	int retval, fd;
+
+	set_fs(KERNEL_DS);
+	fd = bpf_obj_get_user(path);
+	set_fs(oldfs);
+	if (fd < 0)
+		return fd;
+
+	retval = __bpf_mt_check_fd(fd, ret);
+	sys_close(fd);
+	return retval;
+}
+
 static int bpf_mt_check(const struct xt_mtchk_param *par)
 {
 	struct xt_bpf_info *info = par->matchinfo;
@@ -66,9 +83,10 @@ static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
 		return __bpf_mt_check_bytecode(info->bpf_program,
 					       info->bpf_program_num_elem,
 					       &info->filter);
-	else if (info->mode == XT_BPF_MODE_FD_PINNED ||
-		 info->mode == XT_BPF_MODE_FD_ELF)
+	else if (info->mode == XT_BPF_MODE_FD_ELF)
 		return __bpf_mt_check_fd(info->fd, &info->filter);
+	else if (info->mode == XT_BPF_MODE_PATH_PINNED)
+		return __bpf_mt_check_path(info->path, &info->filter);
 	else
 		return -EINVAL;
 }
-- 
2.14.2


^ permalink raw reply related

* [PATCH v2] netlink: do not set cb_running if dump's start() errs
From: Jason A. Donenfeld @ 2017-10-09 12:14 UTC (permalink / raw)
  To: johannes, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld
In-Reply-To: <CAHmME9qK84HT6DEC2Z3q-E+J8AtNSp_7GNAcjWSSsKnxX0eAkA@mail.gmail.com>

It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
---
 net/netlink/af_netlink.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 94c11cf0459d..f34750691c5c 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2266,16 +2266,17 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->min_dump_alloc = control->min_dump_alloc;
 	cb->skb = skb;
 
+	if (cb->start) {
+		ret = cb->start(cb);
+		if (ret)
+			goto error_unlock;
+	}
+
 	nlk->cb_running = true;
 
 	mutex_unlock(nlk->cb_mutex);
 
-	ret = 0;
-	if (cb->start)
-		ret = cb->start(cb);
-
-	if (!ret)
-		ret = netlink_dump(sk);
+	ret = netlink_dump(sk);
 
 	sock_put(sk);
 
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH] netlink: do not set cb_running if dump's start() errs
From: Jason A. Donenfeld @ 2017-10-09 12:12 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, Netdev, LKML
In-Reply-To: <1507550326.26041.39.camel@sipsolutions.net>

Hi Johannes,

Yes, indeed, and I think that's actually a good thing. It means that
the starts aren't ever called in parallel, which could be useful if
drivers have any ordering requirements. The change doesn't negatively
effect any existing drivers. I'll resubmit with a larger commit
message explaining this.

Jason

^ permalink raw reply

* [PATCH] cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
From: Aleksander Morgado @ 2017-10-09 12:05 UTC (permalink / raw)
  To: oliver
  Cc: davem, marco.demarco, stefano.godeas, linux-usb, netdev,
	linux-kernel, Aleksander Morgado

The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback.
This module allows switching to different USB profiles with the
'AT+UUSBCONF' command, and provides a ECM network interface when the
'AT+UUSBCONF=2' profile is selected.

The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB
configuration includes a ECM network interface.

Both these modules are controlled via AT commands through one of the
TTYs exposed. Connecting these modules may be done just by activating
the desired PDP context with 'AT+CGACT=1,<cid>' and then running DHCP
on the ECM interface.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
---
 drivers/net/usb/cdc_ether.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 29c7e2ec0dcb..52ea80bcd639 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -560,6 +560,7 @@ static const struct driver_info wwan_info = {
 #define NVIDIA_VENDOR_ID	0x0955
 #define HP_VENDOR_ID		0x03f0
 #define MICROSOFT_VENDOR_ID	0x045e
+#define UBLOX_VENDOR_ID		0x1546
 
 static const struct usb_device_id	products[] = {
 /* BLACKLIST !!
@@ -868,6 +869,18 @@ static const struct usb_device_id	products[] = {
 				      USB_CDC_SUBCLASS_ETHERNET,
 				      USB_CDC_PROTO_NONE),
 	.driver_info = (unsigned long)&zte_cdc_info,
+}, {
+	/* U-blox TOBY-L2 */
+	USB_DEVICE_AND_INTERFACE_INFO(UBLOX_VENDOR_ID, 0x1143, USB_CLASS_COMM,
+				      USB_CDC_SUBCLASS_ETHERNET,
+				      USB_CDC_PROTO_NONE),
+	.driver_info = (unsigned long)&wwan_info,
+}, {
+	/* U-blox SARA-U2 */
+	USB_DEVICE_AND_INTERFACE_INFO(UBLOX_VENDOR_ID, 0x1104, USB_CLASS_COMM,
+				      USB_CDC_SUBCLASS_ETHERNET,
+				      USB_CDC_PROTO_NONE),
+	.driver_info = (unsigned long)&wwan_info,
 }, {
 	USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET,
 			USB_CDC_PROTO_NONE),
-- 
2.14.1

^ permalink raw reply related

* [PATCH v3 3/3] net: phy: Change error to EINVAL for invalid MAC
From: Dan Murphy @ 2017-10-09 12:03 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, Woojung.Huh, afd, Dan Murphy
In-Reply-To: <20171009120302.23611-1-dmurphy@ti.com>

Change the return error code to EINVAL if the MAC
address is not valid in the set_wol function.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v3 - No changes made
v2 - There was no v1 on this patch this is new.

 drivers/net/phy/at803x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index c1e52b9dc58d..5f93e6add563 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -167,7 +167,7 @@ static int at803x_set_wol(struct phy_device *phydev,
 		mac = (const u8 *) ndev->dev_addr;
 
 		if (!is_valid_ether_addr(mac))
-			return -EFAULT;
+			return -EINVAL;
 
 		for (i = 0; i < 3; i++) {
 			phy_write(phydev, AT803X_MMD_ACCESS_CONTROL,
-- 
2.14.0

^ permalink raw reply related

* [PATCH v3 2/3] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-09 12:03 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, Woojung.Huh, afd, Dan Murphy
In-Reply-To: <20171009120302.23611-1-dmurphy@ti.com>

Add support for the TI  DP83822 10/100Mbit ethernet phy.

The DP83822 provides flexibility to connect to a MAC through a
standard MII, RMII or RGMII interface.

Datasheet:
http://www.ti.com/product/DP83822I/datasheet

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v3 - Fixed WoL indication bit and removed mutex for suspend/resume - 
https://www.mail-archive.com/netdev@vger.kernel.org/msg191891.html and
https://www.mail-archive.com/netdev@vger.kernel.org/msg191665.html

v2 - Updated per comments.  Removed unnessary parantheis, called genphy_suspend/
resume routines and then performing WoL changes, reworked sopass storage and reduced
the number of phy reads, and moved WOL_SECURE_ON - 
https://www.mail-archive.com/netdev@vger.kernel.org/msg191392.html

 drivers/net/phy/Kconfig   |   5 +
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/dp83822.c | 302 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 308 insertions(+)
 create mode 100644 drivers/net/phy/dp83822.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index cd931cf9dcc2..8e78a482e09e 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -277,6 +277,11 @@ config DAVICOM_PHY
 	---help---
 	  Currently supports dm9161e and dm9131
 
+config DP83822_PHY
+	tristate "Texas Instruments DP83822 PHY"
+	---help---
+	  Supports the DP83822 PHY.
+
 config DP83848_PHY
 	tristate "Texas Instruments DP83848 PHY"
 	---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 416df92fbf4f..df3b82ba8550 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CICADA_PHY)	+= cicada.o
 obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
+obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
 obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
 obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
 obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c
new file mode 100644
index 000000000000..de196dbc46cd
--- /dev/null
+++ b/drivers/net/phy/dp83822.c
@@ -0,0 +1,302 @@
+/*
+ * Driver for the Texas Instruments DP83822 PHY
+ *
+ * Copyright (C) 2017 Texas Instruments Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy.h>
+#include <linux/netdevice.h>
+
+#define DP83822_PHY_ID	        0x2000a240
+#define DP83822_DEVADDR		0x1f
+
+#define MII_DP83822_MISR1	0x12
+#define MII_DP83822_MISR2	0x13
+#define MII_DP83822_RESET_CTRL	0x1f
+
+#define DP83822_HW_RESET	BIT(15)
+#define DP83822_SW_RESET	BIT(14)
+
+/* MISR1 bits */
+#define DP83822_RX_ERR_HF_INT_EN	BIT(0)
+#define DP83822_FALSE_CARRIER_HF_INT_EN	BIT(1)
+#define DP83822_ANEG_COMPLETE_INT_EN	BIT(2)
+#define DP83822_DUP_MODE_CHANGE_INT_EN	BIT(3)
+#define DP83822_SPEED_CHANGED_INT_EN	BIT(4)
+#define DP83822_LINK_STAT_INT_EN	BIT(5)
+#define DP83822_ENERGY_DET_INT_EN	BIT(6)
+#define DP83822_LINK_QUAL_INT_EN	BIT(7)
+
+/* MISR2 bits */
+#define DP83822_JABBER_DET_INT_EN	BIT(0)
+#define DP83822_WOL_PKT_INT_EN		BIT(1)
+#define DP83822_SLEEP_MODE_INT_EN	BIT(2)
+#define DP83822_MDI_XOVER_INT_EN	BIT(3)
+#define DP83822_LB_FIFO_INT_EN		BIT(4)
+#define DP83822_PAGE_RX_INT_EN		BIT(5)
+#define DP83822_ANEG_ERR_INT_EN		BIT(6)
+#define DP83822_EEE_ERROR_CHANGE_INT_EN	BIT(7)
+
+/* INT_STAT1 bits */
+#define DP83822_WOL_INT_EN	BIT(4)
+#define DP83822_WOL_INT_STAT	BIT(12)
+
+#define MII_DP83822_RXSOP1	0x04a5
+#define	MII_DP83822_RXSOP2	0x04a6
+#define	MII_DP83822_RXSOP3	0x04a7
+
+/* WoL Registers */
+#define	MII_DP83822_WOL_CFG	0x04a0
+#define	MII_DP83822_WOL_STAT	0x04a1
+#define	MII_DP83822_WOL_DA1	0x04a2
+#define	MII_DP83822_WOL_DA2	0x04a3
+#define	MII_DP83822_WOL_DA3	0x04a4
+
+/* WoL bits */
+#define DP83822_WOL_MAGIC_EN	BIT(1)
+#define DP83822_WOL_SECURE_ON	BIT(5)
+#define DP83822_WOL_EN		BIT(7)
+#define DP83822_WOL_INDICATION_SEL BIT(8)
+#define DP83822_WOL_CLR_INDICATION BIT(11)
+
+static int dp83822_ack_interrupt(struct phy_device *phydev)
+{
+	int err = phy_read(phydev, MII_DP83822_MISR1);
+
+	if (err < 0)
+		return err;
+
+	err = phy_read(phydev, MII_DP83822_MISR2);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83822_set_wol(struct phy_device *phydev,
+			   struct ethtool_wolinfo *wol)
+{
+	struct net_device *ndev = phydev->attached_dev;
+	u16 value;
+	const u8 *mac;
+
+	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
+		mac = (const u8 *)ndev->dev_addr;
+
+		if (!is_valid_ether_addr(mac))
+			return -EINVAL;
+
+		/* MAC addresses start with byte 5, but stored in mac[0].
+		 * 822 PHYs store bytes 4|5, 2|3, 0|1
+		 */
+		phy_write_mmd(phydev, DP83822_DEVADDR,
+			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
+		phy_write_mmd(phydev, DP83822_DEVADDR,
+			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
+			      (mac[5] << 8) | mac[4]);
+
+		value = phy_read_mmd(phydev, DP83822_DEVADDR,
+				     MII_DP83822_WOL_CFG);
+		if (wol->wolopts & WAKE_MAGIC)
+			value |= DP83822_WOL_MAGIC_EN;
+		else
+			value &= ~DP83822_WOL_MAGIC_EN;
+
+		if (wol->wolopts & WAKE_MAGICSECURE) {
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP1,
+				      (wol->sopass[1] << 8) | wol->sopass[0]);
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP2,
+				      (wol->sopass[3] << 8) | wol->sopass[2]);
+			phy_write_mmd(phydev, DP83822_DEVADDR,
+				      MII_DP83822_RXSOP3,
+				      (wol->sopass[5] << 8) | wol->sopass[4]);
+			value |= DP83822_WOL_SECURE_ON;
+		} else {
+			value &= ~DP83822_WOL_SECURE_ON;
+		}
+
+		value |= (DP83822_WOL_EN | DP83822_WOL_INDICATION_SEL |
+			  DP83822_WOL_CLR_INDICATION);
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
+			      value);
+	} else {
+		value = phy_read_mmd(phydev, DP83822_DEVADDR,
+				     MII_DP83822_WOL_CFG);
+		value &= ~DP83822_WOL_EN;
+		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
+			      value);
+	}
+
+	return 0;
+}
+
+static void dp83822_get_wol(struct phy_device *phydev,
+			    struct ethtool_wolinfo *wol)
+{
+	int value;
+	u16 sopass_val;
+
+	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
+	wol->wolopts = 0;
+
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+	if (value & DP83822_WOL_MAGIC_EN)
+		wol->wolopts |= WAKE_MAGIC;
+
+	if (~value & DP83822_WOL_CLR_INDICATION)
+		wol->wolopts = 0;
+
+	if (value & DP83822_WOL_SECURE_ON) {
+		wol->wolopts |= WAKE_MAGICSECURE;
+	} else {
+		wol->wolopts &= ~WAKE_MAGICSECURE;
+		return;
+	}
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP1);
+	wol->sopass[0] = (sopass_val & 0xff);
+	wol->sopass[1] = (sopass_val >> 8);
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2);
+	wol->sopass[2] = (sopass_val & 0xff);
+	wol->sopass[3] = (sopass_val >> 8);
+
+	sopass_val = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3);
+	wol->sopass[4] = (sopass_val & 0xff);
+	wol->sopass[5] = (sopass_val >> 8);
+}
+
+static int dp83822_config_intr(struct phy_device *phydev)
+{
+	int misr_status;
+	int err;
+
+	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+		misr_status = phy_read(phydev, MII_DP83822_MISR1);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83822_RX_ERR_HF_INT_EN |
+				DP83822_FALSE_CARRIER_HF_INT_EN |
+				DP83822_ANEG_COMPLETE_INT_EN |
+				DP83822_DUP_MODE_CHANGE_INT_EN |
+				DP83822_SPEED_CHANGED_INT_EN |
+				DP83822_LINK_STAT_INT_EN |
+				DP83822_ENERGY_DET_INT_EN |
+				DP83822_LINK_QUAL_INT_EN);
+
+		err = phy_write(phydev, MII_DP83822_MISR1, misr_status);
+		if (err < 0)
+			return err;
+
+		misr_status = phy_read(phydev, MII_DP83822_MISR2);
+		if (misr_status < 0)
+			return misr_status;
+
+		misr_status |= (DP83822_JABBER_DET_INT_EN |
+				DP83822_WOL_PKT_INT_EN |
+				DP83822_SLEEP_MODE_INT_EN |
+				DP83822_MDI_XOVER_INT_EN |
+				DP83822_LB_FIFO_INT_EN |
+				DP83822_PAGE_RX_INT_EN |
+				DP83822_ANEG_ERR_INT_EN |
+				DP83822_EEE_ERROR_CHANGE_INT_EN);
+
+		err = phy_write(phydev, MII_DP83822_MISR2, misr_status);
+	} else {
+		err = phy_write(phydev, MII_DP83822_MISR1, 0);
+		if (err < 0)
+			return err;
+
+		err = phy_write(phydev, MII_DP83822_MISR1, 0);
+	}
+
+	return err;
+}
+
+static int dp83822_phy_reset(struct phy_device *phydev)
+{
+	int err;
+
+	err = phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+static int dp83822_suspend(struct phy_device *phydev)
+{
+	int value;
+
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+
+	if (!(value & DP83822_WOL_EN))
+		genphy_suspend(phydev);
+
+	return 0;
+}
+
+static int dp83822_resume(struct phy_device *phydev)
+{
+	int value;
+
+	genphy_resume(phydev);
+
+	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
+
+	phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG, value |
+		      DP83822_WOL_CLR_INDICATION);
+
+
+	return 0;
+}
+
+static struct phy_driver dp83822_driver[] = {
+	{
+	 .phy_id = DP83822_PHY_ID,
+	 .phy_id_mask = 0xfffffff0,
+	 .name = "TI DP83822",
+	 .features = PHY_BASIC_FEATURES,
+	 .flags = PHY_HAS_INTERRUPT,
+	 .config_init = genphy_config_init,
+	 .soft_reset = dp83822_phy_reset,
+	 .get_wol = dp83822_get_wol,
+	 .set_wol = dp83822_set_wol,
+	 .ack_interrupt = dp83822_ack_interrupt,
+	 .config_intr = dp83822_config_intr,
+	 .config_aneg = genphy_config_aneg,
+	 .read_status = genphy_read_status,
+	 .suspend = dp83822_suspend,
+	 .resume = dp83822_resume,
+	 },
+};
+module_phy_driver(dp83822_driver);
+
+static struct mdio_device_id __maybe_unused dp83822_tbl[] = {
+	{ DP83822_PHY_ID, 0xfffffff0 },
+	{ },
+};
+MODULE_DEVICE_TABLE(mdio, dp83822_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments DP83822 PHY driver");
+MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
+MODULE_LICENSE("GPL");
-- 
2.14.0

^ permalink raw reply related

* [PATCH v3 1/3] net: phy: Remove TI DP83822 from DP83848 driver
From: Dan Murphy @ 2017-10-09 12:03 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev, Woojung.Huh, afd, Dan Murphy

Removing the DP83822 device from the DP83848 to
support the TI DP83822 dedicated driver that will
initially support WoL settings.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
---

v3 - No changes made
v2 - There was no v1 on this patch this is new.

 drivers/net/phy/dp83848.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index 3de4fe4dda77..3966d43c5146 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -20,7 +20,6 @@
 #define TI_DP83620_PHY_ID		0x20005ce0
 #define NS_DP83848C_PHY_ID		0x20005c90
 #define TLK10X_PHY_ID			0x2000a210
-#define TI_DP83822_PHY_ID		0x2000a240
 
 /* Registers */
 #define DP83848_MICR			0x11 /* MII Interrupt Control Register */
@@ -80,7 +79,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
 	{ NS_DP83848C_PHY_ID, 0xfffffff0 },
 	{ TI_DP83620_PHY_ID, 0xfffffff0 },
 	{ TLK10X_PHY_ID, 0xfffffff0 },
-	{ TI_DP83822_PHY_ID, 0xfffffff0 },
 	{ }
 };
 MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
@@ -110,7 +108,6 @@ static struct phy_driver dp83848_driver[] = {
 	DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
 	DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
-	DP83848_PHY_DRIVER(TI_DP83822_PHY_ID, "TI DP83822 10/100 Mbps PHY"),
 };
 module_phy_driver(dp83848_driver);
 
-- 
2.14.0

^ permalink raw reply related

* Re: [PATCH 10/12] net/mlx4: replace <linux/radix-tree.h> with <linux/radix-tree-root.h>
From: Masahiro Yamada @ 2017-10-09 12:02 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, Linux Kernel Mailing List, Thomas Gleixner,
	Andrew Morton, Ian Abbott, Ingo Molnar, Linus Torvalds,
	linux-rdma, Yishai Hadas, Tariq Toukan, netdev
In-Reply-To: <1507483942.11434.4.camel@perches.com>

2017-10-09 2:32 GMT+09:00 Joe Perches <joe@perches.com>:
> On Mon, 2017-10-09 at 02:29 +0900, Masahiro Yamada wrote:
>> The idea is simple; include necessary headers explicitly.
>
> Try that for kernel.h
>
> There's a reason aggregation of #includes is useful.
>

BTW, talking about <linux/kernel.h>, it is too much aggregation, isn't it?

It exceed 850 lines, and contains lots of unrelated stuff in one header.
Perhaps, it could be a good time to think about splitting?

For example, I see many trace_... things in it.

I wonder if linux/kernel.h is a good home for them, or a separate file.


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Daniel Borkmann @ 2017-10-09 12:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Willem de Bruijn
  Cc: Shmulik Ladkani, netfilter-devel, Willem de Bruijn,
	Network Development, Rafael Buchbinder, Shmulik Ladkani
In-Reply-To: <20171009115736.GA31826@salvia>

Hi Shmulik,

On 10/09/2017 01:57 PM, Pablo Neira Ayuso wrote:
> On Mon, Oct 09, 2017 at 01:18:23PM +0200, Pablo Neira Ayuso wrote:
>> On Fri, Oct 06, 2017 at 01:40:13PM -0400, Willem de Bruijn wrote:
>>> On Fri, Oct 6, 2017 at 12:02 PM, Shmulik Ladkani <shmulik@nsof.io> wrote:
>>>> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>>>>
>>>> Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
>>>> support for attaching an eBPF object by an fd, with the
>>>> 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
>>>> IPT_SO_SET_REPLACE call.
>>>>
>>>> However this breaks subsequent iptables calls:
>>>>
>>>>   # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
>>>>   # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
>>>>   iptables: Invalid argument. Run `dmesg' for more information.
>> [...]
>>>>
>>>> References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
>>>>              [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
>>>>
>>>> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
>>>> Cc: Willem de Bruijn <willemb@google.com>
>>>> Reported-by: Rafael Buchbinder <rafi@rbk.ms>
>>>> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>>>
>>> Acked-by: Willem de Bruijn <willemb@google.com>
>>
>> Applied, thanks.
>
> Hm, I have to keep this back. Compilation breaks here.
>
> net/netfilter/xt_bpf.c: In function ‘__bpf_mt_check_path’:
> net/netfilter/xt_bpf.c:59:2: error: implicit declaration of function
> ‘bpf_obj_get_user’ [-Werror=implicit-function-declaration]
>    fd = bpf_obj_get_user(path);
>    ^

Yeah, probably best to just add a dummy bpf_obj_get_user()
returning an error when CONFIG_BPF_SYSCALL is disabled.

^ permalink raw reply

* Re: [PATCH] netlink: do not set cb_running if dump's start() errs
From: Johannes Berg @ 2017-10-09 11:58 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel
In-Reply-To: <20171009115648.25989-1-Jason@zx2c4.com>

On Mon, 2017-10-09 at 13:56 +0200, Jason A. Donenfeld wrote:

> @@ -2266,16 +2266,17 @@ int __netlink_dump_start(struct sock *ssk,
> struct sk_buff *skb,
>  	cb->min_dump_alloc = control->min_dump_alloc;
>  	cb->skb = skb;
>  
> +	if (cb->start) {
> +		ret = cb->start(cb);
> +		if (ret)
> +			goto error_unlock;
> +	}
> +
>  	nlk->cb_running = true;
>  
>  	mutex_unlock(nlk->cb_mutex);

Hmm. Now start is invoked with the mutex held, I'm not sure it actually
_matters_, but that should probably be reviewed and mentioned in the
commit log?

johannes

^ permalink raw reply

* Re: [PATCH] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Pablo Neira Ayuso @ 2017-10-09 11:57 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Shmulik Ladkani, netfilter-devel, Willem de Bruijn,
	Network Development, Daniel Borkmann, Rafael Buchbinder,
	Shmulik Ladkani
In-Reply-To: <20171009111823.GA30637@salvia>

On Mon, Oct 09, 2017 at 01:18:23PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 06, 2017 at 01:40:13PM -0400, Willem de Bruijn wrote:
> > On Fri, Oct 6, 2017 at 12:02 PM, Shmulik Ladkani <shmulik@nsof.io> wrote:
> > > From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> > >
> > > Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> > > support for attaching an eBPF object by an fd, with the
> > > 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> > > IPT_SO_SET_REPLACE call.
> > >
> > > However this breaks subsequent iptables calls:
> > >
> > >  # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
> > >  # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
> > >  iptables: Invalid argument. Run `dmesg' for more information.
> [...]
> > >
> > > References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
> > >             [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
> > >
> > > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > > Cc: Willem de Bruijn <willemb@google.com>
> > > Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> > > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> > 
> > Acked-by: Willem de Bruijn <willemb@google.com>
> 
> Applied, thanks.

Hm, I have to keep this back. Compilation breaks here.

net/netfilter/xt_bpf.c: In function ‘__bpf_mt_check_path’:
net/netfilter/xt_bpf.c:59:2: error: implicit declaration of function
‘bpf_obj_get_user’ [-Werror=implicit-function-declaration]
  fd = bpf_obj_get_user(path);
  ^

^ permalink raw reply

* Re: [PATCH] netlink: do not set cb_running if dump's start() errs
From: Jason A. Donenfeld @ 2017-10-09 11:57 UTC (permalink / raw)
  To: Johannes Berg, David Miller, Netdev, LKML; +Cc: Jason A. Donenfeld
In-Reply-To: <20171009115648.25989-1-Jason@zx2c4.com>

Dave - this very likely should be queued up for stable.

Jason

^ permalink raw reply

* [PATCH] netlink: do not set cb_running if dump's start() errs
From: Jason A. Donenfeld @ 2017-10-09 11:56 UTC (permalink / raw)
  To: johannes, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld

It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
---
 net/netlink/af_netlink.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 94c11cf0459d..f34750691c5c 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2266,16 +2266,17 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->min_dump_alloc = control->min_dump_alloc;
 	cb->skb = skb;
 
+	if (cb->start) {
+		ret = cb->start(cb);
+		if (ret)
+			goto error_unlock;
+	}
+
 	nlk->cb_running = true;
 
 	mutex_unlock(nlk->cb_mutex);
 
-	ret = 0;
-	if (cb->start)
-		ret = cb->start(cb);
-
-	if (!ret)
-		ret = netlink_dump(sk);
+	ret = netlink_dump(sk);
 
 	sock_put(sk);
 
-- 
2.14.2

^ permalink raw reply related

* Re: netlink backwards compatibility in userspace tools
From: Jason A. Donenfeld @ 2017-10-09 11:34 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, LKML, Daniel Kahn Gillmor
In-Reply-To: <20171008.214726.1977368725573365543.davem@davemloft.net>

Hi Dave,

That seems wise. Thanks for the advice. A more sophisticated way of
approaching this would be for the kernel to also send a bitmap of
which attributes are "critical" and only warn (or even error) of
_those_ are not understood. But that seems needlessly complex, and so
I think I'll go with your suggestion, and simply warn once with a
shorter message.

Jason

^ permalink raw reply

* Re: [PATCH] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Pablo Neira Ayuso @ 2017-10-09 11:18 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Shmulik Ladkani, netfilter-devel, Willem de Bruijn,
	Network Development, Daniel Borkmann, Rafael Buchbinder,
	Shmulik Ladkani
In-Reply-To: <CAF=yD-JbZ_ViWsBWK7Ye=4RYF9BgnMKvD04xE5P26RNRpnmuiQ@mail.gmail.com>

On Fri, Oct 06, 2017 at 01:40:13PM -0400, Willem de Bruijn wrote:
> On Fri, Oct 6, 2017 at 12:02 PM, Shmulik Ladkani <shmulik@nsof.io> wrote:
> > From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> >
> > Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> > support for attaching an eBPF object by an fd, with the
> > 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> > IPT_SO_SET_REPLACE call.
> >
> > However this breaks subsequent iptables calls:
> >
> >  # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
> >  # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
> >  iptables: Invalid argument. Run `dmesg' for more information.
[...]
> >
> > References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
> >             [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
> >
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> 
> Acked-by: Willem de Bruijn <willemb@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v3] netfilter: SYNPROXY: fix process non tcp packet bug in {ipv4,ipv6}_synproxy_hook
From: Pablo Neira Ayuso @ 2017-10-09 11:09 UTC (permalink / raw)
  To: Lin Zhang
  Cc: kadlec, fw, davem, kuznet, yoshfuji, netfilter-devel, coreteam,
	netdev
In-Reply-To: <1507221843-4915-1-git-send-email-xiaolou4617@gmail.com>

On Fri, Oct 06, 2017 at 12:44:03AM +0800, Lin Zhang wrote:
> In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet,
> but the real server maybe reply an icmp error packet related to the
> exist tcp conntrack, so we will access wrong tcp data.
> 
> For fix it, check for the protocol field and only process tcp traffic.

Applied, thanks. I have mangled the title slightly to:

        netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook

For the record.

^ permalink raw reply

* Re: [patch net-next v2 1/5] net: bridge: Notify on bridge device mrouter state changes
From: Nikolay Aleksandrov @ 2017-10-09 10:52 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, nogahf, mlxsw, ivecera, andrew, stephen,
	nbd, roopa
In-Reply-To: <20171009091535.1315-2-jiri@resnulli.us>

On 09/10/17 12:15, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> Add the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER switchdev notification type, used
> to indicate whether the bridge is or isn't mrouter. Notify when the bridge
> changes its state, similarly to the already existing bridged port mrouter
> notifications.
> 
> The notification uses the switchdev_attr.u.mrouter boolean flag to indicate
> the current bridge mrouter status. Thus, it only indicates whether the
> bridge is currently used as an mrouter or not, and does not indicate the
> exact mrouter state of the bridge (learning, permanent, etc.).
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> v1->v2:
>  - use the timer_pending to distinguish between learning-on and
>    learning-off states
> ---
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_multicast.c | 38 +++++++++++++++++++++++++++++++++++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 

Much simpler, I like it. Thanks!

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [patch net-next v2 2/5] net: bridge: Export bridge multicast router state
From: Ivan Vecera @ 2017-10-09 10:19 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, nogahf, mlxsw, nikolay, andrew, stephen,
	nbd, roopa
In-Reply-To: <20171009091535.1315-3-jiri@resnulli.us>

On 9.10.2017 11:15, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> Add an access function that, given a bridge netdevice, returns whether the
> bridge device is currently an mrouter or not. The function uses the already
> existing br_multicast_is_router function to check that.
> 
> This function is needed in order to allow ports that join an already
> existing bridge to know the current mrouter state of the bridge device.
> Together with the bridge device mrouter ports switchdev notifications, it
> is possible to have full offloading of the semantics of the bridge device
> mcast router state.
> 
> Due to the fact that the bridge multicast router status can change in
> packet RX path, take the multicast_router bridge spinlock to protect the
> read.
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/linux/if_bridge.h |  5 +++++
>  net/bridge/br_multicast.c | 12 ++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 316ee11..02639eb 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -64,6 +64,7 @@ int br_multicast_list_adjacent(struct net_device *dev,
>  bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
> +bool br_multicast_router(const struct net_device *dev);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>  					     struct list_head *br_ip_list)
> @@ -84,6 +85,10 @@ static inline bool br_multicast_enabled(const struct net_device *dev)
>  {
>  	return false;
>  }
> +static inline bool br_multicast_router(const struct net_device *dev)
> +{
> +	return false;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index bd50550..7947e04 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -2216,6 +2216,18 @@ bool br_multicast_enabled(const struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(br_multicast_enabled);
>  
> +bool br_multicast_router(const struct net_device *dev)
> +{
> +	struct net_bridge *br = netdev_priv(dev);
> +	bool is_router;
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	is_router = br_multicast_is_router(br);
> +	spin_unlock_bh(&br->multicast_lock);
> +	return is_router;
> +}
> +EXPORT_SYMBOL_GPL(br_multicast_router);
> +
>  int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
>  {
>  	unsigned long max_delay;
> 

Reviewed by: Ivan Vecera <ivecera@redhat.com>

^ permalink raw reply

* Re: [patch net-next v2 1/5] net: bridge: Notify on bridge device mrouter state changes
From: Ivan Vecera @ 2017-10-09 10:19 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, nogahf, mlxsw, nikolay, andrew, stephen,
	nbd, roopa
In-Reply-To: <20171009091535.1315-2-jiri@resnulli.us>

On 9.10.2017 11:15, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> Add the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER switchdev notification type, used
> to indicate whether the bridge is or isn't mrouter. Notify when the bridge
> changes its state, similarly to the already existing bridged port mrouter
> notifications.
> 
> The notification uses the switchdev_attr.u.mrouter boolean flag to indicate
> the current bridge mrouter status. Thus, it only indicates whether the
> bridge is currently used as an mrouter or not, and does not indicate the
> exact mrouter state of the bridge (learning, permanent, etc.).
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> v1->v2:
>  - use the timer_pending to distinguish between learning-on and
>    learning-off states
> ---
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_multicast.c | 38 +++++++++++++++++++++++++++++++++++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d767b79..d756fbe 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -51,6 +51,7 @@ enum switchdev_attr_id {
>  	SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
>  	SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
>  	SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
> +	SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
>  };
>  
>  struct switchdev_attr {
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 8dc5c8d..bd50550 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -859,8 +859,32 @@ static void br_multicast_router_expired(unsigned long data)
>  	spin_unlock(&br->multicast_lock);
>  }
>  
> +static void br_mc_router_state_change(struct net_bridge *p,
> +				      bool is_mc_router)
> +{
> +	struct switchdev_attr attr = {
> +		.orig_dev = p->dev,
> +		.id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
> +		.flags = SWITCHDEV_F_DEFER,
> +		.u.mrouter = is_mc_router,
> +	};
> +
> +	switchdev_port_attr_set(p->dev, &attr);
> +}
> +
>  static void br_multicast_local_router_expired(unsigned long data)
>  {
> +	struct net_bridge *br = (struct net_bridge *)data;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (br->multicast_router == MDB_RTR_TYPE_DISABLED ||
> +	    br->multicast_router == MDB_RTR_TYPE_PERM ||
> +	    timer_pending(&br->multicast_router_timer))
> +		goto out;
> +
> +	br_mc_router_state_change(br, false);
> +out:
> +	spin_unlock(&br->multicast_lock);
>  }
>  
>  static void br_multicast_querier_expired(struct net_bridge *br,
> @@ -1364,9 +1388,12 @@ static void br_multicast_mark_router(struct net_bridge *br,
>  	unsigned long now = jiffies;
>  
>  	if (!port) {
> -		if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY)
> +		if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY) {
> +			if (!timer_pending(&br->multicast_router_timer))
> +				br_mc_router_state_change(br, true);
>  			mod_timer(&br->multicast_router_timer,
>  				  now + br->multicast_querier_interval);
> +		}
>  		return;
>  	}
>  
> @@ -1952,7 +1979,7 @@ void br_multicast_init(struct net_bridge *br)
>  
>  	spin_lock_init(&br->multicast_lock);
>  	setup_timer(&br->multicast_router_timer,
> -		    br_multicast_local_router_expired, 0);
> +		    br_multicast_local_router_expired, (unsigned long)br);
>  	setup_timer(&br->ip4_other_query.timer,
>  		    br_ip4_multicast_querier_expired, (unsigned long)br);
>  	setup_timer(&br->ip4_own_query.timer, br_ip4_multicast_query_expired,
> @@ -2042,9 +2069,14 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val)
>  	switch (val) {
>  	case MDB_RTR_TYPE_DISABLED:
>  	case MDB_RTR_TYPE_PERM:
> +		br_mc_router_state_change(br, val == MDB_RTR_TYPE_PERM);
>  		del_timer(&br->multicast_router_timer);
> -		/* fall through */
> +		br->multicast_router = val;
> +		err = 0;
> +		break;
>  	case MDB_RTR_TYPE_TEMP_QUERY:
> +		if (br->multicast_router != MDB_RTR_TYPE_TEMP_QUERY)
> +			br_mc_router_state_change(br, false);
>  		br->multicast_router = val;
>  		err = 0;
>  		break;
> 

Reviewed by: Ivan Vecera <ivecera@redhat.com>

^ permalink raw reply

* Re: [PATCH 41/47] netfilter: convert hook list to an array
From: Tariq Toukan @ 2017-10-09 10:04 UTC (permalink / raw)
  To: Florian Westphal, Tariq Toukan
  Cc: Pablo Neira Ayuso, netfilter-devel, Aaron Conole, davem, netdev
In-Reply-To: <20171009093132.GA7150@breakpoint.cc>



On 09/10/2017 12:31 PM, Florian Westphal wrote:
> Tariq Toukan <tariqt@mellanox.com> wrote:
>> On 04/09/2017 1:42 AM, Pablo Neira Ayuso wrote:
>>> From: Aaron Conole <aconole@bytheb.org>
>>>
>>> This converts the storage and layout of netfilter hook entries from a
>>> linked list to an array.  After this commit, hook entries will be
>>> stored adjacent in memory.  The next pointer is no longer required.
>>>
>>> The ops pointers are stored at the end of the array as they are only
>>> used in the register/unregister path and in the legacy br_netfilter code.
>>>
>>> nf_unregister_net_hooks() is slower than needed as it just calls
>>> nf_unregister_net_hook in a loop (i.e. at least n synchronize_net()
>>> calls), this will be addressed in followup patch.
>>>
>>> Test setup:
>>>   - ixgbe 10gbit
>>>   - netperf UDP_STREAM, 64 byte packets
>>>   - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):
>>> empty mangle and raw prerouting, mangle and filter input hooks:
>>> 353.9
>>> this patch:
>>> 364.2
>>>
>>> Signed-off-by: Aaron Conole <aconole@bytheb.org>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>> ---
>>
>> Hi,
>>
>> We experience a regression in server with iommu enabled.
>> After installing kernel and rebooting the server, it crashes during boot.
>> Please see trace below.
>>
>> Bisecting points to this patch.
> 
> Hmm, strange because
> 
>> [   25.907811] BUG: unable to handle kernel NULL pointer dereference at
>> 000000000000003c
>> [   25.907828] IP: _raw_read_lock_bh+0x15/0x40
> 
> ... this says that ebt_table is NULL (0x3c is the offset of the rwlock).
> 
> If you don't have that fix already, does

No, didn't have it.

> https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=e6b72ee88a56bcfe63f72e9c30766484c45bec72
> 
> netfilter: ebtables: fix race condition in frame_filter_net_init()
> 
> resolve this bug for you?
> 

Now I applied the fix and bug is resolved.

Many thanks!
Tariq

^ permalink raw reply

* [PATCH v3 net-next 12/12] qed: Add iWARP support for fpdu spanned over more than two tcp packets
From: Michal Kalderon @ 2017-10-09  9:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-rdma, dledford, Michal Kalderon, Ariel Elior
In-Reply-To: <1507541874-18344-1-git-send-email-Michal.Kalderon@cavium.com>

We continue to maintain a maximum of three buffers per fpdu, to ensure
that there are enough buffers for additional unaligned mpa packets.
To support this, if a fpdu is split over more than two tcp packets, we
use an intermediate buffer to copy the data to the previous buffer, then
we can release the data. We need an intermediate buffer as the initial
buffer partial packet could be located at the end of the packet, not
leaving room for additional data. This is a corner case, and will usually
not be the case.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 193 ++++++++++++++++++++++++++++
 drivers/net/ethernet/qlogic/qed/qed_iwarp.h |   1 +
 2 files changed, 194 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2994942..b2b1f87 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1420,6 +1420,7 @@ void qed_iwarp_resc_free(struct qed_hwfn *p_hwfn)
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tcp_cid_map, 1);
 	kfree(iwarp_info->mpa_bufs);
 	kfree(iwarp_info->partial_fpdus);
+	kfree(iwarp_info->mpa_intermediate_buf);
 }
 
 int qed_iwarp_accept(void *rdma_cxt, struct qed_iwarp_accept_in *iparams)
@@ -1762,6 +1763,11 @@ enum qed_iwarp_mpa_pkt_type {
 	"QED_IWARP_MPA_PKT_UNALIGNED"
 };
 
+static int
+qed_iwarp_recycle_pkt(struct qed_hwfn *p_hwfn,
+		      struct qed_iwarp_fpdu *fpdu,
+		      struct qed_iwarp_ll2_buff *buf);
+
 static enum qed_iwarp_mpa_pkt_type
 qed_iwarp_mpa_classify(struct qed_hwfn *p_hwfn,
 		       struct qed_iwarp_fpdu *fpdu,
@@ -1822,6 +1828,68 @@ enum qed_iwarp_mpa_pkt_type {
 	fpdu->mpa_frag_len = fpdu->fpdu_length - fpdu->incomplete_bytes;
 }
 
+static int
+qed_iwarp_cp_pkt(struct qed_hwfn *p_hwfn,
+		 struct qed_iwarp_fpdu *fpdu,
+		 struct unaligned_opaque_data *pkt_data,
+		 struct qed_iwarp_ll2_buff *buf, u16 tcp_payload_size)
+{
+	u8 *tmp_buf = p_hwfn->p_rdma_info->iwarp.mpa_intermediate_buf;
+	int rc;
+
+	/* need to copy the data from the partial packet stored in fpdu
+	 * to the new buf, for this we also need to move the data currently
+	 * placed on the buf. The assumption is that the buffer is big enough
+	 * since fpdu_length <= mss, we use an intermediate buffer since
+	 * we may need to copy the new data to an overlapping location
+	 */
+	if ((fpdu->mpa_frag_len + tcp_payload_size) > (u16)buf->buff_size) {
+		DP_ERR(p_hwfn,
+		       "MPA ALIGN: Unexpected: buffer is not large enough for split fpdu buff_size = %d mpa_frag_len = %d, tcp_payload_size = %d, incomplete_bytes = %d\n",
+		       buf->buff_size, fpdu->mpa_frag_len,
+		       tcp_payload_size, fpdu->incomplete_bytes);
+		return -EINVAL;
+	}
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+		   "MPA ALIGN Copying fpdu: [%p, %d] [%p, %d]\n",
+		   fpdu->mpa_frag_virt, fpdu->mpa_frag_len,
+		   (u8 *)(buf->data) + pkt_data->first_mpa_offset,
+		   tcp_payload_size);
+
+	memcpy(tmp_buf, fpdu->mpa_frag_virt, fpdu->mpa_frag_len);
+	memcpy(tmp_buf + fpdu->mpa_frag_len,
+	       (u8 *)(buf->data) + pkt_data->first_mpa_offset,
+	       tcp_payload_size);
+
+	rc = qed_iwarp_recycle_pkt(p_hwfn, fpdu, fpdu->mpa_buf);
+	if (rc)
+		return rc;
+
+	/* If we managed to post the buffer copy the data to the new buffer
+	 * o/w this will occur in the next round...
+	 */
+	memcpy((u8 *)(buf->data), tmp_buf,
+	       fpdu->mpa_frag_len + tcp_payload_size);
+
+	fpdu->mpa_buf = buf;
+	/* fpdu->pkt_hdr remains as is */
+	/* fpdu->mpa_frag is overridden with new buf */
+	fpdu->mpa_frag = buf->data_phys_addr;
+	fpdu->mpa_frag_virt = buf->data;
+	fpdu->mpa_frag_len += tcp_payload_size;
+
+	fpdu->incomplete_bytes -= tcp_payload_size;
+
+	DP_VERBOSE(p_hwfn,
+		   QED_MSG_RDMA,
+		   "MPA ALIGN: split fpdu buff_size = %d mpa_frag_len = %d, tcp_payload_size = %d, incomplete_bytes = %d\n",
+		   buf->buff_size, fpdu->mpa_frag_len, tcp_payload_size,
+		   fpdu->incomplete_bytes);
+
+	return 0;
+}
+
 static void
 qed_iwarp_update_fpdu_length(struct qed_hwfn *p_hwfn,
 			     struct qed_iwarp_fpdu *fpdu, u8 *mpa_data)
@@ -1843,6 +1911,90 @@ enum qed_iwarp_mpa_pkt_type {
 	}
 }
 
+#define QED_IWARP_IS_RIGHT_EDGE(_curr_pkt) \
+	(GET_FIELD((_curr_pkt)->flags,	   \
+		   UNALIGNED_OPAQUE_DATA_PKT_REACHED_WIN_RIGHT_EDGE))
+
+/* This function is used to recycle a buffer using the ll2 drop option. It
+ * uses the mechanism to ensure that all buffers posted to tx before this one
+ * were completed. The buffer sent here will be sent as a cookie in the tx
+ * completion function and can then be reposted to rx chain when done. The flow
+ * that requires this is the flow where a FPDU splits over more than 3 tcp
+ * segments. In this case the driver needs to re-post a rx buffer instead of
+ * the one received, but driver can't simply repost a buffer it copied from
+ * as there is a case where the buffer was originally a packed FPDU, and is
+ * partially posted to FW. Driver needs to ensure FW is done with it.
+ */
+static int
+qed_iwarp_recycle_pkt(struct qed_hwfn *p_hwfn,
+		      struct qed_iwarp_fpdu *fpdu,
+		      struct qed_iwarp_ll2_buff *buf)
+{
+	struct qed_ll2_tx_pkt_info tx_pkt;
+	u8 ll2_handle;
+	int rc;
+
+	memset(&tx_pkt, 0, sizeof(tx_pkt));
+	tx_pkt.num_of_bds = 1;
+	tx_pkt.tx_dest = QED_LL2_TX_DEST_DROP;
+	tx_pkt.l4_hdr_offset_w = fpdu->pkt_hdr_size >> 2;
+	tx_pkt.first_frag = fpdu->pkt_hdr;
+	tx_pkt.first_frag_len = fpdu->pkt_hdr_size;
+	buf->piggy_buf = NULL;
+	tx_pkt.cookie = buf;
+
+	ll2_handle = p_hwfn->p_rdma_info->iwarp.ll2_mpa_handle;
+
+	rc = qed_ll2_prepare_tx_packet(p_hwfn, ll2_handle, &tx_pkt, true);
+	if (rc)
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Can't drop packet rc=%d\n", rc);
+
+	DP_VERBOSE(p_hwfn,
+		   QED_MSG_RDMA,
+		   "MPA_ALIGN: send drop tx packet [%lx, 0x%x], buf=%p, rc=%d\n",
+		   (unsigned long int)tx_pkt.first_frag,
+		   tx_pkt.first_frag_len, buf, rc);
+
+	return rc;
+}
+
+static int
+qed_iwarp_win_right_edge(struct qed_hwfn *p_hwfn, struct qed_iwarp_fpdu *fpdu)
+{
+	struct qed_ll2_tx_pkt_info tx_pkt;
+	u8 ll2_handle;
+	int rc;
+
+	memset(&tx_pkt, 0, sizeof(tx_pkt));
+	tx_pkt.num_of_bds = 1;
+	tx_pkt.tx_dest = QED_LL2_TX_DEST_LB;
+	tx_pkt.l4_hdr_offset_w = fpdu->pkt_hdr_size >> 2;
+
+	tx_pkt.first_frag = fpdu->pkt_hdr;
+	tx_pkt.first_frag_len = fpdu->pkt_hdr_size;
+	tx_pkt.enable_ip_cksum = true;
+	tx_pkt.enable_l4_cksum = true;
+	tx_pkt.calc_ip_len = true;
+	/* vlan overload with enum iwarp_ll2_tx_queues */
+	tx_pkt.vlan = IWARP_LL2_ALIGNED_RIGHT_TRIMMED_TX_QUEUE;
+
+	ll2_handle = p_hwfn->p_rdma_info->iwarp.ll2_mpa_handle;
+
+	rc = qed_ll2_prepare_tx_packet(p_hwfn, ll2_handle, &tx_pkt, true);
+	if (rc)
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Can't send right edge rc=%d\n", rc);
+	DP_VERBOSE(p_hwfn,
+		   QED_MSG_RDMA,
+		   "MPA_ALIGN: Sent right edge FPDU num_bds=%d [%lx, 0x%x], rc=%d\n",
+		   tx_pkt.num_of_bds,
+		   (unsigned long int)tx_pkt.first_frag,
+		   tx_pkt.first_frag_len, rc);
+
+	return rc;
+}
+
 static int
 qed_iwarp_send_fpdu(struct qed_hwfn *p_hwfn,
 		    struct qed_iwarp_fpdu *fpdu,
@@ -1971,6 +2123,20 @@ enum qed_iwarp_mpa_pkt_type {
 					    mpa_buf->tcp_payload_len,
 					    mpa_buf->placement_offset);
 
+			if (!QED_IWARP_IS_RIGHT_EDGE(curr_pkt)) {
+				mpa_buf->tcp_payload_len = 0;
+				break;
+			}
+
+			rc = qed_iwarp_win_right_edge(p_hwfn, fpdu);
+
+			if (rc) {
+				DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+					   "Can't send FPDU:reset rc=%d\n", rc);
+				memset(fpdu, 0, sizeof(*fpdu));
+				break;
+			}
+
 			mpa_buf->tcp_payload_len = 0;
 			break;
 		case QED_IWARP_MPA_PKT_PACKED:
@@ -1994,6 +2160,28 @@ enum qed_iwarp_mpa_pkt_type {
 			break;
 		case QED_IWARP_MPA_PKT_UNALIGNED:
 			qed_iwarp_update_fpdu_length(p_hwfn, fpdu, mpa_data);
+			if (mpa_buf->tcp_payload_len < fpdu->incomplete_bytes) {
+				/* special handling of fpdu split over more
+				 * than 2 segments
+				 */
+				if (QED_IWARP_IS_RIGHT_EDGE(curr_pkt)) {
+					rc = qed_iwarp_win_right_edge(p_hwfn,
+								      fpdu);
+					/* packet will be re-processed later */
+					if (rc)
+						return rc;
+				}
+
+				rc = qed_iwarp_cp_pkt(p_hwfn, fpdu, curr_pkt,
+						      buf,
+						      mpa_buf->tcp_payload_len);
+				if (rc) /* packet will be re-processed later */
+					return rc;
+
+				mpa_buf->tcp_payload_len = 0;
+				break;
+			}
+
 			rc = qed_iwarp_send_fpdu(p_hwfn, fpdu, curr_pkt, buf,
 						 mpa_buf->tcp_payload_len,
 						 pkt_type);
@@ -2510,6 +2698,11 @@ static int qed_iwarp_ll2_stop(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 		goto err;
 
 	iwarp_info->max_num_partial_fpdus = (u16)p_hwfn->p_rdma_info->num_qps;
+
+	iwarp_info->mpa_intermediate_buf = kzalloc(mpa_buff_size, GFP_KERNEL);
+	if (!iwarp_info->mpa_intermediate_buf)
+		goto err;
+
 	/* The mpa_bufs array serves for pending RX packets received on the
 	 * mpa ll2 that don't have place on the tx ring and require later
 	 * processing. We can't fail on allocation of such a struct therefore
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.h b/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
index c58793a..c1ecd74 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
@@ -107,6 +107,7 @@ struct qed_iwarp_info {
 	enum mpa_rtr_type rtr_type;
 	struct qed_iwarp_fpdu *partial_fpdus;
 	struct qed_iwarp_ll2_mpa_buf *mpa_bufs;
+	u8 *mpa_intermediate_buf;
 	u16 max_num_partial_fpdus;
 };
 
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox