* [PATCH RFT v2 2/2] macb: kill PHY reset code
From: Sergei Shtylyov @ 2016-04-28 22:15 UTC (permalink / raw)
To: netdev, nicolas.ferre; +Cc: linux-kernel
In-Reply-To: <19240252.825Vlx6L0H@wasted.cogentembedded.com>
With the 'phylib' now being aware of the "reset-gpios" PHY node property,
there should be no need to frob the PHY reset in this driver anymore...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
---
drivers/net/ethernet/cadence/macb.c | 17 -----------------
drivers/net/ethernet/cadence/macb.h | 1 -
2 files changed, 18 deletions(-)
Index: net-next/drivers/net/ethernet/cadence/macb.c
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb.c
+++ net-next/drivers/net/ethernet/cadence/macb.c
@@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
= macb_clk_init;
int (*init)(struct platform_device *) = macb_init;
struct device_node *np = pdev->dev.of_node;
- struct device_node *phy_node;
const struct macb_config *macb_config = NULL;
struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
unsigned int queue_mask, num_queues;
@@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
else
macb_get_hwaddr(bp);
- /* Power up the PHY if there is a GPIO reset */
- phy_node = of_get_next_available_child(np, NULL);
- if (phy_node) {
- int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-
- if (gpio_is_valid(gpio)) {
- bp->reset_gpio = gpio_to_desc(gpio);
- gpiod_direction_output(bp->reset_gpio, 1);
- }
- }
- of_node_put(phy_node);
-
err = of_get_phy_mode(np);
if (err < 0) {
pdata = dev_get_platdata(&pdev->dev);
@@ -3054,10 +3041,6 @@ static int macb_remove(struct platform_d
mdiobus_unregister(bp->mii_bus);
mdiobus_free(bp->mii_bus);
- /* Shutdown the PHY if there is a GPIO reset */
- if (bp->reset_gpio)
- gpiod_set_value(bp->reset_gpio, 0);
-
unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);
clk_disable_unprepare(bp->hclk);
Index: net-next/drivers/net/ethernet/cadence/macb.h
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb.h
+++ net-next/drivers/net/ethernet/cadence/macb.h
@@ -832,7 +832,6 @@ struct macb {
unsigned int dma_burst_length;
phy_interface_t phy_interface;
- struct gpio_desc *reset_gpio;
/* AT91RM9200 transmit */
struct sk_buff *skb; /* holds skb until xmit interrupt completes */
^ permalink raw reply
* [PATCH RFT 1/2] phylib: add device reset GPIO support
From: Sergei Shtylyov @ 2016-04-28 22:12 UTC (permalink / raw)
To: grant.likely, robh+dt, devicetree, f.fainelli, netdev,
frowand.list, pawel.moll, mark.rutland, ijc+devicetree, galak
Cc: linux-kernel
In-Reply-To: <81129033.NXiOLTg1so@wasted.cogentembedded.com>
The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...
Note that I had to modify the AT803x PHY driver as it would stop working
otherwise as it made use of the reset GPIO for its own purposes...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
---
Changes in version 2:
- reformatted the changelog;
- resolved rejects, refreshed the patch.
Documentation/devicetree/bindings/net/phy.txt | 2 +
Documentation/devicetree/bindings/net/phy.txt | 2 +
drivers/net/phy/at803x.c | 19 ++------------
drivers/net/phy/mdio_bus.c | 4 +++
drivers/net/phy/mdio_device.c | 27 +++++++++++++++++++--
drivers/net/phy/phy_device.c | 33 ++++++++++++++++++++++++--
drivers/of/of_mdio.c | 16 ++++++++++++
include/linux/mdio.h | 3 ++
include/linux/phy.h | 5 +++
8 files changed, 89 insertions(+), 20 deletions(-)
Index: net-next/Documentation/devicetree/bindings/net/phy.txt
===================================================================
--- net-next.orig/Documentation/devicetree/bindings/net/phy.txt
+++ net-next/Documentation/devicetree/bindings/net/phy.txt
@@ -35,6 +35,8 @@ Optional Properties:
- broken-turn-around: If set, indicates the PHY device does not correctly
release the turn around line low at the end of a MDIO transaction.
+- reset-gpios: The GPIO phandle and specifier for the PHY reset signal.
+
Example:
ethernet-phy@0 {
Index: net-next/drivers/net/phy/at803x.c
===================================================================
--- net-next.orig/drivers/net/phy/at803x.c
+++ net-next/drivers/net/phy/at803x.c
@@ -65,7 +65,6 @@ MODULE_LICENSE("GPL");
struct at803x_priv {
bool phy_reset:1;
- struct gpio_desc *gpiod_reset;
};
struct at803x_context {
@@ -271,22 +270,10 @@ static int at803x_probe(struct phy_devic
{
struct device *dev = &phydev->mdio.dev;
struct at803x_priv *priv;
- struct gpio_desc *gpiod_reset;
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
-
- if (phydev->drv->phy_id != ATH8030_PHY_ID)
- goto does_not_require_reset_workaround;
-
- gpiod_reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
- if (IS_ERR(gpiod_reset))
- return PTR_ERR(gpiod_reset);
-
- priv->gpiod_reset = gpiod_reset;
-
-does_not_require_reset_workaround:
phydev->priv = priv;
return 0;
@@ -361,14 +348,14 @@ static void at803x_link_change_notify(st
*/
if (phydev->drv->phy_id == ATH8030_PHY_ID) {
if (phydev->state == PHY_NOLINK) {
- if (priv->gpiod_reset && !priv->phy_reset) {
+ if (phydev->mdio.reset && !priv->phy_reset) {
struct at803x_context context;
at803x_context_save(phydev, &context);
- gpiod_set_value(priv->gpiod_reset, 1);
+ phy_device_reset(phydev, 1);
msleep(1);
- gpiod_set_value(priv->gpiod_reset, 0);
+ phy_device_reset(phydev, 0);
msleep(1);
at803x_context_restore(phydev, &context);
Index: net-next/drivers/net/phy/mdio_bus.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_bus.c
+++ net-next/drivers/net/phy/mdio_bus.c
@@ -35,6 +35,7 @@
#include <linux/phy.h>
#include <linux/io.h>
#include <linux/uaccess.h>
+#include <linux/gpio/consumer.h>
#include <asm/irq.h>
@@ -371,6 +372,9 @@ void mdiobus_unregister(struct mii_bus *
if (!mdiodev)
continue;
+ if (mdiodev->reset)
+ gpiod_put(mdiodev->reset);
+
mdiodev->device_remove(mdiodev);
mdiodev->device_free(mdiodev);
}
Index: net-next/drivers/net/phy/mdio_device.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_device.c
+++ net-next/drivers/net/phy/mdio_device.c
@@ -12,6 +12,8 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/errno.h>
+#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
@@ -103,6 +105,13 @@ void mdio_device_remove(struct mdio_devi
}
EXPORT_SYMBOL(mdio_device_remove);
+void mdio_device_reset(struct mdio_device *mdiodev, int value)
+{
+ if (mdiodev->reset)
+ gpiod_set_value(mdiodev->reset, value);
+}
+EXPORT_SYMBOL(mdio_device_reset);
+
/**
* mdio_probe - probe an MDIO device
* @dev: device to probe
@@ -117,9 +126,16 @@ static int mdio_probe(struct device *dev
struct mdio_driver *mdiodrv = to_mdio_driver(drv);
int err = 0;
- if (mdiodrv->probe)
+ if (mdiodrv->probe) {
+ /* Deassert the reset signal */
+ mdio_device_reset(mdiodev, 0);
+
err = mdiodrv->probe(mdiodev);
+ /* Assert the reset signal */
+ mdio_device_reset(mdiodev, 1);
+ }
+
return err;
}
@@ -129,9 +145,16 @@ static int mdio_remove(struct device *de
struct device_driver *drv = mdiodev->dev.driver;
struct mdio_driver *mdiodrv = to_mdio_driver(drv);
- if (mdiodrv->remove)
+ if (mdiodrv->remove) {
+ /* Deassert the reset signal */
+ mdio_device_reset(mdiodev, 0);
+
mdiodrv->remove(mdiodev);
+ /* Assert the reset signal */
+ mdio_device_reset(mdiodev, 1);
+ }
+
return 0;
}
Index: net-next/drivers/net/phy/phy_device.c
===================================================================
--- net-next.orig/drivers/net/phy/phy_device.c
+++ net-next/drivers/net/phy/phy_device.c
@@ -589,6 +589,9 @@ int phy_device_register(struct phy_devic
if (err)
return err;
+ /* Deassert the reset signal */
+ phy_device_reset(phydev, 0);
+
/* Run all of the fixups for this PHY */
err = phy_scan_fixups(phydev);
if (err) {
@@ -604,9 +607,15 @@ int phy_device_register(struct phy_devic
goto out;
}
+ /* Assert the reset signal */
+ phy_device_reset(phydev, 1);
+
return 0;
out:
+ /* Assert the reset signal */
+ phy_device_reset(phydev, 1);
+
mdiobus_unregister_device(&phydev->mdio);
return err;
}
@@ -792,6 +801,9 @@ int phy_init_hw(struct phy_device *phyde
{
int ret = 0;
+ /* Deassert the reset signal */
+ phy_device_reset(phydev, 0);
+
if (!phydev->drv || !phydev->drv->config_init)
return 0;
@@ -997,6 +1009,9 @@ void phy_detach(struct phy_device *phyde
put_device(&phydev->mdio.dev);
module_put(bus->owner);
+
+ /* Assert the reset signal */
+ phy_device_reset(phydev, 1);
}
EXPORT_SYMBOL(phy_detach);
@@ -1596,9 +1611,16 @@ static int phy_probe(struct device *dev)
/* Set the state to READY by default */
phydev->state = PHY_READY;
- if (phydev->drv->probe)
+ if (phydev->drv->probe) {
+ /* Deassert the reset signal */
+ phy_device_reset(phydev, 0);
+
err = phydev->drv->probe(phydev);
+ /* Assert the reset signal */
+ phy_device_reset(phydev, 1);
+ }
+
mutex_unlock(&phydev->lock);
return err;
@@ -1612,8 +1634,15 @@ static int phy_remove(struct device *dev
phydev->state = PHY_DOWN;
mutex_unlock(&phydev->lock);
- if (phydev->drv->remove)
+ if (phydev->drv->remove) {
+ /* Deassert the reset signal */
+ phy_device_reset(phydev, 0);
+
phydev->drv->remove(phydev);
+
+ /* Assert the reset signal */
+ phy_device_reset(phydev, 1);
+ }
phydev->drv = NULL;
return 0;
Index: net-next/drivers/of/of_mdio.c
===================================================================
--- net-next.orig/drivers/of/of_mdio.c
+++ net-next/drivers/of/of_mdio.c
@@ -44,6 +44,7 @@ static int of_get_phy_id(struct device_n
static void of_mdiobus_register_phy(struct mii_bus *mdio,
struct device_node *child, u32 addr)
{
+ struct gpio_desc *gpiod;
struct phy_device *phy;
bool is_c45;
int rc;
@@ -52,10 +53,17 @@ static void of_mdiobus_register_phy(stru
is_c45 = of_device_is_compatible(child,
"ethernet-phy-ieee802.3-c45");
+ gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios");
+ /* Deassert the reset signal */
+ if (!IS_ERR(gpiod))
+ gpiod_direction_output(gpiod, 0);
if (!is_c45 && !of_get_phy_id(child, &phy_id))
phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
else
phy = get_phy_device(mdio, addr, is_c45);
+ /* Assert the reset signal again */
+ if (!IS_ERR(gpiod))
+ gpiod_set_value(gpiod, 1);
if (IS_ERR(phy))
return;
@@ -75,6 +83,9 @@ static void of_mdiobus_register_phy(stru
of_node_get(child);
phy->mdio.dev.of_node = child;
+ if (!IS_ERR(gpiod))
+ phy->mdio.reset = gpiod;
+
/* All data is now stored in the phy struct;
* register it */
rc = phy_device_register(phy);
@@ -92,6 +103,7 @@ static void of_mdiobus_register_device(s
struct device_node *child, u32 addr)
{
struct mdio_device *mdiodev;
+ struct gpio_desc *gpiod;
int rc;
mdiodev = mdio_device_create(mdio, addr);
@@ -104,6 +116,10 @@ static void of_mdiobus_register_device(s
of_node_get(child);
mdiodev->dev.of_node = child;
+ gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios");
+ if (!IS_ERR(gpiod))
+ mdiodev->reset = gpiod;
+
/* All data is now stored in the mdiodev struct; register it. */
rc = mdio_device_register(mdiodev);
if (rc) {
Index: net-next/include/linux/mdio.h
===================================================================
--- net-next.orig/include/linux/mdio.h
+++ net-next/include/linux/mdio.h
@@ -11,6 +11,7 @@
#include <uapi/linux/mdio.h>
+struct gpio_desc;
struct mii_bus;
/* Multiple levels of nesting are possible. However typically this is
@@ -37,6 +38,7 @@ struct mdio_device {
/* Bus address of the MDIO device (0-31) */
int addr;
int flags;
+ struct gpio_desc *reset;
};
#define to_mdio_device(d) container_of(d, struct mdio_device, dev)
@@ -69,6 +71,7 @@ void mdio_device_free(struct mdio_device
struct mdio_device *mdio_device_create(struct mii_bus *bus, int addr);
int mdio_device_register(struct mdio_device *mdiodev);
void mdio_device_remove(struct mdio_device *mdiodev);
+void mdio_device_reset(struct mdio_device *mdiodev, int value);
int mdio_driver_register(struct mdio_driver *drv);
void mdio_driver_unregister(struct mdio_driver *drv);
Index: net-next/include/linux/phy.h
===================================================================
--- net-next.orig/include/linux/phy.h
+++ net-next/include/linux/phy.h
@@ -769,6 +769,11 @@ static inline int phy_read_status(struct
return phydev->drv->read_status(phydev);
}
+static inline void phy_device_reset(struct phy_device *phydev, int value)
+{
+ mdio_device_reset(&phydev->mdio, value);
+}
+
#define phydev_err(_phydev, format, args...) \
dev_err(&_phydev->mdio.dev, format, ##args)
^ permalink raw reply
* Re: [PATCH net-next] of: of_mdio: Check if MDIO bus controller is available
From: Andrew Lunn @ 2016-04-28 22:12 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
nathan.sullivan-acOepvfBmUk, nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
Rob Herring, Frank Rowand, Grant Likely,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE, open list
In-Reply-To: <1461880510-27132-1-git-send-email-f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Thu, Apr 28, 2016 at 02:55:10PM -0700, Florian Fainelli wrote:
> Add a check whether the 'struct device_node' pointer passed to
> of_mdiobus_register() is an available (aka enabled) node in the Device
> Tree.
>
> Rationale for doing this are cases where an Ethernet MAC provides a MDIO
> bus controller and node, and an additional Ethernet MAC might be
> connecting its PHY/switches to that first MDIO bus controller, while
> still embedding one internally which is therefore marked as "disabled".
>
> Instead of sprinkling checks like these in callers of
> of_mdiobus_register(), do this in a central location.
I think this discussion has shown there is no documented best
practices for MDIO bus drivers and how PHYs nodes are placed within
device tree. Maybe you could document the generic MDIO binding, both
as integrated into a MAC device node, and as a separate device?
Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH RFT v2 0/2] Teach phylib hard-resetting devices
From: Sergei Shtylyov @ 2016-04-28 22:09 UTC (permalink / raw)
To: grant.likely-QSEj5FYQhm4dnm+yROfE0A,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A, devicetree-u79uwXL29TY76Z2rM5mHXA,
f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
frowand.list-Re5JQEeQqe8AvxtiuMwx3w, pawel.moll-5wv7dgnIgG8,
mark.rutland-5wv7dgnIgG8, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
galak-sgV2jX0FEOL9JmXXK+q4OQ
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hello.
Here's the set of 2 patches against DaveM's 'net-next.git' repo. They add to
the 'phylib' support for resetting devices via GPIO and do some clean up after
doing that...
[1/2] phylib: add device reset GPIO support
[2/2] macb: kill PHY reset code
MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] of: of_mdio: Check if MDIO bus controller is available
From: Andrew Lunn @ 2016-04-28 22:07 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
nathan.sullivan-acOepvfBmUk, nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
Rob Herring, Frank Rowand, Grant Likely,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE, open list
In-Reply-To: <1461880510-27132-1-git-send-email-f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Thu, Apr 28, 2016 at 02:55:10PM -0700, Florian Fainelli wrote:
> Add a check whether the 'struct device_node' pointer passed to
> of_mdiobus_register() is an available (aka enabled) node in the Device
> Tree.
>
> Rationale for doing this are cases where an Ethernet MAC provides a MDIO
> bus controller and node, and an additional Ethernet MAC might be
> connecting its PHY/switches to that first MDIO bus controller, while
> still embedding one internally which is therefore marked as "disabled".
>
> Instead of sprinkling checks like these in callers of
> of_mdiobus_register(), do this in a central location.
>
> Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>
> ---
> drivers/of/of_mdio.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index b622b33dbf93..2f497790be1b 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -209,6 +209,10 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
> bool scanphys = false;
> int addr, rc;
>
> + /* Do not continue if the node is disabled */
> + if (!of_device_is_available(np))
> + return -EINVAL;
Could be bike shedding, but would ENODEV be better?
Some callers are going to have to look at the return value and decide
if it is a fatal error, and fail the whole probe, or a non-fatal error
and they should keep going. ENODEV seems less fatal...
Other than that,
Reviewed-by: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH 2/5] mlx5: Add support for UDP tunnel segmentation with outer checksum offload
From: Alexander Duyck @ 2016-04-28 22:04 UTC (permalink / raw)
To: Matthew Finlay
Cc: Saeed Mahameed, Alexander Duyck, Eugenia Emantayev, Bruce W Allan,
Saeed Mahameed, Linux Netdev List, intel-wired-lan, Ariel Elior,
Michael Chan
In-Reply-To: <800B6BE4-251F-4AA5-92FC-3A9393E7981E@mellanox.com>
On Thu, Apr 28, 2016 at 2:43 PM, Matthew Finlay <Matt@mellanox.com> wrote:
>
>
>
>
>
> On 4/20/16, 11:06 AM, "Alexander Duyck" <alexander.duyck@gmail.com> wrote:
>
>>On Wed, Apr 20, 2016 at 10:40 AM, Saeed Mahameed
>><saeedm@dev.mellanox.co.il> wrote:
>>> On Tue, Apr 19, 2016 at 10:06 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
>>>> This patch assumes that the mlx5 hardware will ignore existing IPv4/v6
>>>> header fields for length and checksum as well as the length and checksum
>>>> fields for outer UDP headers.
>>>>
>>>> I have no means of testing this as I do not have any mlx5 hardware but
>>>> thought I would submit it as an RFC to see if anyone out there wants to
>>>> test this and see if this does in fact enable this functionality allowing
>>>> us to to segment UDP tunneled frames that have an outer checksum.
>>>>
>>>> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++++-
>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> index e0adb604f461..57d8da796d50 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> @@ -2390,13 +2390,18 @@ static void mlx5e_build_netdev(struct net_device *netdev)
>>>> netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
>>>>
>>>> if (mlx5e_vxlan_allowed(mdev)) {
>>>> - netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
>>>> + netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
>>>> + NETIF_F_GSO_UDP_TUNNEL_CSUM |
>>>> + NETIF_F_GSO_PARTIAL;
>>>> netdev->hw_enc_features |= NETIF_F_IP_CSUM;
>>>> netdev->hw_enc_features |= NETIF_F_RXCSUM;
>>>> netdev->hw_enc_features |= NETIF_F_TSO;
>>>> netdev->hw_enc_features |= NETIF_F_TSO6;
>>>> netdev->hw_enc_features |= NETIF_F_RXHASH;
>>>> netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
>>>> + netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM |
>>>> + NETIF_F_GSO_PARTIAL;
>>>> + netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
>>>> }
>>>>
>>>> netdev->features = netdev->hw_features;
>>>>
>>>
>>> Hi Alex,
>>>
>>> Adding Matt, VxLAN feature owner from Mellanox,
>>> Matt please correct me if am wrong, but We already tested GSO VxLAN
>>> and we saw the TCP/IP checksum offloads for both inner and outer
>>> headers handled by the hardware.
>>>
>>> And looking at mlx5e_sq_xmit:
>>>
>>> if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
>>> eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
>>> if (skb->encapsulation) {
>>> eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
>>> MLX5_ETH_WQE_L4_INNER_CSUM;
>>> sq->stats.csum_offload_inner++;
>>> } else {
>>> eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
>>> }
>>>
>>> We enable inner/outer hardware checksumming unconditionally without
>>> looking at the features Alex is suggesting in this patch,
>>> Alex, can you elaborate more on the meaning of those features ? and
>>> why would it work for us without declaring them ?
>>
>>Well right now the feature list exposed by the device indicates that
>>TSO is not used if a VxLAN tunnel has a checksum in an outer header.
>>Since that is not exposed currently that is completely offloaded in
>>software via GSO.
>
> The mlx5 hardware requires the outer UDP checksum is not set when offloading encapsulated packets.
The Intel documentation said the same thing. That was due to the fact
that the hardware didn't computer the outer UDP header checksum. I
suspect the Mellanox hardware has the same issue. Also I have tested
on a ConnectX-4 board with the latest firmware and what I am seeing is
that with my patches applied the outer checksum is being correctly
applied for segmentation offloads.
My thought is that that the hardware appears to ignore the UDP
checksum so if it is non-zero you cannot guarantee the checksum would
be correct on the last frame if it is a different size than the rest
of the segments. In the case of these patches that issue has been
resolved as I have precomputed the UDP checksum for the outer UDP
header and all of the segments will be the same length so there should
be no variation in the UDP checksum of the outer header. Unless you
can tell my exactly the reason why we cannot provide the outer UDP
checksum I would assume that the reason is due to the fact that the
hardware doesn't compute it so you cannot handle a fragment on the end
which is resolved already via GSO_PARTIAL.
- Alex
^ permalink raw reply
* Re: [PATCH v2 net] soreuseport: Fix TCP listener hash collision
From: Craig Gallek @ 2016-04-28 22:03 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
On Thu, Apr 28, 2016 at 5:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-04-28 at 17:07 -0400, Craig Gallek wrote:
>> From: Craig Gallek <kraig@google.com>
>>
>> I forgot to include a check for listener port equality when deciding
>> if two sockets should belong to the same reuseport group. This was
>> not caught previously because it's only necessary when two listening
>> sockets for the same user happen to hash to the same listener bucket.
>> This change also includes a check for network namespace equality.
>> The same error does not exist in the UDP path.
>>
>> Fixes: c125e80b8868("soreuseport: fast reuseport TCP socket selection")
>> Signed-off-by: Craig Gallek <kraig@google.com>
>> ---
>> v2 Changes
>> - Suggestions from Eric Dumazet to include network namespace equality
>> check and to avoid a dreference by simply checking inet_bind_bucket
>> pointer equality.
>> ---
>> net/ipv4/inet_hashtables.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>> index bc68eced0105..5c5658268d5e 100644
>> --- a/net/ipv4/inet_hashtables.c
>> +++ b/net/ipv4/inet_hashtables.c
>> @@ -470,15 +470,19 @@ static int inet_reuseport_add_sock(struct sock *sk,
>> const struct sock *sk2,
>> bool match_wildcard))
>> {
>> + struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash;
>> + struct net *net = sock_net(sk);
>> struct sock *sk2;
>> struct hlist_nulls_node *node;
>> kuid_t uid = sock_i_uid(sk);
>>
>> sk_nulls_for_each_rcu(sk2, node, &ilb->head) {
>> - if (sk2 != sk &&
>> + if (net_eq(sock_net(sk2), net) &&
>> + sk2 != sk &&
>> sk2->sk_family == sk->sk_family &&
>> ipv6_only_sock(sk2) == ipv6_only_sock(sk) &&
>> sk2->sk_bound_dev_if == sk->sk_bound_dev_if &&
>> + inet_csk(sk2)->icsk_bind_hash == tb &&
>> sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) &&
>> saddr_same(sk, sk2, false))
>> return reuseport_add_sock(sk, sk2);
>
> Note that I suggested to only use "inet_csk(sk2)->icsk_bind_hash == tb"
>
> If test is true, it means that sockets share same name space and same
> port ;)
>
> Therefore the added net_eq(sock_net(sk2), net) test is redundant.
Thanks for the quick review Eric, sorry I miss read :\ I'll send a v3...
> No strong opinion, as this patch works, and this is not fast path
> anyway.
>
> Acked-by: Eric Dumazet <edumazet@google.com>
>
>
^ permalink raw reply
* Re: [PATCH net-next] tcp: give prequeue mode some care
From: Eric Dumazet @ 2016-04-28 22:00 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20160428.171514.1303373912379094235.davem@davemloft.net>
On Thu, 2016-04-28 at 17:15 -0400, David Miller wrote:
> There was a conflict due to the stats macro renaming, but that was trivial
> to resolve so I did it.
>
> Applied, thanks Eric.
Ah great, I was preparing a V2, you were fast David.
Thanks
^ permalink raw reply
* Re: [PATCH v2 net] soreuseport: Fix TCP listener hash collision
From: Eric Dumazet @ 2016-04-28 21:59 UTC (permalink / raw)
To: Craig Gallek; +Cc: davem, netdev
In-Reply-To: <1461877661-21282-1-git-send-email-kraigatgoog@gmail.com>
On Thu, 2016-04-28 at 17:07 -0400, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> I forgot to include a check for listener port equality when deciding
> if two sockets should belong to the same reuseport group. This was
> not caught previously because it's only necessary when two listening
> sockets for the same user happen to hash to the same listener bucket.
> This change also includes a check for network namespace equality.
> The same error does not exist in the UDP path.
>
> Fixes: c125e80b8868("soreuseport: fast reuseport TCP socket selection")
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
> v2 Changes
> - Suggestions from Eric Dumazet to include network namespace equality
> check and to avoid a dreference by simply checking inet_bind_bucket
> pointer equality.
> ---
> net/ipv4/inet_hashtables.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index bc68eced0105..5c5658268d5e 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -470,15 +470,19 @@ static int inet_reuseport_add_sock(struct sock *sk,
> const struct sock *sk2,
> bool match_wildcard))
> {
> + struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash;
> + struct net *net = sock_net(sk);
> struct sock *sk2;
> struct hlist_nulls_node *node;
> kuid_t uid = sock_i_uid(sk);
>
> sk_nulls_for_each_rcu(sk2, node, &ilb->head) {
> - if (sk2 != sk &&
> + if (net_eq(sock_net(sk2), net) &&
> + sk2 != sk &&
> sk2->sk_family == sk->sk_family &&
> ipv6_only_sock(sk2) == ipv6_only_sock(sk) &&
> sk2->sk_bound_dev_if == sk->sk_bound_dev_if &&
> + inet_csk(sk2)->icsk_bind_hash == tb &&
> sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) &&
> saddr_same(sk, sk2, false))
> return reuseport_add_sock(sk, sk2);
Note that I suggested to only use "inet_csk(sk2)->icsk_bind_hash == tb"
If test is true, it means that sockets share same name space and same
port ;)
Therefore the added net_eq(sock_net(sk2), net) test is redundant.
No strong opinion, as this patch works, and this is not fast path
anyway.
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* [PATCH net-next] of: of_mdio: Check if MDIO bus controller is available
From: Florian Fainelli @ 2016-04-28 21:55 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, andrew-g2DYL2Zd6BY,
nathan.sullivan-acOepvfBmUk, nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
Florian Fainelli, Rob Herring, Frank Rowand, Grant Likely,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE, open list
Add a check whether the 'struct device_node' pointer passed to
of_mdiobus_register() is an available (aka enabled) node in the Device
Tree.
Rationale for doing this are cases where an Ethernet MAC provides a MDIO
bus controller and node, and an additional Ethernet MAC might be
connecting its PHY/switches to that first MDIO bus controller, while
still embedding one internally which is therefore marked as "disabled".
Instead of sprinkling checks like these in callers of
of_mdiobus_register(), do this in a central location.
Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
drivers/of/of_mdio.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index b622b33dbf93..2f497790be1b 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -209,6 +209,10 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
bool scanphys = false;
int addr, rc;
+ /* Do not continue if the node is disabled */
+ if (!of_device_is_available(np))
+ return -EINVAL;
+
/* Mask out all PHYs from auto probing. Instead the PHYs listed in
* the device tree are populated after the bus has been registered */
mdio->phy_mask = ~0;
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH] net: davinci_mdio: Set of_node in the mdio bus
From: David Miller @ 2016-04-28 21:48 UTC (permalink / raw)
To: Linux.HWI
Cc: linux-kernel, netdev, Grygorii.Strashko, jay.schroeder,
ben.mccauley
In-Reply-To: <57228318.90600@garmin.com>
From: "J.D. Schroeder" <Linux.HWI@garmin.com>
Date: Thu, 28 Apr 2016 16:39:36 -0500
> On 04/28/2016 02:44 PM, David Miller wrote:
>>> --- a/drivers/net/ethernet/ti/davinci_mdio.c
>>> +++ b/drivers/net/ethernet/ti/davinci_mdio.c
>>> @@ -343,6 +343,7 @@ static int davinci_mdio_probe(struct platform_device *pdev)
>>> if (davinci_mdio_probe_dt(&data->pdata, pdev))
>>> data->pdata = default_pdata;
>>> snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s", pdev->name);
>>> + data->bus->dev.of_node = dev->of_node;
>>> } else {
>>> data->pdata = pdata ? (*pdata) : default_pdata;
>>> snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s-%x",
>>
>> You can't do this.
>>
>> First of all, of_node objects are reference counted. So even if this was a
>> legal thing to do you would have to drop the reference to the existing of_node
>> pointer and gain a reference to dev->of_node.
>>
>> But even more importantly, it is the job of the bus driver to set that
>> bus->dev.of_node correctly, you should never override it in a driver like
>> this.
>
> David, thanks for your review. I understand your point about the reference count.
>
> One thing to note is that it is always null for the davinci mdio bus when
> going through this path. I'm not trying to override it. I'm trying to make
> sure it has a way to find the davinci mdio bus. Do you see the problem I'm
> trying to solve?
>
> Is there another way to be able to make the of_mdio_find_bus() call be able to
> find the davinci mdio bus?
You should reference the device which actually has an OF node attached with it,
rather than pretending that the MDIO bus does.
Don't fake this stuff, reference the proper device to get the OF node.
Thanks.
^ permalink raw reply
* Re: [PATCH] net: davinci_mdio: Set of_node in the mdio bus
From: J.D. Schroeder @ 2016-04-28 21:39 UTC (permalink / raw)
To: David Miller
Cc: linux-kernel, netdev, Grygorii.Strashko, jay.schroeder,
ben.mccauley
In-Reply-To: <20160428.154410.1087934312951322476.davem@davemloft.net>
On 04/28/2016 02:44 PM, David Miller wrote:
>> --- a/drivers/net/ethernet/ti/davinci_mdio.c
>> +++ b/drivers/net/ethernet/ti/davinci_mdio.c
>> @@ -343,6 +343,7 @@ static int davinci_mdio_probe(struct platform_device *pdev)
>> if (davinci_mdio_probe_dt(&data->pdata, pdev))
>> data->pdata = default_pdata;
>> snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s", pdev->name);
>> + data->bus->dev.of_node = dev->of_node;
>> } else {
>> data->pdata = pdata ? (*pdata) : default_pdata;
>> snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s-%x",
>
> You can't do this.
>
> First of all, of_node objects are reference counted. So even if this was a
> legal thing to do you would have to drop the reference to the existing of_node
> pointer and gain a reference to dev->of_node.
>
> But even more importantly, it is the job of the bus driver to set that
> bus->dev.of_node correctly, you should never override it in a driver like
> this.
David, thanks for your review. I understand your point about the reference count.
One thing to note is that it is always null for the davinci mdio bus when
going through this path. I'm not trying to override it. I'm trying to make
sure it has a way to find the davinci mdio bus. Do you see the problem I'm
trying to solve?
Is there another way to be able to make the of_mdio_find_bus() call be able to
find the davinci mdio bus?
Thanks,
JD
^ permalink raw reply
* Re: [PATCH net 0/3] bpf: fix several bugs
From: David Miller @ 2016-04-28 21:30 UTC (permalink / raw)
To: ast; +Cc: daniel, jannh, torvalds, netdev, kernel-team
In-Reply-To: <1461808582-1452466-1-git-send-email-ast@fb.com>
From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 27 Apr 2016 18:56:19 -0700
> First two patches address bugs found by Jann Horn.
> Last patch is a minor samples fix spotted during the testing.
Series applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net v3 0/5] drivers: net: cpsw: phy-handle fixes
From: David Miller @ 2016-04-28 21:27 UTC (permalink / raw)
To: drivshin.allworx
Cc: netdev, linux-omap, linux-arm-kernel, devicetree, linux-kernel,
mugunthanvnm, grygorii.strashko, andrew.goodbody,
systemprogrammierung.brunner, kwizart
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: "David Rivshin (Allworx)" <drivshin.allworx@gmail.com>
Date: Wed, 27 Apr 2016 21:10:03 -0400
> This series fixes a number of related issues around using phy-handle
> properties in cpsw emac nodes.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH next v2] ipvlan: Fix failure path in dev registration during link creation
From: David Miller @ 2016-04-28 21:23 UTC (permalink / raw)
To: mahesh; +Cc: maheshb, edumazet, netdev, ebiederm
In-Reply-To: <1461794367-39825-1-git-send-email-mahesh@bandewar.net>
From: Mahesh Bandewar <mahesh@bandewar.net>
Date: Wed, 27 Apr 2016 14:59:27 -0700
> From: Mahesh Bandewar <maheshb@google.com>
>
> When newlink creation fails at device-registration, the port->count
> is decremented twice. Francesco Ruggeri (fruggeri@arista.com) found
> this issue in Macvlan and the same exists in IPvlan driver too.
>
> While fixing this issue I noticed another issue of missing unregister
> in case of failure, so adding it to the fix which is similar to the
> macvlan fix by Francesco in commit 308379607548 ("macvlan: fix failure
> during registration v3")
>
> Reported-by: Francesco Ruggeri <fruggeri@arista.com>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v2] net: macb: do not scan PHYs manually
From: Andrew Lunn @ 2016-04-28 21:23 UTC (permalink / raw)
To: Josh Cartwright
Cc: Nathan Sullivan, Nicolas Ferre, netdev, linux-kernel,
Florian Fainelli, Alexandre Belloni
In-Reply-To: <20160428210357.GB30217@jcartwri.amer.corp.natinst.com>
On Thu, Apr 28, 2016 at 04:03:57PM -0500, Josh Cartwright wrote:
> On Thu, Apr 28, 2016 at 08:59:32PM +0200, Andrew Lunn wrote:
> > On Thu, Apr 28, 2016 at 01:55:27PM -0500, Nathan Sullivan wrote:
> > > On Thu, Apr 28, 2016 at 08:43:03PM +0200, Andrew Lunn wrote:
> > > > > I agree that is a valid fix for AT91, however it won't solve our problem, since
> > > > > we have no children on the second ethernet MAC in our devices' device trees. I'm
> > > > > starting to feel like our second MAC shouldn't even really register the MDIO bus
> > > > > since it isn't being used - maybe adding a DT property to not have a bus is a
> > > > > better option?
> > > >
> > > > status = "disabled"
> > > >
> > > > would be the unusual way.
> > > >
> > > > Andrew
> > >
> > > Oh, sorry, I meant we use both MACs on Zynq, however the PHYs are on the MDIO
> > > bus of the first MAC. So, the second MAC is used for ethernet but not for MDIO,
> > > and so it does not have any PHYs under its DT node. It would be nice if there
> > > were a way to tell macb not to bother with MDIO for the second MAC, since that's
> > > handled by the first MAC.
> >
> > Yes, exactly, add support for status = "disabled" in the mdio node.
>
> Unfortunately, the 'macb' doesn't have a "mdio node", or alternatively:
> the node representing the mdio bus is the same node which represents the
> macb instance itself. Setting 'status = "disabled"' on this node will
> just prevent the probing of the macb instance.
:-(
It is very common to have an mdio node within the MAC node, for example imx6sx-sdb.dtsi
&fec1 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_enet1>;
phy-supply = <®_enet_3v3>;
phy-mode = "rgmii";
phy-handle = <ðphy1>;
status = "okay";
mdio {
#address-cells = <1>;
#size-cells = <0>;
ethphy1: ethernet-phy@1 {
reg = <1>;
};
ethphy2: ethernet-phy@2 {
reg = <2>;
};
};
};
&fec2 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_enet2>;
phy-mode = "rgmii";
phy-handle = <ðphy2>;
status = "okay";
};
This even has the two phys on one bus, as you described...
Andrew
^ permalink raw reply
* Re: [PATCH net-next #2 1/1] pch_gbe: replace private tx ring lock with common netif_tx_lock
From: David Miller @ 2016-04-28 21:21 UTC (permalink / raw)
To: romieu; +Cc: netdev, nikolay, dvhart, andy.cress, bryan
In-Reply-To: <20160427212944.GB10269@electric-eye.fr.zoreil.com>
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 27 Apr 2016 23:29:44 +0200
> pch_gbe_tx_ring.tx_lock is only used in the hard_xmit handler and
> in the transmit completion reaper called from NAPI context.
>
> Compile-tested only. Potential victims Cced.
>
> Someone more knowledgeable may check if pch_gbe_tx_queue could
> have some use for a mmiowb.
>
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Cress <andy.cress@us.kontron.com>
> Cc: bryan@fossetcon.org
>
> ---
> Includes Nikolay's fix.
Applied, thank you.
^ permalink raw reply
* Re: [PATCH v3 2/2] net: Add Qualcomm IPC router
From: David Miller @ 2016-04-28 21:19 UTC (permalink / raw)
To: bjorn.andersson
Cc: andy.gross, linux-kernel, netdev, linux-arm-msm, courtney.cavin,
bjorn.andersson
In-Reply-To: <1461784383-2978-2-git-send-email-bjorn.andersson@linaro.org>
From: Bjorn Andersson <bjorn.andersson@linaro.org>
Date: Wed, 27 Apr 2016 12:13:03 -0700
> From: Courtney Cavin <courtney.cavin@sonymobile.com>
>
> Add an implementation of Qualcomm's IPC router protocol, used to
> communicate with service providing remote processors.
>
> Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
> [bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR]
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>
> Changes since v2:
> - Altered Kconfig dependency for QRTR_SMD to be compile testable
>
> Changes since v1:
> - Made node 0 (normally the Qualcomm modem) a valid node
> - Implemented RTM_NEWADDR for specifying the local node id
Please adjust this so that CONFIG_QRTR can be modular.
^ permalink raw reply
* Re: [PATCH net-next 0/6] net: make TCP preemptible
From: Marcelo Ricardo Leitner @ 2016-04-28 21:18 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1461821152-23200-1-git-send-email-edumazet@google.com>
On Wed, Apr 27, 2016 at 10:25:46PM -0700, Eric Dumazet wrote:
> Most of TCP stack assumed it was running from BH handler.
>
> This is great for most things, as TCP behavior is very sensitive
> to scheduling artifacts.
>
> However, the prequeue and backlog processing are problematic,
> as they need to be flushed with BH being blocked.
>
> To cope with modern needs, TCP sockets have big sk_rcvbuf values,
> in the order of 16 MB.
> This means that backlog can hold thousands of packets, and things
> like TCP coalescing or collapsing on this amount of packets can
> lead to insane latency spikes, since BH are blocked for too long.
And due to that, it may potentially lead to packet drops on NIC ring
buffers. Great, thanks Eric.
> It is time to make UDP/TCP stacks preemptible.
>
> Note that fast path still runs from BH handler.
>
> Eric Dumazet (6):
> tcp: do not assume TCP code is non preemptible
> tcp: do not block bh during prequeue processing
> dccp: do not assume DCCP code is non preemptible
> udp: prepare for non BH masking at backlog processing
> sctp: prepare for socket backlog behavior change
> net: do not block BH while processing socket backlog
>
> net/core/sock.c | 22 +++------
> net/dccp/input.c | 2 +-
> net/dccp/ipv4.c | 4 +-
> net/dccp/ipv6.c | 4 +-
> net/dccp/options.c | 2 +-
> net/ipv4/tcp.c | 6 +--
> net/ipv4/tcp_cdg.c | 20 ++++----
> net/ipv4/tcp_cubic.c | 20 ++++----
> net/ipv4/tcp_fastopen.c | 12 ++---
> net/ipv4/tcp_input.c | 126 +++++++++++++++++++----------------------------
> net/ipv4/tcp_ipv4.c | 14 ++++--
> net/ipv4/tcp_minisocks.c | 2 +-
> net/ipv4/tcp_output.c | 7 ++-
> net/ipv4/tcp_recovery.c | 4 +-
> net/ipv4/tcp_timer.c | 10 ++--
> net/ipv4/udp.c | 4 +-
> net/ipv6/tcp_ipv6.c | 12 ++---
> net/ipv6/udp.c | 4 +-
> net/sctp/inqueue.c | 2 +
> 19 files changed, 124 insertions(+), 153 deletions(-)
>
> --
> 2.8.0.rc3.226.g39d4020
>
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: Provide CPU port statistics to master netdev
From: David Miller @ 2016-04-28 21:16 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <1461782714-13471-1-git-send-email-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 27 Apr 2016 11:45:14 -0700
> This patch overloads the DSA master netdev, aka CPU Ethernet MAC to also
> include switch-side statistics, which is useful for debugging purposes,
> when the switch is not properly connected to the Ethernet MAC (duplex
> mismatch, (RG)MII electrical issues etc.).
>
> We accomplish this by retaining the original copy of the master netdev's
> ethtool_ops, and just overload the 3 operations we care about:
> get_sset_count, get_strings and get_ethtool_stats so as to intercept
> these calls and call into the original master_netdev ethtool_ops, plus
> our own.
>
> We take this approach as opposed to providing a set of DSA helper
> functions that would retrive the CPU port's statistics, because the
> entire purpose of DSA is to allow unmodified Ethernet MAC drivers to be
> used as CPU conduit interfaces, therefore, statistics overlay in such
> drivers would simply not scale.
>
> The new ethtool -S <iface> output would therefore look like this now:
> <iface> statistics
> p<2 digits cpu port number>_<switch MIB counter names>
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied, thanks Florian.
^ permalink raw reply
* Re: [PATCH net-next] tcp: give prequeue mode some care
From: David Miller @ 2016-04-28 21:15 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1461777145.5535.77.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Apr 2016 10:12:25 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> TCP prequeue goal is to defer processing of incoming packets
> to user space thread currently blocked in a recvmsg() system call.
>
> Intent is to spend less time processing these packets on behalf
> of softirq handler, as softirq handler is unfair to normal process
> scheduler decisions, as it might interrupt threads that do not
> even use networking.
>
> Current prequeue implementation has following issues :
>
> 1) It only checks size of the prequeue against sk_rcvbuf
>
> It was fine 15 years ago when sk_rcvbuf was in the 64KB vicinity.
> But we now have ~8MB values to cope with modern networking needs.
> We have to add sk_rmem_alloc in the equation, since out of order
> packets can definitely use up to sk_rcvbuf memory themselves.
>
> 2) Even with a fixed memory truesize check, prequeue can be filled
> by thousands of packets. When prequeue needs to be flushed, either
> from sofirq context (in tcp_prequeue() or timer code), or process
> context (in tcp_prequeue_process()), this adds a latency spike
> which is often not desirable.
> I added a fixed limit of 32 packets, as this translated to a max
> flush time of 60 us on my test hosts.
>
> Also note that all packets in prequeue are not accounted for tcp_mem,
> since they are not charged against sk_forward_alloc at this point.
> This is probably not a big deal.
>
> Note that this might increase LINUX_MIB_TCPPREQUEUEDROPPED counts,
> which is misnamed, as packets are not dropped at all, but rather pushed
> to the stack (where they can be either consumed or dropped)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
There was a conflict due to the stats macro renaming, but that was trivial
to resolve so I did it.
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH] MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
From: David Miller @ 2016-04-28 21:12 UTC (permalink / raw)
To: andreas; +Cc: netdev, linux-kernel, sam, software
In-Reply-To: <1461768370-3011-1-git-send-email-andreas@gaisler.com>
From: Andreas Larsson <andreas@gaisler.com>
Date: Wed, 27 Apr 2016 16:46:10 +0200
> Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 1/2] net: nps_enet: Sync access to packet sent flag
From: David Miller @ 2016-04-28 21:11 UTC (permalink / raw)
To: eladkan; +Cc: noamca, linux-kernel, abrodkin, talz, netdev
In-Reply-To: <1461763110-15263-2-git-send-email-eladkan@mellanox.com>
From: Elad Kanfi <eladkan@mellanox.com>
Date: Wed, 27 Apr 2016 16:18:29 +0300
> From: Elad Kanfi <eladkan@mellanox.com>
>
> Below is a description of a possible problematic
> sequence. CPU-A is sending a frame and CPU-B handles
> the interrupt that indicates the frame was sent. CPU-B
> reads an invalid value of tx_packet_sent.
>
> CPU-A CPU-B
> ----- -----
> nps_enet_send_frame
> .
> .
> tx_packet_sent = true
> order HW to start tx
> .
> .
> HW complete tx
> ------> get tx complete interrupt
> .
> .
> if(tx_packet_sent == true)
>
> end memory transaction
> (tx_packet_sent actually
> written)
>
> Problem solution:
>
> Add a memory barrier after setting tx_packet_sent,
> in order to make sure that it is written before
> the packet is sent.
>
> Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
>
> Acked-by: Noam Camus <noamca@mellanox.com>
Please address the feedback about memory barrier pairing.
Also, for both patches, do not put empty lines between the various
tags at the end of the commit message. They should all be together
in one continuous group.
^ permalink raw reply
* Re: [PATCH net] gre: reject GUE and FOU in collect metadata mode
From: David Miller @ 2016-04-28 21:10 UTC (permalink / raw)
To: jbenc; +Cc: netdev, pshelar, tgraf, tom
In-Reply-To: <7280f5aa7ec83653cdf7f0484fb9e696bc8c89e8.1461758705.git.jbenc@redhat.com>
From: Jiri Benc <jbenc@redhat.com>
Date: Wed, 27 Apr 2016 14:08:01 +0200
> The collect metadata mode does not support GUE nor FOU. This might be
> implemented later; until then, we should reject such config.
>
> I think this is okay to be changed. It's unlikely anyone has such
> configuration (as it doesn't work anyway) and we may need a way to
> distinguish whether it's supported or not by the kernel later.
>
> For backwards compatibility with iproute2, it's not possible to just check
> the attribute presence (iproute2 always includes the attribute), the actual
> value has to be checked, too.
>
> Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
> Discovered this only after I already sent v3 of the previous gre set.
> Submitting this patch on its own, it's an indepent fix anyway (though fixing
> the same commit).
Applied, thank you.
^ permalink raw reply
* Re: [PATCH v3 0/2] pegasus: correct buffer & packet sizes
From: David Miller @ 2016-04-28 21:09 UTC (permalink / raw)
To: petkan; +Cc: netdev, a1291762, johannes
In-Reply-To: <1461756290-27421-1-git-send-email-petkan@mip-labs.com>
From: Petko Manolov <petkan@mip-labs.com>
Date: Wed, 27 Apr 2016 14:24:48 +0300
> As noticed by Lincoln Ramsay <a1291762@gmail.com> some old (usb 1.1) Pegasus
> based devices may actually return more bytes than the specified in the datasheet
> amount. That would not be a problem if the allocated space for the SKB was
> equal to the parameter passed to usb_fill_bulk_urb(). Some poor bugger (i
> really hope it was not me, but 'git blame' is useless in this case, so anyway)
> decided to add '+ 8' to the buffer length parameter. Sometimes the usb transfer
> overflows and corrupts the socket structure, leading to kernel panic.
>
> The above doesn't seem to happen for newer (Pegasus2 based) devices which did
> help this bug to hide for so long.
>
> The new default is to not include the CRC at the end of each received package.
> So far CRC has been ignored which makes no sense to do it in a first place.
>
> The patch is against v4.6-rc5 and was tested on ADM8515 device by transferring
> multiple gigabytes of data over a couple of days without any complaints from the
> kernel. Please apply it to whatever net tree you deem fit.
>
> Changes since v1:
>
> - split the patch in two parts;
> - corrected the subject lines;
>
> Changes since v2:
>
> - do not append CRC by default (based on a discussion with Johannes Berg);
Series applied, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox