* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: David Miller @ 2016-11-28 16:30 UTC (permalink / raw)
To: lsanfil; +Cc: eric.dumazet, pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <920b1148-a77d-0b69-01b0-b14bcbdf8648@marvell.com>
From: Lino Sanfilippo <lsanfil@marvell.com>
Date: Mon, 28 Nov 2016 16:57:35 +0100
> I wonder if the best fix would be indeed to deactivate irq coalescing
> completely.
> Does it make any sense at all to use it if a driver uses NAPI already?
It absolutely does make sense, when it is implemented and functions
properly.
^ permalink raw reply
* Re: [RFC PATCH 1/2] net: macb: Add MDIO driver for accessing multiple PHY devices
From: Andrew Lunn @ 2016-11-28 16:33 UTC (permalink / raw)
To: Harini Katakam
Cc: nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
galak-sgV2jX0FEOL9JmXXK+q4OQ,
boris.brezillon-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
alexandre.belloni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
harinikatakamlinux-Re5JQEeQqe8AvxtiuMwx3w,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, harinik-gjFFaj9aHVfQT0dZR+AlfA,
michals-gjFFaj9aHVfQT0dZR+AlfA, Punnaiah Choudary Kalluri
In-Reply-To: <1480326554-6041-1-git-send-email-harinik-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
On Mon, Nov 28, 2016 at 03:19:14PM +0530, Harini Katakam wrote:
> This patch is to add support for the hardware with multiple ethernet
> MAC controllers and a single MDIO bus connected to multiple PHY devices.
> MDIO lines are connected to any one of the ethernet MAC controllers and
> all the PHY devices will be accessed using the PHY maintenance interface
> in that MAC controller. This handling along with PHY functionality is
> moved to macb_mdio.c
>
> Signed-off-by: Punnaiah Choudary Kalluri <punnaia-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Harini Katakam <harinik-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> ---
> drivers/net/ethernet/cadence/Makefile | 2 +-
> drivers/net/ethernet/cadence/macb.c | 169 +++-----------------
> drivers/net/ethernet/cadence/macb.h | 2 +
> drivers/net/ethernet/cadence/macb_mdio.c | 266 +++++++++++++++++++++++++++++++
> 4 files changed, 294 insertions(+), 145 deletions(-)
> create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c
>
> diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
> index 91f79b1..75c3d84 100644
> --- a/drivers/net/ethernet/cadence/Makefile
> +++ b/drivers/net/ethernet/cadence/Makefile
> @@ -2,4 +2,4 @@
> # Makefile for the Atmel network device drivers.
> #
>
> -obj-$(CONFIG_MACB) += macb.o
> +obj-$(CONFIG_MACB) += macb.o macb_mdio.o
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 80ccfc4..ae2a797 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -232,45 +232,6 @@ static void macb_get_hwaddr(struct macb *bp)
> eth_hw_addr_random(bp->dev);
> }
>
> -static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> -{
> - struct macb *bp = bus->priv;
> - int value;
> -
> - macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
> - | MACB_BF(RW, MACB_MAN_READ)
> - | MACB_BF(PHYA, mii_id)
> - | MACB_BF(REGA, regnum)
> - | MACB_BF(CODE, MACB_MAN_CODE)));
> -
> - /* wait for end of transfer */
> - while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> - cpu_relax();
> -
> - value = MACB_BFEXT(DATA, macb_readl(bp, MAN));
> -
> - return value;
> -}
> -
> -static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> - u16 value)
> -{
> - struct macb *bp = bus->priv;
> -
> - macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
> - | MACB_BF(RW, MACB_MAN_WRITE)
> - | MACB_BF(PHYA, mii_id)
> - | MACB_BF(REGA, regnum)
> - | MACB_BF(CODE, MACB_MAN_CODE)
> - | MACB_BF(DATA, value)));
> -
> - /* wait for end of transfer */
> - while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> - cpu_relax();
> -
> - return 0;
> -}
> -
> /**
> * macb_set_tx_clk() - Set a clock to a new frequency
> * @clk Pointer to the clock to change
> @@ -385,27 +346,19 @@ static void macb_handle_link_change(struct net_device *dev)
> static int macb_mii_probe(struct net_device *dev)
> {
> struct macb *bp = netdev_priv(dev);
> - struct macb_platform_data *pdata;
> struct phy_device *phydev;
> - int phy_irq;
> int ret;
>
> - phydev = phy_find_first(bp->mii_bus);
> + if (dev->phydev)
> + return 0;
> +
> + phydev = of_phy_find_device(bp->phy_node);
> if (!phydev) {
> netdev_err(dev, "no PHY found\n");
> return -ENXIO;
> }
> -
> - pdata = dev_get_platdata(&bp->pdev->dev);
> - if (pdata && gpio_is_valid(pdata->phy_irq_pin)) {
> - ret = devm_gpio_request(&bp->pdev->dev, pdata->phy_irq_pin,
> - "phy int");
> - if (!ret) {
> - phy_irq = gpio_to_irq(pdata->phy_irq_pin);
> - phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
> - }
> - }
> -
> + if (bp->phy_irq)
> + phydev->irq = bp->phy_irq;
> /* attach the mac to the phy */
> ret = phy_connect_direct(dev, phydev, &macb_handle_link_change,
> bp->phy_interface);
> @@ -429,80 +382,9 @@ static int macb_mii_probe(struct net_device *dev)
> bp->speed = 0;
> bp->duplex = -1;
>
> - return 0;
> -}
> -
> -static int macb_mii_init(struct macb *bp)
> -{
> - struct macb_platform_data *pdata;
> - struct device_node *np;
> - int err = -ENXIO, i;
> -
> - /* Enable management port */
> - macb_writel(bp, NCR, MACB_BIT(MPE));
> -
> - bp->mii_bus = mdiobus_alloc();
> - if (!bp->mii_bus) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> -
> - bp->mii_bus->name = "MACB_mii_bus";
> - bp->mii_bus->read = &macb_mdio_read;
> - bp->mii_bus->write = &macb_mdio_write;
> - snprintf(bp->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
> - bp->pdev->name, bp->pdev->id);
> - bp->mii_bus->priv = bp;
> - bp->mii_bus->parent = &bp->pdev->dev;
> - pdata = dev_get_platdata(&bp->pdev->dev);
> -
> - dev_set_drvdata(&bp->dev->dev, bp->mii_bus);
> -
> - np = bp->pdev->dev.of_node;
> - if (np) {
> - /* try dt phy registration */
> - err = of_mdiobus_register(bp->mii_bus, np);
> -
> - /* fallback to standard phy registration if no phy were
> - * found during dt phy registration
> - */
> - if (!err && !phy_find_first(bp->mii_bus)) {
> - for (i = 0; i < PHY_MAX_ADDR; i++) {
> - struct phy_device *phydev;
> -
> - phydev = mdiobus_scan(bp->mii_bus, i);
> - if (IS_ERR(phydev) &&
> - PTR_ERR(phydev) != -ENODEV) {
> - err = PTR_ERR(phydev);
> - break;
> - }
> - }
> -
> - if (err)
> - goto err_out_unregister_bus;
> - }
> - } else {
> - if (pdata)
> - bp->mii_bus->phy_mask = pdata->phy_mask;
> -
> - err = mdiobus_register(bp->mii_bus);
> - }
> -
> - if (err)
> - goto err_out_free_mdiobus;
> -
> - err = macb_mii_probe(bp->dev);
> - if (err)
> - goto err_out_unregister_bus;
> + phy_attached_info(phydev);
>
> return 0;
> -
> -err_out_unregister_bus:
> - mdiobus_unregister(bp->mii_bus);
> -err_out_free_mdiobus:
> - mdiobus_free(bp->mii_bus);
> -err_out:
> - return err;
> }
>
> static void macb_update_stats(struct macb *bp)
> @@ -2060,7 +1942,8 @@ static int macb_open(struct net_device *dev)
> netif_carrier_off(dev);
>
> /* if the phy is not yet register, retry later*/
> - if (!dev->phydev)
> + err = macb_mii_probe(dev);
> + if (err)
> return -EAGAIN;
>
> /* RX buffers initialization */
> @@ -3122,16 +3005,16 @@ static int macb_probe(struct platform_device *pdev)
> unsigned int queue_mask, num_queues;
> struct macb_platform_data *pdata;
> bool native_io;
> - struct phy_device *phydev;
> struct net_device *dev;
> struct resource *regs;
> void __iomem *mem;
> const char *mac;
> struct macb *bp;
> + int phy_irq;
> int err;
>
> regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - mem = devm_ioremap_resource(&pdev->dev, regs);
> + mem = devm_ioremap(&pdev->dev, regs->start, resource_size(regs));
> if (IS_ERR(mem))
> return PTR_ERR(mem);
>
> @@ -3250,21 +3133,26 @@ static int macb_probe(struct platform_device *pdev)
> if (err)
> goto err_out_free_netdev;
>
> - err = macb_mii_init(bp);
> - if (err)
> - goto err_out_free_netdev;
> -
> - phydev = dev->phydev;
> -
> - netif_carrier_off(dev);
> -
> err = register_netdev(dev);
> if (err) {
> dev_err(&pdev->dev, "Cannot register net device, aborting.\n");
> goto err_out_unregister_mdio;
> }
>
> - phy_attached_info(phydev);
> + bp->phy_node = of_parse_phandle(bp->pdev->dev.of_node,
> + "phy-handle", 0);
> +
> + pdata = dev_get_platdata(&bp->pdev->dev);
> + if (pdata && gpio_is_valid(pdata->phy_irq_pin)) {
> + err = devm_gpio_request(&bp->pdev->dev, pdata->phy_irq_pin,
> + "phy int");
> + if (!err) {
> + phy_irq = gpio_to_irq(pdata->phy_irq_pin);
> + bp->phy_irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
> + }
> + }
> +
> + netif_carrier_off(dev);
>
> netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
> macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
> @@ -3273,10 +3161,6 @@ static int macb_probe(struct platform_device *pdev)
> return 0;
>
> err_out_unregister_mdio:
> - phy_disconnect(dev->phydev);
> - mdiobus_unregister(bp->mii_bus);
> - mdiobus_free(bp->mii_bus);
> -
> /* Shutdown the PHY if there is a GPIO reset */
> if (bp->reset_gpio)
> gpiod_set_value(bp->reset_gpio, 0);
> @@ -3304,9 +3188,6 @@ static int macb_remove(struct platform_device *pdev)
> bp = netdev_priv(dev);
> if (dev->phydev)
> phy_disconnect(dev->phydev);
> - mdiobus_unregister(bp->mii_bus);
> - dev->phydev = NULL;
> - mdiobus_free(bp->mii_bus);
>
> /* Shutdown the PHY if there is a GPIO reset */
> if (bp->reset_gpio)
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index d67adad..15e5c0f 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -874,6 +874,8 @@ struct macb {
> unsigned int jumbo_max_len;
>
> u32 wol;
> + struct device_node *phy_node;
> + int phy_irq;
> };
>
> static inline bool macb_is_gem(struct macb *bp)
> diff --git a/drivers/net/ethernet/cadence/macb_mdio.c b/drivers/net/ethernet/cadence/macb_mdio.c
> new file mode 100644
> index 0000000..1277ca3
> --- /dev/null
> +++ b/drivers/net/ethernet/cadence/macb_mdio.c
> @@ -0,0 +1,266 @@
> +/*
> + * Cadence Macb mdio controller driver
> + *
> + * Copyright (C) 2014 - 2016 Xilinx, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation; either version 2 of the License, or (at your
> + * option) any later version.
> + */
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/netdevice.h>
> +#include <linux/of_address.h>
> +#include <linux/of_mdio.h>
> +#include <linux/io.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +#include <linux/ptp_clock_kernel.h>
> +#include "macb.h"
> +
> +struct macb_mdio_data {
> + void __iomem *regs;
> +
> + struct clk *pclk;
> + struct clk *hclk;
> +};
> +
> +#define macb_mdio_reg_writel(bp, offset, value) \
> + writel_relaxed(value, bp->regs + offset)
> +#define macb_mdio_writel(bp, reg, value) \
> + macb_mdio_reg_writel(bp, MACB_##reg, value)
> +
> +#define macb_mdio_reg_readl(bp, offset) readl_relaxed(bp->regs + offset)
> +#define macb_mdio_readl(bp, reg) macb_mdio_reg_readl(bp, MACB_##reg)
> +
> +#define MACB_MDIO_TIMEOUT 1000
> +
> +static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> +{
> + struct macb_mdio_data *bp = bus->priv;
> + unsigned int timeout = MACB_MDIO_TIMEOUT;
> + int value;
> +
> + macb_mdio_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF) |
> + MACB_BF(RW, MACB_MAN_READ) |
> + MACB_BF(PHYA, mii_id) |
> + MACB_BF(REGA, regnum) |
> + MACB_BF(CODE, MACB_MAN_CODE)));
> +
> + /* wait for end of transfer */
> + while (!MACB_BFEXT(IDLE, macb_mdio_readl(bp, NSR)) && timeout) {
> + cpu_relax();
> + timeout--;
> + }
> +
> + if (!timeout)
> + return -ETIMEDOUT;
> +
> + value = MACB_BFEXT(DATA, macb_mdio_readl(bp, MAN));
> +
> + return value;
> +}
> +
> +static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> + u16 value)
> +{
> + struct macb_mdio_data *bp = bus->priv;
> + unsigned int timeout = MACB_MDIO_TIMEOUT;
> +
> + macb_mdio_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF) |
> + MACB_BF(RW, MACB_MAN_WRITE) |
> + MACB_BF(PHYA, mii_id) |
> + MACB_BF(REGA, regnum) |
> + MACB_BF(CODE, MACB_MAN_CODE) |
> + MACB_BF(DATA, value)));
> +
> + /* wait for end of transfer */
> + while (!MACB_BFEXT(IDLE, macb_mdio_readl(bp, NSR)) && timeout) {
> + cpu_relax();
> + timeout--;
> + }
> +
> + if (!timeout)
> + return -ETIMEDOUT;
> +
> + return 0;
> +}
> +
> +static u32 gem_mdc_clk_div(struct macb_mdio_data *bp)
> +{
> + u32 config;
> + unsigned long pclk_hz = clk_get_rate(bp->pclk);
> +
> + if (pclk_hz <= 20000000)
> + config = GEM_BF(CLK, GEM_CLK_DIV8);
> + else if (pclk_hz <= 40000000)
> + config = GEM_BF(CLK, GEM_CLK_DIV16);
> + else if (pclk_hz <= 80000000)
> + config = GEM_BF(CLK, GEM_CLK_DIV32);
> + else if (pclk_hz <= 120000000)
> + config = GEM_BF(CLK, GEM_CLK_DIV48);
> + else if (pclk_hz <= 160000000)
> + config = GEM_BF(CLK, GEM_CLK_DIV64);
> + else
> + config = GEM_BF(CLK, GEM_CLK_DIV96);
> +
> + return config;
> +}
> +
> +static int macb_mdio_probe(struct platform_device *pdev)
> +{
> + struct device_node *np = pdev->dev.of_node;
> + struct mii_bus *bus;
> + struct macb_mdio_data *bp;
> + struct resource *res;
> + int ret;
> + u32 config, i;
> +
> + bus = mdiobus_alloc_size(sizeof(*bp));
> + if (!bus)
> + return -ENOMEM;
> +
> + bus->name = "macb_mii_bus";
> + bus->read = &macb_mdio_read;
> + bus->write = &macb_mdio_write;
> + snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
> + bus->parent = &pdev->dev;
> + bus->irq = devm_kzalloc(&pdev->dev, sizeof(int) * PHY_MAX_ADDR,
> + GFP_KERNEL);
This looks wrong, or at least old. It used to be a pointer to an array,
but it is now an actual array.
> +static const struct of_device_id macb_mdio_dt_ids[] = {
> + { .compatible = "cdns,macb-mdio" },
> +
> +};
I've not looked hard enough to know, but can you keep backwards
compatibility? Won't old device tree's assume the mdio bus is always
present? Now you need an explicit node otherwise there will not be an
mdio bus?
Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net/iucv: use explicit clean up labels in iucv_init()
From: Thomas Gleixner @ 2016-11-28 16:31 UTC (permalink / raw)
To: David Miller; +Cc: bigeasy, ubraun, linux-kernel, rt, linux-s390, netdev
In-Reply-To: <20161128.112433.295947046706527102.davem@davemloft.net>
On Mon, 28 Nov 2016, David Miller wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Thu, 24 Nov 2016 17:10:13 +0100
>
> > Ursula suggested to use explicit labels for clean up in the error path
> > instead of one `out_free' label which handles multiple exits.
> > Since the previous patch got already applied, here is a follow up patch.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> "Previous patch" doesn't tell readers anything specific enough to identify
> the change you are referring to. This will be even more true years down
> the line if someone tries to read this commit and figure out what you
> are referring to.
>
> We have a standard mechanism to refer to commits, via SHA1_ID and commit
> header line text, please use it.
I amended the commit message.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 1/1] GSO: Reload iph after pskb_may_pull
From: Alexander Duyck @ 2016-11-28 16:34 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: David Miller, Eric Dumazet, Alexander Duyck, Andrey Konovalov,
Linux Networking Development Mailing List
In-Reply-To: <20161128153658.GB4778@kernel.org>
On Mon, Nov 28, 2016 at 7:36 AM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
> As it may get stale and lead to use after free.
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Alexander Duyck <aduyck@mirantis.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID")
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
> net/ipv4/af_inet.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 5ddf5cda07f4..215143246e4b 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1233,7 +1233,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
>
> /* fixed ID is invalid if DF bit is not set */
> - if (fixedid && !(iph->frag_off & htons(IP_DF)))
> + if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
> goto out;
> }
>
> --
> 2.9.3
>
Looks good to me.
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: John Fastabend @ 2016-11-28 16:38 UTC (permalink / raw)
To: David Miller, mst
Cc: nhorman, jeder, hannes, netdev, linux-kernel, virtualization,
myllynen, maxime.coquelin
In-Reply-To: <20161128.112823.2263657283777032168.davem@davemloft.net>
On 16-11-28 08:28 AM, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Fri, 25 Nov 2016 06:43:08 +0200
>
>> On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
>>> We use single queue even if multiqueue is enabled and let admin to
>>> enable it through ethtool later. This is used to avoid possible
>>> regression (small packet TCP stream transmission). But looks like an
>>> overkill since:
>>>
>>> - single queue user can disable multiqueue when launching qemu
>>> - brings extra troubles for the management since it needs extra admin
>>> tool in guest to enable multiqueue
>>> - multiqueue performs much better than single queue in most of the
>>> cases
>>>
>>> So this patch enables multiqueue by default: if #queues is less than or
>>> equal to #vcpu, enable as much as queue pairs; if #queues is greater
>>> than #vcpu, enable #vcpu queue pairs.
>>>
>>> Cc: Hannes Frederic Sowa <hannes@redhat.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Cc: Neil Horman <nhorman@redhat.com>
>>> Cc: Jeremy Eder <jeder@redhat.com>
>>> Cc: Marko Myllynen <myllynen@redhat.com>
>>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>
>> OK at some level but all uses of num_online_cpus()
>> like this are racy versus hotplug.
>> I know we already have this bug but shouldn't we fix it
>> before we add more?
>
> This is more being used like a heuristic in this scenerio, and in
> fact I would say one would keep the code this way even once proper
> hotplug handlers are installed to adjust the queued dynamically if
> there is a desired (which is also not necessarily the case).
>
> I really don't think this change should be held on up on this issue.
> So can we please make some forward progress here?
>
> Thanks.
>
Also it might be worth noting all the other multiqueue capable
ethernet devices I checked use some variation of this heuristic.
Typically related to what other features are enables RSS, DCB, etc.
So it at least should be familiar to folks who do hotplug cpus on
bare metal boxes.
.John
^ permalink raw reply
* Re: [PATCH] net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues
From: David Miller @ 2016-11-28 16:29 UTC (permalink / raw)
To: niklas.cassel
Cc: peppe.cavallaro, alexandre.torgue, niklass, netdev, linux-kernel
In-Reply-To: <1479998194-7113-1-git-send-email-niklass@axis.com>
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Thu, 24 Nov 2016 15:36:33 +0100
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> The dwmac4 IP can synthesized with 1-8 number of tx queues.
> On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled
> by default. For these IPs, the bitfield TXQEN is R/W.
>
> Always enable tx queue 0. The write will have no effect on IPs synthesized
> with DWC_EQOS_NUM_TXQ == 1.
>
> The driver does still not utilize more than one tx queue in the IP.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Alexandre, we are still waiting for your implicit/explicit ACK on this
change.
Thank you.
^ permalink raw reply
* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: David Miller @ 2016-11-28 16:47 UTC (permalink / raw)
To: sfr; +Cc: elicooper, netdev
In-Reply-To: <20161127130400.4a69ff1b@canb.auug.org.au>
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Sun, 27 Nov 2016 13:04:00 +1100
> [Just for Dave's information]
>
> On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>>
>> Similar to commit ae148b085876
>> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
>> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
>> might not be properly segmented, which causes the packets being dropped.
>>
>> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>> Tested-by: Eli Cooper <elicooper@gmx.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Eli Cooper <elicooper@gmx.com>
>
> I tested this patch and it does *not* solve my problem.
I'm torn on this patch, because it looked exactly like it would solve the
kind of problem Stephen is running into.
Even though it doesn't fix his case, it seems correct to me.
I was wondering if it was also important to set the skb->protocol
before the call to ip_tunnel_encap() but I couldn't find a dependency.
In any event I'd like to see some other people review this change
before I apply it.
My only other guess for Stephen's problem is somehow the SKB headers
aren't set up properly for what the GSO engine expects.
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: Michael S. Tsirkin @ 2016-11-28 16:52 UTC (permalink / raw)
To: Jason Wang
Cc: Neil Horman, Jeremy Eder, Hannes Frederic Sowa, netdev,
linux-kernel, virtualization, Marko Myllynen, Maxime Coquelin
In-Reply-To: <1480048646-17536-1-git-send-email-jasowang@redhat.com>
On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
> We use single queue even if multiqueue is enabled and let admin to
> enable it through ethtool later. This is used to avoid possible
> regression (small packet TCP stream transmission). But looks like an
> overkill since:
>
> - single queue user can disable multiqueue when launching qemu
> - brings extra troubles for the management since it needs extra admin
> tool in guest to enable multiqueue
> - multiqueue performs much better than single queue in most of the
> cases
>
> So this patch enables multiqueue by default: if #queues is less than or
> equal to #vcpu, enable as much as queue pairs; if #queues is greater
> than #vcpu, enable #vcpu queue pairs.
>
> Cc: Hannes Frederic Sowa <hannes@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Neil Horman <nhorman@redhat.com>
> Cc: Jeremy Eder <jeder@redhat.com>
> Cc: Marko Myllynen <myllynen@redhat.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
OK I stil htink we should handle cpu hotplug better
but this can be done separately.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d4ac7a6..a21d93a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1886,8 +1886,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (vi->any_header_sg)
> dev->needed_headroom = vi->hdr_len;
>
> - /* Use single tx/rx queue pair as default */
> - vi->curr_queue_pairs = 1;
> + /* Enable multiqueue by default */
> + if (num_online_cpus() >= max_queue_pairs)
> + vi->curr_queue_pairs = max_queue_pairs;
> + else
> + vi->curr_queue_pairs = num_online_cpus();
> vi->max_queue_pairs = max_queue_pairs;
>
> /* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> @@ -1918,6 +1921,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> goto free_unregister_netdev;
> }
>
> + virtnet_set_affinity(vi);
> +
> /* Assume link up if device can't report link status,
> otherwise get link status from config. */
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> --
> 2.7.4
^ permalink raw reply
* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: Eric Dumazet @ 2016-11-28 16:53 UTC (permalink / raw)
To: David Miller, Alexander Duyck; +Cc: sfr, elicooper, netdev
In-Reply-To: <20161128.114703.154301432767947066.davem@davemloft.net>
On Mon, 2016-11-28 at 11:47 -0500, David Miller wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Sun, 27 Nov 2016 13:04:00 +1100
>
> > [Just for Dave's information]
> >
> > On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
> >>
> >> Similar to commit ae148b085876
> >> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
> >> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
> >> might not be properly segmented, which causes the packets being dropped.
> >>
> >> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> >> Tested-by: Eli Cooper <elicooper@gmx.com>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Eli Cooper <elicooper@gmx.com>
> >
> > I tested this patch and it does *not* solve my problem.
>
> I'm torn on this patch, because it looked exactly like it would solve the
> kind of problem Stephen is running into.
>
> Even though it doesn't fix his case, it seems correct to me.
>
> I was wondering if it was also important to set the skb->protocol
> before the call to ip_tunnel_encap() but I couldn't find a dependency.
>
> In any event I'd like to see some other people review this change
> before I apply it.
>
> My only other guess for Stephen's problem is somehow the SKB headers
> aren't set up properly for what the GSO engine expects.
Well, mlx4 just works, and uses GSO engine just fine.
So my guess is this is a bug in Intel IGB driver.
Alexander, can you take a look ?
Features for eth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]
^ permalink raw reply
* Re: AF_VSOCK network namespace support
From: Jorgen S. Hansen @ 2016-11-28 15:24 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: netdev@vger.kernel.org, imbrenda@linux.vnet.ibm.com
In-Reply-To: <20161123145535.GA16465@stefanha-x1.localdomain>
Hi Stefan,
> On Nov 23, 2016, at 3:55 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> Hi Jorgen,
> There are two use cases where network namespace support in AF_VSOCK
> could be useful:
>
> 1. Claudio Imbrenda pointed out that a machine cannot act as both host
> and guest at the same time. This is necessary for nested
> virtualization. Currently only one transport (the host side or the
> guest side) can be registered at a time.
VMCI based AF_VSOCK relies on the VMCI driver for nested virtualization support. The VMCI driver is a combined host/guest driver with a routing component, that will either direct traffic to VMs managed by the host “personality” of the driver, or to the outer host. So any VMCI driver driver is able to function simultaneously as both a guest and a host driver - exactly to be able to support nested virtualization.
Since, for VMCI based vSocket, the host has a fixed CID (2), any traffic generated by an application inside a VM destined for CID 2 will be routed out of the VM (to the host - either a virtual or physical one). Any traffic for a CID > 2 will be directed towards VMs managed by the host personality of the VMCI driver.
Since VMCI predates nested virtualization, the solution above was partly a result of having to support existing configurations in a transparent way.
> 2. Users may wish to isolate the AF_VSOCK address namespace so that two
> VMs have completely independent CID and ports (they could even use
> the same CID and ports because they're in separate namespaces). This
> ensures that a host service visible to VM1 is not automatically
> visible to VM2.
If the goal is to provide fine grained service access control, won’t this end up requiring a namespace per VM? For ESX, we have a mechanism to tag VMs that allows them to be granted access to a service offered through AF_VSOCK, but this is not part of the Linux hypervisor.
If the intent is to be able to support multi tenancy, then this sounds like a better fit. Also, in the multi tenancy case, isolating the other AFs is probably what you want as well.
> Network namespaces could solve both problems.
>
> A drawback of namespaces is that existing configurations using network
> namespaces for IPv4/6 or other purposes break if AF_VSOCK gains network
> namespace support. This is not a big problem for virtio-vsock if we
> implement namespace support soon since there are no existing users.
>
> I wonder how other address families have solved this transition to
> network namespaces. It's almost like we need fine-grained namespaces
> instead of a blanket network namespace that applies across all address
> families...
>
> I'm playing around with the code now but wanted to get your thoughts in
> case you've already considered these problems.
>
> Stefan
Thanks,
Jørgen
^ permalink raw reply
* Re: [PATCH net-next 0/2] Fix support for the MV88E6097
From: David Miller @ 2016-11-28 16:59 UTC (permalink / raw)
To: eichest; +Cc: andrew, vivien.didelot, netdev, stefan.eichenberger
In-Reply-To: <20161125084130.3210-1-stefan.eichenberger@netmodule.com>
From: Stefan Eichenberger <eichest@gmail.com>
Date: Fri, 25 Nov 2016 09:41:28 +0100
> This patchset fixes the following two issues for the MV88E6097:
> - Add missing definition of g1_irqs
> - Add missing comment
Series applied, thanks Stefan.
^ permalink raw reply
* Re: [PATCH net-next 1/5] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: Gregory CLEMENT @ 2016-11-28 17:00 UTC (permalink / raw)
To: Jisheng Zhang
Cc: David S. Miller, linux-kernel, netdev, Arnd Bergmann,
Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
Thomas Petazzoni, linux-arm-kernel, Nadav Haklai, Marcin Wojtas,
Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <20161128163548.70181560@xhacker>
Hi Jisheng,
On lun., nov. 28 2016, Jisheng Zhang <jszhang@marvell.com> wrote:
> Hi Gregory,
>
> On Fri, 25 Nov 2016 16:30:14 +0100 Gregory CLEMENT wrote:
>
>> Until now the virtual address of the received buffer were stored in the
>> cookie field of the rx descriptor. However, this field is 32-bits only
>> which prevents to use the driver on a 64-bits architecture.
>>
>> With this patch the virtual address is stored in an array not shared with
>> the hardware (no more need to use the DMA API). Thanks to this, it is
>> possible to use cache contrary to the access of the rx descriptor member.
>>
>> The change is done in the swbm path only because the hwbm uses the cookie
>> field, this also means that currently the hwbm is not usable in 64-bits.
>>
>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>> ---
>> drivers/net/ethernet/marvell/mvneta.c | 96 ++++++++++++++++++++++++----
>> 1 file changed, 84 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> index 87274d4ab102..b6849f88cab7 100644
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -561,6 +561,9 @@ struct mvneta_rx_queue {
>> u32 pkts_coal;
>> u32 time_coal;
>>
>> + /* Virtual address of the RX buffer */
>> + void **buf_virt_addr;
>
> can we store buf_phys_addr in cacheable memory as well?
Even if we store in in cacheable memory we will still need to store it
in the buffer descriptor as it is used by the hardware.
>
>> +
>> /* Virtual address of the RX DMA descriptors array */
>> struct mvneta_rx_desc *descs;
>>
>> @@ -1573,10 +1576,14 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp,
>>
>> /* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */
>> static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc,
>> - u32 phys_addr, u32 cookie)
>> + u32 phys_addr, void *virt_addr,
>> + struct mvneta_rx_queue *rxq)
>> {
>> - rx_desc->buf_cookie = cookie;
>> + int i;
>> +
>> rx_desc->buf_phys_addr = phys_addr;
>> + i = rx_desc - rxq->descs;
>> + rxq->buf_virt_addr[i] = virt_addr;
>> }
>>
>> /* Decrement sent descriptors counter */
>> @@ -1781,7 +1788,8 @@ EXPORT_SYMBOL_GPL(mvneta_frag_free);
>>
>> /* Refill processing for SW buffer management */
>> static int mvneta_rx_refill(struct mvneta_port *pp,
>> - struct mvneta_rx_desc *rx_desc)
>> + struct mvneta_rx_desc *rx_desc,
>> + struct mvneta_rx_queue *rxq)
>>
>> {
>> dma_addr_t phys_addr;
>> @@ -1799,7 +1807,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
>> return -ENOMEM;
>> }
>>
>> - mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
>> + mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
>> return 0;
>> }
>>
>> @@ -1861,7 +1869,12 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
>>
>> for (i = 0; i < rxq->size; i++) {
>> struct mvneta_rx_desc *rx_desc = rxq->descs + i;
>> - void *data = (void *)rx_desc->buf_cookie;
>> + void *data;
>> +
>> + if (!pp->bm_priv)
>> + data = rxq->buf_virt_addr[i];
>> + else
>> + data = (void *)(uintptr_t)rx_desc->buf_cookie;
>>
>> dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
>> MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
>> @@ -1894,12 +1907,13 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>> unsigned char *data;
>> dma_addr_t phys_addr;
>> u32 rx_status, frag_size;
>> - int rx_bytes, err;
>> + int rx_bytes, err, index;
>>
>> rx_done++;
>> rx_status = rx_desc->status;
>> rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> - data = (unsigned char *)rx_desc->buf_cookie;
>> + index = rx_desc - rxq->descs;
>> + data = (unsigned char *)rxq->buf_virt_addr[index];
>> phys_addr = rx_desc->buf_phys_addr;
>>
>> if (!mvneta_rxq_desc_is_first_last(rx_status) ||
>> @@ -1938,7 +1952,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>> }
>>
>> /* Refill processing */
>> - err = mvneta_rx_refill(pp, rx_desc);
>> + err = mvneta_rx_refill(pp, rx_desc, rxq);
>> if (err) {
>> netdev_err(dev, "Linux processing - Can't refill\n");
>> rxq->missed++;
>> @@ -2020,7 +2034,7 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int rx_todo,
>> rx_done++;
>> rx_status = rx_desc->status;
>> rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> - data = (unsigned char *)rx_desc->buf_cookie;
>> + data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
>> phys_addr = rx_desc->buf_phys_addr;
>> pool_id = MVNETA_RX_GET_BM_POOL_ID(rx_desc);
>> bm_pool = &pp->bm_priv->bm_pools[pool_id];
>> @@ -2708,6 +2722,57 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
>> return rx_done;
>> }
>>
>> +/* Refill processing for HW buffer management */
>> +static int mvneta_rx_hwbm_refill(struct mvneta_port *pp,
>> + struct mvneta_rx_desc *rx_desc)
>> +
>> +{
>> + dma_addr_t phys_addr;
>> + void *data;
>> +
>> + data = mvneta_frag_alloc(pp->frag_size);
>> + if (!data)
>> + return -ENOMEM;
>> +
>> + phys_addr = dma_map_single(pp->dev->dev.parent, data,
>> + MVNETA_RX_BUF_SIZE(pp->pkt_size),
>> + DMA_FROM_DEVICE);
>> + if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
>> + mvneta_frag_free(pp->frag_size, data);
>> + return -ENOMEM;
>> + }
>> +
>> + phys_addr += pp->rx_offset_correction;
>> + rx_desc->buf_phys_addr = phys_addr;
>> + rx_desc->buf_cookie = (uintptr_t)data;
>> +
>> + return 0;
>> +}
>> +
>> +/* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>> +static int mvneta_rxq_bm_fill(struct mvneta_port *pp,
>> + struct mvneta_rx_queue *rxq,
>> + int num)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < num; i++) {
>> + memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> + if (mvneta_rx_hwbm_refill(pp, rxq->descs + i) != 0) {
>> + netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs filled\n",
>> + __func__, rxq->id, i, num);
>> + break;
>> + }
>> + }
>> +
>> + /* Add this number of RX descriptors as non occupied (ready to
>> + * get packets)
>> + */
>> + mvneta_rxq_non_occup_desc_add(pp, rxq, i);
>> +
>> + return i;
>> +}
>> +
>> /* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>> static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>> int num)
>> @@ -2716,7 +2781,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>>
>> for (i = 0; i < num; i++) {
>> memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> - if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
>> + if (mvneta_rx_refill(pp, rxq->descs + i, rxq) != 0) {
>> netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs filled\n",
>> __func__, rxq->id, i, num);
>> break;
>> @@ -2784,14 +2849,21 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
>> mvneta_rxq_buf_size_set(pp, rxq,
>> MVNETA_RX_BUF_SIZE(pp->pkt_size));
>> mvneta_rxq_bm_disable(pp, rxq);
>> +
>> + rxq->buf_virt_addr = devm_kmalloc(pp->dev->dev.parent,
>> + rxq->size * sizeof(void *),
>> + GFP_KERNEL);
>
> I would suggest allocate this buffer during probe. Otherwise, there's
> memory leak if we either change the mtu or close then open the eth in
> a loop, e.g
>
> while true
> do
> ifconfig eth0 up
> ifconfig eth0 down
> done
Indeed, I will move it.
Thanks,
Gregory
>
> Thanks,
> Jisheng
>
>> + if (!rxq->buf_virt_addr)
>> + return -ENOMEM;
>> +
>> + mvneta_rxq_fill(pp, rxq, rxq->size);
>> } else {
>> mvneta_rxq_bm_enable(pp, rxq);
>> mvneta_rxq_long_pool_set(pp, rxq);
>> mvneta_rxq_short_pool_set(pp, rxq);
>> + mvneta_rxq_bm_fill(pp, rxq, rxq->size);
>> }
>>
>> - mvneta_rxq_fill(pp, rxq, rxq->size);
>> -
>> return 0;
>> }
>>
>
--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: Lino Sanfilippo @ 2016-11-28 17:01 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161128.113031.964579744326063048.davem@davemloft.net>
On 28.11.2016 17:30, David Miller wrote:
> From: Lino Sanfilippo <lsanfil@marvell.com>
> Date: Mon, 28 Nov 2016 16:57:35 +0100
>
>> I wonder if the best fix would be indeed to deactivate irq coalescing
>> completely.
>> Does it make any sense at all to use it if a driver uses NAPI already?
>
> It absolutely does make sense, when it is implemented and functions
> properly.
>
Interesting. I always thought both (NAPI and irq coalescing) are essentially doing the same thing only
one time in software and one time with hw support. Did I misunderstand NAPI?
Regards,
Lino
^ permalink raw reply
* [patch net-next 1/4] mlxsw: resources: Add maximum buffer size
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/resources.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index 1c2119b..3c2171d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -47,6 +47,7 @@ enum mlxsw_res_id {
MLXSW_RES_ID_MAX_SYSTEM_PORT,
MLXSW_RES_ID_MAX_LAG,
MLXSW_RES_ID_MAX_LAG_MEMBERS,
+ MLXSW_RES_ID_MAX_BUFFER_SIZE,
MLXSW_RES_ID_MAX_CPU_POLICERS,
MLXSW_RES_ID_MAX_VRS,
MLXSW_RES_ID_MAX_RIFS,
@@ -70,6 +71,7 @@ static u16 mlxsw_res_ids[] = {
[MLXSW_RES_ID_MAX_SYSTEM_PORT] = 0x2502,
[MLXSW_RES_ID_MAX_LAG] = 0x2520,
[MLXSW_RES_ID_MAX_LAG_MEMBERS] = 0x2521,
+ [MLXSW_RES_ID_MAX_BUFFER_SIZE] = 0x2802, /* Bytes */
[MLXSW_RES_ID_MAX_CPU_POLICERS] = 0x2A13,
[MLXSW_RES_ID_MAX_VRS] = 0x2C01,
[MLXSW_RES_ID_MAX_RIFS] = 0x2C02,
--
2.7.4
^ permalink raw reply related
* [patch net-next 2/4] mlxsw: spectrum_buffers: Limit size of pools
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.
While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.
Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.
Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned
With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index bcaed8a..a746826 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -611,6 +611,9 @@ int mlxsw_sp_sb_pool_set(struct mlxsw_core *mlxsw_core,
u32 pool_size = MLXSW_SP_BYTES_TO_CELLS(size);
enum mlxsw_reg_sbpr_mode mode;
+ if (size > MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_BUFFER_SIZE))
+ return -EINVAL;
+
mode = (enum mlxsw_reg_sbpr_mode) threshold_type;
return mlxsw_sp_sb_pr_write(mlxsw_sp, pool, dir, mode, pool_size);
}
--
2.7.4
^ permalink raw reply related
* [patch net-next 3/4] mlxsw: core: Add missing rollback in error path
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.
Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.
Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index b21f88c..7a0ad39 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1157,6 +1157,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (mlxsw_core->driver->fini)
mlxsw_core->driver->fini(mlxsw_core);
err_driver_init:
+ mlxsw_thermal_fini(mlxsw_core->thermal);
err_thermal_init:
err_hwmon_init:
devlink_unregister(devlink);
--
2.7.4
^ permalink raw reply related
* [patch net-next 4/4] mlxsw: core: Change order of operations in removal path
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>
From: Ido Schimmel <idosch@mellanox.com>
We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.
This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 7a0ad39..4dc028b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1188,8 +1188,8 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
mlxsw_thermal_fini(mlxsw_core->thermal);
devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
- mlxsw_core->bus->fini(mlxsw_core->bus_priv);
kfree(mlxsw_core->lag.mapping);
+ mlxsw_core->bus->fini(mlxsw_core->bus_priv);
free_percpu(mlxsw_core->pcpu_stats);
devlink_free(devlink);
mlxsw_core_driver_put(device_kind);
--
2.7.4
^ permalink raw reply related
* [patch net-next 0/4] mlxsw: couple of enhancements and fixes
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
From: Jiri Pirko <jiri@mellanox.com>
Couple of enhancements and fixes from Ido.
Ido Schimmel (4):
mlxsw: resources: Add maximum buffer size
mlxsw: spectrum_buffers: Limit size of pools
mlxsw: core: Add missing rollback in error path
mlxsw: core: Change order of operations in removal path
drivers/net/ethernet/mellanox/mlxsw/core.c | 3 ++-
drivers/net/ethernet/mellanox/mlxsw/resources.h | 2 ++
drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 3 +++
3 files changed, 7 insertions(+), 1 deletion(-)
--
2.7.4
^ permalink raw reply
* Re: [PATCH 0/2] net: phy: realtek: fix RTL8211F TX-delay handling
From: David Miller @ 2016-11-28 17:07 UTC (permalink / raw)
To: martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg
Cc: f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
mark.rutland-5wv7dgnIgG8, sean.wang-NuS5LvNUpcJWk0Htik3J/w,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
jbrunet-rdvid1DuHRBWk0Htik3J/w
In-Reply-To: <20161125131201.19994-1-martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
From: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
Date: Fri, 25 Nov 2016 14:11:59 +0100
> The RTL8211F PHY driver currently enables the TX-delay only when the
> phy-mode is PHY_INTERFACE_MODE_RGMII. This is incorrect, because there
> are three RGMII variations of the phy-mode which explicitly request the
> PHY to enable the RX and/or TX delay, while PHY_INTERFACE_MODE_RGMII
> specifies that the PHY should disable the RX and/or TX delays.
>
> Additionally to the RTL8211F PHY driver change this contains a small
> update to the phy-mode documentation to clarify the purpose of the
> RGMII phy-modes.
> While this may not be perfect yet it's at least a start. Please feel
> free to drop this patch from this series and send an improved version
> yourself.
>
> These patches are the results of recent discussions, see [0]
>
> [0] http://lists.infradead.org/pipermail/linux-amlogic/2016-November/001688.html
Series applied, thanks Martin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] net: hns: Fix to conditionally convey RX checksum flag to stack
From: David Miller @ 2016-11-28 17:12 UTC (permalink / raw)
To: salil.mehta; +Cc: yisen.zhuang, mehta.salil.lnk, netdev, linux-kernel, linuxarm
In-Reply-To: <20161125133240.1264224-1-salil.mehta@huawei.com>
From: Salil Mehta <salil.mehta@huawei.com>
Date: Fri, 25 Nov 2016 13:32:40 +0000
> @@ -778,6 +778,35 @@ int hns_ae_get_regs_len(struct hnae_handle *handle)
> return total_num;
> }
>
> +static bool hns_ae_is_l3l4_csum_err(struct hnae_handle *handle)
> +{
> + struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
> + u32 regval;
> + bool retval = false;
> +
> + /* read PPE_HIS_PRO_ERR register and check for the checksum errors */
> + regval = dsaf_read_dev(ppe_cb, PPE_HIS_PRO_ERR_REG);
> +
I don't see how a single register can properly provide error status for a ring
of pending received packets.
No matter how this register is implemented, it is either going to result in
packets erroneously being marked as having errors, or error status being
lost when multiple packets in a row have such errors.
For example, if you receive several packets in a row that have errors,
you'll read this register for the first one. If this read clears the error
status, which I am guessing it does, then you won't see the error status
for the next packet that had one of these errors as well.
If you don't have something which is provided on a per-packet basis
then you can't determine the error properly. Therefore you will just
have to always ignore the checksum if there is any error indicated in
the ring descriptor.
^ permalink raw reply
* Re: [PATCH] Set DEFAULT_TCP_CONG to bbr if DEFAULT_BBR is set
From: David Miller @ 2016-11-28 17:15 UTC (permalink / raw)
To: jwollrath; +Cc: netdev
In-Reply-To: <20161125140526.2486-1-jwollrath@web.de>
From: Julian Wollrath <jwollrath@web.de>
Date: Fri, 25 Nov 2016 15:05:26 +0100
> Signed-off-by: Julian Wollrath <jwollrath@web.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] amd-xgbe: Fix unused suspend handlers build warning
From: David Miller @ 2016-11-28 17:19 UTC (permalink / raw)
To: bp; +Cc: linux-kernel, thomas.lendacky, netdev
In-Reply-To: <20161126205352.19577-1-bp@alien8.de>
From: Borislav Petkov <bp@alien8.de>
Date: Sat, 26 Nov 2016 21:53:52 +0100
> From: Borislav Petkov <bp@suse.de>
>
> Fix:
>
> drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ defined
> but not used [-Wunused-function]
> drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ defined
> but not used [-Wunused-function]
>
> I see it during randconfig builds here.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next v2 1/6] tcp: instrument tcp sender limits chronographs
From: Neal Cardwell @ 2016-11-28 17:23 UTC (permalink / raw)
To: Yuchung Cheng
Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
Eric Dumazet
In-Reply-To: <1480316838-154141-2-git-send-email-ycheng@google.com>
On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch implements the skeleton of the TCP chronograph
> instrumentation on sender side limits:
>
> 1) idle (unspec)
> 2) busy sending data other than 3-4 below
> 3) rwnd-limited
> 4) sndbuf-limited
>
> The limits are enumerated 'tcp_chrono'. Since a connection in
> theory can idle forever, we do not track the actual length of this
> uninteresting idle period. For the rest we track how long the sender
> spends in each limit. At any point during the life time of a
> connection, the sender must be in one of the four states.
>
> If there are multiple conditions worthy of tracking in a chronograph
> then the highest priority enum takes precedence over
> the other conditions. So that if something "more interesting"
> starts happening, stop the previous chrono and start a new one.
>
> The time unit is jiffy(u32) in order to save space in tcp_sock.
> This implies application must sample the stats no longer than every
> 49 days of 1ms jiffy.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
^ permalink raw reply
* Re: [PATCH net-next v2 2/6] tcp: instrument how long TCP is busy sending
From: Neal Cardwell @ 2016-11-28 17:24 UTC (permalink / raw)
To: Yuchung Cheng
Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
Eric Dumazet
In-Reply-To: <1480316838-154141-3-git-send-email-ycheng@google.com>
On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch measures TCP busy time, which is defined as the period
> of time when sender has data (or FIN) to send. The time starts when
> data is buffered and stops when the write queue is flushed by ACKs
> or error events.
>
> Note the busy time does not include SYN time, unless data is
> included in SYN (i.e. Fast Open). It does include FIN time even
> if the FIN carries no payload. Excluding pure FIN is possible but
> would incur one additional test in the fast path, which may not
> be worth it.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
^ permalink raw reply
* Re: [PATCH net-next v2 3/6] tcp: instrument how long TCP is limited by receive window
From: Neal Cardwell @ 2016-11-28 17:25 UTC (permalink / raw)
To: Yuchung Cheng
Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
Eric Dumazet
In-Reply-To: <1480316838-154141-4-git-send-email-ycheng@google.com>
On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch measures the total time when the TCP stops sending because
> the receiver's advertised window is not large enough. Note that
> once the limit is lifted we are likely in the busy status if we
> have data pending.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox