* Re: [PATCH net-next v2 0/6] Add comphy support for Armada 38x
From: Kishon Vijay Abraham I @ 2019-02-08 9:22 UTC (permalink / raw)
To: Thomas Petazzoni, David Miller
Cc: linux, andrew, gregory.clement, jason, sebastian.hesselbarth,
devicetree, linux-arm-kernel, mark.rutland, netdev, robh+dt
In-Reply-To: <20190208085821.7556a18d@windsurf>
Hi,
On 08/02/19 1:28 PM, Thomas Petazzoni wrote:
> Hello David,
>
> On Thu, 07 Feb 2019 18:10:49 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
>> Date: Thu, 7 Feb 2019 16:18:25 +0000
>>
>>> This series adds support for the comphy for Armada 38x, which allows
>>> these SoCs to use 2500BASE-X mode with appropriate SFP modules.
>>>
>>> Tested on SolidRun Clearfog after updating for the 5.0 merge window
>>> changes.
>>
>> Series applied, thanks Russell.
>
> This series contained:
>
> - Device Tree bindings that had not been ACKed by the DT bindings
> maintainers, one of which should have been merged through the
> drivers/phy maintainer tree.
>
> - A brand new drivers/phy driver that had not been ACKed by the
> drivers/phy maintainer.
The PHY driver looks good to me. But I think it might still be better to take
it via PHY tree since there are other Marvell drivers that are getting merged
and will result in conflicts in Kconfig and Makefile.
Thanks
Kishon
^ permalink raw reply
* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: Eli Britstein @ 2019-02-08 9:33 UTC (permalink / raw)
To: Simon Horman, David Miller
Cc: gvrose8192@gmail.com, pshelar@ovn.org, dev@openvswitch.org,
netdev@vger.kernel.org
In-Reply-To: <20190208075903.so63odmtuvxawija@netronome.com>
On 2/8/2019 9:59 AM, Simon Horman wrote:
> On Mon, Feb 04, 2019 at 12:09:00PM -0800, David Miller wrote:
>> From: Gregory Rose <gvrose8192@gmail.com>
>> Date: Mon, 4 Feb 2019 11:41:29 -0800
>>
>>> On 2/3/2019 1:12 AM, Eli Britstein wrote:
>>>> Declare ovs key structures using macros as a pre-step towards to
>>>> enable retrieving fields information, as a work done in proposed
>>>> commit in the OVS tree https://patchwork.ozlabs.org/patch/1023406/
>>>> ("odp-util: Do not rewrite fields with the same values as matched"),
>>>> with no functional change.
>>>>
>>>> Signed-off-by: Eli Britstein <elibr@mellanox.com>
>>>> Reviewed-by: Roi Dayan <roid@mellanox.com>
>>> Obscuring the structures with these macros is awful. I'm opposed but
>>> I see it has already been
>>> accepted upstream so I guess that's that.
>> I am personally in no way obligated to apply this patch to my tree
>> just because "upstream" did, and I absolutely have no plans to do so
>> at this point.
>>
>> This patch is absolutely awful.
> I hate to jump on a bandwagon, but this patch makes the code much
> less readable.
Please review the alternative I have posted:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-February/356000.html
^ permalink raw reply
* RE: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: David Laight @ 2019-02-08 9:53 UTC (permalink / raw)
To: 'Marcelo Ricardo Leitner'
Cc: Julien Gomes, netdev@vger.kernel.org, linux-sctp@vger.kernel.org,
linux-kernel@vger.kernel.org, davem@davemloft.net,
nhorman@tuxdriver.com, vyasevich@gmail.com, lucien.xin@gmail.com
In-Reply-To: <20190207174715.GF13621@localhost.localdomain>
From: 'Marcelo Ricardo Leitner'
> Sent: 07 February 2019 17:47
...
> > > Maybe what we want(ed) here then is explicit versioning, to have the 3
> > > definitions available. Then the application is able to use, say struct
> > > sctp_event_subscribe, and be happy with it, while there is struct
> > > sctp_event_subscribe_v2 and struct sctp_event_subscribe_v3 there too.
> > >
> > > But it's too late for that now because that would break applications
> > > already using the new fields in sctp_event_subscribe.
> >
> > It is probably better to break the recompilation of the few programs
> > that use the new fields (and have them not work on old kernels)
> > than to stop recompilations of old programs stop working on old
> > kernels or have requested new options silently ignored.
>
> I got confused here, not sure what you mean. Seems there is one "stop"
> word too many.
More confusing than I intended...
With the current kernel and headers a 'new program' (one that
needs the new options) will fail to run on an old kernel - which is good.
However a recompilation of an 'old program' (that doesn't use
the new options) will also fail to run on an old kernel - which is bad.
Changing the kernel to ignore extra events flags breaks the 'new'
program.
Versioning the structure now (even though it should have been done
earlier) won't change the behaviour of existing binaries.
However a recompilation of an 'old' program would use the 'old'
structure and work on old kernels.
Attempts to recompile a 'new' program will fail - until the structure
name (or some #define to enable the extra fields) is changed.
Breaking compilations is much better than unexpected run-time
behaviour.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH net-next] ipvs: Use struct_size() helper
From: Simon Horman @ 2019-02-08 9:56 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Wensong Zhang, Julian Anastasov, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal, David S. Miller, netdev,
lvs-devel, netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190208004456.GA15845@embeddedor>
On Thu, Feb 07, 2019 at 06:44:56PM -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
>
> struct foo {
> int stuff;
> struct boo entry[];
> };
>
> size = sizeof(struct foo) + count * sizeof(struct boo);
> instance = alloc(size, GFP_KERNEL)
>
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
>
> size = struct_size(instance, entry, count);
>
> This code was detected with the help of Coccinelle.
>
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Pablo, could you consider applying this?
> ---
> net/netfilter/ipvs/ip_vs_ctl.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 7d6318664eb2..bcd9112f47d9 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -2734,8 +2734,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
> int size;
>
> get = (struct ip_vs_get_services *)arg;
> - size = sizeof(*get) +
> - sizeof(struct ip_vs_service_entry) * get->num_services;
> + size = struct_size(get, entrytable, get->num_services);
> if (*len != size) {
> pr_err("length: %u != %u\n", *len, size);
> ret = -EINVAL;
> @@ -2776,8 +2775,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
> int size;
>
> get = (struct ip_vs_get_dests *)arg;
> - size = sizeof(*get) +
> - sizeof(struct ip_vs_dest_entry) * get->num_dests;
> + size = struct_size(get, entrytable, get->num_dests);
> if (*len != size) {
> pr_err("length: %u != %u\n", *len, size);
> ret = -EINVAL;
> --
> 2.20.1
>
^ permalink raw reply
* Re: [PATCH net-next v2 0/6] Add comphy support for Armada 38x
From: Russell King - ARM Linux admin @ 2019-02-08 10:04 UTC (permalink / raw)
To: Kishon Vijay Abraham I
Cc: Thomas Petazzoni, David Miller, andrew, gregory.clement, jason,
sebastian.hesselbarth, devicetree, linux-arm-kernel, mark.rutland,
netdev, robh+dt
In-Reply-To: <295924f0-1007-4b39-189b-1b21e4bf1b86@ti.com>
On Fri, Feb 08, 2019 at 02:52:26PM +0530, Kishon Vijay Abraham I wrote:
> Hi,
>
> On 08/02/19 1:28 PM, Thomas Petazzoni wrote:
> > Hello David,
> >
> > On Thu, 07 Feb 2019 18:10:49 -0800 (PST)
> > David Miller <davem@davemloft.net> wrote:
> >
> >> From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> >> Date: Thu, 7 Feb 2019 16:18:25 +0000
> >>
> >>> This series adds support for the comphy for Armada 38x, which allows
> >>> these SoCs to use 2500BASE-X mode with appropriate SFP modules.
> >>>
> >>> Tested on SolidRun Clearfog after updating for the 5.0 merge window
> >>> changes.
> >>
> >> Series applied, thanks Russell.
> >
> > This series contained:
> >
> > - Device Tree bindings that had not been ACKed by the DT bindings
> > maintainers, one of which should have been merged through the
> > drivers/phy maintainer tree.
> >
> > - A brand new drivers/phy driver that had not been ACKed by the
> > drivers/phy maintainer.
>
> The PHY driver looks good to me. But I think it might still be better to take
> it via PHY tree since there are other Marvell drivers that are getting merged
> and will result in conflicts in Kconfig and Makefile.
... which would have the effect that if the DTS and mvneta changes
are merged ahead of the PHY changes into Linus' tree, mvneta breaks.
Breaking stuff in the merge window is something that needs to be
avoided - that's the exact time that bisects need to work.
Splitting a dependent patch series up for different subsystems to
avoid conflicts is not always a good idea.
Linus has said many times that he's okay dealing with simple
conflicts during the merge window (normally wanting the requester to
at least be aware of the conflict.) Yet, some seem to have fostered
an idea that "conflicts are bad and must be avoided at all costs".
I've sent Linus several pull requests with conflicts during the age
of git, and the conflict resolution diff in the pull request email.
It's *never* been a problem, even if Linus has decided a slightly
different resolution is better.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* Re: [PATCH net-next v2 0/6] Add comphy support for Armada 38x
From: Russell King - ARM Linux admin @ 2019-02-08 10:06 UTC (permalink / raw)
To: Thomas Petazzoni
Cc: David Miller, andrew, gregory.clement, jason, kishon,
sebastian.hesselbarth, devicetree, linux-arm-kernel, mark.rutland,
netdev, robh+dt
In-Reply-To: <20190208085821.7556a18d@windsurf>
On Fri, Feb 08, 2019 at 08:58:21AM +0100, Thomas Petazzoni wrote:
> Hello David,
>
> On Thu, 07 Feb 2019 18:10:49 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
> > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > Date: Thu, 7 Feb 2019 16:18:25 +0000
> >
> > > This series adds support for the comphy for Armada 38x, which allows
> > > these SoCs to use 2500BASE-X mode with appropriate SFP modules.
> > >
> > > Tested on SolidRun Clearfog after updating for the 5.0 merge window
> > > changes.
> >
> > Series applied, thanks Russell.
>
> This series contained:
>
> - Device Tree bindings that had not been ACKed by the DT bindings
> maintainers, one of which should have been merged through the
> drivers/phy maintainer tree.
Actually, it was reviewed by Rob on 3rd December, but I omitted to
add the attributation he sent. So that point is false.
>
> - A brand new drivers/phy driver that had not been ACKed by the
> drivers/phy maintainer.
>
> - Changes to platform Device Tree that should have been merged through
> the platform tree.
>
> Only patches 4/6 and 5/6 should go through the net-next tree, all the
> other patches should have gone through other trees.
>
> Best regards,
>
> Thomas
> --
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
>
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* RE: [PATCH bpf-next] tools/bpf: add missing strings.h include
From: David Laight @ 2019-02-08 10:23 UTC (permalink / raw)
To: 'Andrii Nakryiko', yhs@fb.com, songliubraving@fb.com,
ast@fb.com, kafai@fb.com, netdev@vger.kernel.org,
kernel-team@fb.com
In-Reply-To: <20190207175027.1950358-1-andriin@fb.com>
From: Andrii Nakryiko
> Sent: 07 February 2019 17:50
>
> Few files in libbpf are using bzero() function (defined in strings.h header), but
> don't include corresponding header. When libbpf is added as a dependency to pahole,
> this undeterministically causes warnings on some machines:
>
> bpf.c:225:2: warning: implicit declaration of function ‘bzero’ [-Wimplicit-function-declaration]
> bzero(&attr, sizeof(attr));
> ^~~~~
Wouldn't it be better to change these to the more portable memset()?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH net-next v2 2/6] phy: armada38x: add common phy support
From: Kishon Vijay Abraham I @ 2019-02-08 10:24 UTC (permalink / raw)
To: Russell King, Andrew Lunn, Gregory Clement, Jason Cooper,
Sebastian Hesselbarth, Thomas Petazzoni
Cc: devicetree, linux-arm-kernel, netdev
In-Reply-To: <E1grmOM-0000MF-PS@rmk-PC.armlinux.org.uk>
On 07/02/19 9:49 PM, Russell King wrote:
> Add support for the Armada 38x common phy to allow us to change the
> speed of the Ethernet serdes lane. This driver only supports
> manipulation of the speed, it does not support configuration of the
> common phy.
>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
This patch will conflict with PHY pull request (in the marvell Kconfig and
Makefile). But the resolution should be trivial and should be okay to go via
-net tree.
FWIW
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
> ---
> drivers/phy/marvell/Kconfig | 10 ++
> drivers/phy/marvell/Makefile | 1 +
> drivers/phy/marvell/phy-armada38x-comphy.c | 237 +++++++++++++++++++++++++++++
> 3 files changed, 248 insertions(+)
> create mode 100644 drivers/phy/marvell/phy-armada38x-comphy.c
>
> diff --git a/drivers/phy/marvell/Kconfig b/drivers/phy/marvell/Kconfig
> index 6fb4b56e4c14..224ea4e6a46d 100644
> --- a/drivers/phy/marvell/Kconfig
> +++ b/drivers/phy/marvell/Kconfig
> @@ -21,6 +21,16 @@ config PHY_BERLIN_USB
> help
> Enable this to support the USB PHY on Marvell Berlin SoCs.
>
> +config PHY_MVEBU_A38X_COMPHY
> + tristate "Marvell Armada 38x comphy driver"
> + depends on ARCH_MVEBU || COMPILE_TEST
> + depends on OF
> + select GENERIC_PHY
> + help
> + This driver allows to control the comphy, an hardware block providing
> + shared serdes PHYs on Marvell Armada 38x. Its serdes lanes can be
> + used by various controllers (Ethernet, sata, usb, PCIe...).
> +
> config PHY_MVEBU_CP110_COMPHY
> tristate "Marvell CP110 comphy driver"
> depends on ARCH_MVEBU || COMPILE_TEST
> diff --git a/drivers/phy/marvell/Makefile b/drivers/phy/marvell/Makefile
> index 3975b144f8ec..59b6c03ef756 100644
> --- a/drivers/phy/marvell/Makefile
> +++ b/drivers/phy/marvell/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_ARMADA375_USBCLUSTER_PHY) += phy-armada375-usb2.o
> obj-$(CONFIG_PHY_BERLIN_SATA) += phy-berlin-sata.o
> obj-$(CONFIG_PHY_BERLIN_USB) += phy-berlin-usb.o
> +obj-$(CONFIG_PHY_MVEBU_A38X_COMPHY) += phy-armada38x-comphy.o
> obj-$(CONFIG_PHY_MVEBU_CP110_COMPHY) += phy-mvebu-cp110-comphy.o
> obj-$(CONFIG_PHY_MVEBU_SATA) += phy-mvebu-sata.o
> obj-$(CONFIG_PHY_PXA_28NM_HSIC) += phy-pxa-28nm-hsic.o
> diff --git a/drivers/phy/marvell/phy-armada38x-comphy.c b/drivers/phy/marvell/phy-armada38x-comphy.c
> new file mode 100644
> index 000000000000..3e00bc679d4e
> --- /dev/null
> +++ b/drivers/phy/marvell/phy-armada38x-comphy.c
> @@ -0,0 +1,237 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018 Russell King, Deep Blue Solutions Ltd.
> + *
> + * Partly derived from CP110 comphy driver by Antoine Tenart
> + * <antoine.tenart@bootlin.com>
> + */
> +#include <linux/delay.h>
> +#include <linux/iopoll.h>
> +#include <linux/module.h>
> +#include <linux/phy/phy.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +
> +#define MAX_A38X_COMPHY 6
> +#define MAX_A38X_PORTS 3
> +
> +#define COMPHY_CFG1 0x00
> +#define COMPHY_CFG1_GEN_TX(x) ((x) << 26)
> +#define COMPHY_CFG1_GEN_TX_MSK COMPHY_CFG1_GEN_TX(15)
> +#define COMPHY_CFG1_GEN_RX(x) ((x) << 22)
> +#define COMPHY_CFG1_GEN_RX_MSK COMPHY_CFG1_GEN_RX(15)
> +#define GEN_SGMII_1_25GBPS 6
> +#define GEN_SGMII_3_125GBPS 8
> +
> +#define COMPHY_STAT1 0x18
> +#define COMPHY_STAT1_PLL_RDY_TX BIT(3)
> +#define COMPHY_STAT1_PLL_RDY_RX BIT(2)
> +
> +#define COMPHY_SELECTOR 0xfc
> +
> +struct a38x_comphy;
> +
> +struct a38x_comphy_lane {
> + void __iomem *base;
> + struct a38x_comphy *priv;
> + unsigned int n;
> +
> + int port;
> +};
> +
> +struct a38x_comphy {
> + void __iomem *base;
> + struct device *dev;
> + struct a38x_comphy_lane lane[MAX_A38X_COMPHY];
> +};
> +
> +static const u8 gbe_mux[MAX_A38X_COMPHY][MAX_A38X_PORTS] = {
> + { 0, 0, 0 },
> + { 4, 5, 0 },
> + { 0, 4, 0 },
> + { 0, 0, 4 },
> + { 0, 3, 0 },
> + { 0, 0, 3 },
> +};
> +
> +static void a38x_comphy_set_reg(struct a38x_comphy_lane *lane,
> + unsigned int offset, u32 mask, u32 value)
> +{
> + u32 val;
> +
> + val = readl_relaxed(lane->base + offset) & ~mask;
> + writel(val | value, lane->base + offset);
> +}
> +
> +static void a38x_comphy_set_speed(struct a38x_comphy_lane *lane,
> + unsigned int gen_tx, unsigned int gen_rx)
> +{
> + a38x_comphy_set_reg(lane, COMPHY_CFG1,
> + COMPHY_CFG1_GEN_TX_MSK | COMPHY_CFG1_GEN_RX_MSK,
> + COMPHY_CFG1_GEN_TX(gen_tx) |
> + COMPHY_CFG1_GEN_RX(gen_rx));
> +}
> +
> +static int a38x_comphy_poll(struct a38x_comphy_lane *lane,
> + unsigned int offset, u32 mask, u32 value)
> +{
> + u32 val;
> + int ret;
> +
> + ret = readl_relaxed_poll_timeout_atomic(lane->base + offset, val,
> + (val & mask) == value,
> + 1000, 150000);
> +
> + if (ret)
> + dev_err(lane->priv->dev,
> + "comphy%u: timed out waiting for status\n", lane->n);
> +
> + return ret;
> +}
> +
> +/*
> + * We only support changing the speed for comphys configured for GBE.
> + * Since that is all we do, we only poll for PLL ready status.
> + */
> +static int a38x_comphy_set_mode(struct phy *phy, enum phy_mode mode, int sub)
> +{
> + struct a38x_comphy_lane *lane = phy_get_drvdata(phy);
> + unsigned int gen;
> +
> + if (mode != PHY_MODE_ETHERNET)
> + return -EINVAL;
> +
> + switch (sub) {
> + case PHY_INTERFACE_MODE_SGMII:
> + case PHY_INTERFACE_MODE_1000BASEX:
> + gen = GEN_SGMII_1_25GBPS;
> + break;
> +
> + case PHY_INTERFACE_MODE_2500BASEX:
> + gen = GEN_SGMII_3_125GBPS;
> + break;
> +
> + default:
> + return -EINVAL;
> + }
> +
> + a38x_comphy_set_speed(lane, gen, gen);
> +
> + return a38x_comphy_poll(lane, COMPHY_STAT1,
> + COMPHY_STAT1_PLL_RDY_TX |
> + COMPHY_STAT1_PLL_RDY_RX,
> + COMPHY_STAT1_PLL_RDY_TX |
> + COMPHY_STAT1_PLL_RDY_RX);
> +}
> +
> +static const struct phy_ops a38x_comphy_ops = {
> + .set_mode = a38x_comphy_set_mode,
> + .owner = THIS_MODULE,
> +};
> +
> +static struct phy *a38x_comphy_xlate(struct device *dev,
> + struct of_phandle_args *args)
> +{
> + struct a38x_comphy_lane *lane;
> + struct phy *phy;
> + u32 val;
> +
> + if (WARN_ON(args->args[0] >= MAX_A38X_PORTS))
> + return ERR_PTR(-EINVAL);
> +
> + phy = of_phy_simple_xlate(dev, args);
> + if (IS_ERR(phy))
> + return phy;
> +
> + lane = phy_get_drvdata(phy);
> + if (lane->port >= 0)
> + return ERR_PTR(-EBUSY);
> +
> + lane->port = args->args[0];
> +
> + val = readl_relaxed(lane->priv->base + COMPHY_SELECTOR);
> + val = (val >> (4 * lane->n)) & 0xf;
> +
> + if (!gbe_mux[lane->n][lane->port] ||
> + val != gbe_mux[lane->n][lane->port]) {
> + dev_warn(lane->priv->dev,
> + "comphy%u: not configured for GBE\n", lane->n);
> + phy = ERR_PTR(-EINVAL);
> + }
> +
> + return phy;
> +}
> +
> +static int a38x_comphy_probe(struct platform_device *pdev)
> +{
> + struct phy_provider *provider;
> + struct device_node *child;
> + struct a38x_comphy *priv;
> + struct resource *res;
> + void __iomem *base;
> +
> + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + base = devm_ioremap_resource(&pdev->dev, res);
> + if (IS_ERR(base))
> + return PTR_ERR(base);
> +
> + priv->dev = &pdev->dev;
> + priv->base = base;
> +
> + for_each_available_child_of_node(pdev->dev.of_node, child) {
> + struct phy *phy;
> + int ret;
> + u32 val;
> +
> + ret = of_property_read_u32(child, "reg", &val);
> + if (ret < 0) {
> + dev_err(&pdev->dev, "missing 'reg' property (%d)\n",
> + ret);
> + continue;
> + }
> +
> + if (val >= MAX_A38X_COMPHY || priv->lane[val].base) {
> + dev_err(&pdev->dev, "invalid 'reg' property\n");
> + continue;
> + }
> +
> + phy = devm_phy_create(&pdev->dev, child, &a38x_comphy_ops);
> + if (IS_ERR(phy))
> + return PTR_ERR(phy);
> +
> + priv->lane[val].base = base + 0x28 * val;
> + priv->lane[val].priv = priv;
> + priv->lane[val].n = val;
> + priv->lane[val].port = -1;
> + phy_set_drvdata(phy, &priv->lane[val]);
> + }
> +
> + dev_set_drvdata(&pdev->dev, priv);
> +
> + provider = devm_of_phy_provider_register(&pdev->dev, a38x_comphy_xlate);
> +
> + return PTR_ERR_OR_ZERO(provider);
> +}
> +
> +static const struct of_device_id a38x_comphy_of_match_table[] = {
> + { .compatible = "marvell,armada-380-comphy" },
> + { },
> +};
> +MODULE_DEVICE_TABLE(of, a38x_comphy_of_match_table);
> +
> +static struct platform_driver a38x_comphy_driver = {
> + .probe = a38x_comphy_probe,
> + .driver = {
> + .name = "armada-38x-comphy",
> + .of_match_table = a38x_comphy_of_match_table,
> + },
> +};
> +module_platform_driver(a38x_comphy_driver);
> +
> +MODULE_AUTHOR("Russell King <rmk+kernel@armlinux.org.uk>");
> +MODULE_DESCRIPTION("Common PHY driver for Armada 38x SoCs");
> +MODULE_LICENSE("GPL v2");
>
^ permalink raw reply
* Re: TC stats / hw offload question
From: Edward Cree @ 2019-02-08 10:26 UTC (permalink / raw)
To: Jamal Hadi Salim, netdev; +Cc: Jiri Pirko, Cong Wang
In-Reply-To: <4cb765dd-453f-3139-bce6-6e0b31167aec@mojatatu.com>
On 06/02/19 02:20, Jamal Hadi Salim wrote:
> The classifiers dont mod the packets. The actions do. And they
> maintain stats on the size on "entry" i.e pre-edit.
Thank you for clearing that up.
> Each action keeps its own counters. If you did something like:
>
> tc match using flower blah \
> action vlan push tag ... \
> action redirect to egress of eth0
>
> And you submited a packet of size x bytes,
> then the "match" would record x bytes.
Sorry, where would it record that? I can't find any stats counters on
the "match" either in the software path or the offload API.
> the "vlan action" would record x bytes.
> the "redirect" would record size x+vlaninfo bytes
> the egress of eth0 would recorr x+vlaninfo bytes
Am I right in thinking that offloaded counters don't do that? As far
as I can tell, the drivers with flower offload all use
tcf_exts_stats_update() which takes a single 'bytes' count and adds
it to all the actions. (Presumably this is pre-edit length.)
-Ed
^ permalink raw reply
* Re: Is advertising of 2500Mbps support must from phy device to set phy at 2500Mbps link speed
From: abhijit @ 2019-02-08 10:37 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190206133858.GF20405@lunn.ch>
Thanks Andrew for your reply. I will have look at IEEE document and c45.
On Wednesday 06 February 2019 07:08 PM, Andrew Lunn wrote:
>> Currently, we don't have any phy drivers. Generic driver doesn't seems to
>> support 2500Mbps.
> Correct. genphy only supports upto 1G. The c45 based genphy_c45 is
> slowly gaining more features and might soon support 2.5G.
>
>> If I have to write the driver, whether it is necessary for
>> phy device to advertise speed of 2500Mbps?
> The user could force it, using the ethool command you suggested. But
> it is the PHY driver which configures this. If you add the driver code
> to force it, you might as well add the driver code to allow it to be
> negotiated.
>
>> Phy is custom phy and is currently under test. If you know any phy device
>> that supports 2500Mbps and whose data sheet is available freely please let
>> me know.
> There are none that i know of with open data sheets. However the IEEE
> standards should be freely available and they describe the registers
> the PHY is expected to have. There are also patches floating around
> which add 2.5G and 5G support to the marvell10g driver. I expect these
> patches to get merged soon, but maybe in a different form to make
> genphy_c45 more generic.
>
> Andrew
^ permalink raw reply
* Re: [iproute PATCH] ip-link: Fix listing of alias interfaces
From: Phil Sutter @ 2019-02-08 10:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Roopa Prabhu
In-Reply-To: <20190207162436.4452f7ad@hermes.lan>
On Thu, Feb 07, 2019 at 04:24:36PM -0800, Stephen Hemminger wrote:
> On Thu, 7 Feb 2019 14:05:27 +0100
> Phil Sutter <phil@nwl.cc> wrote:
>
> > Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
> > links in the old alias interface notation:
> >
> > | % ip link show eth0:1
> > | RTNETLINK answers: No such device
> > | Cannot send link get request: No such device
> >
> > Prior to the above commit, link lookup was performed via ifindex
> > returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
> > which on kernel side causes the colon-suffix to be dropped before doing
> > the interface lookup. Netlink API though doesn't care about that at all.
> > To keep things backward compatible, mimick ioctl API behaviour and drop
> > the colon-suffix prior to sending the RTM_GETLINK request.
> >
> > Fixes: 50b9950dd9011 ("link dump filter")
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
>
> What about mistaken usage where the text after the colon is not a number,
> or has additional colon?
That's completely ignored in ioctl-case as well. See dev_ioctl() in
kernel sources:
| colon = strchr(ifr->ifr_name, ':');
| if (colon)
| *colon = 0;
If you pass 'group 0' to link show command, ioctl code path is taken. It
allows (and drops) arbitrary input after the colon (as long as the total
name doesn't exceed 15 characters).
Cheers, Phil
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2019-02-08 10:42 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
[ To the actual mailing lists this time... ]
This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from
Jose Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL ccid,
from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep sleeps,
from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas.
Please pull, thanks a lot!
The following changes since commit 62967898789dc1f09a06e59fa85ae2c5ca4dc2da:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2019-01-29 17:11:47 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
for you to fetch changes up to 39841cc1cbb69344539c98a1fa9d858ed124c7ba:
net: dsa: b53: Fix for failure when irq is not defined in dt (2019-02-07 18:18:37 -0800)
----------------------------------------------------------------
Alexei Starovoitov (4):
Merge branch 'typedef-func_proto'
bpf: run bpf programs with preemption disabled
bpf: fix lockdep false positive in percpu_freelist
bpf: fix potential deadlock in bpf_prog_register
Andrew Lunn (1):
net: dsa: mv88e6xxx: Fix counting of ATU violations
Arun Parameswaran (1):
net: dsa: b53: Fix for failure when irq is not defined in dt
Bart Van Assche (1):
lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
Bjorn Helgaas (1):
net: Don't default Cavium PTP driver to 'y'
Brian Norris (1):
ath10k: correct bus type for WCN3990
Colin Ian King (1):
ieee802154: mcr20a: fix indentation, remove tabs
Dan Carpenter (2):
skge: potential memory corruption in skge_get_regs()
net: dsa: Fix NULL checking in dsa_slave_set_eee()
Daniel Borkmann (3):
bpf, doc: add reviewers to maintainers entry
ipvlan, l3mdev: fix broken l3s mode wrt local routes
Merge branch 'bpf-lockdep-fixes'
David S. Miller (17):
Merge branch 'net-various-compat-ioctl-fixes'
Merge branch 'erspan-always-reports-output-key-to-userspace'
Merge branch 'virtio_net-Fix-problems-around-XDP-tx-and-napi_tx'
Merge branch 'stmmac-fixes'
Merge branch 'ieee802154-for-davem-2019-01-31' of git://git.kernel.org/.../sschmidt/wpan
Merge tag 'mac80211-for-davem-2019-02-01' of git://git.kernel.org/.../jberg/mac80211
Merge tag 'batadv-net-for-davem-20190201' of git://git.open-mesh.org/linux-merge
Merge branch 'smc-fixes'
Merge git://git.kernel.org/.../bpf/bpf
Merge branch 'vsock-virtio-hot-unplug'
Merge branch 'smc-fixes'
Merge tag 'wireless-drivers-for-davem-2019-02-04' of git://git.kernel.org/.../kvalo/wireless-drivers
Merge branch 's390-qeth-fixes'
Merge git://git.kernel.org/.../pablo/nf
Merge tag 'mlx5-fixes-2019-02-05' of git://git.kernel.org/.../saeed/linux
Merge branch 'qed-Bug-fixes'
Merge branch 'ipv6-fixes'
Eli Cooper (1):
netfilter: ipv6: Don't preserve original oif for loopback address
Eric Dumazet (4):
rds: fix refcount bug in rds_sock_addref
dccp: fool proof ccid_hc_[rt]x_parse_options()
mISDN: fix a race in dev_expire_timer()
rxrpc: bad unlock balance in rxrpc_recvmsg
Felix Fietkau (2):
batman-adv: release station info tidstats
mac80211: ensure that mgmt tx skbs have tailroom for encryption
Florian Fainelli (1):
net: systemport: Fix WoL with password after deep sleep
Florian Westphal (2):
selftests: netfilter: add simple masq/redirect test cases
netfilter: nft_compat: don't use refcount_inc on newly allocated entry
George Amanakis (1):
tun: move the call to tun_set_real_num_queues
Govindarajulu Varadarajan (1):
enic: fix checksum validation for IPv6
Greg Kroah-Hartman (1):
sctp: walk the list of asoc safely
Guy Shattah (1):
net/mlx5e: Use the inner headers to determine tc/pedit offload limitation on decap flows
Hangbin Liu (2):
geneve: should not call rt6_lookup() when ipv6 was disabled
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
Hans Wippel (1):
net/smc: allow 16 byte pnetids in netlink policy
Jacob Wen (2):
l2tp: fix reading optional fields of L2TPv3
l2tp: copy 4 more bytes to linear part if necessary
Jakub Kicinski (1):
tools: bpftool: fix crash with un-owned prog arrays
Jakub Sitnicki (1):
sk_msg: Always cancel strp work before freeing the psock
Jiri Olsa (1):
bpftool: Fix prog dump by tag
Johannes Berg (5):
Revert "socket: fix struct ifreq size in compat ioctl"
Revert "kill dev_ifsioc()"
net: socket: fix SIOCGIFNAME in compat
net: socket: make bond ioctls go through compat_ifreq_ioctl()
cfg80211: call disconnect_wk when AP stops
Jose Abreu (3):
net: stmmac: Fallback to Platform Data clock in Watchdog conversion
net: stmmac: Send TSO packets always from Queue 0
net: stmmac: Disable EEE mode earlier in XMIT callback
Julian Wiedmann (4):
s390/qeth: release cmd buffer in error paths
s390/qeth: fix use-after-free in error path
s390/qeth: cancel close_dev work before removing a card
s390/qeth: conclude all event processing before offlining a card
Karsten Graul (7):
net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
net/smc: don't wait for send buffer space when data was already sent
net/smc: recvmsg and splice_read should return 0 after shutdown
net/smc: do not wait under send_lock
net/smc: call smc_cdc_msg_send() under send_lock
net/smc: use device link provided in qp_context
net/smc: fix use of variable in cleared area
Lorenzo Bianconi (3):
net: ip_gre: always reports o_key to userspace
net: ip6_gre: always reports o_key to userspace
mt76x0: eeprom: fix chan_vs_power map in mt76x0_get_power_info
Luca Coelho (1):
iwlwifi: make IWLWIFI depend on CFG80211
Manish Chopra (2):
qed: Fix EQ full firmware assert.
qed*: Advance drivers version to 8.37.0.20
Marc Zyngier (1):
net: dsa: Fix lockdep false positive splat
Martin KaFai Lau (1):
bpf: Fix syscall's stackmap lookup potential deadlock
Martynas Pumputis (2):
bpf, selftests: fix handling of sparse CPU allocations
netfilter: nf_nat: skip nat clash resolution for same-origin entries
Mathias Thore (1):
ucc_geth: Reset BQL queue when stopping device
Michael Chan (1):
bnxt_en: Disable interrupts when allocating CP rings or NQs.
Naresh Kamboju (1):
selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
Or Gerlitz (1):
net/mlx5e: Properly set steering match levels for offloaded TC decap rules
Pablo Neira Ayuso (1):
netfilter: nf_tables: unbind set in rule from commit path
Paolo Abeni (1):
bpftool: fix percpu maps updating
Petr Machata (1):
net: cls_flower: Remove filter from mask before freeing it
Raed Salem (1):
net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
Rahul Verma (1):
qed: Change verbosity for coalescing message.
Rundong Ge (1):
net: dsa: slave: Don't propagate flag changes on down slave interfaces
Russell King (2):
Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"
MAINTAINERS: add maintainer for SFF/SFP/SFP+ support
Sebastian Andrzej Siewior (1):
net: dp83640: expire old TX-skb
Siva Rebbagondla (1):
MAINTAINERS: add entry for redpine wireless driver
Stefano Garzarella (2):
vsock/virtio: fix kernel panic after device hot-unplug
vsock/virtio: reset connected sockets on device removal
Sudarsana Reddy Kalluru (3):
qed: Assign UFP TC value to vlan priority in UFP mode.
qed: Consider TX tcs while deriving the max num_queues for PF.
qede: Fix system crash on configuring channels.
Sven Eckelmann (2):
batman-adv: Avoid WARN on net_device without parent in netns
batman-adv: Force mac header to start of data on xmit
Tonghao Zhang (2):
net/mlx5e: Update hw flows when encap source mac changed
net/mlx5e: Don't overwrite pedit action when multiple pedit used
Toshiaki Makita (8):
virtio_net: Don't enable NAPI when interface is down
virtio_net: Don't call free_old_xmit_skbs for xdp_frames
virtio_net: Fix not restoring real_num_rx_queues
virtio_net: Fix out of bounds access of sq
virtio_net: Don't process redirected XDP frames when XDP is disabled
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
virtio_net: Differentiate sk_buff and xdp_frame on freeing
virtio_net: Account for tx bytes and packets on sending xdp_frames
Ulf Hansson (1):
wlcore: sdio: Fixup power on/off sequence
Ursula Braun (5):
net/smc: fix another sizeof to int comparison
net/smc: preallocated memory for rdma work requests
net/smc: fix sender_free computation
net/smc: delete rkey first before switching to unused
net/smc: correct state change for peer closing
Xin Long (1):
sctp: check and update stream->out_curr when allocating stream_out
Yafang Shao (1):
bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
Yang Wei (10):
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yohei Kanemaru (1):
ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
Yonghong Song (2):
bpf: btf: allow typedef func_proto
tools/bpf: fix test_btf for typedef func_proto case
MAINTAINERS | 21 +++
drivers/isdn/mISDN/timerdev.c | 2 +-
drivers/net/dsa/b53/b53_srab.c | 3 -
drivers/net/dsa/mv88e6xxx/global1_atu.c | 21 +--
drivers/net/ethernet/broadcom/bcmsysport.c | 25 ++-
drivers/net/ethernet/broadcom/bcmsysport.h | 2 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +-
drivers/net/ethernet/broadcom/sb1250-mac.c | 2 +-
drivers/net/ethernet/cavium/Kconfig | 1 -
drivers/net/ethernet/cisco/enic/enic_main.c | 3 +-
drivers/net/ethernet/dec/tulip/de2104x.c | 2 +-
drivers/net/ethernet/freescale/fec_mpc52xx.c | 2 +-
drivers/net/ethernet/freescale/ucc_geth.c | 2 +
drivers/net/ethernet/marvell/skge.c | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 4 +
drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 52 ++++---
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 6 +
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 17 ++-
drivers/net/ethernet/qlogic/qed/qed.h | 2 +-
drivers/net/ethernet/qlogic/qed/qed_l2.c | 8 +-
drivers/net/ethernet/qlogic/qed/qed_sp.h | 1 +
drivers/net/ethernet/qlogic/qed/qed_sp_commands.c | 3 +
drivers/net/ethernet/qlogic/qed/qed_spq.c | 15 +-
drivers/net/ethernet/qlogic/qede/qede.h | 5 +-
drivers/net/ethernet/qlogic/qede/qede_fp.c | 13 ++
drivers/net/ethernet/qlogic/qede/qede_main.c | 3 +
drivers/net/ethernet/smsc/epic100.c | 2 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 17 ++-
drivers/net/ethernet/sun/cassini.c | 2 +-
drivers/net/ethernet/sun/sunbmac.c | 2 +-
drivers/net/ethernet/sun/sunhme.c | 2 +-
drivers/net/ethernet/tehuti/tehuti.c | 2 +-
drivers/net/ethernet/via/via-velocity.c | 2 +-
drivers/net/fddi/defxx.c | 2 +-
drivers/net/geneve.c | 10 +-
drivers/net/ieee802154/mcr20a.c | 6 +-
drivers/net/ipvlan/ipvlan_main.c | 6 +-
drivers/net/phy/dp83640.c | 13 +-
drivers/net/phy/marvell.c | 16 --
drivers/net/tun.c | 3 +-
drivers/net/virtio_net.c | 171 +++++++++++++++------
drivers/net/wan/dscc4.c | 2 +-
drivers/net/wan/fsl_ucc_hdlc.c | 2 +-
drivers/net/wireless/ath/ath10k/core.c | 2 +-
drivers/net/wireless/intel/iwlwifi/Kconfig | 3 +-
drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c | 40 +++--
drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.h | 2 +-
drivers/net/wireless/mediatek/mt76/mt76x0/phy.c | 10 +-
drivers/net/wireless/ti/wlcore/sdio.c | 15 +-
drivers/s390/net/qeth_core.h | 3 +-
drivers/s390/net/qeth_core_main.c | 31 ++--
drivers/s390/net/qeth_l2_main.c | 8 +-
drivers/s390/net/qeth_l3_main.c | 3 +
include/linux/filter.h | 21 ++-
include/linux/netdevice.h | 8 +
include/linux/stmmac.h | 1 +
include/net/l3mdev.h | 3 +-
include/net/netfilter/nf_tables.h | 17 ++-
kernel/bpf/btf.c | 3 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/hashtab.c | 4 +-
kernel/bpf/percpu_freelist.c | 41 +++--
kernel/bpf/percpu_freelist.h | 4 +
kernel/bpf/syscall.c | 12 +-
kernel/trace/bpf_trace.c | 14 +-
lib/test_rhashtable.c | 23 ++-
net/batman-adv/bat_v_elp.c | 3 +
net/batman-adv/hard-interface.c | 5 +-
net/batman-adv/soft-interface.c | 2 +
net/core/filter.c | 2 +
net/core/skmsg.c | 3 +-
net/dccp/ccid.h | 4 +-
net/dsa/master.c | 4 +
net/dsa/slave.c | 17 ++-
net/ipv4/ip_gre.c | 7 +-
net/ipv6/ip6_gre.c | 7 +-
net/ipv6/netfilter.c | 4 +-
net/ipv6/seg6_iptunnel.c | 2 +
net/ipv6/sit.c | 3 +-
net/l2tp/l2tp_core.c | 9 +-
net/l2tp/l2tp_core.h | 20 +++
net/l2tp/l2tp_ip.c | 3 +
net/l2tp/l2tp_ip6.c | 3 +
net/mac80211/tx.c | 12 +-
net/netfilter/nf_conntrack_core.c | 16 ++
net/netfilter/nf_tables_api.c | 85 +++++------
net/netfilter/nft_compat.c | 62 +++-----
net/netfilter/nft_dynset.c | 18 +--
net/netfilter/nft_immediate.c | 6 +-
net/netfilter/nft_lookup.c | 18 +--
net/netfilter/nft_objref.c | 18 +--
net/rds/bind.c | 6 +-
net/rxrpc/recvmsg.c | 3 +-
net/sched/cls_flower.c | 6 +-
net/sctp/socket.c | 4 +-
net/sctp/stream.c | 20 +++
net/smc/af_smc.c | 11 +-
net/smc/smc_cdc.c | 21 ++-
net/smc/smc_cdc.h | 34 ++++-
net/smc/smc_clc.c | 2 +-
net/smc/smc_close.c | 9 +-
net/smc/smc_core.c | 6 +-
net/smc/smc_core.h | 20 +++
net/smc/smc_ib.c | 6 +-
net/smc/smc_llc.c | 3 +-
net/smc/smc_pnet.c | 2 +-
net/smc/smc_tx.c | 64 ++++----
net/smc/smc_wr.c | 46 +++++-
net/smc/smc_wr.h | 1 +
net/socket.c | 82 +++++++---
net/vmw_vsock/virtio_transport.c | 29 +++-
net/wireless/ap.c | 2 +
net/wireless/core.h | 2 +
net/wireless/sme.c | 2 +-
tools/bpf/bpftool/common.c | 6 +-
tools/bpf/bpftool/map.c | 33 ++--
tools/bpf/bpftool/prog.c | 5 +-
tools/testing/selftests/bpf/bpf_util.h | 30 ++--
tools/testing/selftests/bpf/test_btf.c | 9 +-
tools/testing/selftests/netfilter/Makefile | 2 +-
tools/testing/selftests/netfilter/config | 2 +-
tools/testing/selftests/netfilter/nft_nat.sh | 762 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
127 files changed, 1802 insertions(+), 544 deletions(-)
create mode 100755 tools/testing/selftests/netfilter/nft_nat.sh
^ permalink raw reply
* Re: [PATCH net-next 1/2] Revert "devlink: Add a generic wake_on_lan port parameter"
From: kbuild test robot @ 2019-02-08 11:00 UTC (permalink / raw)
To: Vasundhara Volam; +Cc: kbuild-all, davem, michael.chan, jiri, netdev
In-Reply-To: <1549617190-387130-2-git-send-email-vasundhara-v.volam@broadcom.com>
[-- Attachment #1: Type: text/plain, Size: 24900 bytes --]
Hi Vasundhara,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Vasundhara-Volam/Revert-wake_on_lan-devlink-parameter/20190208-181949
config: i386-randconfig-x000-201905 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-14) 8.2.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
Note: the linux-review/Vasundhara-Volam/Revert-wake_on_lan-devlink-parameter/20190208-181949 HEAD caa636fa491621c75cb625cb981adfe514368a45 builds fine.
It only hurts bisectibility.
All error/warnings (new ones prefixed by >>):
>> drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:41:3: error: 'DEVLINK_PARAM_GENERIC_ID_WOL' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
{DEVLINK_PARAM_GENERIC_ID_WOL, NVM_OFF_WOL, BNXT_NVM_PORT_CFG, 1},
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEVLINK_PARAM_GENERIC_ID_MAX
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c: In function 'bnxt_hwrm_nvm_req':
>> drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:76:20: warning: comparison between pointer and integer
nvm_param.id != DEVLINK_PARAM_GENERIC_ID_WOL)
^~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c: In function 'bnxt_dl_wol_validate':
>> drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:174:28: error: 'DEVLINK_PARAM_WAKE_MAGIC' undeclared (first use in this function); did you mean 'DEVLINK_PARAM_CMODE_MAX'?
if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
^~~~~~~~~~~~~~~~~~~~~~~~
DEVLINK_PARAM_CMODE_MAX
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:174:28: note: each undeclared identifier is reported only once for each function it appears in
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:174:25: warning: comparison between pointer and integer
if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
^~
In file included from drivers/net//ethernet/broadcom/bnxt/bnxt.h:23,
from drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:13:
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c: At top level:
>> include/net/devlink.h:404:8: warning: initialization of 'unsigned int' from 'const struct bnxt_dl_nvm_param *' makes integer from pointer without a cast [-Wint-conversion]
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:404:8: note: (near initialization for 'bnxt_dl_port_params[0].id')
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:404:8: error: initializer element is not constant
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:404:8: note: (near initialization for 'bnxt_dl_port_params[0].id')
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:405:10: error: 'DEVLINK_PARAM_GENERIC_WOL_NAME' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:405:10: error: initialization of 'const char *' from incompatible pointer type 'const struct bnxt_dl_nvm_param *' [-Werror=incompatible-pointer-types]
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: note: (near initialization for 'bnxt_dl_port_params[0].name')
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: error: initializer element is not constant
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: note: (near initialization for 'bnxt_dl_port_params[0].name')
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:406:10: error: 'DEVLINK_PARAM_GENERIC_WOL_TYPE' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
.type = DEVLINK_PARAM_GENERIC_##_id##_TYPE, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:406:10: error: incompatible types when initializing type 'enum devlink_param_type' using type 'const struct bnxt_dl_nvm_param *'
.type = DEVLINK_PARAM_GENERIC_##_id##_TYPE, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
--
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:41:3: error: 'DEVLINK_PARAM_GENERIC_ID_WOL' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
{DEVLINK_PARAM_GENERIC_ID_WOL, NVM_OFF_WOL, BNXT_NVM_PORT_CFG, 1},
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
DEVLINK_PARAM_GENERIC_ID_MAX
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c: In function 'bnxt_hwrm_nvm_req':
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:76:20: warning: comparison between pointer and integer
nvm_param.id != DEVLINK_PARAM_GENERIC_ID_WOL)
^~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c: In function 'bnxt_dl_wol_validate':
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:174:28: error: 'DEVLINK_PARAM_WAKE_MAGIC' undeclared (first use in this function); did you mean 'DEVLINK_PARAM_CMODE_MAX'?
if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
^~~~~~~~~~~~~~~~~~~~~~~~
DEVLINK_PARAM_CMODE_MAX
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:174:28: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:174:25: warning: comparison between pointer and integer
if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
^~
In file included from drivers/net/ethernet/broadcom/bnxt/bnxt.h:23,
from drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:13:
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c: At top level:
>> include/net/devlink.h:404:8: warning: initialization of 'unsigned int' from 'const struct bnxt_dl_nvm_param *' makes integer from pointer without a cast [-Wint-conversion]
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:404:8: note: (near initialization for 'bnxt_dl_port_params[0].id')
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:404:8: error: initializer element is not constant
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:404:8: note: (near initialization for 'bnxt_dl_port_params[0].id')
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:405:10: error: 'DEVLINK_PARAM_GENERIC_WOL_NAME' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:405:10: error: initialization of 'const char *' from incompatible pointer type 'const struct bnxt_dl_nvm_param *' [-Werror=incompatible-pointer-types]
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: note: (near initialization for 'bnxt_dl_port_params[0].name')
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: error: initializer element is not constant
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
include/net/devlink.h:405:10: note: (near initialization for 'bnxt_dl_port_params[0].name')
.name = DEVLINK_PARAM_GENERIC_##_id##_NAME, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:406:10: error: 'DEVLINK_PARAM_GENERIC_WOL_TYPE' undeclared here (not in a function); did you mean 'DEVLINK_PARAM_GENERIC_ID_MAX'?
.type = DEVLINK_PARAM_GENERIC_##_id##_TYPE, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
>> include/net/devlink.h:406:10: error: incompatible types when initializing type 'enum devlink_param_type' using type 'const struct bnxt_dl_nvm_param *'
.type = DEVLINK_PARAM_GENERIC_##_id##_TYPE, \
^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:206:2: note: in expansion of macro 'DEVLINK_PARAM_GENERIC'
DEVLINK_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
^~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +41 drivers/net//ethernet/broadcom/bnxt/bnxt_devlink.c
2dc0865e Vasundhara Volam 2018-10-04 28
6354b95e Vasundhara Volam 2018-07-04 29 static const struct bnxt_dl_nvm_param nvm_params[] = {
6354b95e Vasundhara Volam 2018-07-04 30 {DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, NVM_OFF_ENABLE_SRIOV,
6354b95e Vasundhara Volam 2018-07-04 31 BNXT_NVM_SHARED_CFG, 1},
7d859234 Vasundhara Volam 2018-10-04 32 {DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI, NVM_OFF_IGNORE_ARI,
7d859234 Vasundhara Volam 2018-10-04 33 BNXT_NVM_SHARED_CFG, 1},
f399e849 Vasundhara Volam 2018-10-04 34 {DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
f399e849 Vasundhara Volam 2018-10-04 35 NVM_OFF_MSIX_VEC_PER_PF_MAX, BNXT_NVM_SHARED_CFG, 10},
f399e849 Vasundhara Volam 2018-10-04 36 {DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
f399e849 Vasundhara Volam 2018-10-04 37 NVM_OFF_MSIX_VEC_PER_PF_MIN, BNXT_NVM_SHARED_CFG, 7},
2dc0865e Vasundhara Volam 2018-10-04 38 {BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK, NVM_OFF_DIS_GRE_VER_CHECK,
2dc0865e Vasundhara Volam 2018-10-04 39 BNXT_NVM_SHARED_CFG, 1},
782a624d Vasundhara Volam 2019-01-28 40
782a624d Vasundhara Volam 2019-01-28 @41 {DEVLINK_PARAM_GENERIC_ID_WOL, NVM_OFF_WOL, BNXT_NVM_PORT_CFG, 1},
6354b95e Vasundhara Volam 2018-07-04 42 };
6354b95e Vasundhara Volam 2018-07-04 43
6354b95e Vasundhara Volam 2018-07-04 44 static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
6354b95e Vasundhara Volam 2018-07-04 45 int msg_len, union devlink_param_value *val)
6354b95e Vasundhara Volam 2018-07-04 46 {
6fc92c33 Michael Chan 2018-08-05 47 struct hwrm_nvm_get_variable_input *req = msg;
6354b95e Vasundhara Volam 2018-07-04 48 void *data_addr = NULL, *buf = NULL;
6354b95e Vasundhara Volam 2018-07-04 49 struct bnxt_dl_nvm_param nvm_param;
6354b95e Vasundhara Volam 2018-07-04 50 int bytesize, idx = 0, rc, i;
6354b95e Vasundhara Volam 2018-07-04 51 dma_addr_t data_dma_addr;
6354b95e Vasundhara Volam 2018-07-04 52
6354b95e Vasundhara Volam 2018-07-04 53 /* Get/Set NVM CFG parameter is supported only on PFs */
6354b95e Vasundhara Volam 2018-07-04 54 if (BNXT_VF(bp))
6354b95e Vasundhara Volam 2018-07-04 55 return -EPERM;
6354b95e Vasundhara Volam 2018-07-04 56
6354b95e Vasundhara Volam 2018-07-04 57 for (i = 0; i < ARRAY_SIZE(nvm_params); i++) {
6354b95e Vasundhara Volam 2018-07-04 58 if (nvm_params[i].id == param_id) {
6354b95e Vasundhara Volam 2018-07-04 59 nvm_param = nvm_params[i];
6354b95e Vasundhara Volam 2018-07-04 60 break;
6354b95e Vasundhara Volam 2018-07-04 61 }
6354b95e Vasundhara Volam 2018-07-04 62 }
6354b95e Vasundhara Volam 2018-07-04 63
65fac4fe zhong jiang 2018-09-18 64 if (i == ARRAY_SIZE(nvm_params))
65fac4fe zhong jiang 2018-09-18 65 return -EOPNOTSUPP;
65fac4fe zhong jiang 2018-09-18 66
6354b95e Vasundhara Volam 2018-07-04 67 if (nvm_param.dir_type == BNXT_NVM_PORT_CFG)
6354b95e Vasundhara Volam 2018-07-04 68 idx = bp->pf.port_id;
6354b95e Vasundhara Volam 2018-07-04 69 else if (nvm_param.dir_type == BNXT_NVM_FUNC_CFG)
6354b95e Vasundhara Volam 2018-07-04 70 idx = bp->pf.fw_fid - BNXT_FIRST_PF_FID;
6354b95e Vasundhara Volam 2018-07-04 71
6354b95e Vasundhara Volam 2018-07-04 72 bytesize = roundup(nvm_param.num_bits, BITS_PER_BYTE) / BITS_PER_BYTE;
f399e849 Vasundhara Volam 2018-10-04 73 switch (bytesize) {
f399e849 Vasundhara Volam 2018-10-04 74 case 1:
782a624d Vasundhara Volam 2019-01-28 75 if (nvm_param.num_bits == 1 &&
782a624d Vasundhara Volam 2019-01-28 @76 nvm_param.id != DEVLINK_PARAM_GENERIC_ID_WOL)
6354b95e Vasundhara Volam 2018-07-04 77 buf = &val->vbool;
f399e849 Vasundhara Volam 2018-10-04 78 else
f399e849 Vasundhara Volam 2018-10-04 79 buf = &val->vu8;
f399e849 Vasundhara Volam 2018-10-04 80 break;
f399e849 Vasundhara Volam 2018-10-04 81 case 2:
f399e849 Vasundhara Volam 2018-10-04 82 buf = &val->vu16;
f399e849 Vasundhara Volam 2018-10-04 83 break;
f399e849 Vasundhara Volam 2018-10-04 84 case 4:
f399e849 Vasundhara Volam 2018-10-04 85 buf = &val->vu32;
f399e849 Vasundhara Volam 2018-10-04 86 break;
f399e849 Vasundhara Volam 2018-10-04 87 default:
f399e849 Vasundhara Volam 2018-10-04 88 return -EFAULT;
f399e849 Vasundhara Volam 2018-10-04 89 }
6354b95e Vasundhara Volam 2018-07-04 90
750afb08 Luis Chamberlain 2019-01-04 91 data_addr = dma_alloc_coherent(&bp->pdev->dev, bytesize,
6354b95e Vasundhara Volam 2018-07-04 92 &data_dma_addr, GFP_KERNEL);
6354b95e Vasundhara Volam 2018-07-04 93 if (!data_addr)
6354b95e Vasundhara Volam 2018-07-04 94 return -ENOMEM;
6354b95e Vasundhara Volam 2018-07-04 95
6fc92c33 Michael Chan 2018-08-05 96 req->dest_data_addr = cpu_to_le64(data_dma_addr);
6354b95e Vasundhara Volam 2018-07-04 97 req->data_len = cpu_to_le16(nvm_param.num_bits);
6354b95e Vasundhara Volam 2018-07-04 98 req->option_num = cpu_to_le16(nvm_param.offset);
6354b95e Vasundhara Volam 2018-07-04 99 req->index_0 = cpu_to_le16(idx);
6354b95e Vasundhara Volam 2018-07-04 100 if (idx)
6354b95e Vasundhara Volam 2018-07-04 101 req->dimensions = cpu_to_le16(1);
6354b95e Vasundhara Volam 2018-07-04 102
6fc92c33 Michael Chan 2018-08-05 103 if (req->req_type == cpu_to_le16(HWRM_NVM_SET_VARIABLE))
6354b95e Vasundhara Volam 2018-07-04 104 memcpy(data_addr, buf, bytesize);
6354b95e Vasundhara Volam 2018-07-04 105
6354b95e Vasundhara Volam 2018-07-04 106 rc = hwrm_send_message(bp, msg, msg_len, HWRM_CMD_TIMEOUT);
6fc92c33 Michael Chan 2018-08-05 107 if (!rc && req->req_type == cpu_to_le16(HWRM_NVM_GET_VARIABLE))
6354b95e Vasundhara Volam 2018-07-04 108 memcpy(buf, data_addr, bytesize);
6354b95e Vasundhara Volam 2018-07-04 109
6354b95e Vasundhara Volam 2018-07-04 110 dma_free_coherent(&bp->pdev->dev, bytesize, data_addr, data_dma_addr);
3a1d52a5 Vasundhara Volam 2018-10-04 111 if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
3a1d52a5 Vasundhara Volam 2018-10-04 112 netdev_err(bp->dev, "PF does not have admin privileges to modify NVM config\n");
3a1d52a5 Vasundhara Volam 2018-10-04 113 return -EACCES;
3a1d52a5 Vasundhara Volam 2018-10-04 114 } else if (rc) {
6354b95e Vasundhara Volam 2018-07-04 115 return -EIO;
3a1d52a5 Vasundhara Volam 2018-10-04 116 }
6354b95e Vasundhara Volam 2018-07-04 117 return 0;
6354b95e Vasundhara Volam 2018-07-04 118 }
6354b95e Vasundhara Volam 2018-07-04 119
6354b95e Vasundhara Volam 2018-07-04 120 static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 id,
6354b95e Vasundhara Volam 2018-07-04 121 struct devlink_param_gset_ctx *ctx)
6354b95e Vasundhara Volam 2018-07-04 122 {
6354b95e Vasundhara Volam 2018-07-04 123 struct hwrm_nvm_get_variable_input req = {0};
6354b95e Vasundhara Volam 2018-07-04 124 struct bnxt *bp = bnxt_get_bp_from_dl(dl);
2dc0865e Vasundhara Volam 2018-10-04 125 int rc;
6354b95e Vasundhara Volam 2018-07-04 126
6354b95e Vasundhara Volam 2018-07-04 127 bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_GET_VARIABLE, -1, -1);
2dc0865e Vasundhara Volam 2018-10-04 128 rc = bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
2dc0865e Vasundhara Volam 2018-10-04 129 if (!rc)
2dc0865e Vasundhara Volam 2018-10-04 130 if (id == BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK)
2dc0865e Vasundhara Volam 2018-10-04 131 ctx->val.vbool = !ctx->val.vbool;
2dc0865e Vasundhara Volam 2018-10-04 132
2dc0865e Vasundhara Volam 2018-10-04 133 return rc;
6354b95e Vasundhara Volam 2018-07-04 134 }
6354b95e Vasundhara Volam 2018-07-04 135
6354b95e Vasundhara Volam 2018-07-04 136 static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 id,
6354b95e Vasundhara Volam 2018-07-04 137 struct devlink_param_gset_ctx *ctx)
6354b95e Vasundhara Volam 2018-07-04 138 {
6354b95e Vasundhara Volam 2018-07-04 139 struct hwrm_nvm_set_variable_input req = {0};
6354b95e Vasundhara Volam 2018-07-04 140 struct bnxt *bp = bnxt_get_bp_from_dl(dl);
6354b95e Vasundhara Volam 2018-07-04 141
6354b95e Vasundhara Volam 2018-07-04 142 bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_SET_VARIABLE, -1, -1);
2dc0865e Vasundhara Volam 2018-10-04 143
2dc0865e Vasundhara Volam 2018-10-04 144 if (id == BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK)
2dc0865e Vasundhara Volam 2018-10-04 145 ctx->val.vbool = !ctx->val.vbool;
2dc0865e Vasundhara Volam 2018-10-04 146
6354b95e Vasundhara Volam 2018-07-04 147 return bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
6354b95e Vasundhara Volam 2018-07-04 148 }
6354b95e Vasundhara Volam 2018-07-04 149
f399e849 Vasundhara Volam 2018-10-04 150 static int bnxt_dl_msix_validate(struct devlink *dl, u32 id,
f399e849 Vasundhara Volam 2018-10-04 151 union devlink_param_value val,
f399e849 Vasundhara Volam 2018-10-04 152 struct netlink_ext_ack *extack)
f399e849 Vasundhara Volam 2018-10-04 153 {
5fc7c12f Gustavo A. R. Silva 2018-10-05 154 int max_val = -1;
f399e849 Vasundhara Volam 2018-10-04 155
f399e849 Vasundhara Volam 2018-10-04 156 if (id == DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX)
f399e849 Vasundhara Volam 2018-10-04 157 max_val = BNXT_MSIX_VEC_MAX;
f399e849 Vasundhara Volam 2018-10-04 158
f399e849 Vasundhara Volam 2018-10-04 159 if (id == DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN)
f399e849 Vasundhara Volam 2018-10-04 160 max_val = BNXT_MSIX_VEC_MIN_MAX;
f399e849 Vasundhara Volam 2018-10-04 161
5fc7c12f Gustavo A. R. Silva 2018-10-05 162 if (val.vu32 > max_val) {
f399e849 Vasundhara Volam 2018-10-04 163 NL_SET_ERR_MSG_MOD(extack, "MSIX value is exceeding the range");
f399e849 Vasundhara Volam 2018-10-04 164 return -EINVAL;
f399e849 Vasundhara Volam 2018-10-04 165 }
f399e849 Vasundhara Volam 2018-10-04 166
f399e849 Vasundhara Volam 2018-10-04 167 return 0;
f399e849 Vasundhara Volam 2018-10-04 168 }
f399e849 Vasundhara Volam 2018-10-04 169
782a624d Vasundhara Volam 2019-01-28 170 static int bnxt_dl_wol_validate(struct devlink *dl, u32 id,
782a624d Vasundhara Volam 2019-01-28 171 union devlink_param_value val,
782a624d Vasundhara Volam 2019-01-28 172 struct netlink_ext_ack *extack)
782a624d Vasundhara Volam 2019-01-28 173 {
782a624d Vasundhara Volam 2019-01-28 @174 if (val.vu8 && val.vu8 != DEVLINK_PARAM_WAKE_MAGIC) {
782a624d Vasundhara Volam 2019-01-28 175 NL_SET_ERR_MSG_MOD(extack, "WOL type is not supported");
782a624d Vasundhara Volam 2019-01-28 176 return -EINVAL;
782a624d Vasundhara Volam 2019-01-28 177 }
782a624d Vasundhara Volam 2019-01-28 178 return 0;
782a624d Vasundhara Volam 2019-01-28 179 }
782a624d Vasundhara Volam 2019-01-28 180
:::::: The code at line 41 was first introduced by commit
:::::: 782a624d00fa22e7499f5abc29747501ec671313 bnxt_en: Add bnxt_en initial port params table and register it
:::::: TO: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34458 bytes --]
^ permalink raw reply
* RE: [EXT] Re: [PATCH net-next 0/2] qed*: SmartAN query support
From: Sudarsana Reddy Kalluru @ 2019-02-08 11:32 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem@davemloft.net, netdev@vger.kernel.org, Ariel Elior,
Michal Kalderon
In-Reply-To: <20190207115405.02563e74@cakuba.netronome.com>
>-----Original Message-----
>From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
>Sent: 08 February 2019 01:24
>To: Sudarsana Reddy Kalluru <skalluru@marvell.com>
>Cc: davem@davemloft.net; netdev@vger.kernel.org; Ariel Elior
><aelior@marvell.com>; Michal Kalderon <mkalderon@marvell.com>
>Subject: [EXT] Re: [PATCH net-next 0/2] qed*: SmartAN query support
>
>External Email
>
>----------------------------------------------------------------------
>On Thu, 7 Feb 2019 06:20:10 -0800, Sudarsana Reddy Kalluru wrote:
>> SmartAN feature detects the peer/cable capabilities and establishes
>> the link in the best possible configuration.
>
>It sounds familiar, I need to check with FW team, but I think we may be doing
>a similar thing, and adding a common API rather than ethtool flag would be
>preferable.
>
>Could you please share a little bit more detail? What are the configurations
>this would choose between?
Jakub,
Following doc provides detailed information on this feature. We simply need a flag to display whether the feature is enabled in the hardware or not, hence adding it to "ethtool --show-priv-flags".
https://www.cavium.com/Dell/Documents/Converged/TB_Establishing_Adaptive_Links_with_SmartAN_Dell.pdf
****
When an administrator first plugs a device (discrete optical module or Active Optical Cable assembly or DAC) into the Cavium FastLinQ adapter SFP/QSFP interface, the Cavium SmartAN technology reads the device type (discrete optics or AOC or DAC), what its speed rating is, special optics data (such as what FEC mode is required or if it is multi-speed capable), DAC CA-x type, and DAC length.
Armed with this information, the FastLinQ adapter attempts each possible mode (supported by that device) until it secures a link with the connected link partner (switch), without input from the end user.
****
Thanks,
Sudarsana
^ permalink raw reply
* Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
From: Florian Westphal @ 2019-02-08 11:54 UTC (permalink / raw)
To: Florian Westphal
Cc: Sander Eikelenboom, Pablo Neira Ayuso, David S. Miller, netdev,
linux-kernel
In-Reply-To: <20190208070710.rcbj6exqwz6m2o7o@breakpoint.cc>
Florian Westphal <fw@strlen.de> wrote:
> Sander Eikelenboom <linux@eikelenboom.it> wrote:
> > L.S.,
> >
> > While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
> > (using an nftables firewall with NAT and connection tracking).
> >
> > Unfortunately it isn't too obvious since no errors are logged, but on clients it
> > causes symptoms like firefox intermittently not being able to load pages with:
> > Network Protocol Error
> > An error occurred during a connection to www.example.com
> > The page you are trying to view cannot be shown because an error in the network protocol was detected.
> > Please contact the website owners to inform them of this problem.
> >
> > But it's only intermittently, so i can still visit some webpages with clients,
> > could be that packet size and or fragments are at play ?
> >
> > So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with
> > e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
> > but to no avail.
> >
> > After that I tried to git bisect and ended up with:
> >
> > faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
> > commit faec18dbb0405c7d4dda025054511dc3a6696918
> > Author: Florian Westphal <fw@strlen.de>
> > Date: Thu Dec 13 16:01:33 2018 +0100
> >
> > netfilter: nat: remove l4proto->manip_pkt
>
> Thanks, this is immensely helpful.
>
> I think I see the bug, we can't use target->dst.protonum in
> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
> with a related icmp packet.
>
> I will send a patch in a few hours when I get back.
Sander, does this patch fix things for you?
Thanks!
diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
/* Change outer to look like the reply to an incoming packet */
nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+ target.dst.protonum = IPPROTO_ICMP;
if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
return 0;
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
}
nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+ target.dst.protonum = IPPROTO_ICMPV6;
if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
return 0;
^ permalink raw reply
* Re: general protection fault in prepare_to_wait
From: syzbot @ 2019-02-08 12:06 UTC (permalink / raw)
To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs
In-Reply-To: <000000000000fa6a2c057e8b7064@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: ec7fd009e87c Merge branch 'ipv6-fixes'
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=17fc3d97400000
kernel config: https://syzkaller.appspot.com/x/.config?x=2e0064f906afee10
dashboard link: https://syzkaller.appspot.com/bug?extid=55f9d3e51d49e20b2ce5
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16970150c00000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+55f9d3e51d49e20b2ce5@syzkaller.appspotmail.com
8021q: adding VLAN 0 to HW filter on device batadv0
8021q: adding VLAN 0 to HW filter on device batadv0
8021q: adding VLAN 0 to HW filter on device batadv0
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7733 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #67
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__lock_acquire+0x8df/0x4700 kernel/locking/lockdep.c:3215
Code: 28 00 00 00 0f 85 35 27 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f
5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f
85 dc 27 00 00 49 81 3c 24 20 25 9a 89 0f 84 03 f8
RSP: 0018:ffff888088fa7970 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000018
RBP: ffff888088fa7b40 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000018
R13: 0000000000000001 R14: 0000000000000000 R15: ffff888098fc60c0
FS: 00007f2176cc4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f763933ddb8 CR3: 0000000088560000 CR4: 00000000001406f0
Call Trace:
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3841
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
prepare_to_wait+0x7c/0x300 kernel/sched/wait.c:230
nr_accept+0x239/0x790 net/netrom/af_netrom.c:796
__sys_accept4+0x350/0x6a0 net/socket.c:1588
__do_sys_accept net/socket.c:1629 [inline]
__se_sys_accept net/socket.c:1626 [inline]
__x64_sys_accept+0x75/0xb0 net/socket.c:1626
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2176cc3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 000000000073c0e0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2176cc46d4
R13: 00000000004bdc10 R14: 00000000004cdea0 R15: 00000000ffffffff
Modules linked in:
---[ end trace 82c8ff081ad12861 ]---
RIP: 0010:__lock_acquire+0x8df/0x4700 kernel/locking/lockdep.c:3215
Code: 28 00 00 00 0f 85 35 27 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f
5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f
85 dc 27 00 00 49 81 3c 24 20 25 9a 89 0f 84 03 f8
RSP: 0018:ffff888088fa7970 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000018
RBP: ffff888088fa7b40 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000018
R13: 0000000000000001 R14: 0000000000000000 R15: ffff888098fc60c0
FS: 00007f2176cc4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f763933ddb8 CR3: 0000000088560000 CR4: 00000000001406f0
^ permalink raw reply
* Re: [iproute PATCH] ip-link: Fix listing of alias interfaces
From: Michal Kubecek @ 2019-02-08 12:09 UTC (permalink / raw)
To: netdev; +Cc: Phil Sutter, Stephen Hemminger, Roopa Prabhu
In-Reply-To: <20190208104057.GE26388@orbyte.nwl.cc>
On Fri, Feb 08, 2019 at 11:40:57AM +0100, Phil Sutter wrote:
> On Thu, Feb 07, 2019 at 04:24:36PM -0800, Stephen Hemminger wrote:
> > On Thu, 7 Feb 2019 14:05:27 +0100
> > Phil Sutter <phil@nwl.cc> wrote:
> >
> > > Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
> > > links in the old alias interface notation:
> > >
> > > | % ip link show eth0:1
> > > | RTNETLINK answers: No such device
> > > | Cannot send link get request: No such device
> > >
> > > Prior to the above commit, link lookup was performed via ifindex
> > > returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
> > > which on kernel side causes the colon-suffix to be dropped before doing
> > > the interface lookup. Netlink API though doesn't care about that at all.
> > > To keep things backward compatible, mimick ioctl API behaviour and drop
> > > the colon-suffix prior to sending the RTM_GETLINK request.
> > >
> > > Fixes: 50b9950dd9011 ("link dump filter")
> > > Signed-off-by: Phil Sutter <phil@nwl.cc>
> >
> > What about mistaken usage where the text after the colon is not a number,
> > or has additional colon?
>
> That's completely ignored in ioctl-case as well. See dev_ioctl() in
> kernel sources:
>
> | colon = strchr(ifr->ifr_name, ':');
> | if (colon)
> | *colon = 0;
>
> If you pass 'group 0' to link show command, ioctl code path is taken. It
> allows (and drops) arbitrary input after the colon (as long as the total
> name doesn't exceed 15 characters).
Not only that, other ip link subcommands also use ioctl for interface
lookup so that e.g. "ip link del dummy1:x" deletes dummy1 without any
complaint.
But as I mentioned earlier in http://patchwork.ozlabs.org/patch/1037934/
I'm not sure this behaviour is really desirable.
Michal Kubecek
^ permalink raw reply
* Re: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: Neil Horman @ 2019-02-08 12:36 UTC (permalink / raw)
To: David Laight
Cc: 'Marcelo Ricardo Leitner', Julien Gomes,
netdev@vger.kernel.org, linux-sctp@vger.kernel.org,
linux-kernel@vger.kernel.org, davem@davemloft.net,
vyasevich@gmail.com, lucien.xin@gmail.com
In-Reply-To: <0415888f34e7494e9879a77599c618e0@AcuMS.aculab.com>
On Fri, Feb 08, 2019 at 09:53:03AM +0000, David Laight wrote:
> From: 'Marcelo Ricardo Leitner'
> > Sent: 07 February 2019 17:47
> ...
> > > > Maybe what we want(ed) here then is explicit versioning, to have the 3
> > > > definitions available. Then the application is able to use, say struct
> > > > sctp_event_subscribe, and be happy with it, while there is struct
> > > > sctp_event_subscribe_v2 and struct sctp_event_subscribe_v3 there too.
> > > >
> > > > But it's too late for that now because that would break applications
> > > > already using the new fields in sctp_event_subscribe.
> > >
> > > It is probably better to break the recompilation of the few programs
> > > that use the new fields (and have them not work on old kernels)
> > > than to stop recompilations of old programs stop working on old
> > > kernels or have requested new options silently ignored.
> >
> > I got confused here, not sure what you mean. Seems there is one "stop"
> > word too many.
>
> More confusing than I intended...
>
> With the current kernel and headers a 'new program' (one that
> needs the new options) will fail to run on an old kernel - which is good.
> However a recompilation of an 'old program' (that doesn't use
> the new options) will also fail to run on an old kernel - which is bad.
>
I disagree with this, at least as a unilateral statement. I would assert that
an old program, within the constraints of the issue being discussed here, will
run perfectly well, when built and run against a new kernel.
At issue is the size of the structure sctp_event_subscribe, and the fact that in
several instances over the last few years, its been extended to be larger and
encompass more events to subscribe to.
Nominally an application will use this structure (roughly) as follows:
...
struct sctp_event_subscribe events;
size_t evsize = sizeof(events);
memset(&events, 0, sizeof(events));
events.sctp_send_failure_event = 1; /*example event subscription*/
if (sctp_setsocktpt(sctp_fd, SOL_SCTP, SCTP_EVENTS, &events, &evsize) < 0) {
/* do error recovery */
}
....
Assume this code will be built and run against kernel versions A and B, in
which:
A) has a struct sctp_event_subscribe with a size of 9 bytes
B) has a struct sctp_event_subscribe with a size of 10 bytes (due to the added
field sctp_sender_dry_event)
That gives us 4 cases to handle
1) Application build against kernel A and run on kernel A. This works fine, the
sizes of the struct in question will always match
2) Application is built against kernel A and run on kernel B. In this case,
everything will work because the application passes a buffer of size 9, and the
kernel accepts it, because it allows for buffers to be shorter than the current
struct sctp_event_subscribe size. The kernel simply operates on the options
available in the buffer. The application is none the wiser, because it has no
knoweldge of the new option, nor should it because it was built against kernel
A, that never offered that option
3) Application is built against kernel B and run on kernel B. This works fine
for the same reason as (1).
4) Application is built against kernel B and run on kernel A. This will break
because the application is passing a buffer that is larger than what the kernel
expects, and rightly so. The application is passing in a buffer that is
incompatible with what the running kernel expects.
We could look into ways in which to detect the cases in which this might be
'ok', but I don't see why we should bother, because at some point its still an
error to pass in an incompatible buffer. In my mind this is no different than
trying to run a program that allocates hugepages on a kernel that doesn't
support hugepages (just to make up an example). Applications built against
newer kernel can't expect all the features/semantics/etc to be identical to
older kernels.
> Changing the kernel to ignore extra events flags breaks the 'new'
> program.
>
It shouldn't. Assuming you have a program built against headers from kernel B
(above), if you set a field in the structure that only exists in kernel B, and
try to run it on kernel A, you will get an EINVAL return, which is correct
behavior because you are attempting to deliver information to the kernel that
kernel A (the running kernel) doesn't know about. Thats correct behavior.
> Versioning the structure now (even though it should have been done
> earlier) won't change the behaviour of existing binaries.
>
I won't disagree about the niceness of versioning, but that ship has sailed.
> However a recompilation of an 'old' program would use the 'old'
> structure and work on old kernels.
To be clear, this is situation (1) above, and yeah, running on the kernel you
built your application against should always work from a compatibility
standpoint.
> Attempts to recompile a 'new' program will fail - until the structure
> name (or some #define to enable the extra fields) is changed.
>
Yes, but this is alawys the case for structures that change. If you have an
application built against kernel (B), and uses structure fields that only exist
in that version of the kernel (and not earlier) will fail to compile when built
against kernel (A) headers, and thats expected. This happens with any kernel
api that exists in a newer kernel but not an older kernel.
> Breaking compilations is much better than unexpected run-time
> behaviour.
>
Any time you make a system call to the kernel, you have to be prepared to handle
the resulting error condition, thats not unexpected. To assume that a system
call will always work is bad programming practice.
Neil
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
>
^ permalink raw reply
* [PATCH net-next] ipvlan: decouple l3s mode dependencies from other modes
From: Daniel Borkmann @ 2019-02-08 12:55 UTC (permalink / raw)
To: davem
Cc: netdev, Daniel Borkmann, Mahesh Bandewar, Florian Westphal,
Martynas Pumputis
Right now ipvlan has a hard dependency on CONFIG_NETFILTER and
otherwise it cannot be built. However, the only ipvlan operation
mode that actually depends on netfilter is l3s, everything else
is independent of it. Break this hard dependency such that users
are able to use ipvlan l3 mode on systems where netfilter is not
compiled in.
Therefore, this adds a hidden CONFIG_IPVLAN_L3S bool which is
defaulting to y when CONFIG_NETFILTER is set in order to retain
existing behavior for l3s. All l3s related code is refactored
into ipvlan_l3s.c that is compiled in when enabled.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Martynas Pumputis <m@lambda.lt>
---
drivers/net/Kconfig | 6 +-
drivers/net/ipvlan/Makefile | 3 +-
drivers/net/ipvlan/ipvlan.h | 37 ++++++-
drivers/net/ipvlan/ipvlan_core.c | 105 ++----------------
drivers/net/ipvlan/ipvlan_l3s.c | 227 +++++++++++++++++++++++++++++++++++++++
drivers/net/ipvlan/ipvlan_main.c | 117 +++-----------------
6 files changed, 287 insertions(+), 208 deletions(-)
create mode 100644 drivers/net/ipvlan/ipvlan_l3s.c
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index edb1c02..7f9727f6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -145,13 +145,15 @@ config MACVTAP
To compile this driver as a module, choose M here: the module
will be called macvtap.
+config IPVLAN_L3S
+ depends on NETFILTER
+ def_bool y
+ select NET_L3_MASTER_DEV
config IPVLAN
tristate "IP-VLAN support"
depends on INET
depends on IPV6 || !IPV6
- depends on NETFILTER
- select NET_L3_MASTER_DEV
---help---
This allows one to create virtual devices off of a main interface
and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
index 8a2c64d..3ee9536 100644
--- a/drivers/net/ipvlan/Makefile
+++ b/drivers/net/ipvlan/Makefile
@@ -5,4 +5,5 @@
obj-$(CONFIG_IPVLAN) += ipvlan.o
obj-$(CONFIG_IPVTAP) += ipvtap.o
-ipvlan-objs := ipvlan_core.o ipvlan_main.o
+ipvlan-objs-$(CONFIG_IPVLAN_L3S) += ipvlan_l3s.o
+ipvlan-objs := ipvlan_core.o ipvlan_main.o $(ipvlan-objs-y)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index adb826f..b906d2f 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -165,10 +165,9 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev *ipvlan,
const void *iaddr, bool is_v6);
bool ipvlan_addr_busy(struct ipvl_port *port, void *iaddr, bool is_v6);
void ipvlan_ht_addr_del(struct ipvl_addr *addr);
-struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
- u16 proto);
-unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state);
+struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h,
+ int addr_type, bool use_dest);
+void *ipvlan_get_L3_hdr(struct ipvl_port *port, struct sk_buff *skb, int *type);
void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
unsigned int len, bool success, bool mcast);
int ipvlan_link_new(struct net *src_net, struct net_device *dev,
@@ -177,6 +176,36 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
void ipvlan_link_setup(struct net_device *dev);
int ipvlan_link_register(struct rtnl_link_ops *ops);
+#ifdef CONFIG_IPVLAN_L3S
+int ipvlan_l3s_register(struct ipvl_port *port);
+void ipvlan_l3s_unregister(struct ipvl_port *port);
+void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet);
+int ipvlan_l3s_init(void);
+void ipvlan_l3s_cleanup(void);
+#else
+static inline int ipvlan_l3s_register(struct ipvl_port *port)
+{
+ return -ENOTSUPP;
+}
+
+static inline void ipvlan_l3s_unregister(struct ipvl_port *port)
+{
+}
+
+static inline void ipvlan_migrate_l3s_hook(struct net *oldnet,
+ struct net *newnet)
+{
+}
+
+static inline int ipvlan_l3s_init(void)
+{
+ return 0;
+}
+
+static inline void ipvlan_l3s_cleanup(void)
+{
+}
+#endif /* CONFIG_IPVLAN_L3S */
static inline bool netif_is_ipvlan_port(const struct net_device *dev)
{
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 1a8132e..e0f5bc8 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -138,7 +138,7 @@ bool ipvlan_addr_busy(struct ipvl_port *port, void *iaddr, bool is_v6)
return ret;
}
-static void *ipvlan_get_L3_hdr(struct ipvl_port *port, struct sk_buff *skb, int *type)
+void *ipvlan_get_L3_hdr(struct ipvl_port *port, struct sk_buff *skb, int *type)
{
void *lyr3h = NULL;
@@ -355,9 +355,8 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
return ret;
}
-static struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port,
- void *lyr3h, int addr_type,
- bool use_dest)
+struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h,
+ int addr_type, bool use_dest)
{
struct ipvl_addr *addr = NULL;
@@ -647,7 +646,9 @@ int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
case IPVLAN_MODE_L2:
return ipvlan_xmit_mode_l2(skb, dev);
case IPVLAN_MODE_L3:
+#ifdef CONFIG_IPVLAN_L3S
case IPVLAN_MODE_L3S:
+#endif
return ipvlan_xmit_mode_l3(skb, dev);
}
@@ -743,8 +744,10 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
return ipvlan_handle_mode_l2(pskb, port);
case IPVLAN_MODE_L3:
return ipvlan_handle_mode_l3(pskb, port);
+#ifdef CONFIG_IPVLAN_L3S
case IPVLAN_MODE_L3S:
return RX_HANDLER_PASS;
+#endif
}
/* Should not reach here */
@@ -753,97 +756,3 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
}
-
-static struct ipvl_addr *ipvlan_skb_to_addr(struct sk_buff *skb,
- struct net_device *dev)
-{
- struct ipvl_addr *addr = NULL;
- struct ipvl_port *port;
- void *lyr3h;
- int addr_type;
-
- if (!dev || !netif_is_ipvlan_port(dev))
- goto out;
-
- port = ipvlan_port_get_rcu(dev);
- if (!port || port->mode != IPVLAN_MODE_L3S)
- goto out;
-
- lyr3h = ipvlan_get_L3_hdr(port, skb, &addr_type);
- if (!lyr3h)
- goto out;
-
- addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
-out:
- return addr;
-}
-
-struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
- u16 proto)
-{
- struct ipvl_addr *addr;
- struct net_device *sdev;
-
- addr = ipvlan_skb_to_addr(skb, dev);
- if (!addr)
- goto out;
-
- sdev = addr->master->dev;
- switch (proto) {
- case AF_INET:
- {
- int err;
- struct iphdr *ip4h = ip_hdr(skb);
-
- err = ip_route_input_noref(skb, ip4h->daddr, ip4h->saddr,
- ip4h->tos, sdev);
- if (unlikely(err))
- goto out;
- break;
- }
-#if IS_ENABLED(CONFIG_IPV6)
- case AF_INET6:
- {
- struct dst_entry *dst;
- struct ipv6hdr *ip6h = ipv6_hdr(skb);
- int flags = RT6_LOOKUP_F_HAS_SADDR;
- struct flowi6 fl6 = {
- .flowi6_iif = sdev->ifindex,
- .daddr = ip6h->daddr,
- .saddr = ip6h->saddr,
- .flowlabel = ip6_flowinfo(ip6h),
- .flowi6_mark = skb->mark,
- .flowi6_proto = ip6h->nexthdr,
- };
-
- skb_dst_drop(skb);
- dst = ip6_route_input_lookup(dev_net(sdev), sdev, &fl6,
- skb, flags);
- skb_dst_set(skb, dst);
- break;
- }
-#endif
- default:
- break;
- }
-
-out:
- return skb;
-}
-
-unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- struct ipvl_addr *addr;
- unsigned int len;
-
- addr = ipvlan_skb_to_addr(skb, skb->dev);
- if (!addr)
- goto out;
-
- skb->dev = addr->master->dev;
- len = skb->len + ETH_HLEN;
- ipvlan_count_rx(addr->master, len, true, false);
-out:
- return NF_ACCEPT;
-}
diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c
new file mode 100644
index 0000000..9a2f240
--- /dev/null
+++ b/drivers/net/ipvlan/ipvlan_l3s.c
@@ -0,0 +1,227 @@
+/* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#include "ipvlan.h"
+
+static unsigned int ipvlan_netid __read_mostly;
+
+struct ipvlan_netns {
+ unsigned int ipvl_nf_hook_refcnt;
+};
+
+static struct ipvl_addr *ipvlan_skb_to_addr(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct ipvl_addr *addr = NULL;
+ struct ipvl_port *port;
+ int addr_type;
+ void *lyr3h;
+
+ if (!dev || !netif_is_ipvlan_port(dev))
+ goto out;
+
+ port = ipvlan_port_get_rcu(dev);
+ if (!port || port->mode != IPVLAN_MODE_L3S)
+ goto out;
+
+ lyr3h = ipvlan_get_L3_hdr(port, skb, &addr_type);
+ if (!lyr3h)
+ goto out;
+
+ addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
+out:
+ return addr;
+}
+
+static struct sk_buff *ipvlan_l3_rcv(struct net_device *dev,
+ struct sk_buff *skb, u16 proto)
+{
+ struct ipvl_addr *addr;
+ struct net_device *sdev;
+
+ addr = ipvlan_skb_to_addr(skb, dev);
+ if (!addr)
+ goto out;
+
+ sdev = addr->master->dev;
+ switch (proto) {
+ case AF_INET:
+ {
+ struct iphdr *ip4h = ip_hdr(skb);
+ int err;
+
+ err = ip_route_input_noref(skb, ip4h->daddr, ip4h->saddr,
+ ip4h->tos, sdev);
+ if (unlikely(err))
+ goto out;
+ break;
+ }
+#if IS_ENABLED(CONFIG_IPV6)
+ case AF_INET6:
+ {
+ struct dst_entry *dst;
+ struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ int flags = RT6_LOOKUP_F_HAS_SADDR;
+ struct flowi6 fl6 = {
+ .flowi6_iif = sdev->ifindex,
+ .daddr = ip6h->daddr,
+ .saddr = ip6h->saddr,
+ .flowlabel = ip6_flowinfo(ip6h),
+ .flowi6_mark = skb->mark,
+ .flowi6_proto = ip6h->nexthdr,
+ };
+
+ skb_dst_drop(skb);
+ dst = ip6_route_input_lookup(dev_net(sdev), sdev, &fl6,
+ skb, flags);
+ skb_dst_set(skb, dst);
+ break;
+ }
+#endif
+ default:
+ break;
+ }
+out:
+ return skb;
+}
+
+static const struct l3mdev_ops ipvl_l3mdev_ops = {
+ .l3mdev_l3_rcv = ipvlan_l3_rcv,
+};
+
+static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct ipvl_addr *addr;
+ unsigned int len;
+
+ addr = ipvlan_skb_to_addr(skb, skb->dev);
+ if (!addr)
+ goto out;
+
+ skb->dev = addr->master->dev;
+ len = skb->len + ETH_HLEN;
+ ipvlan_count_rx(addr->master, len, true, false);
+out:
+ return NF_ACCEPT;
+}
+
+static const struct nf_hook_ops ipvl_nfops[] = {
+ {
+ .hook = ipvlan_nf_input,
+ .pf = NFPROTO_IPV4,
+ .hooknum = NF_INET_LOCAL_IN,
+ .priority = INT_MAX,
+ },
+#if IS_ENABLED(CONFIG_IPV6)
+ {
+ .hook = ipvlan_nf_input,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_LOCAL_IN,
+ .priority = INT_MAX,
+ },
+#endif
+};
+
+static int ipvlan_register_nf_hook(struct net *net)
+{
+ struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
+ int err = 0;
+
+ if (!vnet->ipvl_nf_hook_refcnt) {
+ err = nf_register_net_hooks(net, ipvl_nfops,
+ ARRAY_SIZE(ipvl_nfops));
+ if (!err)
+ vnet->ipvl_nf_hook_refcnt = 1;
+ } else {
+ vnet->ipvl_nf_hook_refcnt++;
+ }
+
+ return err;
+}
+
+static void ipvlan_unregister_nf_hook(struct net *net)
+{
+ struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
+
+ if (WARN_ON(!vnet->ipvl_nf_hook_refcnt))
+ return;
+
+ vnet->ipvl_nf_hook_refcnt--;
+ if (!vnet->ipvl_nf_hook_refcnt)
+ nf_unregister_net_hooks(net, ipvl_nfops,
+ ARRAY_SIZE(ipvl_nfops));
+}
+
+void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet)
+{
+ struct ipvlan_netns *old_vnet;
+
+ ASSERT_RTNL();
+
+ old_vnet = net_generic(oldnet, ipvlan_netid);
+ if (!old_vnet->ipvl_nf_hook_refcnt)
+ return;
+
+ ipvlan_register_nf_hook(newnet);
+ ipvlan_unregister_nf_hook(oldnet);
+}
+
+static void ipvlan_ns_exit(struct net *net)
+{
+ struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
+
+ if (WARN_ON_ONCE(vnet->ipvl_nf_hook_refcnt)) {
+ vnet->ipvl_nf_hook_refcnt = 0;
+ nf_unregister_net_hooks(net, ipvl_nfops,
+ ARRAY_SIZE(ipvl_nfops));
+ }
+}
+
+static struct pernet_operations ipvlan_net_ops = {
+ .id = &ipvlan_netid,
+ .size = sizeof(struct ipvlan_netns),
+ .exit = ipvlan_ns_exit,
+};
+
+int ipvlan_l3s_init(void)
+{
+ return register_pernet_subsys(&ipvlan_net_ops);
+}
+
+void ipvlan_l3s_cleanup(void)
+{
+ unregister_pernet_subsys(&ipvlan_net_ops);
+}
+
+int ipvlan_l3s_register(struct ipvl_port *port)
+{
+ struct net_device *dev = port->dev;
+ int ret;
+
+ ASSERT_RTNL();
+
+ ret = ipvlan_register_nf_hook(read_pnet(&port->pnet));
+ if (!ret) {
+ dev->l3mdev_ops = &ipvl_l3mdev_ops;
+ dev->priv_flags |= IFF_L3MDEV_MASTER;
+ }
+
+ return ret;
+}
+
+void ipvlan_l3s_unregister(struct ipvl_port *port)
+{
+ struct net_device *dev = port->dev;
+
+ ASSERT_RTNL();
+
+ dev->priv_flags &= ~IFF_L3MDEV_MASTER;
+ ipvlan_unregister_nf_hook(read_pnet(&port->pnet));
+ dev->l3mdev_ops = NULL;
+}
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 19bdde6..8ec73d9 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -9,73 +9,10 @@
#include "ipvlan.h"
-static unsigned int ipvlan_netid __read_mostly;
-
-struct ipvlan_netns {
- unsigned int ipvl_nf_hook_refcnt;
-};
-
-static const struct nf_hook_ops ipvl_nfops[] = {
- {
- .hook = ipvlan_nf_input,
- .pf = NFPROTO_IPV4,
- .hooknum = NF_INET_LOCAL_IN,
- .priority = INT_MAX,
- },
-#if IS_ENABLED(CONFIG_IPV6)
- {
- .hook = ipvlan_nf_input,
- .pf = NFPROTO_IPV6,
- .hooknum = NF_INET_LOCAL_IN,
- .priority = INT_MAX,
- },
-#endif
-};
-
-static const struct l3mdev_ops ipvl_l3mdev_ops = {
- .l3mdev_l3_rcv = ipvlan_l3_rcv,
-};
-
-static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
-{
- ipvlan->dev->mtu = dev->mtu;
-}
-
-static int ipvlan_register_nf_hook(struct net *net)
-{
- struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
- int err = 0;
-
- if (!vnet->ipvl_nf_hook_refcnt) {
- err = nf_register_net_hooks(net, ipvl_nfops,
- ARRAY_SIZE(ipvl_nfops));
- if (!err)
- vnet->ipvl_nf_hook_refcnt = 1;
- } else {
- vnet->ipvl_nf_hook_refcnt++;
- }
-
- return err;
-}
-
-static void ipvlan_unregister_nf_hook(struct net *net)
-{
- struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
-
- if (WARN_ON(!vnet->ipvl_nf_hook_refcnt))
- return;
-
- vnet->ipvl_nf_hook_refcnt--;
- if (!vnet->ipvl_nf_hook_refcnt)
- nf_unregister_net_hooks(net, ipvl_nfops,
- ARRAY_SIZE(ipvl_nfops));
-}
-
static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
struct netlink_ext_ack *extack)
{
struct ipvl_dev *ipvlan;
- struct net_device *mdev = port->dev;
unsigned int flags;
int err;
@@ -97,17 +34,12 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
}
if (nval == IPVLAN_MODE_L3S) {
/* New mode is L3S */
- err = ipvlan_register_nf_hook(read_pnet(&port->pnet));
- if (!err) {
- mdev->l3mdev_ops = &ipvl_l3mdev_ops;
- mdev->priv_flags |= IFF_L3MDEV_MASTER;
- } else
+ err = ipvlan_l3s_register(port);
+ if (err)
goto fail;
} else if (port->mode == IPVLAN_MODE_L3S) {
/* Old mode was L3S */
- mdev->priv_flags &= ~IFF_L3MDEV_MASTER;
- ipvlan_unregister_nf_hook(read_pnet(&port->pnet));
- mdev->l3mdev_ops = NULL;
+ ipvlan_l3s_unregister(port);
}
port->mode = nval;
}
@@ -166,11 +98,8 @@ static void ipvlan_port_destroy(struct net_device *dev)
struct ipvl_port *port = ipvlan_port_get_rtnl(dev);
struct sk_buff *skb;
- if (port->mode == IPVLAN_MODE_L3S) {
- dev->priv_flags &= ~IFF_L3MDEV_MASTER;
- ipvlan_unregister_nf_hook(dev_net(dev));
- dev->l3mdev_ops = NULL;
- }
+ if (port->mode == IPVLAN_MODE_L3S)
+ ipvlan_l3s_unregister(port);
netdev_rx_handler_unregister(dev);
cancel_work_sync(&port->wq);
while ((skb = __skb_dequeue(&port->backlog)) != NULL) {
@@ -446,6 +375,11 @@ static const struct header_ops ipvlan_header_ops = {
.cache_update = eth_header_cache_update,
};
+static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
+{
+ ipvlan->dev->mtu = dev->mtu;
+}
+
static bool netif_is_ipvlan(const struct net_device *dev)
{
/* both ipvlan and ipvtap devices use the same netdev_ops */
@@ -781,7 +715,6 @@ static int ipvlan_device_event(struct notifier_block *unused,
case NETDEV_REGISTER: {
struct net *oldnet, *newnet = dev_net(dev);
- struct ipvlan_netns *old_vnet;
oldnet = read_pnet(&port->pnet);
if (net_eq(newnet, oldnet))
@@ -789,12 +722,7 @@ static int ipvlan_device_event(struct notifier_block *unused,
write_pnet(&port->pnet, newnet);
- old_vnet = net_generic(oldnet, ipvlan_netid);
- if (!old_vnet->ipvl_nf_hook_refcnt)
- break;
-
- ipvlan_register_nf_hook(newnet);
- ipvlan_unregister_nf_hook(oldnet);
+ ipvlan_migrate_l3s_hook(oldnet, newnet);
break;
}
case NETDEV_UNREGISTER:
@@ -1068,23 +996,6 @@ static struct notifier_block ipvlan_addr6_vtor_notifier_block __read_mostly = {
};
#endif
-static void ipvlan_ns_exit(struct net *net)
-{
- struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
-
- if (WARN_ON_ONCE(vnet->ipvl_nf_hook_refcnt)) {
- vnet->ipvl_nf_hook_refcnt = 0;
- nf_unregister_net_hooks(net, ipvl_nfops,
- ARRAY_SIZE(ipvl_nfops));
- }
-}
-
-static struct pernet_operations ipvlan_net_ops = {
- .id = &ipvlan_netid,
- .size = sizeof(struct ipvlan_netns),
- .exit = ipvlan_ns_exit,
-};
-
static int __init ipvlan_init_module(void)
{
int err;
@@ -1099,13 +1010,13 @@ static int __init ipvlan_init_module(void)
register_inetaddr_notifier(&ipvlan_addr4_notifier_block);
register_inetaddr_validator_notifier(&ipvlan_addr4_vtor_notifier_block);
- err = register_pernet_subsys(&ipvlan_net_ops);
+ err = ipvlan_l3s_init();
if (err < 0)
goto error;
err = ipvlan_link_register(&ipvlan_link_ops);
if (err < 0) {
- unregister_pernet_subsys(&ipvlan_net_ops);
+ ipvlan_l3s_cleanup();
goto error;
}
@@ -1126,7 +1037,7 @@ static int __init ipvlan_init_module(void)
static void __exit ipvlan_cleanup_module(void)
{
rtnl_link_unregister(&ipvlan_link_ops);
- unregister_pernet_subsys(&ipvlan_net_ops);
+ ipvlan_l3s_cleanup();
unregister_netdevice_notifier(&ipvlan_notifier_block);
unregister_inetaddr_notifier(&ipvlan_addr4_notifier_block);
unregister_inetaddr_validator_notifier(
--
2.9.5
^ permalink raw reply related
* [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
From: Magnus Karlsson @ 2019-02-08 13:05 UTC (permalink / raw)
To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski,
bjorn.topel, qi.z.zhang
Cc: brouer, xiaolong.ye
This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.
The proposed interface is composed of two parts:
* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
and AF_XDP sockets. This interface also loads a simple XDP program
that routes all traffic on a queue up to the AF_XDP socket.
The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.
The rings are now called xsk_ring and when a producer operates on
it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
way we can get some compile time error checking that the rings are
used correctly.
Comments and contenplations:
* The current behaviour is that the library loads an XDP program (if
requested to do so) but the clean up of this program is left to the
application. It would be possible to implement this cleanup in the
library, but it would require state to be kept on netdev level,
which there is none at the moment, and the synchronization of this
between processes. All this adding complexity. But when we get an
XDP program per queue id, then it becomes trivial to also remove the
XDP program when the application exits. This proposal from Jesper,
Björn and others will also improve the performance of libbpf, since
most of the XDP program code can be removed when that feature is
supported.
* In a future release, I am planning on adding a higher level data
plane interface too. This will be based around recvmsg and sendmsg
with the use of struct iovec for batching, without the user having
to know anything about the underlying four rings of an AF_XDP
socket. There will be one semantic difference though from the
standard recvmsg and that is that the kernel will fill in the iovecs
instead of the application. But the rest should be the same as the
libc versions so that application writers feel at home.
Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions.
Changes v3 to v4:
* Dropped the pr_*() patch in favor of Yonghong Song's patch set
* Addressed the review comments of Daniel Borkmann, mainly leaking
of file descriptors at clean up and making the data plane APIs
all static inline (with the exception of xsk_umem__get_data that
uses an internal structure I do not want to expose).
* Fixed the netlink callback as suggested by Maciej Fijalkowski.
* Removed an unecessary include in the sample program as spotted by
Ilia Fillipov.
Changes v2 to v3:
* Added automatic loading of a simple XDP program that routes all
traffic on a queue up to the AF_XDP socket. This program loading
can be disabled.
* Updated function names to be consistent with the libbpf naming
convention
* Moved all code to xsk.[ch]
* Removed all the XDP program loading code from the sample since
this is now done by libbpf
* The initialization functions now return a handle as suggested by
Alexei
* const statements added in the API where applicable.
Changes v1 to v2:
* Fixed cleanup of library state on error.
* Moved API to initial version
* Prefixed all public functions by xsk__ instead of xsk_
* Added comment about changed default ring sizes, batch size and umem
size in the sample application commit message
* The library now only creates an Rx or Tx ring if the respective
parameter is != NULL
Note that for zero-copy to work on FVL you need the following patch:
https://lore.kernel.org/netdev/1548770597-16141-1-git-send-email-magnus.karlsson@intel.com/.
For ixgbe, you need a similar patch called found here:
https://lore.kernel.org/netdev/CAJ8uoz1GJBmC0GFbURvEzY4kDZZ6C7O9+1F+gV0y=GOMGLobUQ@mail.gmail.com/.
I based this patch set on bpf-next commit a4021a3579c5 ("tools/bpf: add log_level to bpf_load_program_attr")
Thanks: Magnus
Magnus Karlsson (2):
libbpf: add support for using AF_XDP sockets
samples/bpf: convert xdpsock to use libbpf for AF_XDP access
samples/bpf/Makefile | 1 -
samples/bpf/xdpsock.h | 11 -
samples/bpf/xdpsock_kern.c | 56 ---
samples/bpf/xdpsock_user.c | 839 ++++++++++++--------------------------
tools/include/uapi/linux/if_xdp.h | 78 ++++
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/Makefile | 5 +-
tools/lib/bpf/README.rst | 11 +-
tools/lib/bpf/libbpf.map | 7 +
tools/lib/bpf/xsk.c | 742 +++++++++++++++++++++++++++++++++
tools/lib/bpf/xsk.h | 205 ++++++++++
11 files changed, 1306 insertions(+), 651 deletions(-)
delete mode 100644 samples/bpf/xdpsock.h
delete mode 100644 samples/bpf/xdpsock_kern.c
create mode 100644 tools/include/uapi/linux/if_xdp.h
create mode 100644 tools/lib/bpf/xsk.c
create mode 100644 tools/lib/bpf/xsk.h
--
2.7.4
^ permalink raw reply
* [PATCH bpf-next v4 1/2] libbpf: add support for using AF_XDP sockets
From: Magnus Karlsson @ 2019-02-08 13:05 UTC (permalink / raw)
To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski,
bjorn.topel, qi.z.zhang
Cc: brouer, xiaolong.ye
In-Reply-To: <1549631126-29067-1-git-send-email-magnus.karlsson@intel.com>
This commit adds AF_XDP support to libbpf. The main reason for this is
to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.
The interface is composed of two parts:
* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
up umems and af_xdp sockets as well as a simple XDP program.
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
tools/include/uapi/linux/if_xdp.h | 78 ++++
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/Makefile | 5 +-
tools/lib/bpf/README.rst | 11 +-
tools/lib/bpf/libbpf.map | 7 +
tools/lib/bpf/xsk.c | 742 ++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/xsk.h | 205 +++++++++++
7 files changed, 1047 insertions(+), 3 deletions(-)
create mode 100644 tools/include/uapi/linux/if_xdp.h
create mode 100644 tools/lib/bpf/xsk.c
create mode 100644 tools/lib/bpf/xsk.h
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
new file mode 100644
index 0000000..caed8b1
--- /dev/null
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * if_xdp: XDP socket user-space interface
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ * Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef _LINUX_IF_XDP_H
+#define _LINUX_IF_XDP_H
+
+#include <linux/types.h>
+
+/* Options for the sxdp_flags field */
+#define XDP_SHARED_UMEM (1 << 0)
+#define XDP_COPY (1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */
+
+struct sockaddr_xdp {
+ __u16 sxdp_family;
+ __u16 sxdp_flags;
+ __u32 sxdp_ifindex;
+ __u32 sxdp_queue_id;
+ __u32 sxdp_shared_umem_fd;
+};
+
+struct xdp_ring_offset {
+ __u64 producer;
+ __u64 consumer;
+ __u64 desc;
+};
+
+struct xdp_mmap_offsets {
+ struct xdp_ring_offset rx;
+ struct xdp_ring_offset tx;
+ struct xdp_ring_offset fr; /* Fill */
+ struct xdp_ring_offset cr; /* Completion */
+};
+
+/* XDP socket options */
+#define XDP_MMAP_OFFSETS 1
+#define XDP_RX_RING 2
+#define XDP_TX_RING 3
+#define XDP_UMEM_REG 4
+#define XDP_UMEM_FILL_RING 5
+#define XDP_UMEM_COMPLETION_RING 6
+#define XDP_STATISTICS 7
+
+struct xdp_umem_reg {
+ __u64 addr; /* Start of packet data area */
+ __u64 len; /* Length of packet data area */
+ __u32 chunk_size;
+ __u32 headroom;
+};
+
+struct xdp_statistics {
+ __u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+};
+
+/* Pgoff for mmaping the rings */
+#define XDP_PGOFF_RX_RING 0
+#define XDP_PGOFF_TX_RING 0x80000000
+#define XDP_UMEM_PGOFF_FILL_RING 0x100000000ULL
+#define XDP_UMEM_PGOFF_COMPLETION_RING 0x180000000ULL
+
+/* Rx/Tx descriptor */
+struct xdp_desc {
+ __u64 addr;
+ __u32 len;
+ __u32 options;
+};
+
+/* UMEM descriptor is __u64 */
+
+#endif /* _LINUX_IF_XDP_H */
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index bfd9bfc..ee9d536 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 8479162..761691b 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -164,6 +164,9 @@ $(BPF_IN): force elfdep bpfdep
@(test -f ../../include/uapi/linux/if_link.h -a -f ../../../include/uapi/linux/if_link.h && ( \
(diff -B ../../include/uapi/linux/if_link.h ../../../include/uapi/linux/if_link.h >/dev/null) || \
echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'" >&2 )) || true
+ @(test -f ../../include/uapi/linux/if_xdp.h -a -f ../../../include/uapi/linux/if_xdp.h && ( \
+ (diff -B ../../include/uapi/linux/if_xdp.h ../../../include/uapi/linux/if_xdp.h >/dev/null) || \
+ echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
$(Q)$(MAKE) $(build)=libbpf
$(OUTPUT)libbpf.so: $(BPF_IN)
@@ -174,7 +177,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
$(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
- $(QUIET_LINK)$(CXX) $^ -lelf -o $@
+ $(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
check: check_abi
diff --git a/tools/lib/bpf/README.rst b/tools/lib/bpf/README.rst
index 607aae4..45e3788 100644
--- a/tools/lib/bpf/README.rst
+++ b/tools/lib/bpf/README.rst
@@ -9,7 +9,7 @@ described here. It's recommended to follow these conventions whenever a
new function or type is added to keep libbpf API clean and consistent.
All types and functions provided by libbpf API should have one of the
-following prefixes: ``bpf_``, ``btf_``, ``libbpf_``.
+following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``.
System call wrappers
--------------------
@@ -62,6 +62,15 @@ Auxiliary functions and types that don't fit well in any of categories
described above should have ``libbpf_`` prefix, e.g.
``libbpf_get_error`` or ``libbpf_prog_type_by_name``.
+AF_XDP functions
+-------------------
+
+AF_XDP functions should have an ``xsk_`` prefix, e.g.
+``xsk_umem__get_data`` or ``xsk_umem__create``. The interface consists
+of both low-level ring access functions and high-level configuration
+functions. These can be mixed and matched. Note that these functions
+are not reentrant for performance reasons.
+
libbpf ABI
==========
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 89c1149..1cc1bb8 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -143,4 +143,11 @@ LIBBPF_0.0.2 {
btf_ext__new;
btf_ext__reloc_func_info;
btf_ext__reloc_line_info;
+ xsk_umem__create;
+ xsk_socket__create;
+ xsk_umem__delete;
+ xsk_socket__delete;
+ xsk_umem__get_data;
+ xsk_umem__fd;
+ xsk_socket__fd;
} LIBBPF_0.0.1;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
new file mode 100644
index 0000000..a982a76
--- /dev/null
+++ b/tools/lib/bpf/xsk.c
@@ -0,0 +1,742 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 - 2019 Intel Corporation.
+ *
+ * Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <asm/barrier.h>
+#include <linux/compiler.h>
+#include <linux/filter.h>
+#include <linux/if_ether.h>
+#include <linux/if_link.h>
+#include <linux/if_packet.h>
+#include <linux/if_xdp.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_util.h"
+#include "nlattr.h"
+#include "xsk.h"
+
+#ifndef SOL_XDP
+ #define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+ #define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+ #define PF_XDP AF_XDP
+#endif
+
+struct xsk_umem {
+ struct xsk_ring_prod *fill;
+ struct xsk_ring_cons *comp;
+ char *umem_area;
+ struct xsk_umem_config config;
+ int fd;
+ int refcount;
+};
+
+struct xsk_socket {
+ struct xsk_ring_cons *rx;
+ struct xsk_ring_prod *tx;
+ __u64 outstanding_tx;
+ struct xsk_umem *umem;
+ struct xsk_socket_config config;
+ int fd;
+ int xsks_map;
+ int ifindex;
+ int prog_fd;
+ int qidconf_map_fd;
+ int xsks_map_fd;
+ __u32 queue_id;
+};
+
+struct xsk_nl_info {
+ bool xdp_prog_attached;
+ int ifindex;
+ int fd;
+};
+
+#define MAX_QUEUES 128
+
+/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
+ * Unfortunately, it is not part of glibc.
+ */
+static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
+ int fd, __u64 offset)
+{
+#ifdef __NR_mmap2
+ unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
+ long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
+ (off_t)(offset >> page_shift));
+
+ return (void *)ret;
+#else
+ return mmap(addr, length, prot, flags, fd, offset);
+#endif
+}
+
+void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
+{
+ return &((char *)(umem->umem_area))[addr];
+}
+
+int xsk_umem__fd(const struct xsk_umem *umem)
+{
+ return umem ? umem->fd : -EINVAL;
+}
+
+int xsk_socket__fd(const struct xsk_socket *xsk)
+{
+ return xsk ? xsk->fd : -EINVAL;
+}
+
+static bool xsk_page_aligned(void *buffer)
+{
+ unsigned long addr = (unsigned long)buffer;
+
+ return !(addr & (getpagesize() - 1));
+}
+
+static void xsk_set_umem_config(struct xsk_umem_config *cfg,
+ const struct xsk_umem_config *usr_cfg)
+{
+ if (!usr_cfg) {
+ cfg->fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+ cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+ cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+ cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+ return;
+ }
+
+ cfg->fill_size = usr_cfg->fill_size;
+ cfg->comp_size = usr_cfg->comp_size;
+ cfg->frame_size = usr_cfg->frame_size;
+ cfg->frame_headroom = usr_cfg->frame_headroom;
+}
+
+static void xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
+ const struct xsk_socket_config *usr_cfg)
+{
+ if (!usr_cfg) {
+ cfg->rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+ cfg->tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+ cfg->libbpf_flags = 0;
+ cfg->xdp_flags = 0;
+ cfg->bind_flags = 0;
+ return;
+ }
+
+ cfg->rx_size = usr_cfg->rx_size;
+ cfg->tx_size = usr_cfg->tx_size;
+ cfg->libbpf_flags = usr_cfg->libbpf_flags;
+ cfg->xdp_flags = usr_cfg->xdp_flags;
+ cfg->bind_flags = usr_cfg->bind_flags;
+}
+
+int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
+ struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
+ const struct xsk_umem_config *usr_config)
+{
+ struct xdp_mmap_offsets off;
+ struct xdp_umem_reg mr;
+ struct xsk_umem *umem;
+ socklen_t optlen;
+ void *map;
+ int err;
+
+ if (!umem_area || !umem_ptr || !fill || !comp)
+ return -EFAULT;
+ if (!size && !xsk_page_aligned(umem_area))
+ return -EINVAL;
+
+ umem = calloc(1, sizeof(*umem));
+ if (!umem)
+ return -ENOMEM;
+
+ umem->fd = socket(AF_XDP, SOCK_RAW, 0);
+ if (umem->fd < 0) {
+ err = -errno;
+ goto out_umem_alloc;
+ }
+
+ umem->umem_area = umem_area;
+ xsk_set_umem_config(&umem->config, usr_config);
+
+ mr.addr = (uintptr_t)umem_area;
+ mr.len = size;
+ mr.chunk_size = umem->config.frame_size;
+ mr.headroom = umem->config.frame_headroom;
+
+ err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+ err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING,
+ &umem->config.fill_size,
+ sizeof(umem->config.fill_size));
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+ err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING,
+ &umem->config.comp_size,
+ sizeof(umem->config.comp_size));
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ optlen = sizeof(off);
+ err = getsockopt(umem->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ map = xsk_mmap(NULL, off.fr.desc +
+ umem->config.fill_size * sizeof(__u64),
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+ umem->fd, XDP_UMEM_PGOFF_FILL_RING);
+ if (map == MAP_FAILED) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ umem->fill = fill;
+ fill->mask = umem->config.fill_size - 1;
+ fill->size = umem->config.fill_size;
+ fill->producer = map + off.fr.producer;
+ fill->consumer = map + off.fr.consumer;
+ fill->ring = map + off.fr.desc;
+ fill->cached_cons = umem->config.fill_size;
+
+ map = xsk_mmap(NULL,
+ off.cr.desc + umem->config.comp_size * sizeof(__u64),
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+ umem->fd, XDP_UMEM_PGOFF_COMPLETION_RING);
+ if (map == MAP_FAILED) {
+ err = -errno;
+ goto out_mmap;
+ }
+
+ umem->comp = comp;
+ comp->mask = umem->config.comp_size - 1;
+ comp->size = umem->config.comp_size;
+ comp->producer = map + off.cr.producer;
+ comp->consumer = map + off.cr.consumer;
+ comp->ring = map + off.cr.desc;
+
+ *umem_ptr = umem;
+ return 0;
+
+out_mmap:
+ munmap(umem->fill,
+ off.fr.desc + umem->config.fill_size * sizeof(__u64));
+out_socket:
+ close(umem->fd);
+out_umem_alloc:
+ free(umem);
+ return err;
+}
+
+static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
+{
+ struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
+ struct xsk_nl_info *nl_info = cookie;
+ struct ifinfomsg *ifinfo = msg;
+ unsigned char mode;
+ int err;
+
+ if (nl_info->ifindex && nl_info->ifindex != ifinfo->ifi_index)
+ return 0;
+
+ if (!tb[IFLA_XDP])
+ return 0;
+
+ err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
+ NULL);
+ if (err)
+ return err;
+
+ if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
+ return 0;
+
+ mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
+ if (mode == XDP_ATTACHED_NONE)
+ return 0;
+
+ nl_info->xdp_prog_attached = true;
+ nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
+ return 0;
+}
+
+static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
+{
+ struct xsk_nl_info nl_info;
+ unsigned int nl_pid;
+ char err_buf[256];
+ int sock, err;
+
+ sock = libbpf_netlink_open(&nl_pid);
+ if (sock < 0)
+ return false;
+
+ nl_info.xdp_prog_attached = false;
+ nl_info.ifindex = xsk->ifindex;
+ nl_info.fd = -1;
+
+ err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
+ if (err) {
+ libbpf_strerror(err, err_buf, sizeof(err_buf));
+ pr_warning("Error:\n%s\n", err_buf);
+ close(sock);
+ return false;
+ }
+
+ close(sock);
+ xsk->prog_fd = nl_info.fd;
+ return nl_info.xdp_prog_attached;
+}
+
+static int xsk_load_xdp_prog(struct xsk_socket *xsk)
+{
+ char bpf_log_buf[BPF_LOG_BUF_SIZE];
+ int err, prog_fd;
+
+ /* This is the C-program:
+ * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
+ * {
+ * int *qidconf, index = ctx->rx_queue_index;
+ *
+ * // A set entry here means that the correspnding queue_id
+ * // has an active AF_XDP socket bound to it.
+ * qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
+ * if (!qidconf)
+ * return XDP_ABORTED;
+ *
+ * if (*qidconf)
+ * return bpf_redirect_map(&xsks_map, index, 0);
+ *
+ * return XDP_PASS;
+ * }
+ */
+ struct bpf_insn prog[] = {
+ /* r1 = *(u32 *)(r1 + 16) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
+ /* *(u32 *)(r10 - 4) = r1 */
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+ BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+ BPF_MOV32_IMM(BPF_REG_0, 0),
+ /* if r1 == 0 goto +8 */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
+ BPF_MOV32_IMM(BPF_REG_0, 2),
+ /* r1 = *(u32 *)(r1 + 0) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
+ /* if r1 == 0 goto +5 */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
+ /* r2 = *(u32 *)(r10 - 4) */
+ BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
+ BPF_MOV32_IMM(BPF_REG_3, 0),
+ BPF_EMIT_CALL(BPF_FUNC_redirect_map),
+ /* The jumps are to this instruction */
+ BPF_EXIT_INSN(),
+ };
+ size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
+
+ prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
+ "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
+ BPF_LOG_BUF_SIZE);
+ if (prog_fd < 0) {
+ pr_warning("BPF log buffer:\n%s", bpf_log_buf);
+ return prog_fd;
+ }
+
+ err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
+ if (err) {
+ close(prog_fd);
+ return err;
+ }
+
+ xsk->prog_fd = prog_fd;
+ return 0;
+}
+
+static int xsk_create_bpf_maps(struct xsk_socket *xsk)
+{
+ int fd;
+
+ fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
+ sizeof(int), sizeof(int), MAX_QUEUES, 0);
+ if (fd < 0)
+ return fd;
+ xsk->qidconf_map_fd = fd;
+
+ fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
+ sizeof(int), sizeof(int), MAX_QUEUES, 0);
+ if (fd < 0) {
+ close(xsk->qidconf_map_fd);
+ return fd;
+ }
+ xsk->xsks_map_fd = fd;
+
+ return 0;
+}
+
+static void xsk_delete_bpf_maps(struct xsk_socket *xsk)
+{
+ close(xsk->qidconf_map_fd);
+ close(xsk->xsks_map_fd);
+}
+
+static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
+ int xsks_value)
+{
+ bool qidconf_map_updated = false, xsks_map_updated = false;
+ struct bpf_prog_info prog_info = {};
+ __u32 prog_len = sizeof(prog_info);
+ struct bpf_map_info map_info;
+ __u32 map_len = sizeof(map_info);
+ __u32 *map_ids;
+ int reset_value = 0;
+ __u32 num_maps;
+ unsigned int i;
+ int err;
+
+ err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+ if (err)
+ return err;
+
+ num_maps = prog_info.nr_map_ids;
+
+ map_ids = calloc(prog_info.nr_map_ids, sizeof(*map_ids));
+ if (!map_ids)
+ return -ENOMEM;
+
+ memset(&prog_info, 0, prog_len);
+ prog_info.nr_map_ids = num_maps;
+ prog_info.map_ids = (__u64)(unsigned long)map_ids;
+
+ err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+ if (err)
+ goto out_map_ids;
+
+ for (i = 0; i < prog_info.nr_map_ids; i++) {
+ int fd;
+
+ fd = bpf_map_get_fd_by_id(map_ids[i]);
+ if (fd < 0) {
+ err = -errno;
+ goto out_maps;
+ }
+
+ err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
+ if (err)
+ goto out_maps;
+
+ if (!strcmp(map_info.name, "qidconf_map")) {
+ err = bpf_map_update_elem(fd, &xsk->queue_id,
+ &qidconf_value, 0);
+ if (err)
+ goto out_maps;
+ qidconf_map_updated = true;
+ xsk->qidconf_map_fd = fd;
+ } else if (!strcmp(map_info.name, "xsks_map")) {
+ err = bpf_map_update_elem(fd, &xsk->queue_id,
+ &xsks_value, 0);
+ if (err)
+ goto out_maps;
+ xsks_map_updated = true;
+ xsk->xsks_map_fd = fd;
+ }
+
+ if (qidconf_map_updated && xsks_map_updated)
+ break;
+ }
+
+ if (!(qidconf_map_updated && xsks_map_updated)) {
+ err = -ENOENT;
+ goto out_maps;
+ }
+
+ err = 0;
+ goto out_success;
+
+out_maps:
+ if (qidconf_map_updated)
+ (void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
+ &reset_value, 0);
+ if (xsks_map_updated)
+ (void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
+ &reset_value, 0);
+out_success:
+ if (qidconf_map_updated)
+ close(xsk->qidconf_map_fd);
+ if (xsks_map_updated)
+ close(xsk->xsks_map_fd);
+out_map_ids:
+ free(map_ids);
+ return err;
+}
+
+static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
+{
+ bool prog_attached = false;
+ int err;
+
+ if (!xsk_xdp_prog_attached(xsk)) {
+ prog_attached = true;
+ err = xsk_create_bpf_maps(xsk);
+ if (err)
+ return err;
+
+ err = xsk_load_xdp_prog(xsk);
+ if (err)
+ goto out_maps;
+ }
+
+ err = xsk_update_bpf_maps(xsk, true, xsk->fd);
+ if (err)
+ goto out_load;
+
+ return 0;
+
+out_load:
+ if (prog_attached)
+ close(xsk->prog_fd);
+out_maps:
+ if (prog_attached)
+ xsk_delete_bpf_maps(xsk);
+ return err;
+}
+
+int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
+ __u32 queue_id, struct xsk_umem *umem,
+ struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
+ const struct xsk_socket_config *usr_config)
+{
+ struct sockaddr_xdp sxdp = {};
+ struct xdp_mmap_offsets off;
+ struct xsk_socket *xsk;
+ socklen_t optlen;
+ void *map;
+ int err;
+
+ if (!umem || !xsk_ptr || !rx || !tx)
+ return -EFAULT;
+
+ if (umem->refcount) {
+ pr_warning("Error: shared umems not supported by libbpf.\n");
+ return -EBUSY;
+ }
+
+ xsk = calloc(1, sizeof(*xsk));
+ if (!xsk)
+ return -ENOMEM;
+
+ if (umem->refcount++ > 0) {
+ xsk->fd = socket(AF_XDP, SOCK_RAW, 0);
+ if (xsk->fd < 0) {
+ err = -errno;
+ goto out_xsk_alloc;
+ }
+ } else {
+ xsk->fd = umem->fd;
+ }
+
+ xsk->outstanding_tx = 0;
+ xsk->queue_id = queue_id;
+ xsk->umem = umem;
+ xsk->ifindex = if_nametoindex(ifname);
+ if (!xsk->ifindex) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ xsk_set_xdp_socket_config(&xsk->config, usr_config);
+
+ if (rx) {
+ err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
+ &xsk->config.rx_size,
+ sizeof(xsk->config.rx_size));
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+ }
+ if (tx) {
+ err = setsockopt(xsk->fd, SOL_XDP, XDP_TX_RING,
+ &xsk->config.tx_size,
+ sizeof(xsk->config.tx_size));
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+ }
+
+ optlen = sizeof(off);
+ err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+ if (err) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ if (rx) {
+ map = xsk_mmap(NULL, off.rx.desc +
+ xsk->config.rx_size * sizeof(struct xdp_desc),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE,
+ xsk->fd, XDP_PGOFF_RX_RING);
+ if (map == MAP_FAILED) {
+ err = -errno;
+ goto out_socket;
+ }
+
+ rx->mask = xsk->config.rx_size - 1;
+ rx->size = xsk->config.rx_size;
+ rx->producer = map + off.rx.producer;
+ rx->consumer = map + off.rx.consumer;
+ rx->ring = map + off.rx.desc;
+ }
+ xsk->rx = rx;
+
+ if (tx) {
+ map = xsk_mmap(NULL, off.tx.desc +
+ xsk->config.tx_size * sizeof(struct xdp_desc),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE,
+ xsk->fd, XDP_PGOFF_TX_RING);
+ if (map == MAP_FAILED) {
+ err = -errno;
+ goto out_mmap_rx;
+ }
+
+ tx->mask = xsk->config.tx_size - 1;
+ tx->size = xsk->config.tx_size;
+ tx->producer = map + off.tx.producer;
+ tx->consumer = map + off.tx.consumer;
+ tx->ring = map + off.tx.desc;
+ tx->cached_cons = xsk->config.tx_size;
+ }
+ xsk->tx = tx;
+
+ sxdp.sxdp_family = PF_XDP;
+ sxdp.sxdp_ifindex = xsk->ifindex;
+ sxdp.sxdp_queue_id = xsk->queue_id;
+ sxdp.sxdp_flags = xsk->config.bind_flags;
+
+ err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
+ if (err) {
+ err = -errno;
+ goto out_mmap_tx;
+ }
+
+ if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
+ err = xsk_setup_xdp_prog(xsk);
+ if (err)
+ goto out_mmap_tx;
+ }
+
+ *xsk_ptr = xsk;
+ return 0;
+
+out_mmap_tx:
+ if (tx)
+ munmap(xsk->tx,
+ off.tx.desc +
+ xsk->config.tx_size * sizeof(struct xdp_desc));
+out_mmap_rx:
+ if (rx)
+ munmap(xsk->rx,
+ off.rx.desc +
+ xsk->config.rx_size * sizeof(struct xdp_desc));
+out_socket:
+ if (--umem->refcount)
+ close(xsk->fd);
+out_xsk_alloc:
+ free(xsk);
+ return err;
+}
+
+int xsk_umem__delete(struct xsk_umem *umem)
+{
+ struct xdp_mmap_offsets off;
+ socklen_t optlen;
+ int err;
+
+ if (!umem)
+ return 0;
+
+ if (umem->refcount)
+ return -EBUSY;
+
+ optlen = sizeof(off);
+ err = getsockopt(umem->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+ if (!err) {
+ munmap(umem->fill->ring,
+ off.fr.desc + umem->config.fill_size * sizeof(__u64));
+ munmap(umem->comp->ring,
+ off.cr.desc + umem->config.comp_size * sizeof(__u64));
+ }
+
+ close(umem->fd);
+ free(umem);
+
+ return 0;
+}
+
+void xsk_socket__delete(struct xsk_socket *xsk)
+{
+ struct xdp_mmap_offsets off;
+ socklen_t optlen;
+ int err;
+
+ if (!xsk)
+ return;
+
+ (void)xsk_update_bpf_maps(xsk, 0, 0);
+
+ optlen = sizeof(off);
+ err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+ if (!err) {
+ if (xsk->rx)
+ munmap(xsk->rx->ring,
+ off.rx.desc +
+ xsk->config.rx_size * sizeof(struct xdp_desc));
+ if (xsk->tx)
+ munmap(xsk->tx->ring,
+ off.tx.desc +
+ xsk->config.tx_size * sizeof(struct xdp_desc));
+ }
+
+ xsk->umem->refcount--;
+ /* Do not close an fd that also has an associated umem connected
+ * to it.
+ */
+ if (xsk->fd != xsk->umem->fd)
+ close(xsk->fd);
+ free(xsk);
+}
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
new file mode 100644
index 0000000..a1ab8c9
--- /dev/null
+++ b/tools/lib/bpf/xsk.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 - 2019 Intel Corporation.
+ *
+ * Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef __LIBBPF_XSK_H
+#define __LIBBPF_XSK_H
+
+#include <stdio.h>
+#include <stdint.h>
+#include <linux/if_xdp.h>
+
+#include "libbpf.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not access these members directly. Use the functions below. */
+#define DEFINE_XSK_RING(name) \
+struct name { \
+ __u32 cached_prod; \
+ __u32 cached_cons; \
+ __u32 mask; \
+ __u32 size; \
+ __u32 *producer; \
+ __u32 *consumer; \
+ void *ring; \
+}
+
+DEFINE_XSK_RING(xsk_ring_prod);
+DEFINE_XSK_RING(xsk_ring_cons);
+
+struct xsk_umem;
+struct xsk_socket;
+
+static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill,
+ __u32 idx)
+{
+ __u64 *addrs = (__u64 *)fill->ring;
+
+ return &addrs[idx & fill->mask];
+}
+
+static inline const __u64 *
+xsk_ring_cons__comp_addr(const struct xsk_ring_cons *comp, __u32 idx)
+{
+ const __u64 *addrs = (const __u64 *)comp->ring;
+
+ return &addrs[idx & comp->mask];
+}
+
+static inline struct xdp_desc *xsk_ring_prod__tx_desc(struct xsk_ring_prod *tx,
+ __u32 idx)
+{
+ struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
+
+ return &descs[idx & tx->mask];
+}
+
+static inline const struct xdp_desc *
+xsk_ring_cons__rx_desc(const struct xsk_ring_cons *rx, __u32 idx)
+{
+ const struct xdp_desc *descs = (const struct xdp_desc *)rx->ring;
+
+ return &descs[idx & rx->mask];
+}
+
+static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
+{
+ __u32 free_entries = r->cached_cons - r->cached_prod;
+
+ if (free_entries >= nb)
+ return free_entries;
+
+ /* Refresh the local tail pointer.
+ * cached_cons is r->size bigger than the real consumer pointer so
+ * that this addition can be avoided in the more frequently
+ * executed code that computs free_entries in the beginning of
+ * this function. Without this optimization it whould have been
+ * free_entries = r->cached_prod - r->cached_cons + r->size.
+ */
+ r->cached_cons = *r->consumer + r->size;
+
+ return r->cached_cons - r->cached_prod;
+}
+
+static inline __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
+{
+ __u32 entries = r->cached_prod - r->cached_cons;
+
+ if (entries == 0) {
+ r->cached_prod = *r->producer;
+ entries = r->cached_prod - r->cached_cons;
+ }
+
+ return (entries > nb) ? nb : entries;
+}
+
+static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
+ size_t nb, __u32 *idx)
+{
+ if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
+ return 0;
+
+ *idx = prod->cached_prod;
+ prod->cached_prod += nb;
+
+ return nb;
+}
+
+static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
+{
+ /* Make sure everything has been written to the ring before signalling
+ * this to the kernel.
+ */
+ smp_wmb();
+
+ *prod->producer += nb;
+}
+
+static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
+ size_t nb, __u32 *idx)
+{
+ size_t entries = xsk_cons_nb_avail(cons, nb);
+
+ if (likely(entries > 0)) {
+ /* Make sure we do not speculatively read the data before
+ * we have received the packet buffers from the ring.
+ */
+ smp_rmb();
+
+ *idx = cons->cached_cons;
+ cons->cached_cons += entries;
+ }
+
+ return entries;
+}
+
+static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
+{
+ *cons->consumer += nb;
+}
+
+static inline void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
+{
+ return &((char *)umem_area)[addr];
+}
+
+LIBBPF_API void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr);
+
+LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
+LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
+
+#define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
+#define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
+#define XSK_UMEM__DEFAULT_FRAME_SHIFT 11 /* 2048 bytes */
+#define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
+#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+
+struct xsk_umem_config {
+ __u32 fill_size;
+ __u32 comp_size;
+ __u32 frame_size;
+ __u32 frame_headroom;
+};
+
+/* Flags for the libbpf_flags field. */
+#define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0)
+
+struct xsk_socket_config {
+ __u32 rx_size;
+ __u32 tx_size;
+ __u32 libbpf_flags;
+ __u32 xdp_flags;
+ __u16 bind_flags;
+};
+
+/* Set config to NULL to get the default configuration. */
+LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
+ void *umem_area, __u64 size,
+ struct xsk_ring_prod *fill,
+ struct xsk_ring_cons *comp,
+ const struct xsk_umem_config *config);
+LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
+ const char *ifname, __u32 queue_id,
+ struct xsk_umem *umem,
+ struct xsk_ring_cons *rx,
+ struct xsk_ring_prod *tx,
+ const struct xsk_socket_config *config);
+
+/* Returns 0 for success and -EBUSY if the umem is still in use. */
+LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);
+LIBBPF_API void xsk_socket__delete(struct xsk_socket *xsk);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* __LIBBPF_XSK_H */
--
2.7.4
^ permalink raw reply related
* [PATCH bpf-next v4 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access
From: Magnus Karlsson @ 2019-02-08 13:05 UTC (permalink / raw)
To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski,
bjorn.topel, qi.z.zhang
Cc: brouer, xiaolong.ye
In-Reply-To: <1549631126-29067-1-git-send-email-magnus.karlsson@intel.com>
This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.
The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput. Note also that the shared umem code
has been removed from the sample as this is not supported by libbpf
at this point in time.
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
samples/bpf/Makefile | 1 -
samples/bpf/xdpsock.h | 11 -
samples/bpf/xdpsock_kern.c | 56 ---
samples/bpf/xdpsock_user.c | 839 ++++++++++++++-------------------------------
4 files changed, 259 insertions(+), 648 deletions(-)
delete mode 100644 samples/bpf/xdpsock.h
delete mode 100644 samples/bpf/xdpsock_kern.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a0ef7ed..a333e25 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -163,7 +163,6 @@ always += xdp2skb_meta_kern.o
always += syscall_tp_kern.o
always += cpustat_kern.o
always += xdp_adjust_tail_kern.o
-always += xdpsock_kern.o
always += xdp_fwd_kern.o
always += task_fd_query_kern.o
always += xdp_sample_pkts_kern.o
diff --git a/samples/bpf/xdpsock.h b/samples/bpf/xdpsock.h
deleted file mode 100644
index 533ab81..0000000
--- a/samples/bpf/xdpsock.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef XDPSOCK_H_
-#define XDPSOCK_H_
-
-/* Power-of-2 number of sockets */
-#define MAX_SOCKS 4
-
-/* Round-robin receive */
-#define RR_LB 0
-
-#endif /* XDPSOCK_H_ */
diff --git a/samples/bpf/xdpsock_kern.c b/samples/bpf/xdpsock_kern.c
deleted file mode 100644
index b8ccd08..0000000
--- a/samples/bpf/xdpsock_kern.c
+++ /dev/null
@@ -1,56 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include "bpf_helpers.h"
-
-#include "xdpsock.h"
-
-struct bpf_map_def SEC("maps") qidconf_map = {
- .type = BPF_MAP_TYPE_ARRAY,
- .key_size = sizeof(int),
- .value_size = sizeof(int),
- .max_entries = 1,
-};
-
-struct bpf_map_def SEC("maps") xsks_map = {
- .type = BPF_MAP_TYPE_XSKMAP,
- .key_size = sizeof(int),
- .value_size = sizeof(int),
- .max_entries = MAX_SOCKS,
-};
-
-struct bpf_map_def SEC("maps") rr_map = {
- .type = BPF_MAP_TYPE_PERCPU_ARRAY,
- .key_size = sizeof(int),
- .value_size = sizeof(unsigned int),
- .max_entries = 1,
-};
-
-SEC("xdp_sock")
-int xdp_sock_prog(struct xdp_md *ctx)
-{
- int *qidconf, key = 0, idx;
- unsigned int *rr;
-
- qidconf = bpf_map_lookup_elem(&qidconf_map, &key);
- if (!qidconf)
- return XDP_ABORTED;
-
- if (*qidconf != ctx->rx_queue_index)
- return XDP_PASS;
-
-#if RR_LB /* NB! RR_LB is configured in xdpsock.h */
- rr = bpf_map_lookup_elem(&rr_map, &key);
- if (!rr)
- return XDP_ABORTED;
-
- *rr = (*rr + 1) & (MAX_SOCKS - 1);
- idx = *rr;
-#else
- idx = 0;
-#endif
-
- return bpf_redirect_map(&xsks_map, idx, 0);
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index f73055e..159f64b 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1,37 +1,36 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2017 - 2018 Intel Corporation. */
-#include <assert.h>
+#include <asm/barrier.h>
#include <errno.h>
#include <getopt.h>
#include <libgen.h>
#include <linux/bpf.h>
+#include <linux/compiler.h>
#include <linux/if_link.h>
#include <linux/if_xdp.h>
#include <linux/if_ether.h>
+#include <locale.h>
+#include <net/ethernet.h>
#include <net/if.h>
+#include <poll.h>
+#include <pthread.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
-#include <net/ethernet.h>
+#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/socket.h>
-#include <sys/mman.h>
+#include <sys/types.h>
#include <time.h>
#include <unistd.h>
-#include <pthread.h>
-#include <locale.h>
-#include <sys/types.h>
-#include <poll.h>
#include "bpf/libbpf.h"
-#include "bpf_util.h"
+#include "bpf/xsk.h"
#include <bpf/bpf.h>
-#include "xdpsock.h"
-
#ifndef SOL_XDP
#define SOL_XDP 283
#endif
@@ -44,17 +43,11 @@
#define PF_XDP AF_XDP
#endif
-#define NUM_FRAMES 131072
-#define FRAME_HEADROOM 0
-#define FRAME_SHIFT 11
-#define FRAME_SIZE 2048
-#define NUM_DESCS 1024
-#define BATCH_SIZE 16
-
-#define FQ_NUM_DESCS 1024
-#define CQ_NUM_DESCS 1024
+#define NUM_FRAMES (4 * 1024)
+#define BATCH_SIZE 64
#define DEBUG_HEXDUMP 0
+#define MAX_SOCKS 8
typedef __u64 u64;
typedef __u32 u32;
@@ -73,54 +66,30 @@ static const char *opt_if = "";
static int opt_ifindex;
static int opt_queue;
static int opt_poll;
-static int opt_shared_packet_buffer;
static int opt_interval = 1;
static u32 opt_xdp_bind_flags;
static __u32 prog_id;
-struct xdp_umem_uqueue {
- u32 cached_prod;
- u32 cached_cons;
- u32 mask;
- u32 size;
- u32 *producer;
- u32 *consumer;
- u64 *ring;
- void *map;
+struct xsk_umem_info {
+ struct xsk_ring_prod fq;
+ struct xsk_ring_cons cq;
+ struct xsk_umem *umem;
};
-struct xdp_umem {
- char *frames;
- struct xdp_umem_uqueue fq;
- struct xdp_umem_uqueue cq;
- int fd;
-};
-
-struct xdp_uqueue {
- u32 cached_prod;
- u32 cached_cons;
- u32 mask;
- u32 size;
- u32 *producer;
- u32 *consumer;
- struct xdp_desc *ring;
- void *map;
-};
-
-struct xdpsock {
- struct xdp_uqueue rx;
- struct xdp_uqueue tx;
- int sfd;
- struct xdp_umem *umem;
- u32 outstanding_tx;
+struct xsk_socket_info {
+ struct xsk_ring_cons rx;
+ struct xsk_ring_prod tx;
+ struct xsk_umem_info *umem;
+ struct xsk_socket *xsk;
unsigned long rx_npkts;
unsigned long tx_npkts;
unsigned long prev_rx_npkts;
unsigned long prev_tx_npkts;
+ u32 outstanding_tx;
};
static int num_socks;
-struct xdpsock *xsks[MAX_SOCKS];
+struct xsk_socket_info *xsks[MAX_SOCKS];
static unsigned long get_nsecs(void)
{
@@ -130,225 +99,124 @@ static unsigned long get_nsecs(void)
return ts.tv_sec * 1000000000UL + ts.tv_nsec;
}
-static void dump_stats(void);
-
-#define lassert(expr) \
- do { \
- if (!(expr)) { \
- fprintf(stderr, "%s:%s:%i: Assertion failed: " \
- #expr ": errno: %d/\"%s\"\n", \
- __FILE__, __func__, __LINE__, \
- errno, strerror(errno)); \
- dump_stats(); \
- exit(EXIT_FAILURE); \
- } \
- } while (0)
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-#ifdef __aarch64__
-#define u_smp_rmb() __asm__ __volatile__("dmb ishld": : :"memory")
-#define u_smp_wmb() __asm__ __volatile__("dmb ishst": : :"memory")
-#else
-#define u_smp_rmb() barrier()
-#define u_smp_wmb() barrier()
-#endif
-#define likely(x) __builtin_expect(!!(x), 1)
-#define unlikely(x) __builtin_expect(!!(x), 0)
-
-static const char pkt_data[] =
- "\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
- "\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
- "\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
- "\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
-
-static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb)
-{
- u32 free_entries = q->cached_cons - q->cached_prod;
-
- if (free_entries >= nb)
- return free_entries;
-
- /* Refresh the local tail pointer */
- q->cached_cons = *q->consumer + q->size;
-
- return q->cached_cons - q->cached_prod;
-}
-
-static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs)
+static void print_benchmark(bool running)
{
- u32 free_entries = q->cached_cons - q->cached_prod;
+ const char *bench_str = "INVALID";
- if (free_entries >= ndescs)
- return free_entries;
+ if (opt_bench == BENCH_RXDROP)
+ bench_str = "rxdrop";
+ else if (opt_bench == BENCH_TXONLY)
+ bench_str = "txonly";
+ else if (opt_bench == BENCH_L2FWD)
+ bench_str = "l2fwd";
- /* Refresh the local tail pointer */
- q->cached_cons = *q->consumer + q->size;
- return q->cached_cons - q->cached_prod;
-}
+ printf("%s:%d %s ", opt_if, opt_queue, bench_str);
+ if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
+ printf("xdp-skb ");
+ else if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
+ printf("xdp-drv ");
+ else
+ printf(" ");
-static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb)
-{
- u32 entries = q->cached_prod - q->cached_cons;
+ if (opt_poll)
+ printf("poll() ");
- if (entries == 0) {
- q->cached_prod = *q->producer;
- entries = q->cached_prod - q->cached_cons;
+ if (running) {
+ printf("running...");
+ fflush(stdout);
}
-
- return (entries > nb) ? nb : entries;
}
-static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs)
+static void dump_stats(void)
{
- u32 entries = q->cached_prod - q->cached_cons;
+ unsigned long now = get_nsecs();
+ long dt = now - prev_time;
+ int i;
- if (entries == 0) {
- q->cached_prod = *q->producer;
- entries = q->cached_prod - q->cached_cons;
- }
+ prev_time = now;
- return (entries > ndescs) ? ndescs : entries;
-}
+ for (i = 0; i < num_socks && xsks[i]; i++) {
+ char *fmt = "%-15s %'-11.0f %'-11lu\n";
+ double rx_pps, tx_pps;
-static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq,
- struct xdp_desc *d,
- size_t nb)
-{
- u32 i;
+ rx_pps = (xsks[i]->rx_npkts - xsks[i]->prev_rx_npkts) *
+ 1000000000. / dt;
+ tx_pps = (xsks[i]->tx_npkts - xsks[i]->prev_tx_npkts) *
+ 1000000000. / dt;
- if (umem_nb_free(fq, nb) < nb)
- return -ENOSPC;
+ printf("\n sock%d@", i);
+ print_benchmark(false);
+ printf("\n");
- for (i = 0; i < nb; i++) {
- u32 idx = fq->cached_prod++ & fq->mask;
+ printf("%-15s %-11s %-11s %-11.2f\n", "", "pps", "pkts",
+ dt / 1000000000.);
+ printf(fmt, "rx", rx_pps, xsks[i]->rx_npkts);
+ printf(fmt, "tx", tx_pps, xsks[i]->tx_npkts);
- fq->ring[idx] = d[i].addr;
+ xsks[i]->prev_rx_npkts = xsks[i]->rx_npkts;
+ xsks[i]->prev_tx_npkts = xsks[i]->tx_npkts;
}
-
- u_smp_wmb();
-
- *fq->producer = fq->cached_prod;
-
- return 0;
}
-static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u64 *d,
- size_t nb)
+static void *poller(void *arg)
{
- u32 i;
-
- if (umem_nb_free(fq, nb) < nb)
- return -ENOSPC;
-
- for (i = 0; i < nb; i++) {
- u32 idx = fq->cached_prod++ & fq->mask;
-
- fq->ring[idx] = d[i];
+ (void)arg;
+ for (;;) {
+ sleep(opt_interval);
+ dump_stats();
}
- u_smp_wmb();
-
- *fq->producer = fq->cached_prod;
-
- return 0;
+ return NULL;
}
-static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq,
- u64 *d, size_t nb)
+static void remove_xdp_program(void)
{
- u32 idx, i, entries = umem_nb_avail(cq, nb);
-
- u_smp_rmb();
-
- for (i = 0; i < entries; i++) {
- idx = cq->cached_cons++ & cq->mask;
- d[i] = cq->ring[idx];
- }
-
- if (entries > 0) {
- u_smp_wmb();
+ __u32 curr_prog_id = 0;
- *cq->consumer = cq->cached_cons;
+ if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
+ printf("bpf_get_link_xdp_id failed\n");
+ exit(EXIT_FAILURE);
}
-
- return entries;
-}
-
-static inline void *xq_get_data(struct xdpsock *xsk, u64 addr)
-{
- return &xsk->umem->frames[addr];
+ if (prog_id == curr_prog_id)
+ bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+ else if (!curr_prog_id)
+ printf("couldn't find a prog id on a given interface\n");
+ else
+ printf("program on interface changed, not removing\n");
}
-static inline int xq_enq(struct xdp_uqueue *uq,
- const struct xdp_desc *descs,
- unsigned int ndescs)
+static void int_exit(int sig)
{
- struct xdp_desc *r = uq->ring;
- unsigned int i;
+ struct xsk_umem *umem = xsks[0]->umem->umem;
- if (xq_nb_free(uq, ndescs) < ndescs)
- return -ENOSPC;
-
- for (i = 0; i < ndescs; i++) {
- u32 idx = uq->cached_prod++ & uq->mask;
-
- r[idx].addr = descs[i].addr;
- r[idx].len = descs[i].len;
- }
+ (void)sig;
- u_smp_wmb();
+ dump_stats();
+ xsk_socket__delete(xsks[0]->xsk);
+ (void)xsk_umem__delete(umem);
+ remove_xdp_program();
- *uq->producer = uq->cached_prod;
- return 0;
+ exit(EXIT_SUCCESS);
}
-static inline int xq_enq_tx_only(struct xdp_uqueue *uq,
- unsigned int id, unsigned int ndescs)
+static void __exit_with_error(int error, const char *file, const char *func,
+ int line)
{
- struct xdp_desc *r = uq->ring;
- unsigned int i;
-
- if (xq_nb_free(uq, ndescs) < ndescs)
- return -ENOSPC;
-
- for (i = 0; i < ndescs; i++) {
- u32 idx = uq->cached_prod++ & uq->mask;
-
- r[idx].addr = (id + i) << FRAME_SHIFT;
- r[idx].len = sizeof(pkt_data) - 1;
- }
-
- u_smp_wmb();
-
- *uq->producer = uq->cached_prod;
- return 0;
+ fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
+ line, error, strerror(error));
+ dump_stats();
+ remove_xdp_program();
+ exit(EXIT_FAILURE);
}
-static inline int xq_deq(struct xdp_uqueue *uq,
- struct xdp_desc *descs,
- int ndescs)
-{
- struct xdp_desc *r = uq->ring;
- unsigned int idx;
- int i, entries;
-
- entries = xq_nb_avail(uq, ndescs);
-
- u_smp_rmb();
-
- for (i = 0; i < entries; i++) {
- idx = uq->cached_cons++ & uq->mask;
- descs[i] = r[idx];
- }
-
- if (entries > 0) {
- u_smp_wmb();
+#define exit_with_error(error) __exit_with_error(error, __FILE__, __func__, \
+ __LINE__)
- *uq->consumer = uq->cached_cons;
- }
-
- return entries;
-}
+static const char pkt_data[] =
+ "\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
+ "\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
+ "\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
+ "\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
static void swap_mac_addresses(void *data)
{
@@ -397,258 +265,73 @@ static void hex_dump(void *pkt, size_t length, u64 addr)
printf("\n");
}
-static size_t gen_eth_frame(char *frame)
+static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
{
- memcpy(frame, pkt_data, sizeof(pkt_data) - 1);
+ memcpy(xsk_umem__get_data(umem->umem, addr), pkt_data,
+ sizeof(pkt_data) - 1);
return sizeof(pkt_data) - 1;
}
-static struct xdp_umem *xdp_umem_configure(int sfd)
+static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
{
- int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS;
- struct xdp_mmap_offsets off;
- struct xdp_umem_reg mr;
- struct xdp_umem *umem;
- socklen_t optlen;
- void *bufs;
+ struct xsk_umem_info *umem;
+ int ret;
umem = calloc(1, sizeof(*umem));
- lassert(umem);
-
- lassert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
- NUM_FRAMES * FRAME_SIZE) == 0);
-
- mr.addr = (__u64)bufs;
- mr.len = NUM_FRAMES * FRAME_SIZE;
- mr.chunk_size = FRAME_SIZE;
- mr.headroom = FRAME_HEADROOM;
-
- lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0);
- lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size,
- sizeof(int)) == 0);
- lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size,
- sizeof(int)) == 0);
-
- optlen = sizeof(off);
- lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
- &optlen) == 0);
-
- umem->fq.map = mmap(0, off.fr.desc +
- FQ_NUM_DESCS * sizeof(u64),
- PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, sfd,
- XDP_UMEM_PGOFF_FILL_RING);
- lassert(umem->fq.map != MAP_FAILED);
-
- umem->fq.mask = FQ_NUM_DESCS - 1;
- umem->fq.size = FQ_NUM_DESCS;
- umem->fq.producer = umem->fq.map + off.fr.producer;
- umem->fq.consumer = umem->fq.map + off.fr.consumer;
- umem->fq.ring = umem->fq.map + off.fr.desc;
- umem->fq.cached_cons = FQ_NUM_DESCS;
-
- umem->cq.map = mmap(0, off.cr.desc +
- CQ_NUM_DESCS * sizeof(u64),
- PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, sfd,
- XDP_UMEM_PGOFF_COMPLETION_RING);
- lassert(umem->cq.map != MAP_FAILED);
-
- umem->cq.mask = CQ_NUM_DESCS - 1;
- umem->cq.size = CQ_NUM_DESCS;
- umem->cq.producer = umem->cq.map + off.cr.producer;
- umem->cq.consumer = umem->cq.map + off.cr.consumer;
- umem->cq.ring = umem->cq.map + off.cr.desc;
-
- umem->frames = bufs;
- umem->fd = sfd;
+ if (!umem)
+ exit_with_error(errno);
- if (opt_bench == BENCH_TXONLY) {
- int i;
-
- for (i = 0; i < NUM_FRAMES * FRAME_SIZE; i += FRAME_SIZE)
- (void)gen_eth_frame(&umem->frames[i]);
- }
+ ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
+ NULL);
+ if (ret)
+ exit_with_error(-ret);
return umem;
}
-static struct xdpsock *xsk_configure(struct xdp_umem *umem)
+static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
{
- struct sockaddr_xdp sxdp = {};
- struct xdp_mmap_offsets off;
- int sfd, ndescs = NUM_DESCS;
- struct xdpsock *xsk;
- bool shared = true;
- socklen_t optlen;
- u64 i;
-
- sfd = socket(PF_XDP, SOCK_RAW, 0);
- lassert(sfd >= 0);
+ struct xsk_socket_config cfg;
+ struct xsk_socket_info *xsk;
+ int ret;
+ u32 idx;
+ int i;
xsk = calloc(1, sizeof(*xsk));
- lassert(xsk);
-
- xsk->sfd = sfd;
- xsk->outstanding_tx = 0;
-
- if (!umem) {
- shared = false;
- xsk->umem = xdp_umem_configure(sfd);
- } else {
- xsk->umem = umem;
- }
-
- lassert(setsockopt(sfd, SOL_XDP, XDP_RX_RING,
- &ndescs, sizeof(int)) == 0);
- lassert(setsockopt(sfd, SOL_XDP, XDP_TX_RING,
- &ndescs, sizeof(int)) == 0);
- optlen = sizeof(off);
- lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
- &optlen) == 0);
-
- /* Rx */
- xsk->rx.map = mmap(NULL,
- off.rx.desc +
- NUM_DESCS * sizeof(struct xdp_desc),
- PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, sfd,
- XDP_PGOFF_RX_RING);
- lassert(xsk->rx.map != MAP_FAILED);
-
- if (!shared) {
- for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE)
- lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1)
- == 0);
- }
-
- /* Tx */
- xsk->tx.map = mmap(NULL,
- off.tx.desc +
- NUM_DESCS * sizeof(struct xdp_desc),
- PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, sfd,
- XDP_PGOFF_TX_RING);
- lassert(xsk->tx.map != MAP_FAILED);
-
- xsk->rx.mask = NUM_DESCS - 1;
- xsk->rx.size = NUM_DESCS;
- xsk->rx.producer = xsk->rx.map + off.rx.producer;
- xsk->rx.consumer = xsk->rx.map + off.rx.consumer;
- xsk->rx.ring = xsk->rx.map + off.rx.desc;
-
- xsk->tx.mask = NUM_DESCS - 1;
- xsk->tx.size = NUM_DESCS;
- xsk->tx.producer = xsk->tx.map + off.tx.producer;
- xsk->tx.consumer = xsk->tx.map + off.tx.consumer;
- xsk->tx.ring = xsk->tx.map + off.tx.desc;
- xsk->tx.cached_cons = NUM_DESCS;
-
- sxdp.sxdp_family = PF_XDP;
- sxdp.sxdp_ifindex = opt_ifindex;
- sxdp.sxdp_queue_id = opt_queue;
-
- if (shared) {
- sxdp.sxdp_flags = XDP_SHARED_UMEM;
- sxdp.sxdp_shared_umem_fd = umem->fd;
- } else {
- sxdp.sxdp_flags = opt_xdp_bind_flags;
- }
-
- lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0);
+ if (!xsk)
+ exit_with_error(errno);
+
+ xsk->umem = umem;
+ cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+ cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+ cfg.libbpf_flags = 0;
+ cfg.xdp_flags = opt_xdp_flags;
+ cfg.bind_flags = opt_xdp_bind_flags;
+ ret = xsk_socket__create(&xsk->xsk, opt_if, opt_queue, umem->umem,
+ &xsk->rx, &xsk->tx, &cfg);
+ if (ret)
+ exit_with_error(-ret);
+
+ ret = bpf_get_link_xdp_id(opt_ifindex, &prog_id, opt_xdp_flags);
+ if (ret)
+ exit_with_error(-ret);
+
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq,
+ XSK_RING_PROD__DEFAULT_NUM_DESCS,
+ &idx);
+ if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
+ exit_with_error(-ret);
+ for (i = 0;
+ i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
+ XSK_UMEM__DEFAULT_FRAME_SIZE;
+ i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
+ xsk_ring_prod__submit(&xsk->umem->fq,
+ XSK_RING_PROD__DEFAULT_NUM_DESCS);
return xsk;
}
-static void print_benchmark(bool running)
-{
- const char *bench_str = "INVALID";
-
- if (opt_bench == BENCH_RXDROP)
- bench_str = "rxdrop";
- else if (opt_bench == BENCH_TXONLY)
- bench_str = "txonly";
- else if (opt_bench == BENCH_L2FWD)
- bench_str = "l2fwd";
-
- printf("%s:%d %s ", opt_if, opt_queue, bench_str);
- if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
- printf("xdp-skb ");
- else if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
- printf("xdp-drv ");
- else
- printf(" ");
-
- if (opt_poll)
- printf("poll() ");
-
- if (running) {
- printf("running...");
- fflush(stdout);
- }
-}
-
-static void dump_stats(void)
-{
- unsigned long now = get_nsecs();
- long dt = now - prev_time;
- int i;
-
- prev_time = now;
-
- for (i = 0; i < num_socks && xsks[i]; i++) {
- char *fmt = "%-15s %'-11.0f %'-11lu\n";
- double rx_pps, tx_pps;
-
- rx_pps = (xsks[i]->rx_npkts - xsks[i]->prev_rx_npkts) *
- 1000000000. / dt;
- tx_pps = (xsks[i]->tx_npkts - xsks[i]->prev_tx_npkts) *
- 1000000000. / dt;
-
- printf("\n sock%d@", i);
- print_benchmark(false);
- printf("\n");
-
- printf("%-15s %-11s %-11s %-11.2f\n", "", "pps", "pkts",
- dt / 1000000000.);
- printf(fmt, "rx", rx_pps, xsks[i]->rx_npkts);
- printf(fmt, "tx", tx_pps, xsks[i]->tx_npkts);
-
- xsks[i]->prev_rx_npkts = xsks[i]->rx_npkts;
- xsks[i]->prev_tx_npkts = xsks[i]->tx_npkts;
- }
-}
-
-static void *poller(void *arg)
-{
- (void)arg;
- for (;;) {
- sleep(opt_interval);
- dump_stats();
- }
-
- return NULL;
-}
-
-static void int_exit(int sig)
-{
- __u32 curr_prog_id = 0;
-
- (void)sig;
- dump_stats();
- if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
- printf("bpf_get_link_xdp_id failed\n");
- exit(EXIT_FAILURE);
- }
- if (prog_id == curr_prog_id)
- bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
- else if (!curr_prog_id)
- printf("couldn't find a prog id on a given interface\n");
- else
- printf("program on interface changed, not removing\n");
- exit(EXIT_SUCCESS);
-}
-
static struct option long_options[] = {
{"rxdrop", no_argument, 0, 'r'},
{"txonly", no_argument, 0, 't'},
@@ -656,7 +339,6 @@ static struct option long_options[] = {
{"interface", required_argument, 0, 'i'},
{"queue", required_argument, 0, 'q'},
{"poll", no_argument, 0, 'p'},
- {"shared-buffer", no_argument, 0, 's'},
{"xdp-skb", no_argument, 0, 'S'},
{"xdp-native", no_argument, 0, 'N'},
{"interval", required_argument, 0, 'n'},
@@ -676,7 +358,6 @@ static void usage(const char *prog)
" -i, --interface=n Run on interface n\n"
" -q, --queue=n Use queue n (default 0)\n"
" -p, --poll Use poll syscall\n"
- " -s, --shared-buffer Use shared packet buffer\n"
" -S, --xdp-skb=n Use XDP skb-mod\n"
" -N, --xdp-native=n Enfore XDP native mode\n"
" -n, --interval=n Specify statistics update interval (default 1 sec).\n"
@@ -715,9 +396,6 @@ static void parse_command_line(int argc, char **argv)
case 'q':
opt_queue = atoi(optarg);
break;
- case 's':
- opt_shared_packet_buffer = 1;
- break;
case 'p':
opt_poll = 1;
break;
@@ -751,75 +429,104 @@ static void parse_command_line(int argc, char **argv)
opt_if);
usage(basename(argv[0]));
}
+
}
-static void kick_tx(int fd)
+static void kick_tx(struct xsk_socket_info *xsk)
{
int ret;
- ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
+ ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY)
return;
- lassert(0);
+ exit_with_error(errno);
}
-static inline void complete_tx_l2fwd(struct xdpsock *xsk)
+static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
{
- u64 descs[BATCH_SIZE];
+ u32 idx_cq, idx_fq;
unsigned int rcvd;
size_t ndescs;
if (!xsk->outstanding_tx)
return;
- kick_tx(xsk->sfd);
+ kick_tx(xsk);
ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE :
- xsk->outstanding_tx;
+ xsk->outstanding_tx;
/* re-add completed Tx buffers */
- rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs);
+ rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
if (rcvd > 0) {
- umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd);
+ unsigned int i;
+ int ret;
+
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+ while (ret != rcvd) {
+ if (ret < 0)
+ exit_with_error(-ret);
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
+ &idx_fq);
+ }
+ for (i = 0; i < rcvd; i++)
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+ *xsk_ring_cons__comp_addr(&xsk->umem->cq,
+ idx_cq++);
+
+ xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+ xsk_ring_cons__release(&xsk->umem->cq, rcvd);
xsk->outstanding_tx -= rcvd;
xsk->tx_npkts += rcvd;
}
}
-static inline void complete_tx_only(struct xdpsock *xsk)
+static inline void complete_tx_only(struct xsk_socket_info *xsk)
{
- u64 descs[BATCH_SIZE];
unsigned int rcvd;
+ u32 idx;
if (!xsk->outstanding_tx)
return;
- kick_tx(xsk->sfd);
+ kick_tx(xsk);
- rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE);
+ rcvd = xsk_ring_cons__peek(&xsk->umem->cq, BATCH_SIZE, &idx);
if (rcvd > 0) {
+ xsk_ring_cons__release(&xsk->umem->cq, rcvd);
xsk->outstanding_tx -= rcvd;
xsk->tx_npkts += rcvd;
}
}
-static void rx_drop(struct xdpsock *xsk)
+static void rx_drop(struct xsk_socket_info *xsk)
{
- struct xdp_desc descs[BATCH_SIZE];
unsigned int rcvd, i;
+ u32 idx_rx, idx_fq = 0;
+ int ret;
- rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+ rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
if (!rcvd)
return;
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+ while (ret != rcvd) {
+ if (ret < 0)
+ exit_with_error(-ret);
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+ }
+
for (i = 0; i < rcvd; i++) {
- char *pkt = xq_get_data(xsk, descs[i].addr);
+ u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+ u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+ char *pkt = xsk_umem__get_data(xsk->umem->umem, addr);
- hex_dump(pkt, descs[i].len, descs[i].addr);
+ hex_dump(pkt, len, addr);
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
}
+ xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+ xsk_ring_cons__release(&xsk->rx, rcvd);
xsk->rx_npkts += rcvd;
-
- umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
}
static void rx_drop_all(void)
@@ -830,7 +537,7 @@ static void rx_drop_all(void)
memset(fds, 0, sizeof(fds));
for (i = 0; i < num_socks; i++) {
- fds[i].fd = xsks[i]->sfd;
+ fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
fds[i].events = POLLIN;
timeout = 1000; /* 1sn */
}
@@ -847,14 +554,14 @@ static void rx_drop_all(void)
}
}
-static void tx_only(struct xdpsock *xsk)
+static void tx_only(struct xsk_socket_info *xsk)
{
int timeout, ret, nfds = 1;
struct pollfd fds[nfds + 1];
- unsigned int idx = 0;
+ u32 idx, frame_nb = 0;
memset(fds, 0, sizeof(fds));
- fds[0].fd = xsk->sfd;
+ fds[0].fd = xsk_socket__fd(xsk->xsk);
fds[0].events = POLLOUT;
timeout = 1000; /* 1sn */
@@ -864,50 +571,73 @@ static void tx_only(struct xdpsock *xsk)
if (ret <= 0)
continue;
- if (fds[0].fd != xsk->sfd ||
- !(fds[0].revents & POLLOUT))
+ if (!(fds[0].revents & POLLOUT))
continue;
}
- if (xq_nb_free(&xsk->tx, BATCH_SIZE) >= BATCH_SIZE) {
- lassert(xq_enq_tx_only(&xsk->tx, idx, BATCH_SIZE) == 0);
+ if (xsk_ring_prod__reserve(&xsk->tx, BATCH_SIZE, &idx) ==
+ BATCH_SIZE) {
+ unsigned int i;
+ for (i = 0; i < BATCH_SIZE; i++) {
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr
+ = (frame_nb + i) <<
+ XSK_UMEM__DEFAULT_FRAME_SHIFT;
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len =
+ sizeof(pkt_data) - 1;
+ }
+
+ xsk_ring_prod__submit(&xsk->tx, BATCH_SIZE);
xsk->outstanding_tx += BATCH_SIZE;
- idx += BATCH_SIZE;
- idx %= NUM_FRAMES;
+ frame_nb += BATCH_SIZE;
+ frame_nb %= NUM_FRAMES;
}
complete_tx_only(xsk);
}
}
-static void l2fwd(struct xdpsock *xsk)
+static void l2fwd(struct xsk_socket_info *xsk)
{
for (;;) {
- struct xdp_desc descs[BATCH_SIZE];
unsigned int rcvd, i;
+ u32 idx_rx, idx_tx = 0;
int ret;
for (;;) {
complete_tx_l2fwd(xsk);
- rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+ rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE,
+ &idx_rx);
if (rcvd > 0)
break;
}
+ ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx);
+ while (ret != rcvd) {
+ if (ret < 0)
+ exit_with_error(-ret);
+ ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx);
+ }
+
for (i = 0; i < rcvd; i++) {
- char *pkt = xq_get_data(xsk, descs[i].addr);
+ u64 addr = xsk_ring_cons__rx_desc(&xsk->rx,
+ idx_rx)->addr;
+ u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
+ idx_rx++)->len;
+ char *pkt = xsk_umem__get_data(xsk->umem->umem, addr);
swap_mac_addresses(pkt);
- hex_dump(pkt, descs[i].len, descs[i].addr);
+ hex_dump(pkt, len, addr);
+ xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+ xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
}
- xsk->rx_npkts += rcvd;
+ xsk_ring_prod__submit(&xsk->tx, rcvd);
+ xsk_ring_cons__release(&xsk->rx, rcvd);
- ret = xq_enq(&xsk->tx, descs, rcvd);
- lassert(ret == 0);
+ xsk->rx_npkts += rcvd;
xsk->outstanding_tx += rcvd;
}
}
@@ -915,17 +645,10 @@ static void l2fwd(struct xdpsock *xsk)
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
- struct bpf_prog_load_attr prog_load_attr = {
- .prog_type = BPF_PROG_TYPE_XDP,
- };
- int prog_fd, qidconf_map, xsks_map;
- struct bpf_prog_info info = {};
- __u32 info_len = sizeof(info);
- struct bpf_object *obj;
- char xdp_filename[256];
- struct bpf_map *map;
- int i, ret, key = 0;
+ struct xsk_umem_info *umem;
pthread_t pt;
+ void *bufs;
+ int ret;
parse_command_line(argc, argv);
@@ -935,67 +658,22 @@ int main(int argc, char **argv)
exit(EXIT_FAILURE);
}
- snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
- prog_load_attr.file = xdp_filename;
-
- if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
- exit(EXIT_FAILURE);
- if (prog_fd < 0) {
- fprintf(stderr, "ERROR: no program found: %s\n",
- strerror(prog_fd));
- exit(EXIT_FAILURE);
- }
-
- map = bpf_object__find_map_by_name(obj, "qidconf_map");
- qidconf_map = bpf_map__fd(map);
- if (qidconf_map < 0) {
- fprintf(stderr, "ERROR: no qidconf map found: %s\n",
- strerror(qidconf_map));
- exit(EXIT_FAILURE);
- }
-
- map = bpf_object__find_map_by_name(obj, "xsks_map");
- xsks_map = bpf_map__fd(map);
- if (xsks_map < 0) {
- fprintf(stderr, "ERROR: no xsks map found: %s\n",
- strerror(xsks_map));
- exit(EXIT_FAILURE);
- }
-
- if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags) < 0) {
- fprintf(stderr, "ERROR: link set xdp fd failed\n");
- exit(EXIT_FAILURE);
- }
-
- ret = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
- if (ret) {
- printf("can't get prog info - %s\n", strerror(errno));
- return 1;
- }
- prog_id = info.id;
+ ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
+ NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+ if (ret)
+ exit_with_error(ret);
- ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
- if (ret) {
- fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
- exit(EXIT_FAILURE);
- }
+ /* Create sockets... */
+ umem = xsk_configure_umem(bufs,
+ NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+ xsks[num_socks++] = xsk_configure_socket(umem);
- /* Create sockets... */
- xsks[num_socks++] = xsk_configure(NULL);
-
-#if RR_LB
- for (i = 0; i < MAX_SOCKS - 1; i++)
- xsks[num_socks++] = xsk_configure(xsks[0]->umem);
-#endif
+ if (opt_bench == BENCH_TXONLY) {
+ int i;
- /* ...and insert them into the map. */
- for (i = 0; i < num_socks; i++) {
- key = i;
- ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->sfd, 0);
- if (ret) {
- fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
- exit(EXIT_FAILURE);
- }
+ for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
+ i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+ (void)gen_eth_frame(umem, i);
}
signal(SIGINT, int_exit);
@@ -1005,7 +683,8 @@ int main(int argc, char **argv)
setlocale(LC_ALL, "");
ret = pthread_create(&pt, NULL, poller, NULL);
- lassert(ret == 0);
+ if (ret)
+ exit_with_error(ret);
prev_time = get_nsecs();
--
2.7.4
^ permalink raw reply related
* [PATCH bpf] xsk: add missing smp_rmb() in xsk_mmap
From: Magnus Karlsson @ 2019-02-08 13:13 UTC (permalink / raw)
To: magnus.karlsson, bjorn.topel, ast, daniel, netdev
All the setup code in AF_XDP is protected by a mutex with the
exception of the mmap code that cannot use it. To make sure that a
process banging on the mmap call at the same time as another process
is setting up the socket, smp_wmb() calls were added in the umem
registration code and the queue creation code, so that the published
structures that xsk_mmap needs would be consistent. However, the
corresponding smp_rmb() calls were not added to the xsk_mmap
code. This patch adds these calls.
Fixes: 37b076933a8e3 ("xsk: add missing write- and data-dependency barrier")
Fixes: c0c77d8fb787c ("xsk: add user memory registration support sockopt")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
net/xdp/xsk.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a032684..45f3b52 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -669,6 +669,8 @@ static int xsk_mmap(struct file *file, struct socket *sock,
if (!umem)
return -EINVAL;
+ /* Matches the smp_wmb() in XDP_UMEM_REG */
+ smp_rmb();
if (offset == XDP_UMEM_PGOFF_FILL_RING)
q = READ_ONCE(umem->fq);
else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING)
@@ -678,6 +680,8 @@ static int xsk_mmap(struct file *file, struct socket *sock,
if (!q)
return -EINVAL;
+ /* Matches the smp_wmb() in xsk_init_queue */
+ smp_rmb();
qpg = virt_to_head_page(q->ring);
if (size > (PAGE_SIZE << compound_order(qpg)))
return -EINVAL;
--
2.7.4
^ permalink raw reply related
* [PATCH 1/2] can: c_can: support 64 message objects for D_CAN
From: Andrejs Cainikovs @ 2019-02-08 13:17 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
linux-can@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Patrick Zysset
In-Reply-To: <20190208131738.27668-1-andrejs.cainikovs@netmodule.com>
D_CAN supports up to 128 message objects, comparing to 32 on C_CAN.
However, some CPUs with D_CAN controller have their own limits:
TI AM335x Sitara CPU, for example, supports max of 64 message objects.
This patch extends max D_CAN message objects up to 64.
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@netmodule.com>
---
drivers/net/can/c_can/Kconfig | 12 ++++++++++++
drivers/net/can/c_can/c_can.c | 42 ++++++++++++++++++++++--------------------
drivers/net/can/c_can/c_can.h | 20 ++++++++++++++++----
3 files changed, 50 insertions(+), 24 deletions(-)
diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index 61ffc12d8fd8..6c1ada7291df 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -20,4 +20,16 @@ config CAN_C_CAN_PCI
---help---
This driver adds support for the C_CAN/D_CAN chips connected
to the PCI bus.
+
+config CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ bool "Use 64 message objects for D_CAN"
+ default n
+ ---help---
+ D_CAN supports up to 128 message objects, comparing to 32 on
+ C_CAN. However, some CPUs with D_CAN controller have their
+ own limits: TI AM335x Sitara CPU, for example, supports max
+ of 64 message objects.
+ Enabling this option extends max D_CAN message objects up to
+ 64.
+
endif
diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 606b7d8ffe13..5d695b89b459 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -352,15 +352,6 @@ static void c_can_setup_tx_object(struct net_device *dev, int iface,
}
}
-static inline void c_can_activate_all_lower_rx_msg_obj(struct net_device *dev,
- int iface)
-{
- int i;
-
- for (i = C_CAN_MSG_OBJ_RX_FIRST; i <= C_CAN_MSG_RX_LOW_LAST; i++)
- c_can_object_get(dev, iface, i, IF_COMM_CLR_NEWDAT);
-}
-
static int c_can_handle_lost_msg_obj(struct net_device *dev,
int iface, int objno, u32 ctrl)
{
@@ -706,7 +697,16 @@ static void c_can_do_tx(struct net_device *dev)
struct net_device_stats *stats = &dev->stats;
u32 idx, obj, pkts = 0, bytes = 0, pend, clr;
- clr = pend = priv->read_reg(priv, C_CAN_INTPND2_REG);
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ if (priv->type == BOSCH_D_CAN) {
+ pend = priv->read_reg32(priv, C_CAN_INTPND3_REG);
+ } else {
+#endif
+ pend = priv->read_reg(priv, C_CAN_INTPND2_REG);
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ }
+#endif
+ clr = pend;
while ((idx = ffs(pend))) {
idx--;
@@ -817,7 +817,17 @@ static int c_can_read_objects(struct net_device *dev, struct c_can_priv *priv,
static inline u32 c_can_get_pending(struct c_can_priv *priv)
{
- u32 pend = priv->read_reg(priv, C_CAN_NEWDAT1_REG);
+ u32 pend;
+
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ if (priv->type == BOSCH_D_CAN) {
+ pend = priv->read_reg32(priv, C_CAN_NEWDAT1_REG);
+ } else {
+#endif
+ pend = priv->read_reg(priv, C_CAN_NEWDAT1_REG);
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ }
+#endif
return pend;
}
@@ -828,8 +838,7 @@ static inline u32 c_can_get_pending(struct c_can_priv *priv)
* c_can core saves a received CAN message into the first free message
* object it finds free (starting with the lowest). Bits NEWDAT and
* INTPND are set for this message object indicating that a new message
- * has arrived. To work-around this issue, we keep two groups of message
- * objects whose partitioning is defined by C_CAN_MSG_OBJ_RX_SPLIT.
+ * has arrived.
*
* We clear the newdat bit right away.
*
@@ -840,13 +849,6 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
struct c_can_priv *priv = netdev_priv(dev);
u32 pkts = 0, pend = 0, toread, n;
- /*
- * It is faster to read only one 16bit register. This is only possible
- * for a maximum number of 16 objects.
- */
- BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
- "Implementation does not support more message objects than 16");
-
while (quota > 0) {
if (!pend) {
pend = c_can_get_pending(priv);
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 8acdc7fa4792..e44b686a70a2 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -23,9 +23,15 @@
#define C_CAN_H
/* message object split */
+
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+#define C_CAN_NO_OF_OBJECTS 64
+#else
#define C_CAN_NO_OF_OBJECTS 32
-#define C_CAN_MSG_OBJ_RX_NUM 16
-#define C_CAN_MSG_OBJ_TX_NUM 16
+#endif
+
+#define C_CAN_MSG_OBJ_TX_NUM (C_CAN_NO_OF_OBJECTS >> 1)
+#define C_CAN_MSG_OBJ_RX_NUM (C_CAN_NO_OF_OBJECTS - C_CAN_MSG_OBJ_TX_NUM)
#define C_CAN_MSG_OBJ_RX_FIRST 1
#define C_CAN_MSG_OBJ_RX_LAST (C_CAN_MSG_OBJ_RX_FIRST + \
@@ -35,9 +41,11 @@
#define C_CAN_MSG_OBJ_TX_LAST (C_CAN_MSG_OBJ_TX_FIRST + \
C_CAN_MSG_OBJ_TX_NUM - 1)
-#define C_CAN_MSG_OBJ_RX_SPLIT 9
-#define C_CAN_MSG_RX_LOW_LAST (C_CAN_MSG_OBJ_RX_SPLIT - 1)
+#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+#define RECEIVE_OBJECT_BITS 0xffffffff
+#else
#define RECEIVE_OBJECT_BITS 0x0000ffff
+#endif
enum reg {
C_CAN_CTRL_REG = 0,
@@ -76,6 +84,8 @@ enum reg {
C_CAN_NEWDAT2_REG,
C_CAN_INTPND1_REG,
C_CAN_INTPND2_REG,
+ C_CAN_INTPND3_REG,
+ C_CAN_INTPND4_REG,
C_CAN_MSGVAL1_REG,
C_CAN_MSGVAL2_REG,
C_CAN_FUNCTION_REG,
@@ -137,6 +147,8 @@ static const u16 reg_map_d_can[] = {
[C_CAN_NEWDAT2_REG] = 0x9E,
[C_CAN_INTPND1_REG] = 0xB0,
[C_CAN_INTPND2_REG] = 0xB2,
+ [C_CAN_INTPND3_REG] = 0xB4,
+ [C_CAN_INTPND4_REG] = 0xB6,
[C_CAN_MSGVAL1_REG] = 0xC4,
[C_CAN_MSGVAL2_REG] = 0xC6,
[C_CAN_IF1_COMREQ_REG] = 0x100,
--
2.11.0
^ permalink raw reply related
* [PATCH 2/2] can: c_can: configurable amount of D_CAN RX objects
From: Andrejs Cainikovs @ 2019-02-08 13:17 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
linux-can@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Patrick Zysset
In-Reply-To: <20190208131738.27668-1-andrejs.cainikovs@netmodule.com>
Make number of D_CAN RX message objects configurable. This will allow
having bigger (or smaller) RX buffer instead of 50/50 split for RX/TX.
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@netmodule.com>
---
drivers/net/can/c_can/Kconfig | 8 ++++++
drivers/net/can/c_can/c_can.c | 64 +++++++++++++++++++++++++++++--------------
drivers/net/can/c_can/c_can.h | 16 +++++------
3 files changed, 58 insertions(+), 30 deletions(-)
diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index 6c1ada7291df..949d2d12d71e 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -32,4 +32,12 @@ config CAN_C_CAN_DCAN_64_MSG_OBJECTS
Enabling this option extends max D_CAN message objects up to
64.
+config CAN_C_CAN_DCAN_RX_MSG_OBJECTS
+ int "Specify amount of D_CAN RX message objects"
+ depends on CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ default 32
+ ---help---
+ Use specific number of message objects for RX, instead of
+ 50/50 split between RX/TX.
+
endif
diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 5d695b89b459..675bc223e222 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -208,6 +208,26 @@ static const struct can_bittiming_const c_can_bittiming_const = {
.brp_inc = 1,
};
+static inline u64 c_can_get_mask(int bits)
+{
+ return ((u64)1 << bits) - 1;
+}
+
+static inline int c_can_ffs64(u64 x)
+{
+ int b;
+
+ b = ffs(x);
+
+ if (!b) {
+ b = ffs(x >> 32);
+ if (b)
+ b += 32;
+ }
+
+ return b;
+}
+
static inline void c_can_pm_runtime_enable(const struct c_can_priv *priv)
{
if (priv->device)
@@ -695,24 +715,23 @@ static void c_can_do_tx(struct net_device *dev)
{
struct c_can_priv *priv = netdev_priv(dev);
struct net_device_stats *stats = &dev->stats;
- u32 idx, obj, pkts = 0, bytes = 0, pend, clr;
+ u32 idx, obj, pkts = 0, bytes = 0;
+ u64 pend, clr;
+ /* Mask interrupt pending bits */
+ pend = priv->read_reg32(priv, C_CAN_INTPND1_REG);
#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
if (priv->type == BOSCH_D_CAN) {
- pend = priv->read_reg32(priv, C_CAN_INTPND3_REG);
- } else {
-#endif
- pend = priv->read_reg(priv, C_CAN_INTPND2_REG);
-#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
+ pend |= (u64)priv->read_reg32(priv, C_CAN_INTPND3_REG) << 32;
}
#endif
- clr = pend;
+ pend &= ~c_can_get_mask(C_CAN_MSG_OBJ_RX_NUM);
+ clr = pend >> C_CAN_MSG_OBJ_RX_NUM;
- while ((idx = ffs(pend))) {
- idx--;
- pend &= ~(1 << idx);
- obj = idx + C_CAN_MSG_OBJ_TX_FIRST;
+ while ((obj = c_can_ffs64(pend))) {
+ pend &= ~((u64)1 << (obj - 1));
c_can_inval_tx_object(dev, IF_RX, obj);
+ idx = obj - C_CAN_MSG_OBJ_TX_FIRST;
can_get_echo_skb(dev, idx);
bytes += priv->dlc[idx];
pkts++;
@@ -736,19 +755,19 @@ static void c_can_do_tx(struct net_device *dev)
* raced with the hardware or failed to readout all upper
* objects in the last run due to quota limit.
*/
-static u32 c_can_adjust_pending(u32 pend)
+static u64 c_can_adjust_pending(u64 pend)
{
- u32 weight, lasts;
+ u64 weight, lasts;
- if (pend == RECEIVE_OBJECT_BITS)
+ if (pend == c_can_get_mask(C_CAN_MSG_OBJ_RX_NUM))
return pend;
/*
* If the last set bit is larger than the number of pending
* bits we have a gap.
*/
- weight = hweight32(pend);
- lasts = fls(pend);
+ weight = hweight64(pend);
+ lasts = fls64(pend);
/* If the bits are linear, nothing to do */
if (lasts == weight)
@@ -777,11 +796,11 @@ static inline void c_can_rx_finalize(struct net_device *dev,
}
static int c_can_read_objects(struct net_device *dev, struct c_can_priv *priv,
- u32 pend, int quota)
+ u64 pend, int quota)
{
u32 pkts = 0, ctrl, obj;
- while ((obj = ffs(pend)) && quota > 0) {
+ while ((obj = c_can_ffs64(pend)) && quota > 0) {
pend &= ~BIT(obj - 1);
c_can_rx_object_get(dev, priv, obj);
@@ -815,13 +834,15 @@ static int c_can_read_objects(struct net_device *dev, struct c_can_priv *priv,
return pkts;
}
-static inline u32 c_can_get_pending(struct c_can_priv *priv)
+static inline u64 c_can_get_pending(struct c_can_priv *priv)
{
- u32 pend;
+ u64 pend;
#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
if (priv->type == BOSCH_D_CAN) {
pend = priv->read_reg32(priv, C_CAN_NEWDAT1_REG);
+ pend |= (u64)priv->read_reg32(priv, C_CAN_NEWDAT3_REG) << 32;
+ pend &= c_can_get_mask(C_CAN_MSG_OBJ_RX_NUM);
} else {
#endif
pend = priv->read_reg(priv, C_CAN_NEWDAT1_REG);
@@ -847,7 +868,8 @@ static inline u32 c_can_get_pending(struct c_can_priv *priv)
static int c_can_do_rx_poll(struct net_device *dev, int quota)
{
struct c_can_priv *priv = netdev_priv(dev);
- u32 pkts = 0, pend = 0, toread, n;
+ u32 pkts = 0, n;
+ u64 pend = 0, toread;
while (quota > 0) {
if (!pend) {
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index e44b686a70a2..4a0759ee249d 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -26,12 +26,12 @@
#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
#define C_CAN_NO_OF_OBJECTS 64
+#define C_CAN_MSG_OBJ_RX_NUM CONFIG_CAN_C_CAN_DCAN_RX_MSG_OBJECTS
#else
#define C_CAN_NO_OF_OBJECTS 32
+#define C_CAN_MSG_OBJ_RX_NUM 16
#endif
-
-#define C_CAN_MSG_OBJ_TX_NUM (C_CAN_NO_OF_OBJECTS >> 1)
-#define C_CAN_MSG_OBJ_RX_NUM (C_CAN_NO_OF_OBJECTS - C_CAN_MSG_OBJ_TX_NUM)
+#define C_CAN_MSG_OBJ_TX_NUM (C_CAN_NO_OF_OBJECTS - C_CAN_MSG_OBJ_RX_NUM)
#define C_CAN_MSG_OBJ_RX_FIRST 1
#define C_CAN_MSG_OBJ_RX_LAST (C_CAN_MSG_OBJ_RX_FIRST + \
@@ -41,12 +41,6 @@
#define C_CAN_MSG_OBJ_TX_LAST (C_CAN_MSG_OBJ_TX_FIRST + \
C_CAN_MSG_OBJ_TX_NUM - 1)
-#ifdef CONFIG_CAN_C_CAN_DCAN_64_MSG_OBJECTS
-#define RECEIVE_OBJECT_BITS 0xffffffff
-#else
-#define RECEIVE_OBJECT_BITS 0x0000ffff
-#endif
-
enum reg {
C_CAN_CTRL_REG = 0,
C_CAN_CTRL_EX_REG,
@@ -82,6 +76,8 @@ enum reg {
C_CAN_TXRQST2_REG,
C_CAN_NEWDAT1_REG,
C_CAN_NEWDAT2_REG,
+ C_CAN_NEWDAT3_REG,
+ C_CAN_NEWDAT4_REG,
C_CAN_INTPND1_REG,
C_CAN_INTPND2_REG,
C_CAN_INTPND3_REG,
@@ -145,6 +141,8 @@ static const u16 reg_map_d_can[] = {
[C_CAN_TXRQST2_REG] = 0x8A,
[C_CAN_NEWDAT1_REG] = 0x9C,
[C_CAN_NEWDAT2_REG] = 0x9E,
+ [C_CAN_NEWDAT3_REG] = 0xA0,
+ [C_CAN_NEWDAT4_REG] = 0xA2,
[C_CAN_INTPND1_REG] = 0xB0,
[C_CAN_INTPND2_REG] = 0xB2,
[C_CAN_INTPND3_REG] = 0xB4,
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox