Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] sh_eth: Do not print an error message for probe deferral
From: David Miller @ 2017-05-18 15:21 UTC (permalink / raw)
  To: geert+renesas
  Cc: sergei.shtylyov, laurent.pinchart+renesas, netdev,
	linux-renesas-soc
In-Reply-To: <1495112495-25086-2-git-send-email-geert+renesas@glider.be>

From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Thu, 18 May 2017 15:01:35 +0200

> EPROBE_DEFER is not an error, hence printing an error message like
> 
>     sh-eth ee700000.ethernet: failed to initialise MDIO
> 
> may confuse the user.
> 
> To fix this, suppress the error message in case of probe deferral.
> While at it, shorten the message, and add the actual error code.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Applied.

^ permalink raw reply

* Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
From: David Miller @ 2017-05-18 15:21 UTC (permalink / raw)
  To: geert+renesas-gXvu3+zWzMSzQB+pC5nmwQ
  Cc: andrew-g2DYL2Zd6BY, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	thomas.petazzoni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-renesas-soc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1495112345-24795-1-git-send-email-geert+renesas-gXvu3+zWzMSzQB+pC5nmwQ@public.gmane.org>

From: Geert Uytterhoeven <geert+renesas-gXvu3+zWzMSzQB+pC5nmwQ@public.gmane.org>
Date: Thu, 18 May 2017 14:59:05 +0200

> If an Ethernet PHY is initialized before the interrupt controller it is
> connected to, a message like the following is printed:
> 
>     irq: no irq domain found for /interrupt-controller@e61c0000 !
> 
> However, the actual error is ignored, leading to a non-functional (-1)
> PHY interrupt later:
> 
>     Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=-1)
> 
> Depending on whether the PHY driver will fall back to polling, Ethernet
> may or may not work.
> 
> To fix this:
>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>      of_irq_get().
>      Unlike the former, the latter returns -EPROBE_DEFER if the
>      interrupt controller is not yet available, so this condition can be
>      detected.
>      Other errors are handled the same as before, i.e. use the passed
>      mdio->irq[addr] as interrupt.
>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>      of_mdiobus_register_device().
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas-gXvu3+zWzMSzQB+pC5nmwQ@public.gmane.org>

Florian or someone similarly knowledgable, please review.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS
From: David Miller @ 2017-05-18 15:22 UTC (permalink / raw)
  To: andrew; +Cc: f.fainelli, netdev, hayeswang, mario_limonciello
In-Reply-To: <20170518150925.GB707@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 18 May 2017 17:09:25 +0200

> Since these are software counters, they can be consistent. From a
> practical point of view, i doubt they ever will all be consistent,
> there are simply too many drivers to test and change if
> needed. However, for the ones somebody cares about, they can be made
> consistent.
> 
> I care about r8152, and would like to make it consistent with asix,
> dsa, e1000e.

No objection from me for making software counters consistent.

^ permalink raw reply

* [PATCH net-next] xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static
From: Wei Yongjun @ 2017-05-18 15:22 UTC (permalink / raw)
  To: Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
	Stefano Stabellini, Juergen Gross
  Cc: Wei Yongjun, v9fs-developer, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Fixes the following sparse warnings:

net/9p/trans_xen.c:528:5: warning:
 symbol 'p9_trans_xen_init' was not declared. Should it be static?
net/9p/trans_xen.c:540:6: warning:
 symbol 'p9_trans_xen_exit' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 net/9p/trans_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 71e8564..3deb17f 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -525,7 +525,7 @@ static struct xenbus_driver xen_9pfs_front_driver = {
 	.otherend_changed = xen_9pfs_front_changed,
 };
 
-int p9_trans_xen_init(void)
+static int p9_trans_xen_init(void)
 {
 	if (!xen_domain())
 		return -ENODEV;
@@ -537,7 +537,7 @@ int p9_trans_xen_init(void)
 }
 module_init(p9_trans_xen_init);
 
-void p9_trans_xen_exit(void)
+static void p9_trans_xen_exit(void)
 {
 	v9fs_unregister_trans(&p9_xen_trans);
 	return xenbus_unregister_driver(&xen_9pfs_front_driver);

^ permalink raw reply related

* Re: [PATCH] liquidio: make the spinlock octeon_devices_lock static
From: David Miller @ 2017-05-18 15:24 UTC (permalink / raw)
  To: colin.king
  Cc: derek.chickles, satananda.burla, felix.manlunas, raghu.vatsavayi,
	netdev, kernel-janitors, linux-kernel
In-Reply-To: <20170518091401.13013-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 18 May 2017 10:14:01 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> octeon_devices_lock can be made static as it does not need to be
> in global scope.
> 
> Cleans up sparse warning: "warning: symbol 'octeon_devices_lock'
> was not declared. Should it be static?"
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

^ permalink raw reply

* [PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()
From: Wei Yongjun @ 2017-05-18 15:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Falcon, John Allen, Nathan Fontenot
  Cc: Wei Yongjun, linuxppc-dev, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Add the missing unlock before return from function __ibmvnic_reset()
in the error handling case.

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 4f2d329..27f7933 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1313,6 +1313,7 @@ static void __ibmvnic_reset(struct work_struct *work)
 
 	if (rc) {
 		free_all_rwi(adapter);
+		mutex_unlock(&adapter->reset_lock);
 		return;
 	}

^ permalink raw reply related

* Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails
From: Nikolay Aleksandrov @ 2017-05-18 15:25 UTC (permalink / raw)
  To: Vivien Didelot, netdev
  Cc: linux-kernel, kernel, David S. Miller, Stephen Hemminger,
	moderated list:ETHERNET BRIDGE
In-Reply-To: <877f1edwxp.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

On 5/18/17 6:08 PM, Vivien Didelot wrote:
> Hi Nikolay,
> 
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:
> 
>>>    			err = __br_mdb_del(br, entry);
>>> -			if (!err)
>>> -				__br_mdb_notify(dev, p, entry, RTM_DELMDB);
>>> +			if (err)
>>> +				break;
>>> +			__br_mdb_notify(dev, p, entry, RTM_DELMDB);
>>>    		}
>>>    	} else {
>>>    		err = __br_mdb_del(br, entry);
>>>
>>
>> This can potentially break user-space scripts that rely on the best-effort
>> behaviour, this is the normal "delete without vid & enabled vlan filtering".
>> You can check the fdb delete code which does the same, this was intentional.
>>
>> You can add an mdb entry without a vid to all vlans, add a vlan and then try
>> to remove it from all vlans where it is present - with this patch obviously
>> that will fail at the new vlan.
> 
> OK good to know. That intention wasn't obvious. I can make __br_mdb_del
> return void instead? What about the rest of the patchset if I do so?
> 
> Thanks,
> 
>          Vivien
> 

If you make it return void we will not be able to return proper error value
when doing a single operation (the else case). About the rest I see only some
minor style issues, I'll comment on the respective patches. Another minor nit is 
using switch() instead of if/else for the message types but that is really up to 
you, I don't mind either way. :-)

Cheers,
  Nik

^ permalink raw reply

* [PATCH net-next] qed: Remove unused including <linux/version.h>
From: Wei Yongjun @ 2017-05-18 15:26 UTC (permalink / raw)
  To: Yuval Mintz, Ariel Elior; +Cc: Wei Yongjun, everest-linux-l2, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Remove including <linux/version.h> that is not needed.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c  | 1 -
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 1 -
 drivers/net/ethernet/qlogic/qed/qed_l2.c    | 1 -
 drivers/net/ethernet/qlogic/qed/qed_ll2.c   | 1 -
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 1 -
 5 files changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
index 21a58ff..690dd2b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
@@ -43,7 +43,6 @@
 #include <linux/slab.h>
 #include <linux/stddef.h>
 #include <linux/string.h>
-#include <linux/version.h>
 #include <linux/workqueue.h>
 #include <linux/errno.h>
 #include <linux/list.h>
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
index 339c91d..f2fd09c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
@@ -44,7 +44,6 @@
 #include <linux/slab.h>
 #include <linux/stddef.h>
 #include <linux/string.h>
-#include <linux/version.h>
 #include <linux/workqueue.h>
 #include <linux/errno.h>
 #include <linux/list.h>
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 746fed4..fab6e69 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -43,7 +43,6 @@
 #include <linux/slab.h>
 #include <linux/stddef.h>
 #include <linux/string.h>
-#include <linux/version.h>
 #include <linux/workqueue.h>
 #include <linux/bitops.h>
 #include <linux/bug.h>
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 09c8641..b04dfc4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -38,7 +38,6 @@
 #include <linux/pci.h>
 #include <linux/slab.h>
 #include <linux/stddef.h>
-#include <linux/version.h>
 #include <linux/workqueue.h>
 #include <net/ipv6.h>
 #include <linux/bitops.h>
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 537d123..f286daa 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -34,7 +34,6 @@
 #include <linux/pci.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <linux/version.h>
 #include <linux/delay.h>
 #include <asm/byteorder.h>
 #include <linux/dma-mapping.h>

^ permalink raw reply related

* Re: [patch net] mlxsw: spectrum: Avoid possible NULL pointer dereference
From: David Miller @ 2017-05-18 15:27 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, mlxsw
In-Reply-To: <20170518110352.14807-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 18 May 2017 13:03:52 +0200

> From: Ido Schimmel <idosch@mellanox.com>
> 
> In case we got an FDB notification for a port that doesn't exist we
> execute an FDB entry delete to prevent it from re-appearing the next
> time we poll for notifications.
> 
> If the operation failed we would trigger a NULL pointer dereference as
> 'mlxsw_sp_port' is NULL.
> 
> Fix it by reporting the error using the underlying bus device instead.
> 
> Fixes: 12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied, thank you.

^ permalink raw reply

* RE: [PATCH net-next] qed: Utilize FW 8.20.0.0
From: Mintz, Yuval @ 2017-05-18 15:28 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linux-scsi@vger.kernel.org, Dupuis, Chad,
	Amrani, Ram, Tayar, Tomer, Rangankar, Manish
In-Reply-To: <20170518.104300.795177460216894201.davem@davemloft.net>

> >> This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
> >> firmware. The changes are mostly contained in qed with minor changes
> >> to qedi due to some HSI changes.
> >>
> >> Content-wise, the firmware contains fixes to various issues exposed
> >> since the release of the previous firmware, including:
> >>  - Corrects iSCSI fast retransmit when data digest is enabled.
> >>  - Stop draining packets when receiving several consecutive PFCs.
> >>  - Prevent possible assertion when consecutively opening/closing
> >>    many connections.
> >>  - Prevent possible assertion due to too long BDQ fetch time.
> >>
> >> In addition, the new firmware would allow us to later add iWARP
> >> support in qed and qedr.
> >>
> >> Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
> >> Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
> >> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
> >> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
> >> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
> >
> > Applied.
> 
> Actually I had to revert.  Please look at the compiler output before
> submitting changes:
> 
> drivers/net/ethernet/qlogic/qed/qed_debug.c: In function ‘qed_grc_dump’:
> drivers/net/ethernet/qlogic/qed/qed_debug.c:2425:6: warning: ‘addr’ may
> be used uninitialized in this function [-Wmaybe-uninitialized]
>   u32 byte_addr = DWORDS_TO_BYTES(addr), offset = 0, i;
>       ^
> drivers/net/ethernet/qlogic/qed/qed_debug.c:3534:7: note: ‘addr’ was
> declared here
>    u32 addr, size = RSS_REG_RSS_RAM_DATA_SIZE;
>        ^
> 
> 'addr' is never, ever, assigned a value, yet it is passed into a function as an
> argument.

Sorry about that. Will send v2 [hopefully] later today.

^ permalink raw reply

* Re: [PATCH net-next 5/6] net: bridge: get msgtype from nlmsghdr in mdb ops
From: Nikolay Aleksandrov @ 2017-05-18 15:28 UTC (permalink / raw)
  To: Vivien Didelot, netdev
  Cc: moderated list:ETHERNET BRIDGE, kernel, linux-kernel,
	David S. Miller
In-Reply-To: <20170517212709.6473-6-vivien.didelot@savoirfairelinux.com>

On 5/18/17 12:27 AM, Vivien Didelot wrote:
> Retrieve the message type from the nlmsghdr structure instead of
> hardcoding it in both br_mdb_add and br_mdb_del.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>   net/bridge/br_mdb.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index a72d5e6f339f..d280b20587cb 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -569,6 +569,7 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	struct net_bridge_port *p;
>   	struct net_bridge_vlan *v;
>   	struct net_bridge *br;
> +	int msgtype = nlh->nlmsg_type;

minor nits:
nlmsg_type is a u16, also please keep the order and arrange these from longest 
to shortest

>   	int err;
>   
>   	err = br_mdb_parse(skb, nlh, &dev, &entry);
> @@ -595,12 +596,12 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	if (br_vlan_enabled(br) && vg && entry->vid == 0) {
>   		list_for_each_entry(v, &vg->vlan_list, vlist) {
>   			entry->vid = v->vid;
> -			err = __br_mdb_do(p, entry, RTM_NEWMDB);
> +			err = __br_mdb_do(p, entry, msgtype);
>   			if (err)
>   				break;
>   		}
>   	} else {
> -		err = __br_mdb_do(p, entry, RTM_NEWMDB);
> +		err = __br_mdb_do(p, entry, msgtype);
>   	}
>   
>   	return err;
> @@ -677,6 +678,7 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	struct net_bridge_port *p;
>   	struct net_bridge_vlan *v;
>   	struct net_bridge *br;
> +	int msgtype = nlh->nlmsg_type;

same here

>   	int err;
>   
>   	err = br_mdb_parse(skb, nlh, &dev, &entry);
> @@ -703,12 +705,12 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	if (br_vlan_enabled(br) && vg && entry->vid == 0) {
>   		list_for_each_entry(v, &vg->vlan_list, vlist) {
>   			entry->vid = v->vid;
> -			err = __br_mdb_do(p, entry, RTM_DELMDB);
> +			err = __br_mdb_do(p, entry, msgtype);
>   			if (err)
>   				break;
>   		}
>   	} else {
> -		err = __br_mdb_do(p, entry, RTM_DELMDB);
> +		err = __br_mdb_do(p, entry, msgtype);
>   	}
>   
>   	return err;
> 

^ permalink raw reply

* Re: [PATCH] net: sched: fix a use-after-free error on chain on the error exit path
From: David Miller @ 2017-05-18 15:31 UTC (permalink / raw)
  To: colin.king
  Cc: jhs, xiyou.wangcong, jiri, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20170518140702.6072-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 18 May 2017 15:07:02 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Set chain to null after the call to tcf_chain_destroy so that we don't
> call tcf_chain_put on the error exit path, thus avoiding a use-after-free
> error.
> 
> Detected by CoverityScan, CID#1436357 ("Use after free")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Colin, you really need to make some adjustments to how you are submitting
these kinds of patches.

First of all, you must indicate the target tree in your Subject line
as "[PATCH net-next] " in this case.

Also, you need to add an appropriate Fixes: tag right before your
signoff.

Thank you.

^ permalink raw reply

* [PATCH net-next] net/mlx5e: Fix possible memory leak
From: Wei Yongjun @ 2017-05-18 15:34 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky, Hadar Hen Zion
  Cc: Wei Yongjun, netdev, linux-rdma

From: Wei Yongjun <weiyongjun1@huawei.com>

'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 11c27e4..a72ecbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv,
 
 	if (!(nud_state & NUD_VALID)) {
 		neigh_event_send(n, NULL);
-		neigh_release(n);
-		return -EAGAIN;
+		err = -EAGAIN;
+		goto out;
 	}
 
 	err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
@@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv,
 
 	if (!(nud_state & NUD_VALID)) {
 		neigh_event_send(n, NULL);
-		neigh_release(n);
-		return -EAGAIN;
+		err = -EAGAIN;
+		goto out;
 	}
 
 	err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,

^ permalink raw reply related

* [RFC net-next PATCH 0/5] XDP driver feature API and handling change to xdp_buff
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer

I would like some comments on introducing a feature API between XDP
drives and XDP/BPF core.  The primary issue is when extending struct
xdp_buff, today, drivers not implementing this feature can access
uninitilized memory, using bpf-helper associated with the feature.

---

Jesper Dangaard Brouer (5):
      samples/bpf: xdp_tx_iptunnel make use of map_data[]
      mlx5: fix bug reading rss_hash_type from CQE
      net: introduce XDP driver features interface
      net: new XDP feature for reading HW rxhash from drivers
      mlx5: add XDP rxhash feature for driver mlx5


 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |   98 ++++++++++++++-------
 include/linux/filter.h                            |   31 ++++++-
 include/linux/mlx5/device.h                       |   10 ++
 include/linux/netdev_features.h                   |   34 +++++++
 include/linux/netdevice.h                         |    1 
 include/uapi/linux/bpf.h                          |   56 ++++++++++++
 kernel/bpf/verifier.c                             |    3 +
 net/core/dev.c                                    |   48 ++++++++++
 net/core/ethtool.c                                |    2 
 net/core/filter.c                                 |   73 ++++++++++++++++
 samples/bpf/bpf_helpers.h                         |    2 
 samples/bpf/xdp_tx_iptunnel_common.h              |    2 
 samples/bpf/xdp_tx_iptunnel_kern.c                |    2 
 samples/bpf/xdp_tx_iptunnel_user.c                |   14 ++-
 tools/include/uapi/linux/bpf.h                    |   10 ++
 16 files changed, 345 insertions(+), 44 deletions(-)

^ permalink raw reply

* [RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[]
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer
In-Reply-To: <149512205297.14733.15729847433404265933.stgit@firesoul>

There is no reason to use a compile time constant MAX_IPTNL_ENTRIES
shared between the _user.c and _kern.c, when map_data[].def.max_entries
can tell us dynamically what the max_entries were of the ELF map that
the bpf loaded created.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 samples/bpf/xdp_tx_iptunnel_common.h |    2 --
 samples/bpf/xdp_tx_iptunnel_kern.c   |    2 +-
 samples/bpf/xdp_tx_iptunnel_user.c   |   14 +++++++++-----
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/samples/bpf/xdp_tx_iptunnel_common.h b/samples/bpf/xdp_tx_iptunnel_common.h
index dd12cc35110f..b065699cacb5 100644
--- a/samples/bpf/xdp_tx_iptunnel_common.h
+++ b/samples/bpf/xdp_tx_iptunnel_common.h
@@ -9,8 +9,6 @@
 
 #include <linux/types.h>
 
-#define MAX_IPTNL_ENTRIES 256U
-
 struct vip {
 	union {
 		__u32 v6[4];
diff --git a/samples/bpf/xdp_tx_iptunnel_kern.c b/samples/bpf/xdp_tx_iptunnel_kern.c
index 0f4f6e8c8611..b19489eb3c22 100644
--- a/samples/bpf/xdp_tx_iptunnel_kern.c
+++ b/samples/bpf/xdp_tx_iptunnel_kern.c
@@ -30,7 +30,7 @@ struct bpf_map_def SEC("maps") vip2tnl = {
 	.type = BPF_MAP_TYPE_HASH,
 	.key_size = sizeof(struct vip),
 	.value_size = sizeof(struct iptnl_info),
-	.max_entries = MAX_IPTNL_ENTRIES,
+	.max_entries = 256,
 };
 
 static __always_inline void count_tx(u32 protocol)
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index 92b8bde9337c..0500a5cc75c4 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -123,11 +123,6 @@ static int parse_ports(const char *port_str, int *min_port, int *max_port)
 		return 1;
 	}
 
-	if (tmp_max_port - tmp_min_port + 1 > MAX_IPTNL_ENTRIES) {
-		fprintf(stderr, "Port range (%s) is larger than %u\n",
-			port_str, MAX_IPTNL_ENTRIES);
-		return 1;
-	}
 	*min_port = tmp_min_port;
 	*max_port = tmp_max_port;
 
@@ -142,6 +137,7 @@ int main(int argc, char **argv)
 	int min_port = 0, max_port = 0;
 	struct iptnl_info tnl = {};
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	unsigned int entries, max_entries;
 	struct vip vip = {};
 	char filename[256];
 	int opt;
@@ -238,6 +234,14 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	entries = max_port - min_port + 1;
+	max_entries = map_data[1].def.max_entries;
+	if (entries > max_entries) {
+		fprintf(stderr, "Req port entries (%u) is larger than max %u\n",
+			entries, max_entries);
+		return 1;
+	}
+
 	signal(SIGINT, int_exit);
 
 	while (min_port <= max_port) {

^ permalink raw reply related

* [RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer
In-Reply-To: <149512205297.14733.15729847433404265933.stgit@firesoul>

Masks for extracting part of the Completion Queue Entry (CQE)
field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
CQE_RSS_HTYPE_L4.

The bug resulted in setting skb->l4_hash, even-though the
rss_hash_type indicated that hash was NOT computed over the
L4 (UDP or TCP) part of the packet.

Added comments from the datasheet, to make it more clear what
these masks are selecting.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/mlx5/device.h |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index dd9a263ed368..a940ec6a046c 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -787,8 +787,14 @@ enum {
 };
 
 enum {
-	CQE_RSS_HTYPE_IP	= 0x3 << 6,
-	CQE_RSS_HTYPE_L4	= 0x3 << 2,
+	CQE_RSS_HTYPE_IP	= 0x3 << 2,
+	/* cqe->rss_hash_type[3:2] - IP destination selected for hash
+	 * (00 = none,  01 = IPv4, 10 = IPv6, 11 = Reserved)
+	 */
+	CQE_RSS_HTYPE_L4	= 0x3 << 6,
+	/* cqe->rss_hash_type[7:6] - L4 destination selected for hash
+	 * (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI
+	 */
 };
 
 enum {

^ permalink raw reply related

* [RFC net-next PATCH 3/5] net: introduce XDP driver features interface
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer
In-Reply-To: <149512205297.14733.15729847433404265933.stgit@firesoul>

There is a fundamental difference between normal eBPF programs
and (XDP) eBPF programs getting attached in a driver. For normal
eBPF programs it is easy to add a new bpf feature, like a bpf
helper, because is it strongly tied to the feature being
available in the current core kernel code.  When drivers invoke a
bpf_prog, then it is not sufficient to simply relying on whether
a bpf_helper exists or not.  When a driver haven't implemented a
given feature yet, then it is possible to expose uninitialized
parts of xdp_buff.  The driver pass in a pointer to xdp_buff,
usually "allocated" on the stack, which must not be exposed.

Only two user visible NETIF_F_XDP_* net_device feature flags are
exposed via ethtool (-k) seen as "xdp" and "xdp-partial".
The "xdp-partial" is detected when there is not feature equality
between kernel and driver, and a netdev_warn is given.

The idea is that XDP_DRV_* feature bits define a contract between
the driver and the kernel, giving a reliable way to know that XDP
features a driver promised to implement. Thus, knowing what bpf
side features are safe to allow.

There are 3 levels of features: "required", "devel" and "optional".

The motivation is pushing driver vendors forward to support all
the new XDP features.  Once a given feature bit is moved into
the "required" features, the kernel will reject loading XDP
program if feature isn't implemented by driver.  Features under
developement, require help from the bpf infrastrucure to detect
when a given helper or direct-access is used, using a bpf_prog
bit to mark a need for the feature, and pulling in this bit in
the xdp_features_check().  When all drivers have implemented
a "devel" feature, it can be moved to the "required" feature and
the bpf_prog bit can be refurbished. The "optional" features are
for things that are handled safely runtime, but drivers will
still get flagged as "xdp-partial" if not implementing those.
---
 include/linux/netdev_features.h |   32 ++++++++++++++++++++++++++++++++
 include/linux/netdevice.h       |    1 +
 net/core/dev.c                  |   34 ++++++++++++++++++++++++++++++++++
 net/core/ethtool.c              |    2 ++
 4 files changed, 69 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 1d4737cffc71..ff81ee231410 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -77,6 +77,8 @@ enum {
 	NETIF_F_HW_ESP_BIT,		/* Hardware ESP transformation offload */
 	NETIF_F_HW_ESP_TX_CSUM_BIT,	/* ESP with TX checksum offload */
 
+	NETIF_F_XDP_BASELINE_BIT,	/* Driver supports XDP */
+	NETIF_F_XDP_PARTIAL_BIT,	/* not supporting all XDP features */
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -140,6 +142,8 @@ enum {
 #define NETIF_F_HW_TC		__NETIF_F(HW_TC)
 #define NETIF_F_HW_ESP		__NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM	__NETIF_F(HW_ESP_TX_CSUM)
+#define NETIF_F_XDP_BASELINE	__NETIF_F(XDP_BASELINE)
+#define NETIF_F_XDP_PARTIAL	__NETIF_F(XDP_PARTIAL)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
@@ -212,4 +216,32 @@ enum {
 				 NETIF_F_GSO_UDP_TUNNEL |		\
 				 NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
+/* XDP driver flags */
+enum {
+	XDP_DRV_F_ENABLED_BIT,
+};
+
+#define __XDP_DRV_F_BIT(bit)	((netdev_features_t)1 << (bit))
+#define __XDP_DRV_F(name)	__XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT)
+#define XDP_DRV_F_ENABLED	__XDP_DRV_F(ENABLED)
+
+/* XDP driver MUST support these features, else kernel MUST reject
+ * bpf_prog to guarantee safe access to data structures
+ */
+#define XDP_DRV_FEATURES_REQUIRED	XDP_DRV_F_ENABLED
+
+/* Some XDP features are under development. Based on bpf_prog loading
+ * detect if kernel feature can be activated.
+ */
+#define XDP_DRV_FEATURES_DEVEL		0
+
+/* Some XDP features are optional, like action return code, as they
+ * are handled safely runtime.
+ */
+#define XDP_DRV_FEATURES_OPTIONAL	0
+
+#define XDP_DRV_FEATURES_MASK		(XDP_DRV_FEATURES_REQUIRED |	\
+					 XDP_DRV_FEATURES_DEVEL |	\
+					 XDP_DRV_FEATURES_OPTIONAL)
+
 #endif	/* _LINUX_NETDEV_FEATURES_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2efb56..329ae156ff65 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1685,6 +1685,7 @@ struct net_device {
 	netdev_features_t	hw_enc_features;
 	netdev_features_t	mpls_features;
 	netdev_features_t	gso_partial_features;
+	netdev_features_t	xdp_features;
 
 	int			ifindex;
 	int			group;
diff --git a/net/core/dev.c b/net/core/dev.c
index 35a06cebb282..b4af5fbbd9da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6851,6 +6851,25 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
+bool xdp_features_check(struct net_device *dev, struct bpf_prog *xdp_prog,
+			struct netlink_ext_ack *extack, u32 flags)
+{
+	netdev_features_t req_features = XDP_DRV_FEATURES_REQUIRED;
+	netdev_features_t dev_xdp_features;
+
+	/* Generic XDP naturally support all features */
+	if (flags & XDP_FLAGS_SKB_MODE)
+		return true;
+
+	dev_xdp_features = dev->xdp_features & XDP_DRV_FEATURES_MASK;
+	if (req_features & ~dev_xdp_features) {
+		NL_SET_ERR_MSG(extack,
+			       "Required XDP feature not supported by device");
+		return false;
+	}
+	return true;
+}
+
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
@@ -6890,6 +6909,11 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
+
+		if (!xdp_features_check(dev, prog, extack, flags)) {
+			bpf_prog_put(prog);
+			return -EOPNOTSUPP;
+		}
 	}
 
 	memset(&xdp, 0, sizeof(xdp));
@@ -7402,6 +7426,16 @@ int register_netdevice(struct net_device *dev)
 	dev->features |= NETIF_F_SOFT_FEATURES;
 	dev->wanted_features = dev->features & dev->hw_features;
 
+	/* Transfer XDP features and detect mismatch */
+	if (dev->netdev_ops->ndo_xdp) {
+		dev->xdp_features |= XDP_DRV_F_ENABLED;
+		dev->features     |= NETIF_F_XDP_BASELINE;
+		if (dev->xdp_features ^ XDP_DRV_FEATURES_MASK) {
+			netdev_warn(dev, "Partial XDP support in driver\n");
+			dev->features |= NETIF_F_XDP_PARTIAL;
+		}
+	}
+
 	if (!(dev->flags & IFF_LOOPBACK))
 		dev->hw_features |= NETIF_F_NOCACHE_COPY;
 
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03111a2d6653..d283cdc9ee25 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -106,6 +106,8 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_HW_TC_BIT] =		 "hw-tc-offload",
 	[NETIF_F_HW_ESP_BIT] =		 "esp-hw-offload",
 	[NETIF_F_HW_ESP_TX_CSUM_BIT] =	 "esp-tx-csum-hw-offload",
+	[NETIF_F_XDP_BASELINE_BIT] =	 "xdp",
+	[NETIF_F_XDP_PARTIAL_BIT] =	 "xdp-partial", /* "xdp-challenged"? */
 };
 
 static const char

^ permalink raw reply related

* [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer
In-Reply-To: <149512205297.14733.15729847433404265933.stgit@firesoul>

Introducing a new XDP feature and associated bpf helper bpf_xdp_rxhash.

The rxhash and type allow filtering on packets without touching
packet memory.  The performance difference on my system with a
100 Gbit/s mlx5 NIC is 12Mpps to 19Mpps.

TODO: desc RXHASH and associated type, and how XDP choose to map
and export these to bpf_prog's.

TODO: desc how this interacts with XDP driver features system.
---
 include/linux/filter.h          |   31 ++++++++++++++++-
 include/linux/netdev_features.h |    4 ++
 include/uapi/linux/bpf.h        |   56 +++++++++++++++++++++++++++++-
 kernel/bpf/verifier.c           |    3 ++
 net/core/dev.c                  |   16 ++++++++-
 net/core/filter.c               |   73 +++++++++++++++++++++++++++++++++++++++
 samples/bpf/bpf_helpers.h       |    2 +
 tools/include/uapi/linux/bpf.h  |   10 +++++
 8 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa..33a254ccd47d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -413,7 +413,8 @@ struct bpf_prog {
 				locked:1,	/* Program image locked? */
 				gpl_compatible:1, /* Is filter GPL compatible? */
 				cb_access:1,	/* Is control block accessed? */
-				dst_needed:1;	/* Do we need dst entry? */
+				dst_needed:1,	/* Do we need dst entry? */
+				xdp_rxhash_needed:1;/* Req XDP RXHASH support */
 	kmemcheck_bitfield_end(meta);
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	u32			len;		/* Number of filter blocks */
@@ -444,12 +445,40 @@ struct bpf_skb_data_end {
 	void *data_end;
 };
 
+/* Kernel internal xdp_buff->flags */
+#define XDP_CTX_F_RXHASH_TYPE_MASK	XDP_HASH_TYPE_MASK
+#define XDP_CTX_F_RXHASH_TYPE_BITS	XDP_HASH_TYPE_BITS
+#define XDP_CTX_F_RXHASH_SW		(1ULL <<  XDP_CTX_F_RXHASH_TYPE_BITS)
+#define XDP_CTX_F_RXHASH_HW		(1ULL << (XDP_CTX_F_RXHASH_TYPE_BITS+1))
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
 	void *data_hard_start;
+	u64 flags;
+	u32 rxhash;
 };
 
+/* helper functions for driver setting rxhash */
+static inline void
+xdp_record_hash(struct xdp_buff *xdp, u32 hash, u32 type)
+{
+	xdp->flags |= XDP_CTX_F_RXHASH_HW;
+	xdp->flags |= type & XDP_CTX_F_RXHASH_TYPE_MASK;
+	xdp->rxhash = hash;
+}
+
+static inline void
+xdp_set_skb_hash(struct xdp_buff *xdp, struct sk_buff *skb)
+{
+	if (likely(xdp->flags & (XDP_CTX_F_RXHASH_HW|XDP_CTX_F_RXHASH_SW))) {
+		bool is_sw = !!(xdp->flags | XDP_CTX_F_RXHASH_SW);
+		bool is_l4 = !!(xdp->flags & XDP_HASH_TYPE_L4_MASK);
+
+		__skb_set_hash(skb, xdp->rxhash, is_sw, is_l4);
+	}
+}
+
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf, act_bpf and lwt programs
  */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index ff81ee231410..4b50e8c606c5 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -219,11 +219,13 @@ enum {
 /* XDP driver flags */
 enum {
 	XDP_DRV_F_ENABLED_BIT,
+	XDP_DRV_F_RXHASH_BIT,
 };
 
 #define __XDP_DRV_F_BIT(bit)	((netdev_features_t)1 << (bit))
 #define __XDP_DRV_F(name)	__XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT)
 #define XDP_DRV_F_ENABLED	__XDP_DRV_F(ENABLED)
+#define XDP_DRV_F_RXHASH	__XDP_DRV_F(RXHASH)
 
 /* XDP driver MUST support these features, else kernel MUST reject
  * bpf_prog to guarantee safe access to data structures
@@ -233,7 +235,7 @@ enum {
 /* Some XDP features are under development. Based on bpf_prog loading
  * detect if kernel feature can be activated.
  */
-#define XDP_DRV_FEATURES_DEVEL		0
+#define XDP_DRV_FEATURES_DEVEL		XDP_DRV_F_RXHASH
 
 /* Some XDP features are optional, like action return code, as they
  * are handled safely runtime.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 945a1f5f63c5..1d9d3a46217d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -482,6 +482,9 @@ union bpf_attr {
  *     Get the owner uid of the socket stored inside sk_buff.
  *     @skb: pointer to skb
  *     Return: uid of the socket owner on success or overflowuid if failed.
+ *
+ * u64 bpf_xdp_rxhash(xdp_md, new_hash, type, flags)
+ *	TODO: MISSING DESC
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -531,7 +534,8 @@ union bpf_attr {
 	FN(xdp_adjust_head),		\
 	FN(probe_read_str),		\
 	FN(get_socket_cookie),		\
-	FN(get_socket_uid),
+	FN(get_socket_uid),		\
+	FN(xdp_rxhash),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -581,6 +585,10 @@ enum bpf_func_id {
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
 #define BPF_F_CTXLEN_MASK		(0xfffffULL << 32)
 
+/* BPF_FUNC_xdp_rxhash flags */
+#define BPF_F_RXHASH_SET		0ULL
+#define BPF_F_RXHASH_GET		(1ULL << 0)
+
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
  */
@@ -660,6 +668,52 @@ enum xdp_action {
 struct xdp_md {
 	__u32 data;
 	__u32 data_end;
+	__u32 rxhash;
+	/* (FIXME delete comment)
+	 * Discussion: If choosing to support direct read, then I
+	 * (believe) having a separate 'rxhash_type' is easier and
+	 * faster to implement. (Else I have to do BPF instruction
+	 * hacks to move the type into upper bits of 'rxhash', which I
+	 * couldn't figureout how to do ;-))
+	*/
+	__u32 rxhash_type;
 };
 
+/* XDP rxhash have an associated type, which is related to the RSS
+ * (Receive Side Scaling) standard, but NIC HW have different mapping
+ * and support. Thus, create mapping that is interesting for XDP.  XDP
+ * would primarly want insight into L3 and L4 protocol info.
+ *
+ * TODO: Likely need to get extended with "L3_IPV6_EX" due RSS standard
+ *
+ * The HASH_TYPE will be returned from bpf helper as the top 32-bit of
+ * the 64-bit rxhash (internally type stored in xdp_buff->flags).
+ */
+#define XDP_HASH(x)		((x) & ((1ULL << 32)-1))
+#define XDP_HASH_TYPE(x)	((x) >> 32)
+
+#define XDP_HASH_TYPE_L3_SHIFT	0
+#define XDP_HASH_TYPE_L3_BITS	3
+#define XDP_HASH_TYPE_L3_MASK	((1ULL << XDP_HASH_TYPE_L3_BITS)-1)
+#define XDP_HASH_TYPE_L3(x)	((x) & XDP_HASH_TYPE_L3_MASK)
+enum {
+	XDP_HASH_TYPE_L3_IPV4 = 1,
+	XDP_HASH_TYPE_L3_IPV6,
+};
+
+#define XDP_HASH_TYPE_L4_SHIFT	XDP_HASH_TYPE_L3_BITS
+#define XDP_HASH_TYPE_L4_BITS	5
+#define XDP_HASH_TYPE_L4_MASK						\
+	(((1ULL << XDP_HASH_TYPE_L4_BITS)-1) << XDP_HASH_TYPE_L4_SHIFT)
+#define XDP_HASH_TYPE_L4(x)	((x) & XDP_HASH_TYPE_L4_MASK)
+enum {
+	_XDP_HASH_TYPE_L4_TCP = 1,
+	_XDP_HASH_TYPE_L4_UDP,
+};
+#define XDP_HASH_TYPE_L4_TCP (_XDP_HASH_TYPE_L4_TCP << XDP_HASH_TYPE_L4_SHIFT)
+#define XDP_HASH_TYPE_L4_UDP (_XDP_HASH_TYPE_L4_UDP << XDP_HASH_TYPE_L4_SHIFT)
+
+#define XDP_HASH_TYPE_BITS   (XDP_HASH_TYPE_L3_BITS + XDP_HASH_TYPE_L4_BITS)
+#define XDP_HASH_TYPE_MASK   (XDP_HASH_TYPE_L3_MASK | XDP_HASH_TYPE_L4_MASK)
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f8b6ed690be..248bc113ad18 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3346,6 +3346,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			prog->dst_needed = 1;
 		if (insn->imm == BPF_FUNC_get_prandom_u32)
 			bpf_user_rnd_init_once();
+		if (insn->imm == BPF_FUNC_xdp_rxhash)
+			prog->xdp_rxhash_needed = 1;
 		if (insn->imm == BPF_FUNC_tail_call) {
 			/* If we tail call into other programs, we
 			 * cannot make any assumptions since they can
@@ -3353,6 +3355,7 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			 * the program array.
 			 */
 			prog->cb_access = 1;
+			prog->xdp_rxhash_needed = 1;
 
 			/* mark bpf_tail_call as different opcode to avoid
 			 * conditional branch in the interpeter for every normal
diff --git a/net/core/dev.c b/net/core/dev.c
index b4af5fbbd9da..28082067ac00 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4318,9 +4318,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp.data_end = xdp.data + hlen;
 	xdp.data_hard_start = skb->data - skb_headroom(skb);
 	orig_data = xdp.data;
+	xdp.flags  = 0;
+	xdp.rxhash = skb->hash;
 
 	act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
+	xdp_set_skb_hash(&xdp, skb);
+
 	off = xdp.data - orig_data;
 	if (off > 0)
 		__skb_pull(skb, off);
@@ -6851,10 +6855,20 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
+netdev_features_t bpf_get_xdp_features(struct bpf_prog *prog)
+{
+	netdev_features_t features = XDP_DRV_FEATURES_REQUIRED;
+
+	if (prog->xdp_rxhash_needed)
+		features |= XDP_DRV_F_RXHASH;
+
+	return features;
+}
+
 bool xdp_features_check(struct net_device *dev, struct bpf_prog *xdp_prog,
 			struct netlink_ext_ack *extack, u32 flags)
 {
-	netdev_features_t req_features = XDP_DRV_FEATURES_REQUIRED;
+	netdev_features_t req_features = bpf_get_xdp_features(xdp_prog);
 	netdev_features_t dev_xdp_features;
 
 	/* Generic XDP naturally support all features */
diff --git a/net/core/filter.c b/net/core/filter.c
index a253a6197e6b..df04ac73f581 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2272,6 +2272,54 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_xdp_rxhash, struct xdp_buff *, xdp, u32, new_hash, u32, type,
+	   unsigned int, flags)
+{
+	/* Read+write access to xdp_buff->rxhash is safe, because
+	 * fixup_bpf_calls() detect when helper is used, and drivers
+	 * not implemeting rxhash will not be allowed to load bpf_prog.
+	 */
+
+	/* Set hash and type */
+	if (flags == BPF_F_RXHASH_SET) {
+		xdp->rxhash = new_hash;
+		xdp->flags |= XDP_CTX_F_RXHASH_SW; /* Need for skb "is_sw" */
+		xdp->flags |= type & XDP_CTX_F_RXHASH_TYPE_MASK;
+	}
+
+	/* Get can specify "type" interested in */
+	if ((flags == BPF_F_RXHASH_GET) &&
+	    (type & XDP_CTX_F_RXHASH_TYPE_MASK)) {
+		u32 f_type = (xdp->flags & XDP_CTX_F_RXHASH_TYPE_MASK);
+		bool match = false;
+
+		/* Match on either L3 or L4 type rxhash */
+		if (!((type ^ f_type) & XDP_HASH_TYPE_L3_MASK))
+			match = true;
+		if (!((type ^ f_type) & XDP_HASH_TYPE_L4_MASK))
+			match = true;
+		if (match == false)
+			return 0;
+	}
+
+	/* Drivers only xdp_record_hash if NETIF_F_RXHASH enabled */
+	if (likely(xdp->flags & (XDP_CTX_F_RXHASH_HW|XDP_CTX_F_RXHASH_SW))) {
+		u64 rxhash_type = xdp->flags & XDP_CTX_F_RXHASH_TYPE_MASK;
+
+		return (u64)(xdp->rxhash | (rxhash_type << 32));
+	}
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_rxhash_proto = {
+	.func           = bpf_xdp_rxhash,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER, // Q: how do I say "u64" ?
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+	.arg3_type      = ARG_ANYTHING,
+};
+
 bool bpf_helper_changes_pkt_data(void *func)
 {
 	if (func == bpf_skb_vlan_push ||
@@ -2760,6 +2808,8 @@ xdp_func_proto(enum bpf_func_id func_id)
 		return &bpf_get_smp_processor_id_proto;
 	case BPF_FUNC_xdp_adjust_head:
 		return &bpf_xdp_adjust_head_proto;
+	case BPF_FUNC_xdp_rxhash:
+		return &bpf_xdp_rxhash_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -3308,6 +3358,29 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 				      si->dst_reg, si->src_reg,
 				      offsetof(struct xdp_buff, data_end));
 		break;
+	case offsetof(struct xdp_md, rxhash):
+		/* Direct read-access to rxhash is safe, as drivers
+		 * not implementing will not be allowed to load bpf_prog.
+		 *
+		 * Driver gotchas: Even if NETIF_F_RXHASH is disabled
+		 * drivers must init xdp_buff->rxhash, due to this
+		 * direct read.
+		 */
+		prog->xdp_rxhash_needed = 1;
+
+		BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_buff, rxhash) != 4);
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct xdp_buff, rxhash));
+		break;
+	case offsetof(struct xdp_md, rxhash_type):
+		/* rxhash_type stored in lower 8-bits of xdp_buff->flags */
+		prog->xdp_rxhash_needed = 1;
+
+		BUILD_BUG_ON(XDP_HASH_TYPE_BITS != 8);
+		/* Load first 8 bits (BPF_B) of flags */
+		*insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg,
+				      offsetof(struct xdp_buff, flags));
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 9a9c95f2c9fb..634a976a02c6 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -59,6 +59,8 @@ static unsigned long long (*bpf_get_prandom_u32)(void) =
 	(void *) BPF_FUNC_get_prandom_u32;
 static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
 	(void *) BPF_FUNC_xdp_adjust_head;
+static unsigned long long (*bpf_xdp_rxhash)(void *ctx, __u32 new_hash, __u32 type, unsigned int flags) =
+	(void *) BPF_FUNC_xdp_rxhash;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e553529929f6..a38c544bf6f0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -483,6 +483,9 @@ union bpf_attr {
  *     @skb: pointer to skb
  *     Return: uid of the socket owner on success or 0 if the socket pointer
  *     inside sk_buff is NULL
+ *
+ * u64 bpf_xdp_rxhash(xdp_md, new_hash, type, flags)
+ *	FIXME: Copy desc from include/uapi/linux/bpf.h
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -532,7 +535,8 @@ union bpf_attr {
 	FN(xdp_adjust_head),		\
 	FN(probe_read_str),		\
 	FN(get_socket_cookie),		\
-	FN(get_socket_uid),
+	FN(get_socket_uid),		\
+	FN(xdp_rxhash),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -661,6 +665,10 @@ enum xdp_action {
 struct xdp_md {
 	__u32 data;
 	__u32 data_end;
+	__u32 rxhash;
+	__u32 rxhash_type;
 };
 
+// FIXME: Sync with include/uapi/linux/bpf.h
+
 #endif /* _UAPI__LINUX_BPF_H__ */

^ permalink raw reply related

* [RFC net-next PATCH 5/5] mlx5: add XDP rxhash feature for driver mlx5
From: Jesper Dangaard Brouer @ 2017-05-18 15:41 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: John Fastabend, netdev, Jesper Dangaard Brouer
In-Reply-To: <149512205297.14733.15729847433404265933.stgit@firesoul>


---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |   98 ++++++++++++++-------
 2 files changed, 70 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e43411d232ee..3ae90dbdd3de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3956,6 +3956,9 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
+	/* XDP_DRV_F_ENABLED is added in register_netdevice() */
+	netdev->xdp_features = XDP_DRV_F_RXHASH;
+
 	if (mlx5e_vxlan_allowed(mdev)) {
 		netdev->hw_features     |= NETIF_F_GSO_UDP_TUNNEL |
 					   NETIF_F_GSO_UDP_TUNNEL_CSUM |
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ae66fad98244..eb9d859bf09d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -514,14 +514,28 @@ static void mlx5e_lro_update_hdr(struct sk_buff *skb, struct mlx5_cqe64 *cqe,
 	}
 }
 
-static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 *cqe,
-				      struct sk_buff *skb)
+u8 mlx5_htype_l3_to_xdp[4] = {
+	0,			/* 00 - none */
+	XDP_HASH_TYPE_L3_IPV4,	/* 01 - IPv4 */
+	XDP_HASH_TYPE_L3_IPV6,	/* 10 - IPv6 */
+	0,			/* 11 - Reserved */
+};
+
+u8 mlx5_htype_l4_to_xdp[4] = {
+	0,			/* 00 - none */
+	XDP_HASH_TYPE_L4_TCP,	/* 01 - TCP  */
+	XDP_HASH_TYPE_L4_UDP,	/* 10 - UDP  */
+	0,			/* 11 - IPSEC.SPI */
+};
+
+static inline void mlx5e_xdp_set_hash(struct mlx5_cqe64 *cqe,
+				      struct xdp_buff *xdp)
 {
 	u8 cht = cqe->rss_hash_type;
-	int ht = (cht & CQE_RSS_HTYPE_L4) ? PKT_HASH_TYPE_L4 :
-		 (cht & CQE_RSS_HTYPE_IP) ? PKT_HASH_TYPE_L3 :
-					    PKT_HASH_TYPE_NONE;
-	skb_set_hash(skb, be32_to_cpu(cqe->rss_hash_result), ht);
+	u32 ht = (mlx5_htype_l4_to_xdp[((cht & CQE_RSS_HTYPE_L4) >> 6)] | \
+		  mlx5_htype_l3_to_xdp[((cht & CQE_RSS_HTYPE_IP) >> 2)]);
+
+	xdp_record_hash(xdp, be32_to_cpu(cqe->rss_hash_result), ht);
 }
 
 static inline bool is_first_ethertype_ip(struct sk_buff *skb)
@@ -570,7 +584,8 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 				      u32 cqe_bcnt,
 				      struct mlx5e_rq *rq,
-				      struct sk_buff *skb)
+				      struct sk_buff *skb,
+				      struct xdp_buff *xdp)
 {
 	struct net_device *netdev = rq->netdev;
 	struct mlx5e_tstamp *tstamp = rq->tstamp;
@@ -593,8 +608,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 
 	skb_record_rx_queue(skb, rq->ix);
 
-	if (likely(netdev->features & NETIF_F_RXHASH))
-		mlx5e_skb_set_hash(cqe, skb);
+	xdp_set_skb_hash(xdp, skb);
 
 	if (cqe_has_vlan(cqe))
 		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
@@ -609,11 +623,12 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
-					 struct sk_buff *skb)
+					 struct sk_buff *skb,
+					 struct xdp_buff *xdp)
 {
 	rq->stats.packets++;
 	rq->stats.bytes += cqe_bcnt;
-	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb, xdp);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq)
@@ -696,27 +711,27 @@ static inline bool mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq,
 /* returns true if packet was consumed by xdp */
 static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
 				   struct mlx5e_dma_info *di,
-				   void *va, u16 *rx_headroom, u32 *len)
+				   struct xdp_buff *xdp, void *va,
+				   u16 *rx_headroom, u32 *len)
 {
 	const struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
-	struct xdp_buff xdp;
 	u32 act;
 
 	if (!prog)
 		return false;
 
-	xdp.data = va + *rx_headroom;
-	xdp.data_end = xdp.data + *len;
-	xdp.data_hard_start = va;
+	xdp->data = va + *rx_headroom;
+	xdp->data_end = xdp->data + *len;
+	xdp->data_hard_start = va;
 
-	act = bpf_prog_run_xdp(prog, &xdp);
+	act = bpf_prog_run_xdp(prog, xdp);
 	switch (act) {
 	case XDP_PASS:
-		*rx_headroom = xdp.data - xdp.data_hard_start;
-		*len = xdp.data_end - xdp.data;
+		*rx_headroom = xdp->data - xdp->data_hard_start;
+		*len = xdp->data_end - xdp->data;
 		return false;
 	case XDP_TX:
-		if (unlikely(!mlx5e_xmit_xdp_frame(rq, di, &xdp)))
+		if (unlikely(!mlx5e_xmit_xdp_frame(rq, di, xdp)))
 			trace_xdp_exception(rq->netdev, prog, act);
 		return true;
 	default:
@@ -731,7 +746,22 @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
 }
 
 static inline
+void mlx5_fill_xdp_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
+			    struct xdp_buff *xdp)
+{
+	struct net_device *netdev = rq->netdev;
+
+	xdp->flags = 0;
+
+	if (likely(netdev->features & NETIF_F_RXHASH))
+		mlx5e_xdp_set_hash(cqe, xdp);
+	else
+		xdp->rxhash=0; /* Due to bpf direct read */
+}
+
+static inline
 struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
+			     struct xdp_buff *xdp,
 			     u16 wqe_counter, u32 cqe_bcnt)
 {
 	struct mlx5e_dma_info *di;
@@ -756,9 +786,10 @@ struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 		mlx5e_page_release(rq, di, true);
 		return NULL;
 	}
+	mlx5_fill_xdp_rx_cqe(rq, cqe, xdp);
 
 	rcu_read_lock();
-	consumed = mlx5e_xdp_handle(rq, di, va, &rx_headroom, &cqe_bcnt);
+	consumed = mlx5e_xdp_handle(rq, di, xdp, va, &rx_headroom, &cqe_bcnt);
 	rcu_read_unlock();
 	if (consumed)
 		return NULL; /* page/packet was consumed by XDP */
@@ -784,6 +815,7 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	struct mlx5e_rx_wqe *wqe;
 	__be16 wqe_counter_be;
+	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	u16 wqe_counter;
 	u32 cqe_bcnt;
@@ -793,11 +825,11 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 	cqe_bcnt       = be32_to_cpu(cqe->byte_cnt);
 
-	skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+	skb = skb_from_cqe(rq, cqe, &xdp, wqe_counter, cqe_bcnt);
 	if (!skb)
 		goto wq_ll_pop;
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb, &xdp);
 	napi_gro_receive(rq->cq.napi, skb);
 
 wq_ll_pop:
@@ -811,6 +843,7 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_eswitch_rep *rep = priv->ppriv;
 	struct mlx5e_rx_wqe *wqe;
+	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	__be16 wqe_counter_be;
 	u16 wqe_counter;
@@ -821,11 +854,11 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 	cqe_bcnt       = be32_to_cpu(cqe->byte_cnt);
 
-	skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+	skb = skb_from_cqe(rq, cqe, &xdp, wqe_counter, cqe_bcnt);
 	if (!skb)
 		goto wq_ll_pop;
 
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb, &xdp);
 
 	if (rep->vlan && skb_vlan_tag_present(skb))
 		skb_vlan_pop(skb);
@@ -882,6 +915,7 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	struct mlx5e_mpw_info *wi = &rq->mpwqe.info[wqe_id];
 	struct mlx5e_rx_wqe  *wqe = mlx5_wq_ll_get_wqe(&rq->wq, wqe_id);
 	struct sk_buff *skb;
+	struct xdp_buff xdp;
 	u16 cqe_bcnt;
 
 	wi->consumed_strides += cstrides;
@@ -906,9 +940,10 @@ void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 
 	prefetch(skb->data);
 	cqe_bcnt = mpwrq_get_cqe_byte_cnt(cqe);
+	mlx5_fill_xdp_rx_cqe(rq, cqe, &xdp);
 
 	mlx5e_mpwqe_fill_rx_skb(rq, cqe, wi, cqe_bcnt, skb);
-	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb, &xdp);
 	napi_gro_receive(rq->cq.napi, skb);
 
 mpwrq_cqe_out:
@@ -1043,7 +1078,8 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
 static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
-					 struct sk_buff *skb)
+					 struct sk_buff *skb,
+					 struct xdp_buff *xdp)
 {
 	struct net_device *netdev = rq->netdev;
 	u8 *dgid;
@@ -1071,8 +1107,7 @@ static inline void mlx5i_complete_rx_cqe(struct mlx5e_rq *rq,
 
 	skb_record_rx_queue(skb, rq->ix);
 
-	if (likely(netdev->features & NETIF_F_RXHASH))
-		mlx5e_skb_set_hash(cqe, skb);
+	xdp_set_skb_hash(xdp, skb);
 
 	skb_reset_mac_header(skb);
 	skb_pull(skb, MLX5_IPOIB_ENCAP_LEN);
@@ -1088,6 +1123,7 @@ void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 {
 	struct mlx5e_rx_wqe *wqe;
 	__be16 wqe_counter_be;
+	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	u16 wqe_counter;
 	u32 cqe_bcnt;
@@ -1097,11 +1133,11 @@ void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	wqe            = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 	cqe_bcnt       = be32_to_cpu(cqe->byte_cnt);
 
-	skb = skb_from_cqe(rq, cqe, wqe_counter, cqe_bcnt);
+	skb = skb_from_cqe(rq, cqe, &xdp, wqe_counter, cqe_bcnt);
 	if (!skb)
 		goto wq_ll_pop;
 
-	mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+	mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb, &xdp);
 	napi_gro_receive(rq->cq.napi, skb);
 
 wq_ll_pop:

^ permalink raw reply related

* Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails
From: Vivien Didelot @ 2017-05-18 15:45 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: moderated list:ETHERNET BRIDGE, kernel, linux-kernel,
	David S. Miller
In-Reply-To: <1211f3cb-c5f5-a9bd-638b-bfcb99fd1bed@cumulusnetworks.com>

Hi Nikolay,

Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:

>> OK good to know. That intention wasn't obvious. I can make __br_mdb_del
>> return void instead? What about the rest of the patchset if I do so?
>
> If you make it return void we will not be able to return proper error value
> when doing a single operation (the else case). About the rest I see only some
> minor style issues, I'll comment on the respective patches. Another minor nit is 
> using switch() instead of if/else for the message types but that is really up to 
> you, I don't mind either way. :-)

Ho OK I understand better the batch vs single delete operation now.
__br_mdb_do hardly makes sense now, because we don't know which case we
are handling... But factorizing br_mdb_do still makes sense. I'll come
up with something.

Thanks,

        Vivien

^ permalink raw reply

* [PATCH net-next] xfrm: Make function xfrm_dev_register static
From: Wei Yongjun @ 2017-05-18 15:51 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Fixes the following sparse warning:

net/xfrm/xfrm_device.c:141:5: warning:
 symbol 'xfrm_dev_register' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 net/xfrm/xfrm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 8ec8a3f..50ec733 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -138,7 +138,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 }
 EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok);
 
-int xfrm_dev_register(struct net_device *dev)
+static int xfrm_dev_register(struct net_device *dev)
 {
 	if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
 		return NOTIFY_BAD;

^ permalink raw reply related

* Re: [SPAM]Re: [PATCH net-next 5/6] net: bridge: get msgtype from nlmsghdr in mdb ops
From: Vivien Didelot @ 2017-05-18 15:53 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: moderated list:ETHERNET BRIDGE, kernel, linux-kernel,
	David S. Miller
In-Reply-To: <ee5d9736-72ec-f4fd-8e9b-2c4179687e16@cumulusnetworks.com>

Hi Nikolay,

Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:

>> +	int msgtype = nlh->nlmsg_type;
>
> minor nits:
> nlmsg_type is a u16, also please keep the order and arrange these from longest 
> to shortest

The reverse christmas tree \o/

Hum, __br_mdb_notify takes an int type, and struct nlmsghdr defines it
as a __u16. Does u16 still make sense here instead of int?


Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Fix possible memory leak
From: Yuval Shaia @ 2017-05-18 15:58 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, Hadar Hen Zion,
	Wei Yongjun, netdev, linux-rdma
In-Reply-To: <20170518153441.24398-1-weiyj.lk@gmail.com>

On Thu, May 18, 2017 at 03:34:41PM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> 'encap_header' is malloced and should be freed before leaving from
> the error handling cases, otherwise it will cause memory leak.
> 
> Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 11c27e4..a72ecbc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv,
>  
>  	if (!(nud_state & NUD_VALID)) {
>  		neigh_event_send(n, NULL);
> -		neigh_release(n);
> -		return -EAGAIN;
> +		err = -EAGAIN;
> +		goto out;
>  	}
>  
>  	err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
> @@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv,
>  
>  	if (!(nud_state & NUD_VALID)) {
>  		neigh_event_send(n, NULL);
> -		neigh_release(n);
> -		return -EAGAIN;
> +		err = -EAGAIN;
> +		goto out;
>  	}

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

>  
>  	err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xfrm: fix state migration replay sequence numbers
From: Richard Guy Briggs @ 2017-05-18 15:55 UTC (permalink / raw)
  To: Antony Antony; +Cc: netdev, Herbert Xu, Steffen Klassert
In-Reply-To: <20170518143953.GA64905@AntonyAntony.local>

On 2017-05-18 16:39, Antony Antony wrote:
> During xfrm migration replay and preplay sequence numbers are not 
> copied from the previous state. 
> 
> Here is tcpdump output showing the problem.
> 10.0.10.46 is running vanilla kernel, IKE/IPsec responder.
> After the migration it sent wrong sequence number, reset to 1.
> The migration is from 10.0.0.52 to 10.0.0.53.
> 
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136
> 
> NOTE: next sequence is wrong 0x1
> 
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136
> 
> The attached patch fix it by copying replay and preplay.
> 
> regards,
> -antony
> 
> Antony Antony (1):
>   xfrm: fix state migration replay sequence numbers
> 
>  net/xfrm/xfrm_state.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> -- 
> 2.9.3
> 

> >From 1241e8b4c38ad2bf7399599165f763af38aba8d9 Mon Sep 17 00:00:00 2001
> From: Antony Antony <antony@phenome.org>
> Date: Thu, 18 May 2017 12:19:32 +0200
> Subject: [PATCH] xfrm: fix state migration copy replay sequence numbers
> To: netdev@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au>, Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Richard Guy Briggs <rgb@tricolour.ca>
> 
> During xfrm migration copy replay and preplay sequence numbers
> from the previous state.
> 
> Signed-off-by: Antony Antony <antony@phenome.org>
> ---
>  net/xfrm/xfrm_state.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index fc3c5aa..2e291bc 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig)
>  	x->curlft.add_time = orig->curlft.add_time;
>  	x->km.state = orig->km.state;
>  	x->km.seq = orig->km.seq;
> +	x->replay = orig->replay;
> +	x->preplay = orig->preplay;
>  
>  	return x;
>  
> -- 
> 2.9.3

This looks reasonable to me.  With a bit more out-of-band information from
Antony and Paul Wouters we have:

https://tools.ietf.org/html/rfc4555#section-3.5

so while it is not explicit about what is to be copied, it only indicates that
the IPsec SA is to be updated with the new address whereas this implementation
creates a new IPsec SA and copies over the values, missing some.

(Note: using "git format-patch --cover-letter --cc ... -o <dir>" and "git
send-email --to ... <dir>" work really well together.)

Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>




	slainte mhath, RGB

^ permalink raw reply

* Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
From: Andrew Lunn @ 2017-05-18 16:09 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Florian Fainelli, Rob Herring, Frank Rowand, Thomas Petazzoni,
	Sergei Shtylyov, netdev, devicetree, linux-renesas-soc,
	linux-kernel
In-Reply-To: <1495112345-24795-1-git-send-email-geert+renesas@glider.be>

On Thu, May 18, 2017 at 02:59:05PM +0200, Geert Uytterhoeven wrote:
> If an Ethernet PHY is initialized before the interrupt controller it is
> connected to, a message like the following is printed:
> 
>     irq: no irq domain found for /interrupt-controller@e61c0000 !
> 
> However, the actual error is ignored, leading to a non-functional (-1)
> PHY interrupt later:
> 
>     Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=-1)
> 
> Depending on whether the PHY driver will fall back to polling, Ethernet
> may or may not work.
> 
> To fix this:
>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>      of_irq_get().
>      Unlike the former, the latter returns -EPROBE_DEFER if the
>      interrupt controller is not yet available, so this condition can be
>      detected.
>      Other errors are handled the same as before, i.e. use the passed
>      mdio->irq[addr] as interrupt.
>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>      of_mdiobus_register_device().
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver.
> I assume it always happened on RZ/G1 in mainline.
> ---
>  drivers/of/of_mdio.c | 39 +++++++++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 *phy_id)
>  	return -EINVAL;
>  }
>  
> -static void of_mdiobus_register_phy(struct mii_bus *mdio,
> +static int of_mdiobus_register_phy(struct mii_bus *mdio,
>  				    struct device_node *child, u32 addr)
>  {
>  	struct phy_device *phy;
> @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>  	else
>  		phy = get_phy_device(mdio, addr, is_c45);
>  	if (IS_ERR(phy))
> -		return;
> +		return PTR_ERR(phy);
>  
> -	rc = irq_of_parse_and_map(child, 0);
> +	rc = of_irq_get(child, 0);
> +	if (rc == -EPROBE_DEFER) {
> +		phy_device_free(phy);
> +		return rc;
> +	}

Maybe this should be consistent. All other places there is an error,
you return it. Here however, you only return the error if it is
EPROBE_DEFER.

	Andrew


>  	if (rc > 0) {
>  		phy->irq = rc;
>  		mdio->irq[addr] = rc;
> @@ -84,22 +88,23 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>  	if (rc) {
>  		phy_device_free(phy);
>  		of_node_put(child);
> -		return;
> +		return rc;
>  	}
>  
>  	dev_dbg(&mdio->dev, "registered phy %s at address %i\n",
>  		child->name, addr);
> +	return 0;
>  }
>  

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox