Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next 10/16] nfp: bpf: add map data structure
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

To be able to split code into reasonable chunks we need to add
the map data structures already.  Later patches will add code
piece by piece.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  7 ++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 18 ++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 684b16cb6a20..2506d1139fa9 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -288,6 +288,8 @@ static int nfp_bpf_init(struct nfp_app *app)
 	bpf->app = app;
 	app->priv = bpf;
 
+	INIT_LIST_HEAD(&bpf->map_list);
+
 	err = nfp_bpf_parse_capabilities(app);
 	if (err)
 		goto err_free_bpf;
@@ -301,7 +303,10 @@ static int nfp_bpf_init(struct nfp_app *app)
 
 static void nfp_bpf_clean(struct nfp_app *app)
 {
-	kfree(app->priv);
+	struct nfp_app_bpf *bpf = app->priv;
+
+	WARN_ON(!list_empty(&bpf->map_list));
+	kfree(bpf);
 }
 
 const struct nfp_app_type app_bpf = {
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index c7d815455cb0..71c58d25858b 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -82,6 +82,8 @@ enum pkt_vec {
  * struct nfp_app_bpf - bpf app priv structure
  * @app:		backpointer to the app
  *
+ * @map_list:		list of offloaded maps
+ *
  * @adjust_head:	adjust head capability
  * @flags:		extra flags for adjust head
  * @off_min:		minimal packet offset within buffer required
@@ -92,6 +94,8 @@ enum pkt_vec {
 struct nfp_app_bpf {
 	struct nfp_app *app;
 
+	struct list_head map_list;
+
 	struct nfp_bpf_cap_adjust_head {
 		u32 flags;
 		int off_min;
@@ -101,6 +105,20 @@ struct nfp_app_bpf {
 	} adjust_head;
 };
 
+/**
+ * struct nfp_bpf_map - private per-map data attached to BPF maps for offload
+ * @offmap:	pointer to the offloaded BPF map
+ * @bpf:	back pointer to bpf app private structure
+ * @tid:	table id identifying map on datapath
+ * @l:		link on the nfp_app_bpf->map_list list
+ */
+struct nfp_bpf_map {
+	struct bpf_offloaded_map *offmap;
+	struct nfp_app_bpf *bpf;
+	u32 tid;
+	struct list_head l;
+};
+
 struct nfp_prog;
 struct nfp_insn_meta;
 typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 11/16] nfp: bpf: add basic control channel communication
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.

Control messages are tagged with a 16 bit ID.  Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile        |   1 +
 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c      | 238 +++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/fw.h        |  22 ++
 drivers/net/ethernet/netronome/nfp/bpf/main.c      |   5 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h      |  23 ++
 drivers/net/ethernet/netronome/nfp/nfp_app.h       |   9 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |  12 ++
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |   7 +
 8 files changed, 317 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 6e5ef984398b..064f00e23a19 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -44,6 +44,7 @@ endif
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
 nfp-objs += \
+	    bpf/cmsg.o \
 	    bpf/main.o \
 	    bpf/offload.o \
 	    bpf/verifier.o \
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
new file mode 100644
index 000000000000..46753ee9f7c5
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/jiffies.h>
+#include <linux/skbuff.h>
+#include <linux/wait.h>
+
+#include "../nfp_app.h"
+#include "../nfp_net.h"
+#include "fw.h"
+#include "main.h"
+
+#define cmsg_warn(bpf, msg...)	nn_dp_warn(&(bpf)->app->ctrl->dp, msg)
+
+#define NFP_BPF_TAG_ALLOC_SPAN	(U16_MAX / 4)
+
+static bool nfp_bpf_all_tags_busy(struct nfp_app_bpf *bpf)
+{
+	u16 used_tags;
+
+	used_tags = bpf->tag_alloc_next - bpf->tag_alloc_last;
+
+	return used_tags > NFP_BPF_TAG_ALLOC_SPAN;
+}
+
+static int nfp_bpf_alloc_tag(struct nfp_app_bpf *bpf)
+{
+	/* All FW communication for BPF is request-reply.  To make sure we
+	 * don't reuse the message ID too early after timeout - limit the
+	 * number of requests in flight.
+	 */
+	if (nfp_bpf_all_tags_busy(bpf)) {
+		cmsg_warn(bpf, "all FW request contexts busy!\n");
+		return -EAGAIN;
+	}
+
+	WARN_ON(__test_and_set_bit(bpf->tag_alloc_next, bpf->tag_allocator));
+	return bpf->tag_alloc_next++;
+}
+
+static void nfp_bpf_free_tag(struct nfp_app_bpf *bpf, u16 tag)
+{
+	WARN_ON(!__test_and_clear_bit(tag, bpf->tag_allocator));
+
+	while (!test_bit(bpf->tag_alloc_last, bpf->tag_allocator) &&
+	       bpf->tag_alloc_last != bpf->tag_alloc_next)
+		bpf->tag_alloc_last++;
+}
+
+static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
+{
+	struct cmsg_hdr *hdr;
+
+	hdr = (struct cmsg_hdr *)skb->data;
+
+	return be16_to_cpu(hdr->tag);
+}
+
+static struct sk_buff *__nfp_bpf_reply(struct nfp_app_bpf *bpf, u16 tag)
+{
+	unsigned int msg_tag;
+	struct sk_buff *skb;
+
+	skb_queue_walk(&bpf->cmsg_replies, skb) {
+		msg_tag = nfp_bpf_cmsg_get_tag(skb);
+		if (msg_tag == tag) {
+			nfp_bpf_free_tag(bpf, tag);
+			__skb_unlink(skb, &bpf->cmsg_replies);
+			return skb;
+		}
+	}
+
+	return NULL;
+}
+
+static struct sk_buff *nfp_bpf_reply(struct nfp_app_bpf *bpf, u16 tag)
+{
+	struct sk_buff *skb;
+
+	nfp_ctrl_lock(bpf->app->ctrl);
+	skb = __nfp_bpf_reply(bpf, tag);
+	nfp_ctrl_unlock(bpf->app->ctrl);
+
+	return skb;
+}
+
+static struct sk_buff *nfp_bpf_reply_drop_tag(struct nfp_app_bpf *bpf, u16 tag)
+{
+	struct sk_buff *skb;
+
+	nfp_ctrl_lock(bpf->app->ctrl);
+	skb = __nfp_bpf_reply(bpf, tag);
+	if (!skb)
+		nfp_bpf_free_tag(bpf, tag);
+	nfp_ctrl_unlock(bpf->app->ctrl);
+
+	return skb;
+}
+
+static struct sk_buff *
+nfp_bpf_cmsg_wait_reply(struct nfp_app_bpf *bpf, enum nfp_bpf_cmsg_type type,
+			int tag)
+{
+	struct sk_buff *skb;
+	int err;
+
+	err = wait_event_interruptible_timeout(bpf->cmsg_wq,
+					       skb = nfp_bpf_reply(bpf, tag),
+					       msecs_to_jiffies(5000));
+	/* We didn't get a response - try last time and atomically drop
+	 * the tag even if no response is matched.
+	 */
+	if (!skb)
+		skb = nfp_bpf_reply_drop_tag(bpf, tag);
+	if (err < 0) {
+		cmsg_warn(bpf, "%s waiting for response to 0x%02x: %d\n",
+			  err == ERESTARTSYS ? "interrupted" : "error",
+			  type, err);
+		return ERR_PTR(err);
+	}
+	if (!skb) {
+		cmsg_warn(bpf, "timeout waiting for response to 0x%02x\n",
+			  type);
+		return ERR_PTR(-ETIMEDOUT);
+	}
+
+	return skb;
+}
+
+struct sk_buff *
+nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
+			 enum nfp_bpf_cmsg_type type, unsigned int reply_size)
+{
+	struct cmsg_hdr *hdr;
+	int tag;
+
+	nfp_ctrl_lock(bpf->app->ctrl);
+	tag = nfp_bpf_alloc_tag(bpf);
+	if (tag < 0) {
+		nfp_ctrl_unlock(bpf->app->ctrl);
+		dev_kfree_skb_any(skb);
+		return ERR_PTR(tag);
+	}
+
+	hdr = (void *)skb->data;
+	hdr->ver = CMSG_MAP_ABI_VERSION;
+	hdr->type = type;
+	hdr->tag = cpu_to_be16(tag);
+
+	__nfp_app_ctrl_tx(bpf->app, skb);
+
+	nfp_ctrl_unlock(bpf->app->ctrl);
+
+	skb = nfp_bpf_cmsg_wait_reply(bpf, type, tag);
+	if (IS_ERR(skb))
+		return skb;
+
+	hdr = (struct cmsg_hdr *)skb->data;
+	/* 0 reply_size means caller will do the validation */
+	if (reply_size && skb->len != reply_size) {
+		cmsg_warn(bpf, "cmsg drop - wrong size %d != %d!\n",
+			  skb->len, reply_size);
+		goto err_free;
+	}
+	if (hdr->type != __CMSG_REPLY(type)) {
+		cmsg_warn(bpf, "cmsg drop - wrong type 0x%02x != 0x%02lx!\n",
+			  hdr->type, __CMSG_REPLY(type));
+		goto err_free;
+	}
+
+	return skb;
+err_free:
+	dev_kfree_skb_any(skb);
+	return ERR_PTR(-EIO);
+}
+
+void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
+{
+	struct nfp_app_bpf *bpf = app->priv;
+	unsigned int tag;
+
+	if (unlikely(skb->len < sizeof(struct cmsg_reply_map_simple))) {
+		cmsg_warn(bpf, "cmsg drop - too short %d!\n", skb->len);
+		goto err_free;
+	}
+
+	nfp_ctrl_lock(bpf->app->ctrl);
+
+	tag = nfp_bpf_cmsg_get_tag(skb);
+	if (unlikely(!test_bit(tag, bpf->tag_allocator))) {
+		cmsg_warn(bpf, "cmsg drop - no one is waiting for tag %u!\n",
+			  tag);
+		goto err_unlock;
+	}
+
+	__skb_queue_tail(&bpf->cmsg_replies, skb);
+	wake_up_interruptible_all(&bpf->cmsg_wq);
+
+	nfp_ctrl_unlock(bpf->app->ctrl);
+
+	return;
+err_unlock:
+	nfp_ctrl_unlock(bpf->app->ctrl);
+err_free:
+	dev_kfree_skb_any(skb);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 7206aa1522db..107676b34760 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -51,4 +51,26 @@ struct nfp_bpf_cap_tlv_adjust_head {
 
 #define NFP_BPF_ADJUST_HEAD_NO_META	BIT(0)
 
+/*
+ * Types defined for map related control messages
+ */
+#define CMSG_MAP_ABI_VERSION		1
+
+enum nfp_bpf_cmsg_type {
+	__CMSG_TYPE_MAP_MAX,
+};
+
+#define CMSG_TYPE_MAP_REPLY_BIT		7
+#define __CMSG_REPLY(req)		(BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))
+
+struct cmsg_hdr {
+	u8 type;
+	u8 ver;
+	__be16 tag;
+};
+
+struct cmsg_reply_map_simple {
+	struct cmsg_hdr hdr;
+	__be32 rc;
+};
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 2506d1139fa9..74d42bdcb905 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -288,6 +288,8 @@ static int nfp_bpf_init(struct nfp_app *app)
 	bpf->app = app;
 	app->priv = bpf;
 
+	skb_queue_head_init(&bpf->cmsg_replies);
+	init_waitqueue_head(&bpf->cmsg_wq);
 	INIT_LIST_HEAD(&bpf->map_list);
 
 	err = nfp_bpf_parse_capabilities(app);
@@ -305,6 +307,7 @@ static void nfp_bpf_clean(struct nfp_app *app)
 {
 	struct nfp_app_bpf *bpf = app->priv;
 
+	WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
 	WARN_ON(!list_empty(&bpf->map_list));
 	kfree(bpf);
 }
@@ -321,6 +324,8 @@ const struct nfp_app_type app_bpf = {
 	.vnic_alloc	= nfp_bpf_vnic_alloc,
 	.vnic_free	= nfp_bpf_vnic_free,
 
+	.ctrl_msg_rx	= nfp_bpf_ctrl_msg_rx,
+
 	.setup_tc	= nfp_bpf_setup_tc,
 	.tc_busy	= nfp_bpf_tc_busy,
 	.bpf		= nfp_ndo_bpf,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 71c58d25858b..fd9987d6a374 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -37,10 +37,14 @@
 #include <linux/bitfield.h>
 #include <linux/bpf.h>
 #include <linux/bpf_verifier.h>
+#include <linux/kernel.h>
 #include <linux/list.h>
+#include <linux/skbuff.h>
 #include <linux/types.h>
+#include <linux/wait.h>
 
 #include "../nfp_asm.h"
+#include "fw.h"
 
 /* For branch fixup logic use up-most byte of branch instruction as scratch
  * area.  Remember to clear this before sending instructions to HW!
@@ -82,6 +86,13 @@ enum pkt_vec {
  * struct nfp_app_bpf - bpf app priv structure
  * @app:		backpointer to the app
  *
+ * @tag_allocator:	bitmap of control message tags in use
+ * @tag_alloc_next:	next tag bit to allocate
+ * @tag_alloc_last:	next tag bit to be freed
+ *
+ * @cmsg_replies:	received cmsg replies waiting to be consumed
+ * @cmsg_wq:		work queue for waiting for cmsg replies
+ *
  * @map_list:		list of offloaded maps
  *
  * @adjust_head:	adjust head capability
@@ -94,6 +105,13 @@ enum pkt_vec {
 struct nfp_app_bpf {
 	struct nfp_app *app;
 
+	DECLARE_BITMAP(tag_allocator, U16_MAX + 1);
+	u16 tag_alloc_next;
+	u16 tag_alloc_last;
+
+	struct sk_buff_head cmsg_replies;
+	struct wait_queue_head cmsg_wq;
+
 	struct list_head map_list;
 
 	struct nfp_bpf_cap_adjust_head {
@@ -270,4 +288,9 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog *prog,
 struct nfp_insn_meta *
 nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		  unsigned int insn_idx, unsigned int n_insns);
+
+struct sk_buff *
+nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
+			 enum nfp_bpf_cmsg_type type, unsigned int reply_size);
+void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index ee67a7202819..9dd4158bf89e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -160,6 +160,7 @@ struct nfp_app {
 	void *priv;
 };
 
+bool __nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb);
 bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb);
 
 static inline int nfp_app_init(struct nfp_app *app)
@@ -313,6 +314,14 @@ static inline int nfp_app_xdp_offload(struct nfp_app *app, struct nfp_net *nn,
 	return app->type->xdp_offload(app, nn, prog);
 }
 
+static inline bool __nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb)
+{
+	trace_devlink_hwmsg(priv_to_devlink(app->pf), false, 0,
+			    skb->data, skb->len);
+
+	return __nfp_ctrl_tx(app->ctrl, skb);
+}
+
 static inline bool nfp_app_ctrl_tx(struct nfp_app *app, struct sk_buff *skb)
 {
 	trace_devlink_hwmsg(priv_to_devlink(app->pf), false, 0,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 3801c52098d5..3d61bab0ff2f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -836,6 +836,18 @@ static inline const char *nfp_net_name(struct nfp_net *nn)
 	return nn->dp.netdev ? nn->dp.netdev->name : "ctrl";
 }
 
+static inline void nfp_ctrl_lock(struct nfp_net *nn)
+	__acquires(&nn->r_vecs[0].lock)
+{
+	spin_lock_bh(&nn->r_vecs[0].lock);
+}
+
+static inline void nfp_ctrl_unlock(struct nfp_net *nn)
+	__releases(&nn->r_vecs[0].lock)
+{
+	spin_unlock_bh(&nn->r_vecs[0].lock);
+}
+
 /* Globals */
 extern const char nfp_driver_version[];
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 773089442b64..7972e6dd9547 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1917,6 +1917,13 @@ nfp_ctrl_tx_one(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
 	return false;
 }
 
+bool __nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb)
+{
+	struct nfp_net_r_vector *r_vec = &nn->r_vecs[0];
+
+	return nfp_ctrl_tx_one(nn, r_vec, skb, false);
+}
+
 bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb)
 {
 	struct nfp_net_r_vector *r_vec = &nn->r_vecs[0];
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 12/16] nfp: bpf: implement helpers for FW map ops
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

Implement calls for FW map communication.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 210 +++++++++++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/fw.h   |  65 ++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  17 ++-
 3 files changed, 288 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index 46753ee9f7c5..71e6586acc36 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -31,6 +31,7 @@
  * SOFTWARE.
  */
 
+#include <linux/bpf.h>
 #include <linux/bitops.h>
 #include <linux/bug.h>
 #include <linux/jiffies.h>
@@ -79,6 +80,28 @@ static void nfp_bpf_free_tag(struct nfp_app_bpf *bpf, u16 tag)
 		bpf->tag_alloc_last++;
 }
 
+static struct sk_buff *
+nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size)
+{
+	struct sk_buff *skb;
+
+	skb = nfp_app_ctrl_msg_alloc(bpf->app, size, GFP_KERNEL);
+	skb_put(skb, size);
+
+	return skb;
+}
+
+static struct sk_buff *
+nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
+{
+	unsigned int size;
+
+	size = sizeof(struct cmsg_req_map_op);
+	size += sizeof(struct cmsg_key_value_pair) * n;
+
+	return nfp_bpf_cmsg_alloc(bpf, size);
+}
+
 static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
 {
 	struct cmsg_hdr *hdr;
@@ -159,7 +182,7 @@ nfp_bpf_cmsg_wait_reply(struct nfp_app_bpf *bpf, enum nfp_bpf_cmsg_type type,
 	return skb;
 }
 
-struct sk_buff *
+static struct sk_buff *
 nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
 			 enum nfp_bpf_cmsg_type type, unsigned int reply_size)
 {
@@ -206,6 +229,191 @@ nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
 	return ERR_PTR(-EIO);
 }
 
+static int
+nfp_bpf_ctrl_rc_to_errno(struct nfp_app_bpf *bpf,
+			 struct cmsg_reply_map_simple *reply)
+{
+	static const int res_table[] = {
+		[CMSG_RC_SUCCESS]	= 0,
+		[CMSG_RC_ERR_MAP_FD]	= -EBADFD,
+		[CMSG_RC_ERR_MAP_NOENT]	= -ENOENT,
+		[CMSG_RC_ERR_MAP_ERR]	= -EINVAL,
+		[CMSG_RC_ERR_MAP_PARSE]	= -EIO,
+		[CMSG_RC_ERR_MAP_EXIST]	= -EEXIST,
+		[CMSG_RC_ERR_MAP_NOMEM]	= -ENOMEM,
+		[CMSG_RC_ERR_MAP_E2BIG]	= -E2BIG,
+	};
+	u32 rc;
+
+	rc = be32_to_cpu(reply->rc);
+	if (rc >= ARRAY_SIZE(res_table)) {
+		cmsg_warn(bpf, "FW responded with invalid status: %u\n", rc);
+		return -EIO;
+	}
+
+	return res_table[rc];
+}
+
+long long int
+nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map)
+{
+	struct cmsg_reply_map_alloc_tbl *reply;
+	struct cmsg_req_map_alloc_tbl *req;
+	struct sk_buff *skb;
+	u32 tid;
+	int err;
+
+	skb = nfp_bpf_cmsg_alloc(bpf, sizeof(*req));
+	if (!skb)
+		return -ENOMEM;
+
+	req = (void *)skb->data;
+	req->key_size = cpu_to_be32(map->key_size);
+	req->value_size = cpu_to_be32(map->value_size);
+	req->max_entries = cpu_to_be32(map->max_entries);
+	req->map_type = cpu_to_be32(map->map_type);
+	req->map_flags = 0;
+
+	skb = nfp_bpf_cmsg_communicate(bpf, skb, CMSG_TYPE_MAP_ALLOC,
+				       sizeof(*reply));
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
+	reply = (void *)skb->data;
+	err = nfp_bpf_ctrl_rc_to_errno(bpf, &reply->reply_hdr);
+	if (err)
+		goto err_free;
+
+	tid = be32_to_cpu(reply->tid);
+	dev_consume_skb_any(skb);
+
+	return tid;
+err_free:
+	dev_kfree_skb_any(skb);
+	return err;
+}
+
+void nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map)
+{
+	struct cmsg_reply_map_free_tbl *reply;
+	struct cmsg_req_map_free_tbl *req;
+	struct sk_buff *skb;
+	int err;
+
+	skb = nfp_bpf_cmsg_alloc(bpf, sizeof(*req));
+	if (!skb) {
+		cmsg_warn(bpf, "leaking map - failed to allocate msg\n");
+		return;
+	}
+
+	req = (void *)skb->data;
+	req->tid = cpu_to_be32(nfp_map->tid);
+
+	skb = nfp_bpf_cmsg_communicate(bpf, skb, CMSG_TYPE_MAP_FREE,
+				       sizeof(*reply));
+	if (IS_ERR(skb)) {
+		cmsg_warn(bpf, "leaking map - I/O error\n");
+		return;
+	}
+
+	reply = (void *)skb->data;
+	err = nfp_bpf_ctrl_rc_to_errno(bpf, &reply->reply_hdr);
+	if (err)
+		cmsg_warn(bpf, "leaking map - FW responded with: %d\n", err);
+
+	dev_consume_skb_any(skb);
+}
+
+static int
+nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
+		      enum nfp_bpf_cmsg_type op,
+		      u8 *key, u8 *value, u64 flags, u8 *out_key, u8 *out_value)
+{
+	struct nfp_bpf_map *nfp_map = offmap->dev_priv;
+	struct nfp_app_bpf *bpf = nfp_map->bpf;
+	struct bpf_map *map = &offmap->map;
+	struct cmsg_reply_map_op *reply;
+	struct cmsg_req_map_op *req;
+	struct sk_buff *skb;
+	int err;
+
+	/* FW messages have no space for more than 32 bits of flags */
+	if (flags >> 32)
+		return -EOPNOTSUPP;
+
+	skb = nfp_bpf_cmsg_map_req_alloc(bpf, 1);
+	if (!skb)
+		return -ENOMEM;
+
+	req = (void *)skb->data;
+	req->tid = cpu_to_be32(nfp_map->tid);
+	req->count = cpu_to_be32(1);
+	req->flags = cpu_to_be32(flags);
+
+	/* Copy inputs */
+	if (key)
+		memcpy(&req->elem[0].key, key, map->key_size);
+	if (value)
+		memcpy(&req->elem[0].value, value, map->value_size);
+
+	skb = nfp_bpf_cmsg_communicate(bpf, skb, op,
+				       sizeof(*reply) + sizeof(*reply->elem));
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
+	reply = (void *)skb->data;
+	err = nfp_bpf_ctrl_rc_to_errno(bpf, &reply->reply_hdr);
+	if (err)
+		goto err_free;
+
+	/* Copy outputs */
+	if (out_key)
+		memcpy(out_key, &reply->elem[0].key, map->key_size);
+	if (out_value)
+		memcpy(out_value, &reply->elem[0].value, map->value_size);
+
+	dev_consume_skb_any(skb);
+
+	return 0;
+err_free:
+	dev_kfree_skb_any(skb);
+	return err;
+}
+
+int nfp_bpf_ctrl_update_entry(struct bpf_offloaded_map *offmap,
+			      void *key, void *value, u64 flags)
+{
+	return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_UPDATE,
+				     key, value, flags, NULL, NULL);
+}
+
+int nfp_bpf_ctrl_del_entry(struct bpf_offloaded_map *offmap, void *key)
+{
+	return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_DELETE,
+				     key, NULL, 0, NULL, NULL);
+}
+
+int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map *offmap,
+			      void *key, void *value)
+{
+	return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_LOOKUP,
+				     key, NULL, 0, NULL, value);
+}
+
+int nfp_bpf_ctrl_getfirst_entry(struct bpf_offloaded_map *offmap,
+				void *next_key)
+{
+	return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_GETFIRST,
+				     NULL, NULL, 0, next_key, NULL);
+}
+
+int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
+			       void *key, void *next_key)
+{
+	return nfp_bpf_ctrl_entry_op(offmap, CMSG_TYPE_MAP_GETNEXT,
+				     key, NULL, 0, next_key, NULL);
+}
+
 void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
 {
 	struct nfp_app_bpf *bpf = app->priv;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 107676b34760..e0ff68fc9562 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -57,12 +57,33 @@ struct nfp_bpf_cap_tlv_adjust_head {
 #define CMSG_MAP_ABI_VERSION		1
 
 enum nfp_bpf_cmsg_type {
+	CMSG_TYPE_MAP_ALLOC	= 1,
+	CMSG_TYPE_MAP_FREE	= 2,
+	CMSG_TYPE_MAP_LOOKUP	= 3,
+	CMSG_TYPE_MAP_UPDATE	= 4,
+	CMSG_TYPE_MAP_DELETE	= 5,
+	CMSG_TYPE_MAP_GETNEXT	= 6,
+	CMSG_TYPE_MAP_GETFIRST	= 7,
 	__CMSG_TYPE_MAP_MAX,
 };
 
 #define CMSG_TYPE_MAP_REPLY_BIT		7
 #define __CMSG_REPLY(req)		(BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))
 
+#define CMSG_MAP_KEY_LW			16
+#define CMSG_MAP_VALUE_LW		16
+
+enum nfp_bpf_cmsg_status {
+	CMSG_RC_SUCCESS			= 0,
+	CMSG_RC_ERR_MAP_FD		= 1,
+	CMSG_RC_ERR_MAP_NOENT		= 2,
+	CMSG_RC_ERR_MAP_ERR		= 3,
+	CMSG_RC_ERR_MAP_PARSE		= 4,
+	CMSG_RC_ERR_MAP_EXIST		= 5,
+	CMSG_RC_ERR_MAP_NOMEM		= 6,
+	CMSG_RC_ERR_MAP_E2BIG		= 7,
+};
+
 struct cmsg_hdr {
 	u8 type;
 	u8 ver;
@@ -73,4 +94,48 @@ struct cmsg_reply_map_simple {
 	struct cmsg_hdr hdr;
 	__be32 rc;
 };
+
+struct cmsg_req_map_alloc_tbl {
+	struct cmsg_hdr hdr;
+	__be32 key_size;		/* in bytes */
+	__be32 value_size;		/* in bytes */
+	__be32 max_entries;
+	__be32 map_type;
+	__be32 map_flags;		/* reserved */
+};
+
+struct cmsg_reply_map_alloc_tbl {
+	struct cmsg_reply_map_simple reply_hdr;
+	__be32 tid;
+};
+
+struct cmsg_req_map_free_tbl {
+	struct cmsg_hdr hdr;
+	__be32 tid;
+};
+
+struct cmsg_reply_map_free_tbl {
+	struct cmsg_reply_map_simple reply_hdr;
+	__be32 count;
+};
+
+struct cmsg_key_value_pair {
+	__be32 key[CMSG_MAP_KEY_LW];
+	__be32 value[CMSG_MAP_VALUE_LW];
+};
+
+struct cmsg_req_map_op {
+	struct cmsg_hdr hdr;
+	__be32 tid;
+	__be32 count;
+	__be32 flags;
+	struct cmsg_key_value_pair elem[0];
+};
+
+struct cmsg_reply_map_op {
+	struct cmsg_reply_map_simple reply_hdr;
+	__be32 count;
+	__be32 resv;
+	struct cmsg_key_value_pair elem[0];
+};
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index fd9987d6a374..51b8c30ab5db 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -289,8 +289,19 @@ struct nfp_insn_meta *
 nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		  unsigned int insn_idx, unsigned int n_insns);
 
-struct sk_buff *
-nfp_bpf_cmsg_communicate(struct nfp_app_bpf *bpf, struct sk_buff *skb,
-			 enum nfp_bpf_cmsg_type type, unsigned int reply_size);
+long long int
+nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map);
+void
+nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map);
+int nfp_bpf_ctrl_getfirst_entry(struct bpf_offloaded_map *offmap,
+				void *next_key);
+int nfp_bpf_ctrl_update_entry(struct bpf_offloaded_map *offmap,
+			      void *key, void *value, u64 flags);
+int nfp_bpf_ctrl_del_entry(struct bpf_offloaded_map *offmap, void *key);
+int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map *offmap,
+			      void *key, void *value);
+int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
+			       void *key, void *next_key);
+
 void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
 #endif
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 13/16] nfp: bpf: parse function call and map capabilities
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

Parse helper function and supported map FW TLV capabilities.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/fw.h   | 16 +++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 47 +++++++++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 24 ++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index e0ff68fc9562..cfcc7bcb2c67 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -38,7 +38,14 @@
 #include <linux/types.h>
 
 enum bpf_cap_tlv_type {
+	NFP_BPF_CAP_TYPE_FUNC		= 1,
 	NFP_BPF_CAP_TYPE_ADJUST_HEAD	= 2,
+	NFP_BPF_CAP_TYPE_MAPS		= 3,
+};
+
+struct nfp_bpf_cap_tlv_func {
+	__le32 func_id;
+	__le32 func_addr;
 };
 
 struct nfp_bpf_cap_tlv_adjust_head {
@@ -51,6 +58,15 @@ struct nfp_bpf_cap_tlv_adjust_head {
 
 #define NFP_BPF_ADJUST_HEAD_NO_META	BIT(0)
 
+struct nfp_bpf_cap_tlv_maps {
+	__le32 types;
+	__le32 max_maps;
+	__le32 max_elems;
+	__le32 max_key_sz;
+	__le32 max_val_sz;
+	__le32 max_elem_sz;
+};
+
 /*
  * Types defined for map related control messages
  */
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 74d42bdcb905..251cd198e710 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -226,6 +226,45 @@ nfp_bpf_parse_cap_adjust_head(struct nfp_app_bpf *bpf, void __iomem *value,
 	return 0;
 }
 
+static int
+nfp_bpf_parse_cap_func(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
+{
+	struct nfp_bpf_cap_tlv_func __iomem *cap = value;
+
+	if (length < sizeof(*cap)) {
+		nfp_err(bpf->app->cpp, "truncated function TLV: %d\n", length);
+		return -EINVAL;
+	}
+
+	switch (readl(&cap->func_id)) {
+	case BPF_FUNC_map_lookup_elem:
+		bpf->helpers.map_lookup = readl(&cap->func_addr);
+		break;
+	}
+
+	return 0;
+}
+
+static int
+nfp_bpf_parse_cap_maps(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
+{
+	struct nfp_bpf_cap_tlv_maps __iomem *cap = value;
+
+	if (length < sizeof(*cap)) {
+		nfp_err(bpf->app->cpp, "truncated maps TLV: %d\n", length);
+		return -EINVAL;
+	}
+
+	bpf->maps.types = readl(&cap->types);
+	bpf->maps.max_maps = readl(&cap->max_maps);
+	bpf->maps.max_elems = readl(&cap->max_elems);
+	bpf->maps.max_key_sz = readl(&cap->max_key_sz);
+	bpf->maps.max_val_sz = readl(&cap->max_val_sz);
+	bpf->maps.max_elem_sz = readl(&cap->max_elem_sz);
+
+	return 0;
+}
+
 static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 {
 	struct nfp_cpp *cpp = app->pf->cpp;
@@ -251,11 +290,19 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 			goto err_release_free;
 
 		switch (type) {
+		case NFP_BPF_CAP_TYPE_FUNC:
+			if (nfp_bpf_parse_cap_func(app->priv, value, length))
+				goto err_release_free;
+			break;
 		case NFP_BPF_CAP_TYPE_ADJUST_HEAD:
 			if (nfp_bpf_parse_cap_adjust_head(app->priv, value,
 							  length))
 				goto err_release_free;
 			break;
+		case NFP_BPF_CAP_TYPE_MAPS:
+			if (nfp_bpf_parse_cap_maps(app->priv, value, length))
+				goto err_release_free;
+			break;
 		default:
 			nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
 			break;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 51b8c30ab5db..9dd76bf248b1 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -101,6 +101,17 @@ enum pkt_vec {
  * @off_max:		maximum packet offset within buffer required
  * @guaranteed_sub:	amount of negative adjustment guaranteed possible
  * @guaranteed_add:	amount of positive adjustment guaranteed possible
+ *
+ * @maps:		map capability
+ * @types:		supported map types
+ * @max_maps:		max number of maps supported
+ * @max_elems:		max number of entries in each map
+ * @max_key_sz:		max size of map key
+ * @max_val_sz:		max size of map value
+ * @max_elem_sz:	max size of map entry (key + value)
+ *
+ * @helpers:		helper addressess for various calls
+ * @map_lookup:		map lookup helper address
  */
 struct nfp_app_bpf {
 	struct nfp_app *app;
@@ -121,6 +132,19 @@ struct nfp_app_bpf {
 		int guaranteed_sub;
 		int guaranteed_add;
 	} adjust_head;
+
+	struct {
+		u32 types;
+		u32 max_maps;
+		u32 max_elems;
+		u32 max_key_sz;
+		u32 max_val_sz;
+		u32 max_elem_sz;
+	} maps;
+
+	struct {
+		u32 map_lookup;
+	} helpers;
 };
 
 /**
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 14/16] nfp: bpf: add verification and codegen for map lookups
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

Verify our current constraints on the location of the key
are met and generate the code for calling map lookup on
the datapath.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 51 +++++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h     | 12 +++++-
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 39 +++++++++++++++++
 3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 0de59f04da84..a037c0bb4185 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1289,6 +1289,55 @@ static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return 0;
 }
 
+static int
+map_lookup_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	struct bpf_offloaded_map *offmap;
+	struct nfp_bpf_map *nfp_map;
+	bool load_lm_ptr;
+	u32 ret_tgt;
+	s64 lm_off;
+	swreg tid;
+
+	offmap = (struct bpf_offloaded_map *)meta->arg1.map_ptr;
+	nfp_map = offmap->dev_priv;
+
+	/* We only have to reload LM0 if the key is not at start of stack */
+	lm_off = nfp_prog->stack_depth;
+	lm_off += meta->arg2.var_off.value + meta->arg2.off;
+	load_lm_ptr = meta->arg2_var_off || lm_off;
+
+	/* Set LM0 to start of key */
+	if (load_lm_ptr)
+		emit_csr_wr(nfp_prog, reg_b(2 * 2), NFP_CSR_ACT_LM_ADDR0);
+
+	/* Load map ID into a register, it should actually fit as an immediate
+	 * but in case it doesn't deal with it here, not in the delay slots.
+	 */
+	tid = ur_load_imm_any(nfp_prog, nfp_map->tid, imm_a(nfp_prog));
+
+	emit_br(nfp_prog, BR_UNC, nfp_prog->bpf->helpers.map_lookup, 2);
+	ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
+
+	/* Load map ID into A0 */
+	wrp_mov(nfp_prog, reg_a(0), tid);
+
+	/* Load the return address into B0 */
+	wrp_immed(nfp_prog, reg_b(0), nfp_prog_current_offset(nfp_prog) + 1);
+
+	if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
+		return -EINVAL;
+
+	/* Reset the LM0 pointer */
+	if (!load_lm_ptr)
+		return 0;
+
+	emit_csr_wr(nfp_prog, stack_reg(nfp_prog),  NFP_CSR_ACT_LM_ADDR0);
+	wrp_nops(nfp_prog, 3);
+
+	return 0;
+}
+
 /* --- Callbacks --- */
 static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
@@ -2028,6 +2077,8 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	switch (meta->insn.imm) {
 	case BPF_FUNC_xdp_adjust_head:
 		return adjust_head(nfp_prog, meta);
+	case BPF_FUNC_map_lookup_elem:
+		return map_lookup_stack(nfp_prog, meta);
 	default:
 		WARN_ONCE(1, "verifier allowed unsupported function\n");
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 9dd76bf248b1..a3b428e1181d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -180,9 +180,12 @@ typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
  * @ptr: pointer type for memory operations
  * @ldst_gather_len: memcpy length gathered from load/store sequence
  * @paired_st: the paired store insn at the head of the sequence
- * @arg2: arg2 for call instructions
  * @ptr_not_const: pointer is not always constant
  * @jmp_dst: destination info for jump instructions
+ * @func_id: function id for call instructions
+ * @arg1: arg1 for call instructions
+ * @arg2: arg2 for call instructions
+ * @arg2_var_off: arg2 changes stack offset on different paths
  * @off: index of first generated machine instruction (in nfp_prog.prog)
  * @n: eBPF instruction number
  * @flags: eBPF instruction extra optimization flags
@@ -200,7 +203,12 @@ struct nfp_insn_meta {
 			bool ptr_not_const;
 		};
 		struct nfp_insn_meta *jmp_dst;
-		struct bpf_reg_state arg2;
+		struct {
+			u32 func_id;
+			struct bpf_reg_state arg1;
+			struct bpf_reg_state arg2;
+			bool arg2_var_off;
+		};
 	};
 	unsigned int off;
 	unsigned short n;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index d8870c2f11f3..589f56f9ea96 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -109,9 +109,11 @@ static int
 nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
 		   struct nfp_insn_meta *meta)
 {
+	const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1;
 	const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2;
 	struct nfp_app_bpf *bpf = nfp_prog->bpf;
 	u32 func_id = meta->insn.imm;
+	s64 off, old_off;
 
 	switch (func_id) {
 	case BPF_FUNC_xdp_adjust_head:
@@ -126,11 +128,48 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
 
 		nfp_record_adjust_head(bpf, nfp_prog, meta, reg2);
 		break;
+
+	case BPF_FUNC_map_lookup_elem:
+		if (!bpf->helpers.map_lookup) {
+			pr_info("map_lookup: not supported by FW\n");
+			return -EOPNOTSUPP;
+		}
+		if (reg2->type != PTR_TO_STACK) {
+			pr_info("map_lookup: unsupported key ptr type %d\n",
+				reg2->type);
+			return -EOPNOTSUPP;
+		}
+		if (!tnum_is_const(reg2->var_off)) {
+			pr_info("map_lookup: variable key pointer\n");
+			return -EOPNOTSUPP;
+		}
+
+		off = reg2->var_off.value + reg2->off;
+		if (-off % 4) {
+			pr_info("map_lookup: unaligned stack pointer %lld\n",
+				-off);
+			return -EOPNOTSUPP;
+		}
+
+		/* Rest of the checks is only if we re-parse the same insn */
+		if (!meta->func_id)
+			break;
+
+		old_off = meta->arg2.var_off.value + meta->arg2.off;
+		meta->arg2_var_off |= off != old_off;
+
+		if (meta->arg1.map_ptr != reg1->map_ptr) {
+			pr_info("map_lookup: called for different map\n");
+			return -EOPNOTSUPP;
+		}
+		break;
 	default:
 		pr_warn("unsupported function id: %d\n", func_id);
 		return -EOPNOTSUPP;
 	}
 
+	meta->func_id = func_id;
+	meta->arg1 = *reg1;
 	meta->arg2 = *reg2;
 
 	return 0;
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 15/16] nfp: bpf: add support for reading map memory
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

Map memory needs to use 40 bit addressing.  Add handling of such
accesses.  Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 77 ++++++++++++++++++++---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  8 +++
 2 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index a037c0bb4185..f1632c80eb3e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -548,27 +548,51 @@ wrp_reg_subpart(struct nfp_prog *nfp_prog, swreg dst, swreg src, u8 field_len,
 	emit_ld_field_any(nfp_prog, dst, mask, src, sc, offset * 8, true);
 }
 
+static void
+addr40_offset(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
+	      swreg *rega, swreg *regb)
+{
+	if (offset == reg_imm(0)) {
+		*rega = reg_a(src_gpr);
+		*regb = reg_b(src_gpr + 1);
+		return;
+	}
+
+	emit_alu(nfp_prog, imm_a(nfp_prog), reg_a(src_gpr), ALU_OP_ADD, offset);
+	emit_alu(nfp_prog, imm_b(nfp_prog), reg_b(src_gpr + 1), ALU_OP_ADD_C,
+		 reg_imm(0));
+	*rega = imm_a(nfp_prog);
+	*regb = imm_b(nfp_prog);
+}
+
 /* NFP has Command Push Pull bus which supports bluk memory operations. */
 static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	bool descending_seq = meta->ldst_gather_len < 0;
 	s16 len = abs(meta->ldst_gather_len);
 	swreg src_base, off;
+	bool src_40bit_addr;
 	unsigned int i;
 	u8 xfer_num;
 
 	off = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
+	src_40bit_addr = meta->ptr.type == PTR_TO_MAP_VALUE;
 	src_base = reg_a(meta->insn.src_reg * 2);
 	xfer_num = round_up(len, 4) / 4;
 
+	if (src_40bit_addr)
+		addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
+			      &off);
+
 	/* Setup PREV_ALU fields to override memory read length. */
 	if (len > 32)
 		wrp_immed(nfp_prog, reg_none(),
 			  CMD_OVE_LEN | FIELD_PREP(CMD_OV_LEN, xfer_num - 1));
 
 	/* Memory read from source addr into transfer-in registers. */
-	emit_cmd_any(nfp_prog, CMD_TGT_READ32_SWAP, CMD_MODE_32b, 0, src_base,
-		     off, xfer_num - 1, true, len > 32);
+	emit_cmd_any(nfp_prog, CMD_TGT_READ32_SWAP,
+		     src_40bit_addr ? CMD_MODE_40b_BA : CMD_MODE_32b, 0,
+		     src_base, off, xfer_num - 1, true, len > 32);
 
 	/* Move from transfer-in to transfer-out. */
 	for (i = 0; i < xfer_num; i++)
@@ -706,20 +730,20 @@ data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size)
 }
 
 static int
-data_ld_host_order(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
-		   u8 dst_gpr, int size)
+data_ld_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr,
+		   swreg lreg, swreg rreg, int size, enum cmd_mode mode)
 {
 	unsigned int i;
 	u8 mask, sz;
 
-	/* We load the value from the address indicated in @offset and then
+	/* We load the value from the address indicated in rreg + lreg and then
 	 * mask out the data we don't need.  Note: this is little endian!
 	 */
 	sz = max(size, 4);
 	mask = size < 4 ? GENMASK(size - 1, 0) : 0;
 
-	emit_cmd(nfp_prog, CMD_TGT_READ32_SWAP, CMD_MODE_32b, 0,
-		 reg_a(src_gpr), offset, sz / 4 - 1, true);
+	emit_cmd(nfp_prog, CMD_TGT_READ32_SWAP, mode, 0,
+		 lreg, rreg, sz / 4 - 1, true);
 
 	i = 0;
 	if (mask)
@@ -735,6 +759,26 @@ data_ld_host_order(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
 	return 0;
 }
 
+static int
+data_ld_host_order_addr32(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
+			  u8 dst_gpr, u8 size)
+{
+	return data_ld_host_order(nfp_prog, dst_gpr, reg_a(src_gpr), offset,
+				  size, CMD_MODE_32b);
+}
+
+static int
+data_ld_host_order_addr40(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
+			  u8 dst_gpr, u8 size)
+{
+	swreg rega, regb;
+
+	addr40_offset(nfp_prog, src_gpr, offset, &rega, &regb);
+
+	return data_ld_host_order(nfp_prog, dst_gpr, rega, regb,
+				  size, CMD_MODE_40b_BA);
+}
+
 static int
 construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset, u16 src, u8 size)
 {
@@ -1772,8 +1816,20 @@ mem_ldx_data(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
 
-	return data_ld_host_order(nfp_prog, meta->insn.src_reg * 2, tmp_reg,
-				  meta->insn.dst_reg * 2, size);
+	return data_ld_host_order_addr32(nfp_prog, meta->insn.src_reg * 2,
+					 tmp_reg, meta->insn.dst_reg * 2, size);
+}
+
+static int
+mem_ldx_emem(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+	     unsigned int size)
+{
+	swreg tmp_reg;
+
+	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
+
+	return data_ld_host_order_addr40(nfp_prog, meta->insn.src_reg * 2,
+					 tmp_reg, meta->insn.dst_reg * 2, size);
 }
 
 static int
@@ -1797,6 +1853,9 @@ mem_ldx(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		return mem_ldx_stack(nfp_prog, meta, size,
 				     meta->ptr.off + meta->ptr.var_off.value);
 
+	if (meta->ptr.type == PTR_TO_MAP_VALUE)
+		return mem_ldx_emem(nfp_prog, meta, size);
+
 	return -EOPNOTSUPP;
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 589f56f9ea96..3ae0add8e4d2 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -247,6 +247,7 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	if (reg->type != PTR_TO_CTX &&
 	    reg->type != PTR_TO_STACK &&
+	    reg->type != PTR_TO_MAP_VALUE &&
 	    reg->type != PTR_TO_PACKET) {
 		pr_info("unsupported ptr type: %d\n", reg->type);
 		return -EINVAL;
@@ -258,6 +259,13 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 			return err;
 	}
 
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		if (is_mbpf_store(meta)) {
+			pr_info("map writes not supported\n");
+			return -EOPNOTSUPP;
+		}
+	}
+
 	if (meta->ptr.type != NOT_INIT && meta->ptr.type != reg->type) {
 		pr_info("ptr type changed for instruction %d -> %d\n",
 			meta->ptr.type, reg->type);
-- 
2.15.1

^ permalink raw reply related

* [PATCH bpf-next 16/16] nfp: bpf: implement bpf map offload
From: Jakub Kicinski @ 2018-01-05  6:09 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel, davem
  Cc: oss-drivers, tehnerd, Jakub Kicinski
In-Reply-To: <20180105060931.30815-1-jakub.kicinski@netronome.com>

Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c    |   1 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h    |   4 +
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 106 +++++++++++++++++++++++
 3 files changed, 111 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 251cd198e710..b23075021bba 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -356,6 +356,7 @@ static void nfp_bpf_clean(struct nfp_app *app)
 
 	WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
 	WARN_ON(!list_empty(&bpf->map_list));
+	WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
 	kfree(bpf);
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index a3b428e1181d..7d74b44209a9 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -94,6 +94,8 @@ enum pkt_vec {
  * @cmsg_wq:		work queue for waiting for cmsg replies
  *
  * @map_list:		list of offloaded maps
+ * @maps_in_use:	number of currently offloaded maps
+ * @map_elems_in_use:	number of elements allocated to offloaded maps
  *
  * @adjust_head:	adjust head capability
  * @flags:		extra flags for adjust head
@@ -124,6 +126,8 @@ struct nfp_app_bpf {
 	struct wait_queue_head cmsg_wq;
 
 	struct list_head map_list;
+	unsigned int maps_in_use;
+	unsigned int map_elems_in_use;
 
 	struct nfp_bpf_cap_adjust_head {
 		u32 flags;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 12a48b7583a3..e94ef02c3b0b 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -36,6 +36,9 @@
  * Netronome network device driver: TC offload functions for PF and VF
  */
 
+#define pr_fmt(fmt)	"NFP net bpf: " fmt
+
+#include <linux/bpf.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/pci.h>
@@ -171,6 +174,105 @@ static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
 	return 0;
 }
 
+static int
+nfp_bpf_map_get_next_key(struct bpf_offloaded_map *offmap,
+			 void *key, void *next_key)
+{
+	if (!key)
+		return nfp_bpf_ctrl_getfirst_entry(offmap, next_key);
+	return nfp_bpf_ctrl_getnext_entry(offmap, key, next_key);
+}
+
+static int
+nfp_bpf_map_delete_elem(struct bpf_offloaded_map *offmap, void *key)
+{
+	if (offmap->map.map_type == BPF_MAP_TYPE_ARRAY)
+		return -EINVAL;
+	return nfp_bpf_ctrl_del_entry(offmap, key);
+}
+
+static const struct bpf_map_dev_ops nfp_bpf_map_ops = {
+	.map_get_next_key	= nfp_bpf_map_get_next_key,
+	.map_lookup_elem	= nfp_bpf_ctrl_lookup_entry,
+	.map_update_elem	= nfp_bpf_ctrl_update_entry,
+	.map_delete_elem	= nfp_bpf_map_delete_elem,
+};
+
+static int
+nfp_bpf_map_alloc(struct nfp_app_bpf *bpf, struct bpf_offloaded_map *offmap)
+{
+	struct nfp_bpf_map *nfp_map;
+	long long int res;
+
+	if (!bpf->maps.types)
+		return -EOPNOTSUPP;
+
+	if (offmap->map.map_flags ||
+	    offmap->map.numa_node != NUMA_NO_NODE) {
+		pr_info("map flags are not supported\n");
+		return -EINVAL;
+	}
+
+	if (!(bpf->maps.types & 1 << offmap->map.map_type)) {
+		pr_info("map type not supported\n");
+		return -EOPNOTSUPP;
+	}
+	if (bpf->maps.max_maps == bpf->maps_in_use) {
+		pr_info("too many maps for a device\n");
+		return -ENOMEM;
+	}
+	if (bpf->maps.max_elems - bpf->map_elems_in_use <
+	    offmap->map.max_entries) {
+		pr_info("map with too many elements: %u, left: %u\n",
+			offmap->map.max_entries,
+			bpf->maps.max_elems - bpf->map_elems_in_use);
+		return -ENOMEM;
+	}
+	if (offmap->map.key_size > bpf->maps.max_key_sz ||
+	    offmap->map.value_size > bpf->maps.max_val_sz ||
+	    round_up(offmap->map.key_size, 8) +
+	    round_up(offmap->map.value_size, 8) > bpf->maps.max_elem_sz) {
+		pr_info("elements don't fit in device constraints\n");
+		return -ENOMEM;
+	}
+
+	nfp_map = kzalloc(sizeof(*nfp_map), GFP_USER);
+	if (!nfp_map)
+		return -ENOMEM;
+
+	offmap->dev_priv = nfp_map;
+	nfp_map->offmap = offmap;
+	nfp_map->bpf = bpf;
+
+	res = nfp_bpf_ctrl_alloc_map(bpf, &offmap->map);
+	if (res < 0) {
+		kfree(nfp_map);
+		return res;
+	}
+
+	nfp_map->tid = res;
+	offmap->dev_ops = &nfp_bpf_map_ops;
+	bpf->maps_in_use++;
+	bpf->map_elems_in_use += offmap->map.max_entries;
+	list_add_tail(&nfp_map->l, &bpf->map_list);
+
+	return 0;
+}
+
+static int
+nfp_bpf_map_free(struct nfp_app_bpf *bpf, struct bpf_offloaded_map *offmap)
+{
+	struct nfp_bpf_map *nfp_map = offmap->dev_priv;
+
+	nfp_bpf_ctrl_free_map(bpf, nfp_map);
+	list_del_init(&nfp_map->l);
+	bpf->map_elems_in_use -= offmap->map.max_entries;
+	bpf->maps_in_use--;
+	kfree(nfp_map);
+
+	return 0;
+}
+
 int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf)
 {
 	switch (bpf->command) {
@@ -180,6 +282,10 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf)
 		return nfp_bpf_translate(nn, bpf->offload.prog);
 	case BPF_OFFLOAD_DESTROY:
 		return nfp_bpf_destroy(nn, bpf->offload.prog);
+	case BPF_OFFLOAD_MAP_ALLOC:
+		return nfp_bpf_map_alloc(app->priv, bpf->offmap);
+	case BPF_OFFLOAD_MAP_FREE:
+		return nfp_bpf_map_free(app->priv, bpf->offmap);
 	default:
 		return -EINVAL;
 	}
-- 
2.15.1

^ permalink raw reply related

* IPv6 source based policy routing query
From: sundeep subbaraya @ 2018-01-05  6:37 UTC (permalink / raw)
  To: linux-net, netdev

Hi,

I am quite new to IPv6.
I ran to one problem hope you guys can help me.
I have multiple ethernet port box which runs by Altera's Cyclone 5.
I am using source based policy routing for ipv6.
The problem is when there is "default via <gateway>" entry in per interface
routing table then ping to a link local address of directly connected PC from
that interface fails.

# ip -6 rule
0:      from all lookup local
16383:  from all oif eth2 lookup eth2
16383:  from 2001::200 lookup eth2
32766:  from all lookup main

# ip -6 route show table eth2
2001::/64 dev eth2  src 2001::200  metric 1024  pref medium
default via 2001::5 dev eth2  metric 1024  pref medium

# ping6 fe80::21b:cdff:fe03:1357 -I eth2
PING fe80::21b:cdff:fe03:1357(fe80::21b:cdff:fe03:1357) from
fe80::2b0:aeff:fe03:7c1c eth2: 56 data bytes
>From fe80::2b0:aeff:fe03:7c1c icmp_seq=1 Destination unreachable:
Address unreachable
>From fe80::2b0:aeff:fe03:7c1c icmp_seq=2 Destination unreachable:
Address unreachable
>From fe80::2b0:aeff:fe03:7c1c icmp_seq=3 Destination unreachable:
Address unreachable
^C
--- fe80::21b:cdff:fe03:1357 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4007ms

# ip -6 route del default dev eth2 table eth2
# ping6 fe80::21b:cdff:fe03:1357 -I eth2
PING fe80::21b:cdff:fe03:1357(fe80::21b:cdff:fe03:1357) from
fe80::2b0:aeff:fe03:7c1c eth2: 56 data bytes
64 bytes from fe80::21b:cdff:fe03:1357: icmp_seq=1 ttl=64 time=0.285 ms
64 bytes from fe80::21b:cdff:fe03:1357: icmp_seq=2 ttl=64 time=1.73 ms
64 bytes from fe80::21b:cdff:fe03:1357: icmp_seq=3 ttl=64 time=1.75 ms
^C
--- fe80::21b:cdff:fe03:1357 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.285/1.258/1.755/0.688 ms

>From Wireshark captures, though I ping link local first Neighbor
solicitation for default
gateway ip 2001::5 goes on wire. Since that gateway is not present
ping does not work.
If gateway is present and reply with Neighbor Advertisement then echo
request and replies with
link local happens properly after that.
Why does Neighbor Solicitation for default gateway happens though we
ping link local address ?
I assume link local ping should work despite of unavailable default gateway.
Please correct me if am wrong. Also let me know how to change this behavior.

I am using 4.1.22-ltsi kernel.

Thanks,
Sundeep

^ permalink raw reply

* [PATCH net-next] net: tracepoint: adding new tracepoint arguments in inet_sock_set_state
From: Yafang Shao @ 2018-01-05  6:42 UTC (permalink / raw)
  To: davem, brendan.d.gregg
  Cc: songliubraving, marcelo.leitner, netdev, linux-kernel,
	Yafang Shao

sk->sk_protocol and sk->sk_family are exposed as tracepoint arguments.
Then we can conveniently use these two arguments to do the filter.

Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/trace/events/sock.h | 24 ++++++++++++++++++------
 net/ipv4/af_inet.c          |  6 ++++--
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index 3537c5f..c7df70f 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -11,7 +11,11 @@
 #include <linux/ipv6.h>
 #include <linux/tcp.h>

-/* The protocol traced by sock_set_state */
+#define family_names			\
+		EM(AF_INET)				\
+		EMe(AF_INET6)
+
+/* The protocol traced by inet_sock_set_state */
 #define inet_protocol_names		\
 		EM(IPPROTO_TCP)			\
 		EM(IPPROTO_DCCP)		\
@@ -37,6 +41,7 @@
 #define EM(a)       TRACE_DEFINE_ENUM(a);
 #define EMe(a)      TRACE_DEFINE_ENUM(a);

+family_names
 inet_protocol_names
 tcp_state_names

@@ -45,6 +50,9 @@
 #define EM(a)       { a, #a },
 #define EMe(a)      { a, #a }

+#define show_family_name(val)	\
+	__print_symbolic(val, family_names)
+
 #define show_inet_protocol_name(val)    \
 	__print_symbolic(val, inet_protocol_names)

@@ -108,9 +116,10 @@

 TRACE_EVENT(inet_sock_set_state,

-	TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
+	TP_PROTO(const struct sock *sk, const int family, const int protocol,
+				const int oldstate, const int newstate),

-	TP_ARGS(sk, oldstate, newstate),
+	TP_ARGS(sk, family, protocol, oldstate, newstate),

 	TP_STRUCT__entry(
 		__field(const void *, skaddr)
@@ -118,6 +127,7 @@
 		__field(int, newstate)
 		__field(__u16, sport)
 		__field(__u16, dport)
+		__field(__u16, family)
 		__field(__u8, protocol)
 		__array(__u8, saddr, 4)
 		__array(__u8, daddr, 4)
@@ -133,8 +143,9 @@
 		__entry->skaddr = sk;
 		__entry->oldstate = oldstate;
 		__entry->newstate = newstate;
+		__entry->family = family;
+		__entry->protocol = protocol;

-		__entry->protocol = sk->sk_protocol;
 		__entry->sport = ntohs(inet->inet_sport);
 		__entry->dport = ntohs(inet->inet_dport);

@@ -145,7 +156,7 @@
 		*p32 =  inet->inet_daddr;

 #if IS_ENABLED(CONFIG_IPV6)
-		if (sk->sk_family == AF_INET6) {
+		if (family == AF_INET6) {
 			pin6 = (struct in6_addr *)__entry->saddr_v6;
 			*pin6 = sk->sk_v6_rcv_saddr;
 			pin6 = (struct in6_addr *)__entry->daddr_v6;
@@ -160,7 +171,8 @@
 		}
 	),

-	TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
+	TP_printk("family=%s protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
+			show_family_name(__entry->family),
 			show_inet_protocol_name(__entry->protocol),
 			__entry->sport, __entry->dport,
 			__entry->saddr, __entry->daddr,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index bab98a4..1d52796 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,14 +1223,16 @@ int inet_sk_rebuild_header(struct sock *sk)

 void inet_sk_set_state(struct sock *sk, int state)
 {
-	trace_inet_sock_set_state(sk, sk->sk_state, state);
+	trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
+							sk->sk_state, state);
 	sk->sk_state = state;
 }
 EXPORT_SYMBOL(inet_sk_set_state);

 void inet_sk_state_store(struct sock *sk, int newstate)
 {
-	trace_inet_sock_set_state(sk, sk->sk_state, newstate);
+	trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
+							sk->sk_state, newstate);
 	smp_store_release(&sk->sk_state, newstate);
 }

--
1.8.3.1

^ permalink raw reply related

* Re: [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
From: Steffen Klassert @ 2018-01-05  7:16 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: netdev, Subash Abhinov Kasiviswanathan, Nathan Harold,
	David Miller
In-Reply-To: <CAKD1Yr3jumaDOxwKzDA9PavuG7q=FdwHG-x278265MO=ur_EtQ@mail.gmail.com>

On Fri, Jan 05, 2018 at 01:41:46AM +0900, Lorenzo Colitti wrote:
> On Wed, Jan 3, 2018 at 9:10 PM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > The fact that you need new keyed VTIs looks a bit like a workaround
> > of the design limitations the VTI interfaces have. Unfortunately
> > this is not the only limitation of VTI and I think we don't get what
> > we really want by changing VTI without breaking existing userspace.
> 
> Actually, I added the flag mostly to ensure that there would be no
> changes in behaviour at all - so, for example, to return EEXIST if
> someone tried to create two VTIs on the same IP address pair without
> the flag. But perhaps that's not important. It's unlikely that anyone
> would be trying to do this, since it has always returned an error.
> 
> If it's indeed not important, then I think it may be possible to fix
> the limitations that stop there from being two VTIs with the same IP
> address pair without introducing a new flag or userspace-visible
> changes. (I don't think it's too far off what this patch series does
> today.) If existing setups that only have one VTI per IP address pair
> continue to work as before, but setups where there is more than one
> VTI per IP address pair now work in some way, would that be
> acceptable?

Well, it would be acceptable to support this. The reason why I don't
want it is that we already had problems with such extensions in the
past. The VTI interfaces were developed for one special usecase,
then extended to try to support more. The result was that some user
configurations did not work anymore. And still it has design
limitations that we can't work around.

Also we changed generic tunnel lookup code to flag that we use GRE
keys for something that is not a GRE key but a VTI interface marker.
Your patches need to extend the generic tunnel lookup code again
for this very special usecase. I just don't want to mess around
too much with this.

> 
> > The problem is that VTI interfaces are IP tunnels, and this is
> > not the thing we need. The tunnel is already implemented in the
> > generic xfrm code. All we need is some interface we can route
> > through. In particular we need something that can work with
> > transport mode too.
> 
> Well, I'm not sure. Personally I think VTIs are a pretty natural fit
> for tunnel mode IPsec. For example, they provide an easy way to assign
> an IP address to an IPsec tunnel which is then used for packets
> originated on that tunnel.

The IPsec tunnel endpoint addresses are already defined by the SA,
so there is no need to define them again at the interface. All we
need is some marker (maybe a new lookup key) to assign the SAs
to a certain VTI interface.

> That doesn't really make much sense in
> transport mode, because in transport mode the IP addresses used are
> the ones of the physical interfaces that send the packets.

Right, and this is one of the limitations we can't overcome with
a VTI. So we need to find a new solution for transport mode anyway.

> 
> > I showed already some ideas on creating xfrm interfaces at the
> > IPsec workshop in Seoul and my plan is to discuss this at the
> > upcomming IPsec workshop, so that we get something everybody is
> > happy with. In particular I want to have feedback from the
> > userspace IPsec IKE developers before we change/create something.
> 
> I did look at the code in the ipsec-next-xfrmi tree for a while -
> wrote some tests for it, etc.

To give people on the list a chance to follow what we are talking about,
here is the link to the code:

https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux-stk.git/log/?h=ipsec-next-xfrmi

> The main reason I didn't pursue it is
> that, as written, it couldn't support our use cases. The main reasons
> were:
> 
> 1. It needs to be bound to a specific underlying interface. It looks
> like that interface must have a 6-byte hardware address (and thus
> can't be a cellular interface), but I'm not 100% sure. By contrast,
> the VTI supports an optional underlying link index, and doesn't pose
> any requirements on hardware addresses. If it's possible to make the
> underlying interface optional, by storing the underlying ifindex
> instead of the dev (like tunnels do) then that might work.

This should be possible.

> 2. It cannot use the output mark to influence routing of the
> transformed packets, because it uses the output mark/mask for its own
> purposes. Unfortunately, influencing routing of the transformed
> packets was the reason we proposed XFRMA_OUTPUT_MARK in the first
> place, so this is a showstopper :-(. Do you recall why you used the
> output mark for this, as opposed to the SA mark? If it's possible to
> use the SA mark instead, that might work.

Well, using the output mark was an easy way to get a lookup working
for the first version. I already noticed that it was not a so good
idea. Maybe we need some new lookup key for IPsec lookups...

> 
> If you're willing to evolve the xfrmi design in response to our
> feedback, I can try to make the xfrmi code fit our use cases and send
> patches over the next couple of weeks.

Yes, sure :-)

The current code is just a discussion base, it was likely that
it changes.

> But I don't think we can wait
> until the discussion at the ipsec workshop to discover whether xfrmi
> might be a feasible solution for us. By then we'll either have had to
> do something out of tree (likely the keyed VTI patches, or something
> like them) or postponed this work to a future release.

I don't want to rush with this, I want to have feedback from as many
potential users as possible to be sure to end up with the right thing.
I really want to avoid to have yet another inerface that is almost what
we need, like VTI is.

^ permalink raw reply

* Re: [PATCH net-next] net: tracepoint: adding new tracepoint arguments in inet_sock_set_state
From: Song Liu @ 2018-01-05  7:21 UTC (permalink / raw)
  To: Yafang Shao
  Cc: David Miller, brendan.d.gregg@gmail.com,
	marcelo.leitner@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1515134569-23812-1-git-send-email-laoar.shao@gmail.com>


> On Jan 4, 2018, at 10:42 PM, Yafang Shao <laoar.shao@gmail.com> wrote:
> 
> sk->sk_protocol and sk->sk_family are exposed as tracepoint arguments.
> Then we can conveniently use these two arguments to do the filter.
> 
> Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> include/trace/events/sock.h | 24 ++++++++++++++++++------
> net/ipv4/af_inet.c          |  6 ++++--
> 2 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
> index 3537c5f..c7df70f 100644
> --- a/include/trace/events/sock.h
> +++ b/include/trace/events/sock.h
> @@ -11,7 +11,11 @@
> #include <linux/ipv6.h>
> #include <linux/tcp.h>
> 
> -/* The protocol traced by sock_set_state */
> +#define family_names			\
> +		EM(AF_INET)				\
> +		EMe(AF_INET6)
> +
> +/* The protocol traced by inet_sock_set_state */
> #define inet_protocol_names		\
> 		EM(IPPROTO_TCP)			\
> 		EM(IPPROTO_DCCP)		\
> @@ -37,6 +41,7 @@
> #define EM(a)       TRACE_DEFINE_ENUM(a);
> #define EMe(a)      TRACE_DEFINE_ENUM(a);
> 
> +family_names
> inet_protocol_names
> tcp_state_names
> 
> @@ -45,6 +50,9 @@
> #define EM(a)       { a, #a },
> #define EMe(a)      { a, #a }
> 
> +#define show_family_name(val)	\
> +	__print_symbolic(val, family_names)
> +
> #define show_inet_protocol_name(val)    \
> 	__print_symbolic(val, inet_protocol_names)
> 
> @@ -108,9 +116,10 @@
> 
> TRACE_EVENT(inet_sock_set_state,
> 
> -	TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
> +	TP_PROTO(const struct sock *sk, const int family, const int protocol,
> +				const int oldstate, const int newstate),

Are there cases we need protocol and/or family that is different to 
sk->sk_protocol/sk_family? If not, I think we don't need to change the 
TP_PROTO. 

Thanks,
Song

> 
> -	TP_ARGS(sk, oldstate, newstate),
> +	TP_ARGS(sk, family, protocol, oldstate, newstate),
> 
> 	TP_STRUCT__entry(
> 		__field(const void *, skaddr)
> @@ -118,6 +127,7 @@
> 		__field(int, newstate)
> 		__field(__u16, sport)
> 		__field(__u16, dport)
> +		__field(__u16, family)
> 		__field(__u8, protocol)
> 		__array(__u8, saddr, 4)
> 		__array(__u8, daddr, 4)
> @@ -133,8 +143,9 @@
> 		__entry->skaddr = sk;
> 		__entry->oldstate = oldstate;
> 		__entry->newstate = newstate;
> +		__entry->family = family;
> +		__entry->protocol = protocol;
> 
> -		__entry->protocol = sk->sk_protocol;
> 		__entry->sport = ntohs(inet->inet_sport);
> 		__entry->dport = ntohs(inet->inet_dport);
> 
> @@ -145,7 +156,7 @@
> 		*p32 =  inet->inet_daddr;
> 
> #if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == AF_INET6) {
> +		if (family == AF_INET6) {
> 			pin6 = (struct in6_addr *)__entry->saddr_v6;
> 			*pin6 = sk->sk_v6_rcv_saddr;
> 			pin6 = (struct in6_addr *)__entry->daddr_v6;
> @@ -160,7 +171,8 @@
> 		}
> 	),
> 
> -	TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
> +	TP_printk("family=%s protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
> +			show_family_name(__entry->family),
> 			show_inet_protocol_name(__entry->protocol),
> 			__entry->sport, __entry->dport,
> 			__entry->saddr, __entry->daddr,
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index bab98a4..1d52796 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1223,14 +1223,16 @@ int inet_sk_rebuild_header(struct sock *sk)
> 
> void inet_sk_set_state(struct sock *sk, int state)
> {
> -	trace_inet_sock_set_state(sk, sk->sk_state, state);
> +	trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
> +							sk->sk_state, state);
> 	sk->sk_state = state;
> }
> EXPORT_SYMBOL(inet_sk_set_state);
> 
> void inet_sk_state_store(struct sock *sk, int newstate)
> {
> -	trace_inet_sock_set_state(sk, sk->sk_state, newstate);
> +	trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
> +							sk->sk_state, newstate);
> 	smp_store_release(&sk->sk_state, newstate);
> }
> 
> --
> 1.8.3.1
> 

^ permalink raw reply

* question about __netif_receive_skb_core() work on macvlan device
From: Yuan, Linyu (NSB - CN/Shanghai) @ 2018-01-05  7:19 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi,

I have a question about __netif_receive_skb_core(),
1. create a macvlan device on a real ethernet device and configure a mac address to this macvlan device
2. create a AF_PACKET socket and bind to the real ethernet device in step 1
3. user application will receive packet which destination mac address equal to step1 through socket created by step2

Is it correct for a macvlan device configured as step1?

thanks

^ permalink raw reply

* Re: kernel BUG at ./include/linux/skbuff.h:LINE!
From: Steffen Klassert @ 2018-01-05  8:01 UTC (permalink / raw)
  To: syzbot
  Cc: davem, herbert, kuznet, linux-kernel, netdev, syzkaller-bugs,
	yoshfuji
In-Reply-To: <001a1140bece7bf2210561f565e9@google.com>

On Thu, Jan 04, 2018 at 07:58:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> ead68f216110170ec729e2c4dec0aad6d38259d7
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+82bbd65569c49c6c0c4d@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/skbuff.h:2068!

The reproducer does not trigger here, but I'm pretty sure
it is this:

Subject: [PATCH RFC ipsec] esp: Fix GRO when the headers not fully on the linear part of
 the skb.

The GRO layer does not necessarily pull the complete headers
into the linear part of the skb, a part may remain on the
first page fragment. This can lead to a crash if we try to
pull the headers, so make sure we have them on the linear
part before pulling.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Reported-by: syzbot+82bbd65569c49c6c0c4d@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4_offload.c | 3 ++-
 net/ipv6/esp6_offload.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index f8b918c766b0..dfc1e049aad5 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -38,7 +38,8 @@ static struct sk_buff **esp4_gro_receive(struct sk_buff **head,
 	__be32 spi;
 	int err;
 
-	skb_pull(skb, offset);
+	if (!pskb_pull(skb, offset))
+		return NULL; 
 
 	if ((err = xfrm_parse_spi(skb, IPPROTO_ESP, &spi, &seq)) != 0)
 		goto out;
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index 333a478aa161..dd9627490c7c 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -60,7 +60,8 @@ static struct sk_buff **esp6_gro_receive(struct sk_buff **head,
 	int nhoff;
 	int err;
 
-	skb_pull(skb, offset);
+	if (!pskb_pull(skb, offset))
+		return NULL;
 
 	if ((err = xfrm_parse_spi(skb, IPPROTO_ESP, &spi, &seq)) != 0)
 		goto out;
-- 
2.14.1

^ permalink raw reply related

* Re: [net-next 06/10] net/mlx5e: change Mellanox references in DIM code
From: Tal Gilboa @ 2018-01-05  8:04 UTC (permalink / raw)
  To: Andy Gospodarek, netdev; +Cc: mchan, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-7-git-send-email-andy@greyhouse.net>

On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> Change all mlx5_am* and MLX_AM* references to net_dim and NET_DIM,
MLX_AM->MLX5_AM

>   	cq_period_mode = enable ?
> -		MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> -		MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> +		NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
> +		NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
I'm not sure about this part. CQE/EQE based moderation is a feature in 
Mellanox's chips, which isn't necessarily coupled with adaptive 
moderation. net_dim lib should know which values to choose according to 
the selected mode, but I don't think mlx5 driver should use an enum from 
net_dim for enabling/disabling HW features. Another issue is that we use 
the enum value as an argument for the command to HW (0=EQE, 1=CQE). If 
someone would change the values it would break the HW feature. I think 
it would be safer to use the NET_DIM_XXX enum only when using functions 
from net_dim lib.

>   	current_cq_period_mode = is_rx_cq ?
>   		priv->channels.params.rx_cq_moderation.cq_period_mode :
>   		priv->channels.params.tx_cq_moderation.cq_period_mode;
>   	mode_changed = cq_period_mode != current_cq_period_mode;
>   
> -	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
> +	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE &&
>   	    !MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
>   		return -EOPNOTSUPP;
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 3aa1c90..edd4077 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -674,8 +674,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   		wqe->data.lkey = rq->mkey_be;
>   	}
>   
> -	INIT_WORK(&rq->am.work, mlx5e_rx_am_work);
> -	rq->am.mode = params->rx_cq_moderation.cq_period_mode;
> +	INIT_WORK(&rq->dim.work, mlx5e_rx_dim_work);
> +	rq->dim.mode = params->rx_cq_moderation.cq_period_mode;
>   	rq->page_cache.head = 0;
>   	rq->page_cache.tail = 0;
>   
> @@ -919,7 +919,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
>   	if (err)
>   		goto err_destroy_rq;
>   
> -	if (params->rx_am_enabled)
> +	if (params->rx_dim_enabled)
>   		c->rq.state |= BIT(MLX5E_RQ_STATE_AM);
>   
>   	return 0;
> @@ -952,7 +952,7 @@ static void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
>   
>   static void mlx5e_close_rq(struct mlx5e_rq *rq)
>   {
> -	cancel_work_sync(&rq->am.work);
> +	cancel_work_sync(&rq->dim.work);
>   	mlx5e_destroy_rq(rq);
>   	mlx5e_free_rx_descs(rq);
>   	mlx5e_free_rq(rq);
> @@ -1565,7 +1565,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
>   }
>   
>   static int mlx5e_open_cq(struct mlx5e_channel *c,
> -			 struct mlx5e_cq_moder moder,
> +			 struct net_dim_cq_moder moder,
>   			 struct mlx5e_cq_param *param,
>   			 struct mlx5e_cq *cq)
>   {
> @@ -1747,7 +1747,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>   			      struct mlx5e_channel_param *cparam,
>   			      struct mlx5e_channel **cp)
>   {
> -	struct mlx5e_cq_moder icocq_moder = {0, 0};
> +	struct net_dim_cq_moder icocq_moder = {0, 0};
>   	struct net_device *netdev = priv->netdev;
>   	int cpu = mlx5e_get_cpu(priv, ix);
>   	struct mlx5e_channel *c;
> @@ -1999,7 +1999,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
>   
>   	mlx5e_build_common_cq_param(priv, param);
>   
> -	param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> +	param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>   }
>   
>   static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
> @@ -4016,13 +4016,13 @@ void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
>   	params->tx_cq_moderation.usec =
>   		MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
>   
> -	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
> +	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
>   		params->tx_cq_moderation.usec =
>   			MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE;
>   
>   	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_TX_CQE_BASED_MODER,
>   			params->tx_cq_moderation.cq_period_mode ==
> -				MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
> +				NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE);
>   }
>   
>   void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
> @@ -4034,17 +4034,17 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
>   	params->rx_cq_moderation.usec =
>   		MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
>   
> -	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
> +	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
>   		params->rx_cq_moderation.usec =
>   			MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE;
>   
> -	if (params->rx_am_enabled)
> +	if (params->rx_dim_enabled)
>   		params->rx_cq_moderation =
> -			mlx5e_am_get_def_profile(cq_period_mode);
> +			net_dim_get_def_profile(cq_period_mode);
>   
>   	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_BASED_MODER,
>   			params->rx_cq_moderation.cq_period_mode ==
> -				MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
> +				NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE);
>   }
>   
>   u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
> @@ -4100,9 +4100,9 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
>   
>   	/* CQ moderation params */
>   	cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
> -			MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> -			MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> -	params->rx_am_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
> +			NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
> +			NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +	params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
>   	mlx5e_set_rx_cq_mode_params(params, cq_period_mode);
>   	mlx5e_set_tx_cq_mode_params(params, cq_period_mode);
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index c6a77f8..ccb038f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -877,8 +877,8 @@ static void mlx5e_build_rep_params(struct mlx5_core_dev *mdev,
>   				   struct mlx5e_params *params)
>   {
>   	u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
> -					 MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
> -					 MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
> +					 NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
> +					 NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>   
>   	params->log_sq_size = MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE;
>   	params->rq_wq_type  = MLX5_WQ_TYPE_LINKED_LIST;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> index 1849169..dae77a9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> @@ -79,7 +79,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
>   		mlx5e_cq_arm(&c->sq[i].cq);
>   
>   	if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
> -		mlx5e_rx_am(&c->rq.am,
> +		net_dim(&c->rq.dim,
>   			    c->rq.cq.event_ctr,
>   			    c->rq.stats.packets,
>   			    c->rq.stats.bytes);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
> index ca05f4e..00b9ae3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
> @@ -33,22 +33,22 @@
>   
>   #include "en.h"
>   
> -#define MLX5E_PARAMS_AM_NUM_PROFILES 5
> +#define NET_DIM_PARAMS_NUM_PROFILES 5
>   /* Adaptive moderation profiles */
> -#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
> -#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
> -#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
> -
> -/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
> -#define MLX5_AM_EQE_PROFILES { \
> -	{1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> -	{8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> -	{64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> -	{128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> -	{256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> +#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
> +#define NET_DIM_DEF_PROFILE_CQE 1
> +#define NET_DIM_DEF_PROFILE_EQE 1
> +
> +/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
> +#define NET_DIM_EQE_PROFILES { \
> +	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> +	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> +	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> +	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
> +	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
>   }
>   
> -#define MLX5_AM_CQE_PROFILES { \
> +#define NET_DIM_CQE_PROFILES { \
>   	{2,  256},             \
>   	{8,  128},             \
>   	{16, 64},              \
> @@ -56,193 +56,193 @@
>   	{64, 64}               \
>   }
>   
> -static const struct mlx5e_cq_moder
> -profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
> -	MLX5_AM_EQE_PROFILES,
> -	MLX5_AM_CQE_PROFILES,
> +static const struct net_dim_cq_moder
> +profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
> +	NET_DIM_EQE_PROFILES,
> +	NET_DIM_CQE_PROFILES,
>   };
>   
> -struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix)
> +struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix)
>   {
> -	struct mlx5e_cq_moder cq_moder;
> +	struct net_dim_cq_moder cq_moder;
>   
>   	cq_moder = profile[cq_period_mode][ix];
>   	cq_moder.cq_period_mode = cq_period_mode;
>   	return cq_moder;
>   }
>   
> -struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
> +struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode)
>   {
>   	int default_profile_ix;
>   
> -	if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
> -		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
> -	else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
> -		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
> +	if (rx_cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
> +		default_profile_ix = NET_DIM_DEF_PROFILE_CQE;
> +	else /* NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE */
> +		default_profile_ix = NET_DIM_DEF_PROFILE_EQE;
>   
> -	return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
> +	return net_dim_get_profile(rx_cq_period_mode, default_profile_ix);
>   }
>   
> -static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
> +static bool net_dim_on_top(struct net_dim *dim)
>   {
> -	switch (am->tune_state) {
> -	case MLX5E_AM_PARKING_ON_TOP:
> -	case MLX5E_AM_PARKING_TIRED:
> +	switch (dim->tune_state) {
> +	case NET_DIM_PARKING_ON_TOP:
> +	case NET_DIM_PARKING_TIRED:
>   		return true;
> -	case MLX5E_AM_GOING_RIGHT:
> -		return (am->steps_left > 1) && (am->steps_right == 1);
> -	default: /* MLX5E_AM_GOING_LEFT */
> -		return (am->steps_right > 1) && (am->steps_left == 1);
> +	case NET_DIM_GOING_RIGHT:
> +		return (dim->steps_left > 1) && (dim->steps_right == 1);
> +	default: /* NET_DIM_GOING_LEFT */
> +		return (dim->steps_right > 1) && (dim->steps_left == 1);
>   	}
>   }
>   
> -static void mlx5e_am_turn(struct mlx5e_rx_am *am)
> +static void net_dim_turn(struct net_dim *dim)
>   {
> -	switch (am->tune_state) {
> -	case MLX5E_AM_PARKING_ON_TOP:
> -	case MLX5E_AM_PARKING_TIRED:
> +	switch (dim->tune_state) {
> +	case NET_DIM_PARKING_ON_TOP:
> +	case NET_DIM_PARKING_TIRED:
>   		break;
> -	case MLX5E_AM_GOING_RIGHT:
> -		am->tune_state = MLX5E_AM_GOING_LEFT;
> -		am->steps_left = 0;
> +	case NET_DIM_GOING_RIGHT:
> +		dim->tune_state = NET_DIM_GOING_LEFT;
> +		dim->steps_left = 0;
>   		break;
> -	case MLX5E_AM_GOING_LEFT:
> -		am->tune_state = MLX5E_AM_GOING_RIGHT;
> -		am->steps_right = 0;
> +	case NET_DIM_GOING_LEFT:
> +		dim->tune_state = NET_DIM_GOING_RIGHT;
> +		dim->steps_right = 0;
>   		break;
>   	}
>   }
>   
> -static int mlx5e_am_step(struct mlx5e_rx_am *am)
> +static int net_dim_step(struct net_dim *dim)
>   {
> -	if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
> -		return MLX5E_AM_TOO_TIRED;
> +	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
> +		return NET_DIM_TOO_TIRED;
>   
> -	switch (am->tune_state) {
> -	case MLX5E_AM_PARKING_ON_TOP:
> -	case MLX5E_AM_PARKING_TIRED:
> +	switch (dim->tune_state) {
> +	case NET_DIM_PARKING_ON_TOP:
> +	case NET_DIM_PARKING_TIRED:
>   		break;
> -	case MLX5E_AM_GOING_RIGHT:
> -		if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
> -			return MLX5E_AM_ON_EDGE;
> -		am->profile_ix++;
> -		am->steps_right++;
> +	case NET_DIM_GOING_RIGHT:
> +		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
> +			return NET_DIM_ON_EDGE;
> +		dim->profile_ix++;
> +		dim->steps_right++;
>   		break;
> -	case MLX5E_AM_GOING_LEFT:
> -		if (am->profile_ix == 0)
> -			return MLX5E_AM_ON_EDGE;
> -		am->profile_ix--;
> -		am->steps_left++;
> +	case NET_DIM_GOING_LEFT:
> +		if (dim->profile_ix == 0)
> +			return NET_DIM_ON_EDGE;
> +		dim->profile_ix--;
> +		dim->steps_left++;
>   		break;
>   	}
>   
> -	am->tired++;
> -	return MLX5E_AM_STEPPED;
> +	dim->tired++;
> +	return NET_DIM_STEPPED;
>   }
>   
> -static void mlx5e_am_park_on_top(struct mlx5e_rx_am *am)
> +static void net_dim_park_on_top(struct net_dim *dim)
>   {
> -	am->steps_right  = 0;
> -	am->steps_left   = 0;
> -	am->tired        = 0;
> -	am->tune_state   = MLX5E_AM_PARKING_ON_TOP;
> +	dim->steps_right  = 0;
> +	dim->steps_left   = 0;
> +	dim->tired        = 0;
> +	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
>   }
>   
> -static void mlx5e_am_park_tired(struct mlx5e_rx_am *am)
> +static void net_dim_park_tired(struct net_dim *dim)
>   {
> -	am->steps_right  = 0;
> -	am->steps_left   = 0;
> -	am->tune_state   = MLX5E_AM_PARKING_TIRED;
> +	dim->steps_right  = 0;
> +	dim->steps_left   = 0;
> +	dim->tune_state   = NET_DIM_PARKING_TIRED;
>   }
>   
> -static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am)
> +static void net_dim_exit_parking(struct net_dim *dim)
>   {
> -	am->tune_state = am->profile_ix ? MLX5E_AM_GOING_LEFT :
> -					  MLX5E_AM_GOING_RIGHT;
> -	mlx5e_am_step(am);
> +	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
> +					  NET_DIM_GOING_RIGHT;
> +	net_dim_step(dim);
>   }
>   
>   #define IS_SIGNIFICANT_DIFF(val, ref) \
>   	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
>   
> -static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
> -				  struct mlx5e_rx_am_stats *prev)
> +static int net_dim_stats_compare(struct net_dim_stats *curr,
> +				   struct net_dim_stats *prev)
>   {
>   	if (!prev->bpms)
> -		return curr->bpms ? MLX5E_AM_STATS_BETTER :
> -				    MLX5E_AM_STATS_SAME;
> +		return curr->bpms ? NET_DIM_STATS_BETTER :
> +				    NET_DIM_STATS_SAME;
>   
>   	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
> -		return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
> -						   MLX5E_AM_STATS_WORSE;
> +		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
> +						   NET_DIM_STATS_WORSE;
>   
>   	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
> -		return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
> -						   MLX5E_AM_STATS_WORSE;
> +		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
> +						   NET_DIM_STATS_WORSE;
>   
>   	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
> -		return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
> -						   MLX5E_AM_STATS_WORSE;
> +		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
> +						   NET_DIM_STATS_WORSE;
>   
> -	return MLX5E_AM_STATS_SAME;
> +	return NET_DIM_STATS_SAME;
>   }
>   
> -static bool mlx5e_am_decision(struct mlx5e_rx_am_stats *curr_stats,
> -			      struct mlx5e_rx_am *am)
> +static bool net_dim_decision(struct net_dim_stats *curr_stats,
> +			       struct net_dim *dim)
>   {
> -	int prev_state = am->tune_state;
> -	int prev_ix = am->profile_ix;
> +	int prev_state = dim->tune_state;
> +	int prev_ix = dim->profile_ix;
>   	int stats_res;
>   	int step_res;
>   
> -	switch (am->tune_state) {
> -	case MLX5E_AM_PARKING_ON_TOP:
> -		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
> -		if (stats_res != MLX5E_AM_STATS_SAME)
> -			mlx5e_am_exit_parking(am);
> +	switch (dim->tune_state) {
> +	case NET_DIM_PARKING_ON_TOP:
> +		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
> +		if (stats_res != NET_DIM_STATS_SAME)
> +			net_dim_exit_parking(dim);
>   		break;
>   
> -	case MLX5E_AM_PARKING_TIRED:
> -		am->tired--;
> -		if (!am->tired)
> -			mlx5e_am_exit_parking(am);
> +	case NET_DIM_PARKING_TIRED:
> +		dim->tired--;
> +		if (!dim->tired)
> +			net_dim_exit_parking(dim);
>   		break;
>   
> -	case MLX5E_AM_GOING_RIGHT:
> -	case MLX5E_AM_GOING_LEFT:
> -		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
> -		if (stats_res != MLX5E_AM_STATS_BETTER)
> -			mlx5e_am_turn(am);
> +	case NET_DIM_GOING_RIGHT:
> +	case NET_DIM_GOING_LEFT:
> +		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
> +		if (stats_res != NET_DIM_STATS_BETTER)
> +			net_dim_turn(dim);
>   
> -		if (mlx5e_am_on_top(am)) {
> -			mlx5e_am_park_on_top(am);
> +		if (net_dim_on_top(dim)) {
> +			net_dim_park_on_top(dim);
>   			break;
>   		}
>   
> -		step_res = mlx5e_am_step(am);
> +		step_res = net_dim_step(dim);
>   		switch (step_res) {
> -		case MLX5E_AM_ON_EDGE:
> -			mlx5e_am_park_on_top(am);
> +		case NET_DIM_ON_EDGE:
> +			net_dim_park_on_top(dim);
>   			break;
> -		case MLX5E_AM_TOO_TIRED:
> -			mlx5e_am_park_tired(am);
> +		case NET_DIM_TOO_TIRED:
> +			net_dim_park_tired(dim);
>   			break;
>   		}
>   
>   		break;
>   	}
>   
> -	if ((prev_state     != MLX5E_AM_PARKING_ON_TOP) ||
> -	    (am->tune_state != MLX5E_AM_PARKING_ON_TOP))
> -		am->prev_stats = *curr_stats;
> +	if ((prev_state     != NET_DIM_PARKING_ON_TOP) ||
> +	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
> +		dim->prev_stats = *curr_stats;
>   
> -	return am->profile_ix != prev_ix;
> +	return dim->profile_ix != prev_ix;
>   }
>   
> -static void mlx5e_am_sample(u16 event_ctr,
> -			    u64 packets,
> -			    u64 bytes,
> -			    struct mlx5e_rx_am_sample *s)
> +static void net_dim_sample(u16 event_ctr,
> +			   u64 packets,
> +			   u64 bytes,
> +			   struct net_dim_sample *s)
>   {
>   	s->time	     = ktime_get();
>   	s->pkt_ctr   = packets;
> @@ -250,13 +250,13 @@ static void mlx5e_am_sample(u16 event_ctr,
>   	s->event_ctr = event_ctr;
>   }
>   
> -#define MLX5E_AM_NEVENTS 64
> +#define NET_DIM_NEVENTS 64
>   #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
>   #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
>   
> -static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
> -				struct mlx5e_rx_am_sample *end,
> -				struct mlx5e_rx_am_stats *curr_stats)
> +static void net_dim_calc_stats(struct net_dim_sample *start,
> +			       struct net_dim_sample *end,
> +			       struct net_dim_stats *curr_stats)
>   {
>   	/* u32 holds up to 71 minutes, should be enough */
>   	u32 delta_us = ktime_us_delta(end->time, start->time);
> @@ -269,39 +269,39 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
>   
>   	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
>   	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
> -	curr_stats->epms = DIV_ROUND_UP(MLX5E_AM_NEVENTS * USEC_PER_MSEC,
> +	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
>   					delta_us);
>   }
>   
> -void mlx5e_rx_am(struct mlx5e_rx_am *am,
> -		 u16 event_ctr,
> -		 u64 packets,
> -		 u64 bytes)
> +void net_dim(struct net_dim *dim,
> +	     u16 event_ctr,
> +	     u64 packets,
> +	     u64 bytes)
>   {
> -	struct mlx5e_rx_am_sample end_sample;
> -	struct mlx5e_rx_am_stats curr_stats;
> +	struct net_dim_sample end_sample;
> +	struct net_dim_stats curr_stats;
>   	u16 nevents;
>   
> -	switch (am->state) {
> -	case MLX5E_AM_MEASURE_IN_PROGRESS:
> +	switch (dim->state) {
> +	case NET_DIM_MEASURE_IN_PROGRESS:
>   		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
> -				  am->start_sample.event_ctr);
> -		if (nevents < MLX5E_AM_NEVENTS)
> +				  dim->start_sample.event_ctr);
> +		if (nevents < NET_DIM_NEVENTS)
>   			break;
> -		mlx5e_am_sample(event_ctr, packets, bytes, &end_sample);
> -		mlx5e_am_calc_stats(&am->start_sample, &end_sample,
> +		net_dim_sample(event_ctr, packets, bytes, &end_sample);
> +		net_dim_calc_stats(&dim->start_sample, &end_sample,
>   				    &curr_stats);
> -		if (mlx5e_am_decision(&curr_stats, am)) {
> -			am->state = MLX5E_AM_APPLY_NEW_PROFILE;
> -			schedule_work(&am->work);
> +		if (net_dim_decision(&curr_stats, dim)) {
> +			dim->state = NET_DIM_APPLY_NEW_PROFILE;
> +			schedule_work(&dim->work);
>   			break;
>   		}
>   		/* fall through */
> -	case MLX5E_AM_START_MEASURE:
> -		mlx5e_am_sample(event_ctr, packets, bytes, &am->start_sample);
> -		am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
> +	case NET_DIM_START_MEASURE:
> +		net_dim_sample(event_ctr, packets, bytes, &dim->start_sample);
> +		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
>   		break;
> -	case MLX5E_AM_APPLY_NEW_PROFILE:
> +	case NET_DIM_APPLY_NEW_PROFILE:
>   		break;
>   	}
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
> index 5ce8e54..a775c12 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
> @@ -31,32 +31,32 @@
>    * SOFTWARE.
>   */
>   
> -#ifndef MLX5_AM_H
> -#define MLX5_AM_H
> +#ifndef NET_DIM_H
> +#define NET_DIM_H
>   
> -struct mlx5e_cq_moder {
> +struct net_dim_cq_moder {
>   	u16 usec;
>   	u16 pkts;
>   	u8 cq_period_mode;
>   };
>   
> -struct mlx5e_rx_am_sample {
> +struct net_dim_sample {
>   	ktime_t time;
>   	u32     pkt_ctr;
>   	u32     byte_ctr;
>   	u16     event_ctr;
>   };
>   
> -struct mlx5e_rx_am_stats {
> +struct net_dim_stats {
>   	int ppms; /* packets per msec */
>   	int bpms; /* bytes per msec */
>   	int epms; /* events per msec */
>   };
>   
> -struct mlx5e_rx_am { /* Adaptive Moderation */
> +struct net_dim { /* Adaptive Moderation */
>   	u8                                      state;
> -	struct mlx5e_rx_am_stats                prev_stats;
> -	struct mlx5e_rx_am_sample               start_sample;
> +	struct net_dim_stats                    prev_stats;
> +	struct net_dim_sample                   start_sample;
>   	struct work_struct                      work;
>   	u8                                      profile_ix;
>   	u8                                      mode;
> @@ -67,43 +67,42 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
>   };
>   
>   enum {
> -	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
> -	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
> -	MLX5_CQ_PERIOD_NUM_MODES
> +	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
> +	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
> +	NET_DIM_CQ_PERIOD_NUM_MODES
>   };
>   
>   /* Adaptive moderation logic */
>   enum {
> -	MLX5E_AM_START_MEASURE,
> -	MLX5E_AM_MEASURE_IN_PROGRESS,
> -	MLX5E_AM_APPLY_NEW_PROFILE,
> +	NET_DIM_START_MEASURE,
> +	NET_DIM_MEASURE_IN_PROGRESS,
> +	NET_DIM_APPLY_NEW_PROFILE,
>   };
>   
>   enum {
> -	MLX5E_AM_PARKING_ON_TOP,
> -	MLX5E_AM_PARKING_TIRED,
> -	MLX5E_AM_GOING_RIGHT,
> -	MLX5E_AM_GOING_LEFT,
> +	NET_DIM_PARKING_ON_TOP,
> +	NET_DIM_PARKING_TIRED,
> +	NET_DIM_GOING_RIGHT,
> +	NET_DIM_GOING_LEFT,
>   };
>   
>   enum {
> -	MLX5E_AM_STATS_WORSE,
> -	MLX5E_AM_STATS_SAME,
> -	MLX5E_AM_STATS_BETTER,
> +	NET_DIM_STATS_WORSE,
> +	NET_DIM_STATS_SAME,
> +	NET_DIM_STATS_BETTER,
>   };
>   
>   enum {
> -	MLX5E_AM_STEPPED,
> -	MLX5E_AM_TOO_TIRED,
> -	MLX5E_AM_ON_EDGE,
> +	NET_DIM_STEPPED,
> +	NET_DIM_TOO_TIRED,
> +	NET_DIM_ON_EDGE,
>   };
>   
> -void mlx5e_rx_am(struct mlx5e_rx_am *am,
> -		 u16 event_ctr,
> -		 u64 packets,
> -		 u64 bytes);
> -void mlx5e_rx_am_work(struct work_struct *work);
> -struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
> -struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
> +void net_dim(struct net_dim *dim,
> +	     u16 event_ctr,
> +	     u64 packets,
> +	     u64 bytes);
> +struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode);
> +struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix);
>   
> -#endif /* MLX5_AM_H */
> +#endif /* NET_DIM_H */
> 

^ permalink raw reply

* Re: [PATCH net-next] net: tracepoint: adding new tracepoint arguments in inet_sock_set_state
From: Yafang Shao @ 2018-01-05  8:09 UTC (permalink / raw)
  To: Song Liu
  Cc: David Miller, brendan.d.gregg@gmail.com,
	marcelo.leitner@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <9A0D3ED9-7FFD-4E96-A42C-90871C2CC0C6@fb.com>

On Fri, Jan 5, 2018 at 3:21 PM, Song Liu <songliubraving@fb.com> wrote:
>
>> On Jan 4, 2018, at 10:42 PM, Yafang Shao <laoar.shao@gmail.com> wrote:
>>
>> sk->sk_protocol and sk->sk_family are exposed as tracepoint arguments.
>> Then we can conveniently use these two arguments to do the filter.
>>
>> Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>> ---
>> include/trace/events/sock.h | 24 ++++++++++++++++++------
>> net/ipv4/af_inet.c          |  6 ++++--
>> 2 files changed, 22 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
>> index 3537c5f..c7df70f 100644
>> --- a/include/trace/events/sock.h
>> +++ b/include/trace/events/sock.h
>> @@ -11,7 +11,11 @@
>> #include <linux/ipv6.h>
>> #include <linux/tcp.h>
>>
>> -/* The protocol traced by sock_set_state */
>> +#define family_names                 \
>> +             EM(AF_INET)                             \
>> +             EMe(AF_INET6)
>> +
>> +/* The protocol traced by inet_sock_set_state */
>> #define inet_protocol_names           \
>>               EM(IPPROTO_TCP)                 \
>>               EM(IPPROTO_DCCP)                \
>> @@ -37,6 +41,7 @@
>> #define EM(a)       TRACE_DEFINE_ENUM(a);
>> #define EMe(a)      TRACE_DEFINE_ENUM(a);
>>
>> +family_names
>> inet_protocol_names
>> tcp_state_names
>>
>> @@ -45,6 +50,9 @@
>> #define EM(a)       { a, #a },
>> #define EMe(a)      { a, #a }
>>
>> +#define show_family_name(val)        \
>> +     __print_symbolic(val, family_names)
>> +
>> #define show_inet_protocol_name(val)    \
>>       __print_symbolic(val, inet_protocol_names)
>>
>> @@ -108,9 +116,10 @@
>>
>> TRACE_EVENT(inet_sock_set_state,
>>
>> -     TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
>> +     TP_PROTO(const struct sock *sk, const int family, const int protocol,
>> +                             const int oldstate, const int newstate),
>
> Are there cases we need protocol and/or family that is different to
> sk->sk_protocol/sk_family? If not, I think we don't need to change the
> TP_PROTO.
>
> Thanks,
> Song
>

As of now, there're two sk_family, which are AF_INET and AF_INET6, and
three sk_protocol which are  IPPROTO_TCP, IPPROTO_SCTP and
IPPROTO_DCCP.

Thanks
Yafang

>>
>> -     TP_ARGS(sk, oldstate, newstate),
>> +     TP_ARGS(sk, family, protocol, oldstate, newstate),
>>
>>       TP_STRUCT__entry(
>>               __field(const void *, skaddr)
>> @@ -118,6 +127,7 @@
>>               __field(int, newstate)
>>               __field(__u16, sport)
>>               __field(__u16, dport)
>> +             __field(__u16, family)
>>               __field(__u8, protocol)
>>               __array(__u8, saddr, 4)
>>               __array(__u8, daddr, 4)
>> @@ -133,8 +143,9 @@
>>               __entry->skaddr = sk;
>>               __entry->oldstate = oldstate;
>>               __entry->newstate = newstate;
>> +             __entry->family = family;
>> +             __entry->protocol = protocol;
>>
>> -             __entry->protocol = sk->sk_protocol;
>>               __entry->sport = ntohs(inet->inet_sport);
>>               __entry->dport = ntohs(inet->inet_dport);
>>
>> @@ -145,7 +156,7 @@
>>               *p32 =  inet->inet_daddr;
>>
>> #if IS_ENABLED(CONFIG_IPV6)
>> -             if (sk->sk_family == AF_INET6) {
>> +             if (family == AF_INET6) {
>>                       pin6 = (struct in6_addr *)__entry->saddr_v6;
>>                       *pin6 = sk->sk_v6_rcv_saddr;
>>                       pin6 = (struct in6_addr *)__entry->daddr_v6;
>> @@ -160,7 +171,8 @@
>>               }
>>       ),
>>
>> -     TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
>> +     TP_printk("family=%s protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
>> +                     show_family_name(__entry->family),
>>                       show_inet_protocol_name(__entry->protocol),
>>                       __entry->sport, __entry->dport,
>>                       __entry->saddr, __entry->daddr,
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index bab98a4..1d52796 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1223,14 +1223,16 @@ int inet_sk_rebuild_header(struct sock *sk)
>>
>> void inet_sk_set_state(struct sock *sk, int state)
>> {
>> -     trace_inet_sock_set_state(sk, sk->sk_state, state);
>> +     trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
>> +                                                     sk->sk_state, state);
>>       sk->sk_state = state;
>> }
>> EXPORT_SYMBOL(inet_sk_set_state);
>>
>> void inet_sk_state_store(struct sock *sk, int newstate)
>> {
>> -     trace_inet_sock_set_state(sk, sk->sk_state, newstate);
>> +     trace_inet_sock_set_state(sk, sk->sk_family, sk->sk_protocol,
>> +                                                     sk->sk_state, newstate);
>>       smp_store_release(&sk->sk_state, newstate);
>> }
>>
>> --
>> 1.8.3.1
>>
>

^ permalink raw reply

* Re: [net-next 08/10] net/dim: use struct net_dim_sample as arg to net_dim
From: Tal Gilboa @ 2018-01-05  8:13 UTC (permalink / raw)
  To: Andy Gospodarek, netdev; +Cc: mchan, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-9-git-send-email-andy@greyhouse.net>

Thanks for doing this, would make future changes easier.

On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> Simplify the arguments net_dim() by formatting them into a struct
> net_dim_sample before calling the function.
> 

^ permalink raw reply

* Re: [net-next 00/10] net: create dynamic software irq moderation library
From: Tal Gilboa @ 2018-01-05  8:14 UTC (permalink / raw)
  To: Andy Gospodarek, netdev; +Cc: mchan, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

Thanks Andy for your hard work. Looks great overall!

On 1/4/2018 10:21 PM, Andy Gospodarek wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> This converts the dynamic interrupt moderation library from the mlx5_en driver
> into a library so it can be used by any driver.  The penultimatepatch in this
Had to look up "penultimatepatch " :), but aren't these two words?

> set adds support for interrupt moderation in the bnxt_en driver and the last
> patch creates an entry in the MAINTAINERS file.
> 
> The main purpose of this code in the mlx5_en driver is to allow an
> administrator to make sure that default coalesce settings are optimized
> for low latency, but quickly adapt to handle high throughput traffic and
> optimize how many packets are received during each napi poll.
> 
> For any new driver the following changes would be needed to use this
> library:
> 
> - add elements in ring struct to track items needed by this library
> - create function that can be called to actually set coalesce settings
>    for the driver
> 
> Credit to Rob Rice and Lee Reed for doing some of the initial proof of
> concept and testing for this patch and Tal Gilboa and Or Gerlitz for their
> comments, etc on this set.
> 
> Andy Gospodarek (10):
>    net/mlx5e: move interrupt moderation structs to new file
>    net/mlx5e: move interrupt moderation forward declarations
>    net/mlx5e: remove rq references in mlx5e_rx_am
>    net/mlx5e: move AM logic enums
>    net/mlx5e: move generic functions to new file
>    net/mlx5e: change Mellanox references in DIM code
>    net: move dynamic interrpt coalescing code to include/linux
interrpt -> interrupt. The topic of the actual patch was fixed, only 
left in the cover.

>    net/dim: use struct net_dim_sample as arg to net_dim
>    bnxt_en: add support for software dynamic interrupt moderation
>    MAINTAINERS: add entry for Dynamic Interrupt Moderation
> 
>   MAINTAINERS                                        |   5 +
>   drivers/net/ethernet/broadcom/bnxt/Makefile        |   2 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  52 +++
>   drivers/net/ethernet/broadcom/bnxt/bnxt.h          |  34 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |  32 ++
>   drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
>   drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en.h       |  46 +--
>   drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
>   .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 -------------------
>   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
>   drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 108 ++++++
>   include/linux/mlx5/mlx5_ifc.h                      |   6 -
>   include/linux/net_dim.h                            | 372 +++++++++++++++++++++
>   17 files changed, 693 insertions(+), 426 deletions(-)
>   create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
>   delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
mlx5/core/net_dim.h was removed from code. Please fix the cover.
>   create mode 100644 include/linux/net_dim.h
> 

^ permalink raw reply

* Re: xfrm: Return error on unknown switch in init_state
From: Steffen Klassert @ 2018-01-05  8:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <20180104112104.GA6437@gondor.apana.org.au>

On Thu, Jan 04, 2018 at 10:21:04PM +1100, Herbert Xu wrote:
> Currently esp will happily create an xfrm state with an unknown
> encap type for IPv4 or an unknown mode for IPv6, without setting
> the necessary state parameters.  This patch fixes it by returning
> -EINVAL.

Looks like we catch the unknown mode in __xfrm_init_state().
But in any case, if we want to return -EINVAL on unknown mode,
we should do it for IPv6 and for IPv4.

^ permalink raw reply

* RE
From: Mrs Alice Walton @ 2018-01-05  8:33 UTC (permalink / raw)
  To: Recipients

my name is Mrs. Alice Walton, a business woman an America Citizen and the heiress to the fortune of Walmart stores, born October 7, 1949. I have a mission for you worth $100,000,000.00(Hundred Million United State Dollars) which I intend using for CHARITY

^ permalink raw reply

* [PATCH] xen-netfront: enable device after manual module load
From: Eduardo Otubo @ 2018-01-05  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, xen-devel, jgross, boris.ostrovsky, vkuznets, mgamal,
	cavery

When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.

The guest would face errors like:

  [root@guest ~]# ethtool -i eth0
  Cannot get driver information: No such device

  [root@guest ~]# ifconfig eth0
  eth0: error fetching interface information: Device not found

This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
---
 drivers/net/xen-netfront.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c5a34671abda..9bd7ddeeb6a5 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1326,6 +1326,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 
 	netif_carrier_off(netdev);
 
+	xenbus_switch_state(dev, XenbusStateInitialising);
 	return netdev;
 
  exit:
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH v5 1/3] clocksource/drivers/atcpit100: Add andestech atcpit100 timer
From: Greentime Hu @ 2018-01-05  8:45 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Greentime, Rick Chen, Rick Chen, Linux Kernel Mailing List,
	Arnd Bergmann, Linus Walleij, linux-arch, Thomas Gleixner,
	Jason Cooper, Marc Zyngier, Rob Herring, netdev, Vincent Chen,
	DTML, Al Viro, David Howells, Will Deacon, linux-serial
In-Reply-To: <efdb249a-8dc0-e076-a1db-20cecdbf4c13@linaro.org>

Hi, Daniel:

2018-01-05 3:48 GMT+08:00 Daniel Lezcano <daniel.lezcano@linaro.org>:
> On 04/01/2018 15:06, Greentime Hu wrote:
>> Hi, Daniel:
>>
>> 2018-01-04 21:50 GMT+08:00 Daniel Lezcano <daniel.lezcano@linaro.org>:
>>>
>>> Hi,
>>>
>>> sorry I missed your answer. Comments below.
>>>
>>> On 13/12/2017 07:06, Greentime Hu wrote:
>>>> Hi, Daniel:
>>>>
>>>> 2017-12-12 18:05 GMT+08:00 Daniel Lezcano <daniel.lezcano@linaro.org>:
>>>>> On 12/12/2017 06:46, Rick Chen wrote:
>>>>>> ATCPIT100 is often used on the Andes architecture,
>>>>>> This timer provide 4 PIT channels. Each PIT channel is a
>>>>>> multi-function timer, can be configured as 32,16,8 bit timers
>>>>>> or PWM as well.
>>>>>>
>>>>>> For system timer it will set channel 1 32-bit timer0 as clock
>>>>>> source and count downwards until underflow and restart again.
>>>>>
>>>>> [ ... ]
>>>>>
>>>>>> +config CLKSRC_ATCPIT100
>>>>>> +     bool "Clocksource for AE3XX platform"
>>>>>> +     depends on NDS32 || COMPILE_TEST
>>>>>> +     depends on HAS_IOMEM
>>>>>> +     help
>>>>>> +       This option enables support for the Andestech AE3XX platform timers.
>>>>>
>>>>> Hi Rick,
>>>>>
>>>>> the general rule for the Kconfig is:
>>>>>
>>>>> bool "Clocksource for AE3XX platform" if COMPILE_TEST
>
> BTW, select TIMER_OF is missing.

We don't select here because we select TIMER_OF in arch/nds32/Kconfig
I am not sure if I still need to select TIMER_OF here?

>>>>> and no deps on the platform.
>>>>>
>>>>> It is up to the platform Kconfig to select the option.
>>>>>
>>>>> We want here a silent option but make it selectable in case of
>>>>> compilation test coverage.
>>>>
>>>>
>>>> The way we like to use it is because
>>>> 1. This timer is a basic component to boot an nds32 CPU and it should
>>>> be able to select without COMPILE_TEST for nds32 architecture.
>>>
>>> Yes, so you don't need it to be selectable, you must select it from the
>>> platform's Kconfig.
>>
>> I am not sure that I get your point or not.
>> We don't have a CONFIG_PLAT_AE3XX.
>> Do you mean we should create one and select CLKSRC_ATCPIT100 under
>> CONFIG_PLAT_AE3XX?
>
> No. Can't you add in arch/ndis32/Kconfig ?
>
> +select TIMER_ATCPIT100
>
> Like:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n50

IMHO, it might be a little bit wierd if we select TIMER_ATCPIT100 in
arch/nds32/Kconfig because it is part of SoC instead of CPU.
If we change to another SoC with another timer, we need to select
another TIMER in arch/nds32/Kconfig and delete TIMER_ATCPIT100.
It seems more flexible to be selected in driver layer.

It seems to be the timer is part of the arch to be selected in arch's Kconfig.
arch/arc/Kconfig:       select ARC_TIMERS
arch/arc/Kconfig:       select ARC_TIMERS_64BIT
arch/arm/Kconfig:       select ARM_ARCH_TIMER
arch/arm64/Kconfig:     select ARM_ARCH_TIMER
arch/blackfin/Kconfig:  select BFIN_GPTIMERS

>>>> 2. It seems conflict with debug info. I am not sure if there is
>>>> another way to debug kernel(with debug info) with COMPILE_TEST and
>>>> DEBUG_INFO because we need this driver for nds32 architecture.
>>>>
>>>> Symbol: DEBUG_INFO [=n]
>>>> Type  : boolean
>>>> Prompt: Compile the kernel with debug info
>>>>   Location:
>>>>     -> Kernel hacking
>>>>       -> Compile-time checks and compiler options
>>>>   Defined at lib/Kconfig.debug:140
>>>>   Depends on: DEBUG_KERNEL [=y] && !COMPILE_TEST [=n]
>>>
>>> The COMPILE_TEST option is only there to allow cross-compilation test
>>> coverage, it does not select or unselect the driver in usual way.
>>>
>>> If the COMPILE_TEST is enabled, then the option will appear in the
>>> menuconfig, so that gives the opportunity to select/unselect it.
>>>
>>> Otherwise, the Kconfig's platform selects automatically the driver and
>>> the user *can't* unselect it from the menuconfig as it is a silent
>>> option and that is certainly what you want.
>>>
>>>>> Also, this driver is not a CLKSRC but a TIMER. Rename CLKSRC_ATCPIT100
>>>>> to TIMER_ATCPIT100.
>>>>
>>>> Thanks. We will rename it in the next version patch.
>>>
>>> You just resend an entire series V5 for the architecture. I'm confused,
>>> what is the merging path ?
>>
>> Sorry. I didn't get your point.
>> We sent the timer patch and the architecture patch together because it
>> would be easier for reviewer to check the vdso implementations.
>> What do you mean about the merging path?
>
> I received a [Vx y/3] series and I received a [Vx y/39].
>
> The former from Rick Chen means to me "please pick them through your tree".
>
> The latter from you means to me "can you ack the patches so I can merge
> them through my tree". Note you will have to resend the entire arch
> series for every single review/comment (that could end up upset the
> Cc'ed people).
>
> Which one should I review ? I can not track different patchset
> implementing the same thing. Which one should I comment, review ? Are
> the comments I did on [Vx y/3] taken into account in the arch series ?
> etc ...
>
> Please clarify, it is confusing and impossible to review in this situation.
>
> I suggest we stick to the x/3 series, so I can comment it and you can
> resend a new version without resending the entire arch series. So I can
> merge it through my tree, and you get it via eg. a shared immutable
> branch. The arch series will be reduced by 3 patches.

Thanks for your explanation. :)
I will send these 2 patch set separately next time.

^ permalink raw reply

* Re: [PATCH v5 1/3] clocksource/drivers/atcpit100: Add andestech atcpit100 timer
From: Daniel Lezcano @ 2018-01-05  9:31 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Greentime, Rick Chen, Rick Chen, Linux Kernel Mailing List,
	Arnd Bergmann, Linus Walleij, linux-arch, Thomas Gleixner,
	Jason Cooper, Marc Zyngier, Rob Herring, netdev, Vincent Chen,
	DTML, Al Viro, David Howells, Will Deacon, linux-serial
In-Reply-To: <CAEbi=3frQYZHb0yiMvNk_JUiUCy6ca0tbrsM_pt49LtkMy47mw@mail.gmail.com>

On 05/01/2018 09:45, Greentime Hu wrote:
> Hi, Daniel:

[ ... ]

>>>>>> [ ... ]
>>>>>>
>>>>>>> +config CLKSRC_ATCPIT100
>>>>>>> +     bool "Clocksource for AE3XX platform"
>>>>>>> +     depends on NDS32 || COMPILE_TEST
>>>>>>> +     depends on HAS_IOMEM
>>>>>>> +     help
>>>>>>> +       This option enables support for the Andestech AE3XX platform timers.
>>>>>>
>>>>>> Hi Rick,
>>>>>>
>>>>>> the general rule for the Kconfig is:
>>>>>>
>>>>>> bool "Clocksource for AE3XX platform" if COMPILE_TEST
>>
>> BTW, select TIMER_OF is missing.
> 
> We don't select here because we select TIMER_OF in arch/nds32/Kconfig
> I am not sure if I still need to select TIMER_OF here?

Actually, I want the drivers/clocksource/Kconfig to be consistent across
all entries. As TIMER_OF is needed by the driver and nothing else, it
must be selected in the TIMER entry.

As there are a lot of timers and we do the changes little by little,
there are still entries with different format.

It should be something like that:

config ASM9260_TIMER
        bool "ASM9260 timer driver" if COMPILE_TEST
        select CLKSRC_MMIO
        select TIMER_OF
        help
          Enables support for the ASM9260 timer.

Move the select TIMER_OF to the timer option entry.

>>>>>> and no deps on the platform.
>>>>>>
>>>>>> It is up to the platform Kconfig to select the option.
>>>>>>
>>>>>> We want here a silent option but make it selectable in case of
>>>>>> compilation test coverage.
>>>>>
>>>>>
>>>>> The way we like to use it is because
>>>>> 1. This timer is a basic component to boot an nds32 CPU and it should
>>>>> be able to select without COMPILE_TEST for nds32 architecture.
>>>>
>>>> Yes, so you don't need it to be selectable, you must select it from the
>>>> platform's Kconfig.
>>>
>>> I am not sure that I get your point or not.
>>> We don't have a CONFIG_PLAT_AE3XX.
>>> Do you mean we should create one and select CLKSRC_ATCPIT100 under
>>> CONFIG_PLAT_AE3XX?
>>
>> No. Can't you add in arch/ndis32/Kconfig ?
>>
>> +select TIMER_ATCPIT100
>>
>> Like:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n50
> 
> IMHO, it might be a little bit wierd if we select TIMER_ATCPIT100 in
> arch/nds32/Kconfig because it is part of SoC instead of CPU.
> If we change to another SoC with another timer, we need to select
> another TIMER in arch/nds32/Kconfig and delete TIMER_ATCPIT100.
> It seems more flexible to be selected in driver layer.
> 
> It seems to be the timer is part of the arch to be selected in arch's Kconfig.
> arch/arc/Kconfig:       select ARC_TIMERS
> arch/arc/Kconfig:       select ARC_TIMERS_64BIT
> arch/arm/Kconfig:       select ARM_ARCH_TIMER
> arch/arm64/Kconfig:     select ARM_ARCH_TIMER
> arch/blackfin/Kconfig:  select BFIN_GPTIMERS

No, the timer must be selected from the arch/soc's or whatever Kconfig.
Not in the clocksource's Kconfig.

eg.

on ARM:

arch/arm/mach-vt8500/Kconfig:   select VT8500_TIMER
arch/arm/mach-bcm/Kconfig:      select BCM_KONA_TIMER
arch/arm/mach-actions/Kconfig:  select OWL_TIMER
arch/arm/mach-digicolor/Kconfig:        select DIGICOLOR_TIMER

etc ...

on ARM64:

arch/arm64/Kconfig.platforms:   select OWL_TIMER
arch/arm64/Kconfig.platforms:       select ARM_TIMER_SP804
arch/arm64/Kconfig.platforms:       select MTK_TIMER

etc ...

Thanks.

  -- Daniel


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* [PATCH net] of_mdio: avoid MDIO bus removal when a PHY is missing
From: Madalin Bucur @ 2018-01-05  9:36 UTC (permalink / raw)
  To: andrew-g2DYL2Zd6BY, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: geert+renesas-gXvu3+zWzMSzQB+pC5nmwQ,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Madalin Bucur

If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.

Signed-off-by: Madalin Bucur <madalin.bucur-3arQi8VN3Tc@public.gmane.org>
---
 drivers/of/of_mdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 3481e69..93d41275 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -231,7 +231,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 			rc = of_mdiobus_register_phy(mdio, child, addr);
 		else
 			rc = of_mdiobus_register_device(mdio, child, addr);
-		if (rc)
+		if (rc && rc != -ENODEV)
 			goto unregister;
 	}
 
@@ -255,7 +255,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
 
 			if (of_mdiobus_child_is_phy(child)) {
 				rc = of_mdiobus_register_phy(mdio, child, addr);
-				if (rc)
+				if (rc && rc != -ENODEV)
 					goto unregister;
 			}
 		}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: BUG: 4.14.6 unable to handle kernel NULL pointer dereference at xfrm_output_resume
From: Steffen Klassert @ 2018-01-05  9:38 UTC (permalink / raw)
  To: Darius Ski; +Cc: netdev
In-Reply-To: <CAKt3ReJYWKRhnYdt3TpZa0+CqTgp+gL2Xkf-zoBLVmj1vk6s7w@mail.gmail.com>

On Tue, Dec 19, 2017 at 10:50:42AM +0200, Darius Ski wrote:
> Hi,
> 
> thanks a lot for the patch. I have applied it to 4.14.7 and crossed
> fingers, hopefully no more problems.
> 
> I will let community know if there are any more crashes.

Any news on this, did the patch help?

^ permalink raw reply

* [PATCH net-next 19/20] net: hns3: remove redundant semicolon
From: Peng Li @ 2018-01-05 10:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, linuxarm, salil.mehta, lipeng321
In-Reply-To: <1515147504-86802-1-git-send-email-lipeng321@huawei.com>

There is a redundant semicolon, this patch removes it.

Signed-off-by: Peng Li <lipeng321@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index f74b66a..655f522 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1288,7 +1288,7 @@ static int hclgevf_pci_init(struct hclgevf_dev *hdev)
 	pci_set_master(pdev);
 	hw = &hdev->hw;
 	hw->hdev = hdev;
-	hw->io_base = pci_iomap(pdev, 2, 0);;
+	hw->io_base = pci_iomap(pdev, 2, 0);
 	if (!hw->io_base) {
 		dev_err(&pdev->dev, "can't map configuration register space\n");
 		ret = -ENOMEM;
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox