Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf] tools: bpf: fix NULL return handling in bpf__prepare_load
From: Arnaldo Carvalho de Melo @ 2018-05-15 14:11 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: YueHaibing, alexander.shishkin, mingo, peterz, netdev, namhyung
In-Reply-To: <605c7b96-83dc-9a9e-7037-32b2c2dfcb2e@iogearbox.net>

Em Sun, May 13, 2018 at 01:20:22AM +0200, Daniel Borkmann escreveu:
> [ +Arnaldo ]
> 
> On 05/11/2018 01:21 PM, YueHaibing wrote:
> > bpf_object__open()/bpf_object__open_buffer can return error pointer or NULL,
> > check the return values with IS_ERR_OR_NULL() in bpf__prepare_load and
> > bpf__prepare_load_buffer
> > 
> > Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> > ---
> >  tools/perf/util/bpf-loader.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> This should probably be routed via Arnaldo due to the fix in perf itself. If
> there's no particular preference on which tree, we could potentially route it
> as well via bpf with Acked-by from Arnaldo, but that is up to him. Arnaldo,
> any preference?

I'm preparing a pull req right now, and working a bit on perf's BPF
support, so why not, I'll merge it, thanks,

- Arnaldo
 
> > diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> > index af7ad81..cee6587 100644
> > --- a/tools/perf/util/bpf-loader.c
> > +++ b/tools/perf/util/bpf-loader.c
> > @@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, const char *name)
> >  	}
> >  
> >  	obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
> > -	if (IS_ERR(obj)) {
> > +	if (IS_ERR_OR_NULL(obj)) {
> >  		pr_debug("bpf: failed to load buffer\n");
> >  		return ERR_PTR(-EINVAL);
> >  	}
> > @@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source)
> >  			pr_debug("bpf: successfull builtin compilation\n");
> >  		obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
> >  
> > -		if (!IS_ERR(obj) && llvm_param.dump_obj)
> > +		if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
> >  			llvm__dump_obj(filename, obj_buf, obj_buf_sz);
> >  
> >  		free(obj_buf);
> >  	} else
> >  		obj = bpf_object__open(filename);
> >  
> > -	if (IS_ERR(obj)) {
> > +	if (IS_ERR_OR_NULL(obj)) {
> >  		pr_debug("bpf: failed to load %s\n", filename);
> >  		return obj;
> >  	}
> > 

^ permalink raw reply

* Re: simplify procfs code for seq_file instances V2
From: Christoph Hellwig @ 2018-05-15 14:03 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-rtc-u79uwXL29TY76Z2rM5mHXA, Alessandro Zummo,
	Alexandre Belloni, devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	jfs-discussion-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Christoph Hellwig, megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ
In-Reply-To: <20180506171948.GA769@avx2>

On Sun, May 06, 2018 at 08:19:49PM +0300, Alexey Dobriyan wrote:
> On Wed, Apr 25, 2018 at 05:47:47PM +0200, Christoph Hellwig wrote:
> > Changes since V1:
> >  - open code proc_create_data to avoid setting not fully initialized
> >    entries live
> >  - use unsigned int for state_size
> 
> Need this to maintain sizeof(struct proc_dir_entry):

I'll fold your changes into the relevant patches.

> Otherwise ACK fs/proc/ part.

I'll take this as a formal ACK-ed by for all patches touching
procfs.  If I was wrong please scream.

^ permalink raw reply

* Re: [PATCH 06/40] proc: introduce proc_create_single{, _data}
From: Christoph Hellwig @ 2018-05-15 13:58 UTC (permalink / raw)
  To: Finn Thain
  Cc: linux-rtc-u79uwXL29TY76Z2rM5mHXA, Alessandro Zummo,
	Alexandre Belloni, devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	jfs-discussion-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christoph Hellwig, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Alexey Dobriyan, megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ
In-Reply-To: <alpine.LNX.2.21.1804261118050.8-i19888lE8tflDoPlx7XIcw@public.gmane.org>

On Thu, Apr 26, 2018 at 11:45:50AM +1000, Finn Thain wrote:
> >  
> > -/*
> > - * /proc/nubus stuff
> > - */
> > -
> 
> I don't think that the introduction of proc_create_single{,_data} alters 
> the value of that comment. That comment and similar comments in the same 
> file do have a purpose, which is to keep separate the /proc/nubus 
> implementation is kept separate from the /proc/bus/nubus/devices 
> implementation and so on.

Added back.

^ permalink raw reply

* [iproute2-next v2 1/1] tipc: fixed node and name table listings
From: Jon Maloy @ 2018-05-15 13:54 UTC (permalink / raw)
  To: stephen, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	jon.maloy, canh.d.luu, ying.xue, tipc-discussion

We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

---
v2: Fixed compiler warning as per comment from David Ahern

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 tipc/misc.c      | 18 ++++++++++++++++++
 tipc/misc.h      |  1 +
 tipc/nametable.c | 18 ++++++++++--------
 tipc/node.c      | 19 ++++++++-----------
 tipc/peer.c      |  4 ++++
 5 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/tipc/misc.c b/tipc/misc.c
index 16849f1..e8b726f 100644
--- a/tipc/misc.c
+++ b/tipc/misc.c
@@ -13,6 +13,9 @@
 #include <stdint.h>
 #include <linux/tipc.h>
 #include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <errno.h>
 #include "misc.h"
 
 #define IN_RANGE(val, low, high) ((val) <= (high) && (val) >= (low))
@@ -109,3 +112,18 @@ void nodeid2str(uint8_t *id, char *str)
 	for (i = 31; str[i] == '0'; i--)
 		str[i] = 0;
 }
+
+void hash2nodestr(uint32_t hash, char *str)
+{
+	struct tipc_sioc_nodeid_req nr = {};
+	int sd;
+
+	sd = socket(AF_TIPC, SOCK_RDM, 0);
+	if (sd < 0) {
+		fprintf(stderr, "opening TIPC socket: %s\n", strerror(errno));
+		return;
+	}
+	nr.peer = hash;
+	if (!ioctl(sd, SIOCGETNODEID, &nr))
+		nodeid2str((uint8_t *)nr.node_id, str);
+}
diff --git a/tipc/misc.h b/tipc/misc.h
index 6e8afdd..ff2f31f 100644
--- a/tipc/misc.h
+++ b/tipc/misc.h
@@ -17,5 +17,6 @@
 uint32_t str2addr(char *str);
 int str2nodeid(char *str, uint8_t *id);
 void nodeid2str(uint8_t *id, char *str);
+void hash2nodestr(uint32_t hash, char *str);
 
 #endif
diff --git a/tipc/nametable.c b/tipc/nametable.c
index 2578940..ae73dfa 100644
--- a/tipc/nametable.c
+++ b/tipc/nametable.c
@@ -20,6 +20,7 @@
 #include "cmdl.h"
 #include "msg.h"
 #include "nametable.h"
+#include "misc.h"
 
 #define PORTID_STR_LEN 45 /* Four u32 and five delimiter chars */
 
@@ -31,6 +32,7 @@ static int nametable_show_cb(const struct nlmsghdr *nlh, void *data)
 	struct nlattr *attrs[TIPC_NLA_NAME_TABLE_MAX + 1] = {};
 	struct nlattr *publ[TIPC_NLA_PUBL_MAX + 1] = {};
 	const char *scope[] = { "", "zone", "cluster", "node" };
+	char str[33] = {0,};
 
 	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
 	if (!info[TIPC_NLA_NAME_TABLE])
@@ -45,20 +47,20 @@ static int nametable_show_cb(const struct nlmsghdr *nlh, void *data)
 		return MNL_CB_ERROR;
 
 	if (!*iteration)
-		printf("%-10s %-10s %-10s %-10s %-10s %-10s\n",
-		       "Type", "Lower", "Upper", "Node", "Port",
-		       "Publication Scope");
+		printf("%-10s %-10s %-10s %-8s %-10s %-33s\n",
+		       "Type", "Lower", "Upper", "Scope", "Port",
+		       "Node");
 	(*iteration)++;
 
-	printf("%-10u %-10u %-10u %-10x %-10u %-12u",
+	hash2nodestr(mnl_attr_get_u32(publ[TIPC_NLA_PUBL_NODE]), str);
+
+	printf("%-10u %-10u %-10u %-8s %-10u %s\n",
 	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_TYPE]),
 	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_LOWER]),
 	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_UPPER]),
-	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_NODE]),
+	       scope[mnl_attr_get_u32(publ[TIPC_NLA_PUBL_SCOPE])],
 	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_REF]),
-	       mnl_attr_get_u32(publ[TIPC_NLA_PUBL_KEY]));
-
-	printf("%s\n", scope[mnl_attr_get_u32(publ[TIPC_NLA_PUBL_SCOPE])]);
+	       str);
 
 	return MNL_CB_OK;
 }
diff --git a/tipc/node.c b/tipc/node.c
index b73b644..0fa1064 100644
--- a/tipc/node.c
+++ b/tipc/node.c
@@ -26,10 +26,11 @@
 
 static int node_list_cb(const struct nlmsghdr *nlh, void *data)
 {
-	uint32_t addr;
 	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
 	struct nlattr *info[TIPC_NLA_MAX + 1] = {};
 	struct nlattr *attrs[TIPC_NLA_NODE_MAX + 1] = {};
+	char str[33] = {};
+	uint32_t addr;
 
 	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
 	if (!info[TIPC_NLA_NODE])
@@ -40,13 +41,12 @@ static int node_list_cb(const struct nlmsghdr *nlh, void *data)
 		return MNL_CB_ERROR;
 
 	addr = mnl_attr_get_u32(attrs[TIPC_NLA_NODE_ADDR]);
-	printf("%x: ", addr);
-
+	hash2nodestr(addr, str);
+	printf("%-32s %08x ", str, addr);
 	if (attrs[TIPC_NLA_NODE_UP])
 		printf("up\n");
 	else
 		printf("down\n");
-
 	return MNL_CB_OK;
 }
 
@@ -64,7 +64,7 @@ static int cmd_node_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 		fprintf(stderr, "error, message initialisation failed\n");
 		return -1;
 	}
-
+	printf("Node Identity                    Hash     State\n");
 	return msg_dumpit(nlh, node_list_cb, NULL);
 }
 
@@ -120,7 +120,7 @@ static int cmd_node_get_addr(struct nlmsghdr *nlh, const struct cmd *cmd,
 	}
 	close(sk);
 
-	printf("%x\n", addr.addr.id.node);
+	printf("%08x\n", addr.addr.id.node);
 	return 0;
 }
 
@@ -167,7 +167,6 @@ static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data)
 	uint8_t id[16] = {0,};
 	uint64_t *w0 = (uint64_t *) &id[0];
 	uint64_t *w1 = (uint64_t *) &id[8];
-	int i;
 
 	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
 	if (!info[TIPC_NLA_NET])
@@ -180,10 +179,8 @@ static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data)
 	*w0 = mnl_attr_get_u64(attrs[TIPC_NLA_NET_NODEID]);
 	*w1 = mnl_attr_get_u64(attrs[TIPC_NLA_NET_NODEID_W1]);
 	nodeid2str(id, str);
-	printf("Node Identity                     Hash\n");
-	printf("%s", str);
-	for (i = strlen(str); i <= 33; i++)
-		printf(" ");
+	printf("Node Identity                    Hash\n");
+	printf("%-33s", str);
 	cmd_node_get_addr(NULL, NULL, NULL, NULL);
 	return MNL_CB_OK;
 }
diff --git a/tipc/peer.c b/tipc/peer.c
index de0c73c..f638077 100644
--- a/tipc/peer.c
+++ b/tipc/peer.c
@@ -39,8 +39,12 @@ static int cmd_peer_rm_addr(struct nlmsghdr *nlh, const struct cmd *cmd,
 	}
 
 	str = shift_cmdl(cmdl);
+
+	/* First try legacy Z.C.N format, then integer format */
 	addr = str2addr(str);
 	if (!addr)
+		addr = atoi(str);
+	if (!addr)
 		return -1;
 
 	if (!(nlh = msg_init(buf, TIPC_NL_PEER_REMOVE))) {
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes
From: Jason Gunthorpe @ 2018-05-15 13:53 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Leon Romanovsky', dsahern, stephen, netdev, linux-rdma
In-Reply-To: <00fe01d3ec4f$44a6f400$cdf4dc00$@opengridcomputing.com>

On Tue, May 15, 2018 at 08:18:51AM -0500, Steve Wise wrote:
>  
> > > On Mon, May 14, 2018 at 05:04:26PM -0500, Steve Wise wrote:
> > > >
> > > >
> > > > On 5/14/2018 3:41 PM, Jason Gunthorpe wrote:
> > > > > On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> > > > >> This enhancement allows printing rdma device-specific state, if
> > provided
> > > > >> by the kernel.  This is done in a generic manner, so rdma tool
> doesn't
> > > > >> need to know about the details of every type of rdma device.
> > > > >>
> > > > >> Driver attributes for a rdma resource are in the form of <key,
> > > > >> [print_type], value> tuples, where the key is a string and the
> value can
> > > > >> be any supported driver attribute.  The print_type attribute, if
> present,
> > > > >> provides a print format to use vs the standard print format for the
> > type.
> > > > >> For example, the default print type for a PROVIDER_S32 value is "%d
> ",
> > > > >> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe
> > tuple.
> > > > >>
> > > > >> Driver resources are only printed when the -dd flag is present.
> > > > >> If -p is present, then the output is formatted to not exceed 80
> > columns,
> > > > >> otherwise it is printed as a single row to be grep/awk friendly.
> > > > >>
> > > > >> Example output:
> > > > >>
> > > > >> # rdma resource show qp lqpn 1028 -dd -p
> > > > >> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0
> > > path-mig-state MIGRATED pid 0 comm [nvme_rdma]
> > > > >>     sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106
> > > flush_cidx 85 in_use 0
> > > > >>     size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41
> wq_pidx
> > > 171 msn 44 rqt_hwaddr 0x2a8a5d00
> > > > >>     rqt_size 256 in_use 128 size 130 idx 43 wr_id
> 0xffff881057c03408 idx
> > > 40 wr_id 0xffff881057c033f0
> > > > > Hey some of these look like kernel pointers.. That is a no-no.. What
> > > > > is up there?
> > > >
> > > > Nothing is defined as a kernel pointer.  But wr_id is often a pointer
> to
> > > > the kernel rdma application's context...
> > > >
> > > > > The wr_id often contains a pointer, right? So we cannot just pass it
> > > > > to user space..
> > > >
> > > > Hmm.  It is useful for debugging kernel rdma applications.  Perhaps
> > > > these attrs can be only be sent up by the kernel if the capabilities
> > > > allow.  But previous review comments of the kernel series, which is
> now
> > > > merged, forced me to remove passing the capabilities information to
> the
> > > > driver resource fill functions.
> > > >
> > > > So what's the right way to do this?
> > >
> > > The reviewer asked do not pass to drivers whole CAP_.. bits, because
> > > they anyway don't need such granularity.
> > >
> > 
> > Ok thanks.
> 
> How's this?
> 
> diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
> index 6379685..2cf9c5c 100644
> +++ b/include/rdma/restrack.h
> @@ -66,7 +66,8 @@ struct rdma_restrack_root {
>          * Allows rdma drivers to add their own restrack attributes.
>          */
>         int (*fill_res_entry)(struct sk_buff *msg,
> -                             struct rdma_restrack_entry *entry);
> +                             struct rdma_restrack_entry *entry,
> +                             bool net_admin_capable);
>  };

cap net admin is not high enough privledge to see unhashed kernel
pointers. CAP_RAW_IO? Or follow what printk does?

Honestly, I don't really know, this hashed pointer stuff is new..

Jason

^ permalink raw reply

* RE: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes
From: Steve Wise @ 2018-05-15 13:04 UTC (permalink / raw)
  To: 'Leon Romanovsky'
  Cc: 'Jason Gunthorpe', dsahern, stephen, netdev, linux-rdma
In-Reply-To: <20180515085411.GT10381@mtr-leonro.mtl.com>

> On Mon, May 14, 2018 at 05:04:26PM -0500, Steve Wise wrote:
> >
> >
> > On 5/14/2018 3:41 PM, Jason Gunthorpe wrote:
> > > On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> > >> This enhancement allows printing rdma device-specific state, if
provided
> > >> by the kernel.  This is done in a generic manner, so rdma tool
doesn't
> > >> need to know about the details of every type of rdma device.
> > >>
> > >> Driver attributes for a rdma resource are in the form of <key,
> > >> [print_type], value> tuples, where the key is a string and the value
can
> > >> be any supported driver attribute.  The print_type attribute, if
present,
> > >> provides a print format to use vs the standard print format for the
type.
> > >> For example, the default print type for a PROVIDER_S32 value is "%d
",
> > >> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe
tuple.
> > >>
> > >> Driver resources are only printed when the -dd flag is present.
> > >> If -p is present, then the output is formatted to not exceed 80
columns,
> > >> otherwise it is printed as a single row to be grep/awk friendly.
> > >>
> > >> Example output:
> > >>
> > >> # rdma resource show qp lqpn 1028 -dd -p
> > >> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0
> path-mig-state MIGRATED pid 0 comm [nvme_rdma]
> > >>     sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106
> flush_cidx 85 in_use 0
> > >>     size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41
wq_pidx
> 171 msn 44 rqt_hwaddr 0x2a8a5d00
> > >>     rqt_size 256 in_use 128 size 130 idx 43 wr_id 0xffff881057c03408
idx
> 40 wr_id 0xffff881057c033f0
> > > Hey some of these look like kernel pointers.. That is a no-no.. What
> > > is up there?
> >
> > Nothing is defined as a kernel pointer.  But wr_id is often a pointer to
> > the kernel rdma application's context...
> >
> > > The wr_id often contains a pointer, right? So we cannot just pass it
> > > to user space..
> >
> > Hmm.  It is useful for debugging kernel rdma applications.  Perhaps
> > these attrs can be only be sent up by the kernel if the capabilities
> > allow.  But previous review comments of the kernel series, which is now
> > merged, forced me to remove passing the capabilities information to the
> > driver resource fill functions.
> >
> > So what's the right way to do this?
> 
> The reviewer asked do not pass to drivers whole CAP_.. bits, because
> they anyway don't need such granularity.
> 

Ok thanks.

^ permalink raw reply

* RE: [PATCH v1 iproute2-next 2/3] rdma: print driver resource attributes
From: Steve Wise @ 2018-05-15 13:18 UTC (permalink / raw)
  To: 'Leon Romanovsky'
  Cc: 'Jason Gunthorpe', dsahern, stephen, netdev, linux-rdma
In-Reply-To: <20180515085411.GT10381@mtr-leonro.mtl.com>

 
> > On Mon, May 14, 2018 at 05:04:26PM -0500, Steve Wise wrote:
> > >
> > >
> > > On 5/14/2018 3:41 PM, Jason Gunthorpe wrote:
> > > > On Mon, May 07, 2018 at 08:53:16AM -0700, Steve Wise wrote:
> > > >> This enhancement allows printing rdma device-specific state, if
> provided
> > > >> by the kernel.  This is done in a generic manner, so rdma tool
doesn't
> > > >> need to know about the details of every type of rdma device.
> > > >>
> > > >> Driver attributes for a rdma resource are in the form of <key,
> > > >> [print_type], value> tuples, where the key is a string and the
value can
> > > >> be any supported driver attribute.  The print_type attribute, if
present,
> > > >> provides a print format to use vs the standard print format for the
> type.
> > > >> For example, the default print type for a PROVIDER_S32 value is "%d
",
> > > >> but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe
> tuple.
> > > >>
> > > >> Driver resources are only printed when the -dd flag is present.
> > > >> If -p is present, then the output is formatted to not exceed 80
> columns,
> > > >> otherwise it is printed as a single row to be grep/awk friendly.
> > > >>
> > > >> Example output:
> > > >>
> > > >> # rdma resource show qp lqpn 1028 -dd -p
> > > >> link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0
> > path-mig-state MIGRATED pid 0 comm [nvme_rdma]
> > > >>     sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106
> > flush_cidx 85 in_use 0
> > > >>     size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41
wq_pidx
> > 171 msn 44 rqt_hwaddr 0x2a8a5d00
> > > >>     rqt_size 256 in_use 128 size 130 idx 43 wr_id
0xffff881057c03408 idx
> > 40 wr_id 0xffff881057c033f0
> > > > Hey some of these look like kernel pointers.. That is a no-no.. What
> > > > is up there?
> > >
> > > Nothing is defined as a kernel pointer.  But wr_id is often a pointer
to
> > > the kernel rdma application's context...
> > >
> > > > The wr_id often contains a pointer, right? So we cannot just pass it
> > > > to user space..
> > >
> > > Hmm.  It is useful for debugging kernel rdma applications.  Perhaps
> > > these attrs can be only be sent up by the kernel if the capabilities
> > > allow.  But previous review comments of the kernel series, which is
now
> > > merged, forced me to remove passing the capabilities information to
the
> > > driver resource fill functions.
> > >
> > > So what's the right way to do this?
> >
> > The reviewer asked do not pass to drivers whole CAP_.. bits, because
> > they anyway don't need such granularity.
> >
> 
> Ok thanks.

How's this?

diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index 6379685..2cf9c5c 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -66,7 +66,8 @@ struct rdma_restrack_root {
         * Allows rdma drivers to add their own restrack attributes.
         */
        int (*fill_res_entry)(struct sk_buff *msg,
-                             struct rdma_restrack_entry *entry);
+                             struct rdma_restrack_entry *entry,
+                             bool net_admin_capable);
 };

^ permalink raw reply related

* Re: KASAN: out-of-bounds Read in ip6_xmit
From: Guillaume Nault @ 2018-05-15 13:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, davem, kuznet, linux-kernel, netdev, syzkaller-bugs,
	yoshfuji, Paolo Abeni
In-Reply-To: <20180509044525.GD711@sol.localdomain>

On Tue, May 08, 2018 at 09:45:25PM -0700, Eric Biggers wrote:
> On Sun, Jan 28, 2018 at 11:24:01AM -0800, syzbot wrote:
> > Hello,
> > 
> > syzbot hit the following crash on net-next commit
> > 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +0000)
> > Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+c8e66da874feb11aaca6@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > ==================================================================
> > BUG: KASAN: out-of-bounds in ip6_dst_idev include/net/ip6_fib.h:192 [inline]
> > BUG: KASAN: out-of-bounds in ip6_xmit+0x1f76/0x2260
> > net/ipv6/ip6_output.c:264
> > Read of size 8 at addr ffff8801bcc8cc18 by task syz-executor2/11459
> > 
> > CPU: 0 PID: 11459 Comm: syz-executor2 Not tainted 4.15.0-rc9+ #212
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> >  print_address_description+0x73/0x250 mm/kasan/report.c:252
> >  kasan_report_error mm/kasan/report.c:351 [inline]
> >  kasan_report+0x25b/0x340 mm/kasan/report.c:409
> >  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
> >  ip6_dst_idev include/net/ip6_fib.h:192 [inline]
> >  ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264
> >  inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
> >  l2tp_xmit_core net/l2tp/l2tp_core.c:1096 [inline]
> >  l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1191
> >  pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341
> >  sock_sendmsg_nosec net/socket.c:630 [inline]
> >  sock_sendmsg+0xca/0x110 net/socket.c:640
> >  ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
> >  __sys_sendmsg+0xe5/0x210 net/socket.c:2080
> >  SYSC_sendmsg net/socket.c:2091 [inline]
> >  SyS_sendmsg+0x2d/0x50 net/socket.c:2087
> >  entry_SYSCALL_64_fastpath+0x29/0xa0
> 
> No reproducer and last occurred 58 days ago (on commit f44b1886a5f876c8).
> Probably was fixed by commit b954f94023dcc61:
> 
Indeed, this commit certainly applies to this report too. Thanks a lot
for looking into this.

^ permalink raw reply

* Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
From: Oliver Neukum @ 2018-05-15 12:43 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev

Hi,

I got reports about hangs with this trace:

May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more than 60 seconds.
May 13 01:36:55 neroon kernel:       Tainted: G     U            4.17.0-rc4-1.g8257a00-vanilla #1
May 13 01:36:55 neroon kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 01:36:55 neroon kernel: kworker/0:0     D    0     4      2 0x80000000
May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
May 13 01:36:55 neroon kernel: Call Trace:
May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
May 13 01:36:55 neroon kernel:  rtl_work_func_t+0x3c/0x27c [r8152]
May 13 01:36:55 neroon kernel:  process_one_work+0x1d4/0x3f0
May 13 01:36:55 neroon kernel:  worker_thread+0x2b/0x3d0
May 13 01:36:55 neroon kernel:  ? process_one_work+0x3f0/0x3f0
May 13 01:36:55 neroon kernel:  kthread+0x113/0x130
May 13 01:36:55 neroon kernel:  ? kthread_create_worker_on_cpu+0x50/0x50
May 13 01:36:55 neroon kernel:  ret_from_fork+0x3a/0x50

Any idea?

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH v4 1/1] drivers core: multi-threading device shutdown
From: Pavel Tatashin @ 2018-05-15 12:21 UTC (permalink / raw)
  To: gregkh
  Cc: Steven Sistare, Daniel Jordan, LKML, jeffrey.t.kirsher,
	intel-wired-lan, netdev, Alexander Duyck, tobin
In-Reply-To: <20180515073744.GA28338@kroah.com>

Hi Greg,

> Can you refactor this to be at least 2 patches?  One that moves code
> around to comon functions to make the second patch, that adds the new
> functionality, easier to review and understand?

Yes, I will split the patch into a two-three patches.


> And I echo the "don't use kerneldoc for static functions" review
> comment, that's not needed at all.

It was my mistake, I did not realize they were kerneldoc, I simply tried to
follow the code style of this file :) I will modify comments not to be
kerneldoc.

Thank you,
Pavel

^ permalink raw reply

* Re: [PATCH v1 1/4] media: rc: introduce BPF_PROG_IR_DECODER
From: Sean Young @ 2018-05-15 12:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller
In-Reply-To: <089862eb-8d37-6479-3c2a-ba6199ae7d3c@infradead.org>

On Mon, May 14, 2018 at 04:27:19PM -0700, Randy Dunlap wrote:
> On 05/14/2018 02:10 PM, Sean Young wrote:
> > Add support for BPF_PROG_IR_DECODER. This type of BPF program can call
> 
> Kconfig file below uses IR_BPF_DECODER instead of the symbol name above.
> 
> and then patch 3 says a third choice:
> The context provided to a BPF_PROG_RAWIR_DECODER is a struct ir_raw_event;


Yes, you're right. I guess the source/trigger is raw IR events; decoding
is something you're likely to do, but not necessarily. So:

bpf type: BPF_PROG_TYPE_RAWIR_EVENT, has context struct bpf_rawir_event. 

Then we can call the Kconfig option CONFIG_BPF_RAW_IR_EVENT


Sean
> 
> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> > that the last key should be repeated.
> > 
> > Signed-off-by: Sean Young <sean@mess.org>
> > ---
> >  drivers/media/rc/Kconfig          |  8 +++
> >  drivers/media/rc/Makefile         |  1 +
> >  drivers/media/rc/ir-bpf-decoder.c | 93 +++++++++++++++++++++++++++++++
> >  include/linux/bpf_types.h         |  3 +
> >  include/uapi/linux/bpf.h          | 16 +++++-
> >  5 files changed, 120 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/media/rc/ir-bpf-decoder.c
> > 
> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> > index eb2c3b6eca7f..10ad6167d87c 100644
> > --- a/drivers/media/rc/Kconfig
> > +++ b/drivers/media/rc/Kconfig
> > @@ -120,6 +120,14 @@ config IR_IMON_DECODER
> >  	   remote control and you would like to use it with a raw IR
> >  	   receiver, or if you wish to use an encoder to transmit this IR.
> >  
> > +config IR_BPF_DECODER
> > +	bool "Enable IR raw decoder using BPF"
> > +	depends on BPF_SYSCALL
> > +	depends on RC_CORE=y
> > +	help
> > +	   Enable this option to make it possible to load custom IR
> > +	   decoders written in BPF.
> > +
> >  endif #RC_DECODERS
> >  
> >  menuconfig RC_DEVICES
> > diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> > index 2e1c87066f6c..12e1118430d0 100644
> > --- a/drivers/media/rc/Makefile
> > +++ b/drivers/media/rc/Makefile
> > @@ -5,6 +5,7 @@ obj-y += keymaps/
> >  obj-$(CONFIG_RC_CORE) += rc-core.o
> >  rc-core-y := rc-main.o rc-ir-raw.o
> >  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> > +rc-core-$(CONFIG_IR_BPF_DECODER) += ir-bpf-decoder.o
> 
> 
> -- 
> ~Randy

^ permalink raw reply

* [bpf-next V3 PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: Jesper Dangaard Brouer @ 2018-05-15 12:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson,
	makita.toshiaki
In-Reply-To: <152638638695.9477.13781600009169577949.stgit@firesoul>

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 +++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 ++++++--
 drivers/net/tun.c                             |   37 +++++++++-----
 drivers/net/virtio_net.c                      |   66 +++++++++++++++++++------
 include/linux/netdevice.h                     |   14 +++--
 include/net/page_pool.h                       |    5 +-
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   10 ++--
 kernel/bpf/devmap.c                           |   33 ++++++++-----
 net/core/filter.c                             |    4 +-
 net/core/xdp.c                                |   20 ++++++--
 samples/bpf/xdp_monitor_kern.c                |   10 ++++
 samples/bpf/xdp_monitor_user.c                |   35 +++++++++++--
 14 files changed, 206 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5efa68de935b..9b698c5acd05 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3664,14 +3664,19 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
  * @dev: netdev
  * @xdp: XDP buffer
  *
- * Returns Zero if sent, else an error code
+ * Returns number of frames successfully sent. Frames that fail are
+ * free'ed via XDP return API.
+ *
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
  **/
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	unsigned int queue_index = smp_processor_id();
 	struct i40e_vsi *vsi = np->vsi;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (test_bit(__I40E_VSI_DOWN, vsi->state))
 		return -ENETDOWN;
@@ -3679,11 +3684,18 @@ int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
 		return -ENXIO;
 
-	err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
-	if (err != I40E_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
+		if (err != I40E_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index fdd2c55f03a6..eb8804b3d7b6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -487,7 +487,7 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool in_sw);
 void i40e_detect_recover_hung(struct i40e_vsi *vsi);
 int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
 bool __i40e_chk_linearize(struct sk_buff *skb);
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf);
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames);
 void i40e_xdp_flush(struct net_device *dev);
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6652b201df5b..9645619f7729 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10017,11 +10017,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int ixgbe_xdp_xmit(struct net_device *dev, int n,
+			  struct xdp_frame **frames)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_ring *ring;
-	int err;
+	int drops = 0;
+	int i;
 
 	if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
 		return -ENETDOWN;
@@ -10033,11 +10035,18 @@ static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (unlikely(!ring))
 		return -ENXIO;
 
-	err = ixgbe_xmit_xdp_ring(adapter, xdpf);
-	if (err != IXGBE_XDP_TX)
-		return -ENOSPC;
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		int err;
 
-	return 0;
+		err = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		if (err != IXGBE_XDP_TX) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+
+	return n - drops;
 }
 
 static void ixgbe_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d3c04ab9752a..4fe0c75c5e0b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,6 +70,7 @@
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
+#include <net/xdp.h>
 #include <linux/seq_file.h>
 #include <linux/uio.h>
 #include <linux/skb_array.h>
@@ -1290,34 +1291,44 @@ static const struct net_device_ops tun_netdev_ops = {
 	.ndo_get_stats64	= tun_net_get_stats64,
 };
 
-static int tun_xdp_xmit(struct net_device *dev, struct xdp_frame *frame)
+static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	u32 numqueues;
-	int ret = 0;
+	int drops = 0;
+	int cnt = n;
+	int i;
 
 	rcu_read_lock();
 
 	numqueues = READ_ONCE(tun->numqueues);
 	if (!numqueues) {
-		ret = -ENOSPC;
-		goto out;
+		rcu_read_unlock();
+		return -ENXIO; /* Caller will free/return all frames */
 	}
 
 	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
 					    numqueues]);
-	/* Encode the XDP flag into lowest bit for consumer to differ
-	 * XDP buffer from sk_buff.
-	 */
-	if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(frame))) {
-		this_cpu_inc(tun->pcpu_stats->tx_dropped);
-		ret = -ENOSPC;
+
+	spin_lock(&tfile->tx_ring.producer_lock);
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdp = frames[i];
+		/* Encode the XDP flag into lowest bit for consumer to differ
+		 * XDP buffer from sk_buff.
+		 */
+		void *frame = tun_xdp_to_ptr(xdp);
+
+		if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
+			this_cpu_inc(tun->pcpu_stats->tx_dropped);
+			xdp_return_frame_rx_napi(xdp);
+			drops++;
+		}
 	}
+	spin_unlock(&tfile->tx_ring.producer_lock);
 
-out:
 	rcu_read_unlock();
-	return ret;
+	return cnt - drops;
 }
 
 static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
@@ -1327,7 +1338,7 @@ static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
 	if (unlikely(!frame))
 		return -EOVERFLOW;
 
-	return tun_xdp_xmit(dev, frame);
+	return tun_xdp_xmit(dev, 1, &frame);
 }
 
 static void tun_xdp_flush(struct net_device *dev)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f34794a76c4d..39a0783d1cde 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -419,23 +419,13 @@ static void virtnet_xdp_flush(struct net_device *dev)
 	virtqueue_kick(sq->vq);
 }
 
-static int __virtnet_xdp_xmit(struct virtnet_info *vi,
-			       struct xdp_frame *xdpf)
+static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
+				   struct send_queue *sq,
+				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	struct xdp_frame *xdpf_sent;
-	struct send_queue *sq;
-	unsigned int len;
-	unsigned int qp;
 	int err;
 
-	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
-	sq = &vi->sq[qp];
-
-	/* Free up any pending old buffers before queueing new ones. */
-	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
-		xdp_return_frame(xdpf_sent);
-
 	/* virtqueue want to use data area in-front of packet */
 	if (unlikely(xdpf->metasize > 0))
 		return -EOPNOTSUPP;
@@ -459,11 +449,40 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi,
 	return 0;
 }
 
-static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
+static int __virtnet_xdp_tx_xmit(struct virtnet_info *vi,
+				   struct xdp_frame *xdpf)
+{
+	struct xdp_frame *xdpf_sent;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	return __virtnet_xdp_xmit_one(vi, sq, xdpf);
+}
+
+static int virtnet_xdp_xmit(struct net_device *dev,
+			    int n, struct xdp_frame **frames)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct receive_queue *rq = vi->rq;
+	struct xdp_frame *xdpf_sent;
 	struct bpf_prog *xdp_prog;
+	struct send_queue *sq;
+	unsigned int len;
+	unsigned int qp;
+	int drops = 0;
+	int err;
+	int i;
+
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
 
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
 	 * indicate XDP resources have been successfully allocated.
@@ -472,7 +491,20 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf)
 	if (!xdp_prog)
 		return -ENXIO;
 
-	return __virtnet_xdp_xmit(vi, xdpf);
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+		xdp_return_frame(xdpf_sent);
+
+	for (i = 0; i < n; i++) {
+		struct xdp_frame *xdpf = frames[i];
+
+		err = __virtnet_xdp_xmit_one(vi, sq, xdpf);
+		if (err) {
+			xdp_return_frame_rx_napi(xdpf);
+			drops++;
+		}
+	}
+	return n - drops;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -616,7 +648,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
@@ -779,7 +811,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			xdpf = convert_to_xdp_frame(&xdp);
 			if (unlikely(!xdpf))
 				goto err_xdp;
-			err = __virtnet_xdp_xmit(vi, xdpf);
+			err = __virtnet_xdp_tx_xmit(vi, xdpf);
 			if (unlikely(err)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				if (unlikely(xdp_page != page))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 03ed492c4e14..debdb6286170 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1185,9 +1185,13 @@ struct dev_ifalias {
  *	This function is used to set or query state related to XDP on the
  *	netdevice and manage BPF offload. See definition of
  *	enum bpf_netdev_command for details.
- * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp);
- *	This function is used to submit a XDP packet for transmit on a
- *	netdevice.
+ * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
+ *	This function is used to submit @n XDP packets for transmit on a
+ *	netdevice. Returns number of frames successfully transmitted, frames
+ *	that got dropped are freed/returned via xdp_return_frame().
+ *	Returns negative number, means general error invoking ndo, meaning
+ *	no frames were xmit'ed and core-caller will free all frames.
+ *	TODO: Consider add flag to allow sending flush operation.
  * void (*ndo_xdp_flush)(struct net_device *dev);
  *	This function is used to inform the driver to flush a particular
  *	xdp tx queue. Must be called on same CPU as xdp_xmit.
@@ -1375,8 +1379,8 @@ struct net_device_ops {
 						       int needed_headroom);
 	int			(*ndo_bpf)(struct net_device *dev,
 					   struct netdev_bpf *bpf);
-	int			(*ndo_xdp_xmit)(struct net_device *dev,
-						struct xdp_frame *xdp);
+	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
+						struct xdp_frame **xdp);
 	void			(*ndo_xdp_flush)(struct net_device *dev);
 };
 
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c79087153148..694d055e01ef 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -115,13 +115,14 @@ void page_pool_destroy(struct page_pool *pool);
 void __page_pool_put_page(struct page_pool *pool,
 			  struct page *page, bool allow_direct);
 
-static inline void page_pool_put_page(struct page_pool *pool, struct page *page)
+static inline void page_pool_put_page(struct page_pool *pool,
+				      struct page *page, bool allow_direct)
 {
 	/* When page_pool isn't compiled-in, net/core/xdp.c doesn't
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	__page_pool_put_page(pool, page, false);
+	__page_pool_put_page(pool, page, allow_direct);
 #endif
 }
 /* Very limited use-cases allow recycle direct */
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0b689cf561c7..7ad779237ae8 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 }
 
 void xdp_return_frame(struct xdp_frame *xdpf);
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 2e9ef0650144..1ecf4c67fcf7 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -234,9 +234,9 @@ TRACE_EVENT(xdp_devmap_xmit,
 	TP_PROTO(const struct bpf_map *map, u32 map_index,
 		 int sent, int drops,
 		 const struct net_device *from_dev,
-		 const struct net_device *to_dev),
+		 const struct net_device *to_dev, int err),
 
-	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
 
 	TP_STRUCT__entry(
 		__field(int, map_id)
@@ -246,6 +246,7 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__field(int, sent)
 		__field(int, from_ifindex)
 		__field(int, to_ifindex)
+		__field(int, err)
 	),
 
 	TP_fast_assign(
@@ -256,16 +257,17 @@ TRACE_EVENT(xdp_devmap_xmit,
 		__entry->sent		= sent;
 		__entry->from_ifindex	= from_dev->ifindex;
 		__entry->to_ifindex	= to_dev->ifindex;
+		__entry->err		= err;
 	),
 
 	TP_printk("ndo_xdp_xmit"
 		  " map_id=%d map_index=%d action=%s"
 		  " sent=%d drops=%d"
-		  " from_ifindex=%d to_ifindex=%d",
+		  " from_ifindex=%d to_ifindex=%d err=%d",
 		  __entry->map_id, __entry->map_index,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
 		  __entry->sent, __entry->drops,
-		  __entry->from_ifindex, __entry->to_ifindex)
+		  __entry->from_ifindex, __entry->to_ifindex, __entry->err)
 );
 
 #endif /* _TRACE_XDP_H */
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 6f84100723b0..4dd8f0e3a8d9 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -222,7 +222,7 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
 	struct net_device *dev = obj->dev;
-	int sent = 0, drops = 0;
+	int sent = 0, drops = 0, err = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -234,23 +234,32 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 		prefetch(xdpf);
 	}
 
-	for (i = 0; i < bq->count; i++) {
-		struct xdp_frame *xdpf = bq->q[i];
-		int err;
-
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-		if (err) {
-			drops++;
-			xdp_return_frame(xdpf);
-		}
-		sent++;
+	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q);
+	if (sent < 0) {
+		err = sent;
+		sent = 0;
+		goto error;
 	}
+	drops = bq->count - sent;
+out:
 	bq->count = 0;
 
 	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
-			      sent, drops, bq->dev_rx, dev);
+			      sent, drops, bq->dev_rx, dev, err);
 	bq->dev_rx = NULL;
 	return 0;
+error:
+	/* If ndo_xdp_xmit fails with an errno, no frames have been
+	 * xmit'ed and it's our responsibility to them free all.
+	 */
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		/* RX path under NAPI protection, can return frames faster */
+		xdp_return_frame_rx_napi(xdpf);
+		drops++;
+	}
+	goto out;
 }
 
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
diff --git a/net/core/filter.c b/net/core/filter.c
index 68277358cca9..1f663404e36a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3008,8 +3008,8 @@ static int __bpf_tx_xdp(struct net_device *dev,
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
-	if (err)
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf);
+	if (err <= 0)
 		return err;
 	dev->netdev_ops->ndo_xdp_flush(dev);
 	return 0;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index bf6758f74339..cb8c4e061a5a 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -308,7 +308,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 }
 EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
 
-static void xdp_return(void *data, struct xdp_mem_info *mem)
+/* XDP RX runs under NAPI protection, and in different delivery error
+ * scenarios (e.g. queue full), it is possible to return the xdp_frame
+ * while still leveraging this protection.  The @napi_direct boolian
+ * is used for those calls sites.  Thus, allowing for faster recycling
+ * of xdp_frames/pages in those cases.
+ */
+static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -320,7 +326,7 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 		page = virt_to_head_page(data);
 		if (xa)
-			page_pool_put_page(xa->page_pool, page);
+			page_pool_put_page(xa->page_pool, page, napi_direct);
 		else
 			put_page(page);
 		rcu_read_unlock();
@@ -340,12 +346,18 @@ static void xdp_return(void *data, struct xdp_mem_info *mem)
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-	xdp_return(xdpf->data, &xdpf->mem);
+	__xdp_return(xdpf->data, &xdpf->mem, false);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
+{
+	__xdp_return(xdpf->data, &xdpf->mem, true);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
+
 void xdp_return_buff(struct xdp_buff *xdp)
 {
-	xdp_return(xdp->data, &xdp->rxq->mem);
+	__xdp_return(xdp->data, &xdp->rxq->mem, true);
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 2854aa0665ea..ad10fe700d7d 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -125,6 +125,7 @@ struct datarec {
 	u64 processed;
 	u64 dropped;
 	u64 info;
+	u64 err;
 };
 #define MAX_CPUS 64
 
@@ -228,6 +229,7 @@ struct devmap_xmit_ctx {
 	int sent;		//	offset:24; size:4; signed:1;
 	int from_ifindex;	//	offset:28; size:4; signed:1;
 	int to_ifindex;		//	offset:32; size:4; signed:1;
+	int err;		//	offset:36; size:4; signed:1;
 };
 
 SEC("tracepoint/xdp/xdp_devmap_xmit")
@@ -245,5 +247,13 @@ int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
 	/* Record bulk events, then userspace can calc average bulk size */
 	rec->info += 1;
 
+	/* Record error cases, where no frame were sent */
+	if (ctx->err)
+		rec->err++;
+
+	/* Catch API error of drv ndo_xdp_xmit sent more than count */
+	if (ctx->drops < 0)
+		rec->err++;
+
 	return 1;
 }
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 7e18a454924c..dd558cbb2309 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -117,6 +117,7 @@ struct datarec {
 	__u64 processed;
 	__u64 dropped;
 	__u64 info;
+	__u64 err;
 };
 #define MAX_CPUS 64
 
@@ -152,6 +153,7 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 	__u64 sum_processed = 0;
 	__u64 sum_dropped = 0;
 	__u64 sum_info = 0;
+	__u64 sum_err = 0;
 	int i;
 
 	if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
@@ -170,10 +172,13 @@ static bool map_collect_record(int fd, __u32 key, struct record *rec)
 		sum_dropped        += values[i].dropped;
 		rec->cpu[i].info = values[i].info;
 		sum_info        += values[i].info;
+		rec->cpu[i].err = values[i].err;
+		sum_err        += values[i].err;
 	}
 	rec->total.processed = sum_processed;
 	rec->total.dropped   = sum_dropped;
 	rec->total.info      = sum_info;
+	rec->total.err       = sum_err;
 	return true;
 }
 
@@ -274,6 +279,18 @@ static double calc_info(struct datarec *r, struct datarec *p, double period)
 	return pps;
 }
 
+static double calc_err(struct datarec *r, struct datarec *p, double period)
+{
+	__u64 packets = 0;
+	double pps = 0;
+
+	if (period > 0) {
+		packets = r->err - p->err;
+		pps = packets / period;
+	}
+	return pps;
+}
+
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
 			bool err_only)
@@ -412,11 +429,12 @@ static void stats_print(struct stats_record *stats_rec,
 
 	/* devmap ndo_xdp_xmit stats */
 	{
-		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
-		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s %s\n";
 		struct record *rec, *prev;
-		double drop, info;
+		double drop, info, err;
 		char *i_str = "";
+		char *err_str = "";
 
 		rec  =  &stats_rec->xdp_devmap_xmit;
 		prev = &stats_prev->xdp_devmap_xmit;
@@ -428,22 +446,29 @@ static void stats_print(struct stats_record *stats_rec,
 			pps  = calc_pps(r, p, t);
 			drop = calc_drop(r, p, t);
 			info = calc_info(r, p, t);
+			err  = calc_err(r, p, t);
 			if (info > 0) {
 				i_str = "bulk-average";
 				info = (pps+drop) / info; /* calc avg bulk */
 			}
+			if (err > 0)
+				err_str = "drv-err";
 			if (pps > 0 || drop > 0)
 				printf(fmt1, "devmap-xmit",
-				       i, pps, drop, info, i_str);
+				       i, pps, drop, info, i_str, err_str);
 		}
 		pps = calc_pps(&rec->total, &prev->total, t);
 		drop = calc_drop(&rec->total, &prev->total, t);
 		info = calc_info(&rec->total, &prev->total, t);
+		err  = calc_err(&rec->total, &prev->total, t);
 		if (info > 0) {
 			i_str = "bulk-average";
 			info = (pps+drop) / info; /* calc avg bulk */
 		}
-		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+		if (err > 0)
+			err_str = "drv-err";
+		printf(fmt2, "devmap-xmit", "total", pps, drop,
+		       info, i_str, err_str);
 	}
 
 	printf("\n");

^ permalink raw reply related

* [bpf-next V3 PATCH 3/4] xdp: add tracepoint for devmap like cpumap have
From: Jesper Dangaard Brouer @ 2018-05-15 12:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson,
	makita.toshiaki
In-Reply-To: <152638638695.9477.13781600009169577949.stgit@firesoul>

Notice how this allow us get XDP statistic without affecting the XDP
performance, as tracepoint is no-longer activated on a per packet basis.

The xdp_monitor sample/tool is updated to use this new tracepoint.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h            |    6 ++++-
 include/trace/events/xdp.h     |   39 +++++++++++++++++++++++++++++++++++
 kernel/bpf/devmap.c            |   25 ++++++++++++++++++-----
 net/core/filter.c              |    2 +-
 samples/bpf/xdp_monitor_kern.c |   39 +++++++++++++++++++++++++++++++++++
 samples/bpf/xdp_monitor_user.c |   44 +++++++++++++++++++++++++++++++++++++++-
 6 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8527964da402..3dda20a29cdb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -489,7 +489,8 @@ struct xdp_buff;
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
@@ -575,7 +576,8 @@ static inline void __dev_map_flush(struct bpf_map *map)
 struct xdp_buff;
 struct bpf_dtab_netdev;
 static inline
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	return 0;
 }
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 96104610d40e..2e9ef0650144 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -229,6 +229,45 @@ TRACE_EVENT(xdp_cpumap_enqueue,
 		  __entry->to_cpu)
 );
 
+TRACE_EVENT(xdp_devmap_xmit,
+
+	TP_PROTO(const struct bpf_map *map, u32 map_index,
+		 int sent, int drops,
+		 const struct net_device *from_dev,
+		 const struct net_device *to_dev),
+
+	TP_ARGS(map, map_index, sent, drops, from_dev, to_dev),
+
+	TP_STRUCT__entry(
+		__field(int, map_id)
+		__field(u32, act)
+		__field(u32, map_index)
+		__field(int, drops)
+		__field(int, sent)
+		__field(int, from_ifindex)
+		__field(int, to_ifindex)
+	),
+
+	TP_fast_assign(
+		__entry->map_id		= map->id;
+		__entry->act		= XDP_REDIRECT;
+		__entry->map_index	= map_index;
+		__entry->drops		= drops;
+		__entry->sent		= sent;
+		__entry->from_ifindex	= from_dev->ifindex;
+		__entry->to_ifindex	= to_dev->ifindex;
+	),
+
+	TP_printk("ndo_xdp_xmit"
+		  " map_id=%d map_index=%d action=%s"
+		  " sent=%d drops=%d"
+		  " from_ifindex=%d to_ifindex=%d",
+		  __entry->map_id, __entry->map_index,
+		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->sent, __entry->drops,
+		  __entry->from_ifindex, __entry->to_ifindex)
+);
+
 #endif /* _TRACE_XDP_H */
 
 #include <trace/define_trace.h>
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index cab72c100bb5..6f84100723b0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,7 @@
 #include <linux/bpf.h>
 #include <net/xdp.h>
 #include <linux/filter.h>
+#include <trace/events/xdp.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -57,6 +58,7 @@
 #define DEV_MAP_BULK_SIZE 16
 struct xdp_bulk_queue {
 	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	struct net_device *dev_rx;
 	unsigned int count;
 };
 
@@ -219,8 +221,8 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			 struct xdp_bulk_queue *bq)
 {
-	unsigned int processed = 0, drops = 0;
 	struct net_device *dev = obj->dev;
+	int sent = 0, drops = 0;
 	int i;
 
 	if (unlikely(!bq->count))
@@ -241,10 +243,13 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj,
 			drops++;
 			xdp_return_frame(xdpf);
 		}
-		processed++;
+		sent++;
 	}
 	bq->count = 0;
 
+	trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
+			      sent, drops, bq->dev_rx, dev);
+	bq->dev_rx = NULL;
 	return 0;
 }
 
@@ -301,18 +306,28 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 /* Runs under RCU-read-side, plus in softirq under NAPI protection.
  * Thus, safe percpu variable access.
  */
-static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+		      struct net_device *dev_rx)
+
 {
 	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
 		bq_xmit_all(obj, bq);
 
+	/* Ingress dev_rx will be the same for all xdp_frame's in
+	 * bulk_queue, because bq stored per-CPU and must be flushed
+	 * from net_device drivers NAPI func end.
+	 */
+	if (!bq->dev_rx)
+		bq->dev_rx = dev_rx;
+
 	bq->q[bq->count++] = xdpf;
 	return 0;
 }
 
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
 {
 	struct net_device *dev = dst->dev;
 	struct xdp_frame *xdpf;
@@ -325,7 +340,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	err = bq_enqueue(dst, xdpf);
+	err = bq_enqueue(dst, xdpf, dev_rx);
 	if (err)
 		return err;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 284b9ec3d50f..68277358cca9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3026,7 +3026,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 	case BPF_MAP_TYPE_DEVMAP: {
 		struct bpf_dtab_netdev *dst = fwd;
 
-		err = dev_map_enqueue(dst, xdp);
+		err = dev_map_enqueue(dst, xdp, dev_rx);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 211db8ded0de..2854aa0665ea 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -208,3 +208,42 @@ int trace_xdp_cpumap_kthread(struct cpumap_kthread_ctx *ctx)
 
 	return 0;
 }
+
+struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_devmap_xmit/format
+ * Code in:         kernel/include/trace/events/xdp.h
+ */
+struct devmap_xmit_ctx {
+	u64 __pad;		// First 8 bytes are not accessible by bpf code
+	int map_id;		//	offset:8;  size:4; signed:1;
+	u32 act;		//	offset:12; size:4; signed:0;
+	u32 map_index;		//	offset:16; size:4; signed:0;
+	int drops;		//	offset:20; size:4; signed:1;
+	int sent;		//	offset:24; size:4; signed:1;
+	int from_ifindex;	//	offset:28; size:4; signed:1;
+	int to_ifindex;		//	offset:32; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_devmap_xmit")
+int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
+{
+	struct datarec *rec;
+	u32 key = 0;
+
+	rec = bpf_map_lookup_elem(&devmap_xmit_cnt, &key);
+	if (!rec)
+		return 0;
+	rec->processed += ctx->sent;
+	rec->dropped   += ctx->drops;
+
+	/* Record bulk events, then userspace can calc average bulk size */
+	rec->info += 1;
+
+	return 1;
+}
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index bf09b5188acd..7e18a454924c 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -141,6 +141,7 @@ struct stats_record {
 	struct record_u64 xdp_exception[XDP_ACTION_MAX];
 	struct record xdp_cpumap_kthread;
 	struct record xdp_cpumap_enqueue[MAX_CPUS];
+	struct record xdp_devmap_xmit;
 };
 
 static bool map_collect_record(int fd, __u32 key, struct record *rec)
@@ -397,7 +398,7 @@ static void stats_print(struct stats_record *stats_rec,
 			info = calc_info(r, p, t);
 			if (info > 0)
 				i_str = "sched";
-			if (pps > 0)
+			if (pps > 0 || drop > 0)
 				printf(fmt1, "cpumap-kthread",
 				       i, pps, drop, info, i_str);
 		}
@@ -409,6 +410,42 @@ static void stats_print(struct stats_record *stats_rec,
 		printf(fmt2, "cpumap-kthread", "total", pps, drop, info, i_str);
 	}
 
+	/* devmap ndo_xdp_xmit stats */
+	{
+		char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s\n";
+		char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s\n";
+		struct record *rec, *prev;
+		double drop, info;
+		char *i_str = "";
+
+		rec  =  &stats_rec->xdp_devmap_xmit;
+		prev = &stats_prev->xdp_devmap_xmit;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop(r, p, t);
+			info = calc_info(r, p, t);
+			if (info > 0) {
+				i_str = "bulk-average";
+				info = (pps+drop) / info; /* calc avg bulk */
+			}
+			if (pps > 0 || drop > 0)
+				printf(fmt1, "devmap-xmit",
+				       i, pps, drop, info, i_str);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop(&rec->total, &prev->total, t);
+		info = calc_info(&rec->total, &prev->total, t);
+		if (info > 0) {
+			i_str = "bulk-average";
+			info = (pps+drop) / info; /* calc avg bulk */
+		}
+		printf(fmt2, "devmap-xmit", "total", pps, drop, info, i_str);
+	}
+
 	printf("\n");
 }
 
@@ -437,6 +474,9 @@ static bool stats_collect(struct stats_record *rec)
 	fd = map_data[3].fd; /* map3: cpumap_kthread_cnt */
 	map_collect_record(fd, 0, &rec->xdp_cpumap_kthread);
 
+	fd = map_data[4].fd; /* map4: devmap_xmit_cnt */
+	map_collect_record(fd, 0, &rec->xdp_devmap_xmit);
+
 	return true;
 }
 
@@ -480,6 +520,7 @@ static struct stats_record *alloc_stats_record(void)
 
 	rec_sz = sizeof(struct datarec);
 	rec->xdp_cpumap_kthread.cpu = alloc_rec_per_cpu(rec_sz);
+	rec->xdp_devmap_xmit.cpu    = alloc_rec_per_cpu(rec_sz);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		rec->xdp_cpumap_enqueue[i].cpu = alloc_rec_per_cpu(rec_sz);
@@ -498,6 +539,7 @@ static void free_stats_record(struct stats_record *r)
 		free(r->xdp_exception[i].cpu);
 
 	free(r->xdp_cpumap_kthread.cpu);
+	free(r->xdp_devmap_xmit.cpu);
 
 	for (i = 0; i < MAX_CPUS; i++)
 		free(r->xdp_cpumap_enqueue[i].cpu);

^ permalink raw reply related

* [bpf-next V3 PATCH 2/4] bpf: devmap prepare xdp frames for bulking
From: Jesper Dangaard Brouer @ 2018-05-15 12:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson,
	makita.toshiaki
In-Reply-To: <152638638695.9477.13781600009169577949.stgit@firesoul>

Like cpumap create queue for xdp frames that will be bulked.  For now,
this patch simply invoke ndo_xdp_xmit foreach frame.  This happens,
either when the map flush operation is envoked, or when the limit
DEV_MAP_BULK_SIZE is reached.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/devmap.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 808808bf2bf2..cab72c100bb5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -54,11 +54,18 @@
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+#define DEV_MAP_BULK_SIZE 16
+struct xdp_bulk_queue {
+	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+	unsigned int count;
+};
+
 /* objects in the map */
 struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
+	struct xdp_bulk_queue __percpu *bulkq;
 	struct rcu_head rcu;
 };
 
@@ -209,6 +216,38 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit)
 	__set_bit(bit, bitmap);
 }
 
+static int bq_xmit_all(struct bpf_dtab_netdev *obj,
+			 struct xdp_bulk_queue *bq)
+{
+	unsigned int processed = 0, drops = 0;
+	struct net_device *dev = obj->dev;
+	int i;
+
+	if (unlikely(!bq->count))
+		return 0;
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+
+		prefetch(xdpf);
+	}
+
+	for (i = 0; i < bq->count; i++) {
+		struct xdp_frame *xdpf = bq->q[i];
+		int err;
+
+		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		if (err) {
+			drops++;
+			xdp_return_frame(xdpf);
+		}
+		processed++;
+	}
+	bq->count = 0;
+
+	return 0;
+}
+
 /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
  * from the driver before returning from its napi->poll() routine. The poll()
  * routine is called either from busy_poll context or net_rx_action signaled
@@ -224,6 +263,7 @@ void __dev_map_flush(struct bpf_map *map)
 
 	for_each_set_bit(bit, bitmap, map->max_entries) {
 		struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
+		struct xdp_bulk_queue *bq;
 		struct net_device *netdev;
 
 		/* This is possible if the dev entry is removed by user space
@@ -233,6 +273,9 @@ void __dev_map_flush(struct bpf_map *map)
 			continue;
 
 		__clear_bit(bit, bitmap);
+
+		bq = this_cpu_ptr(dev->bulkq);
+		bq_xmit_all(dev, bq);
 		netdev = dev->dev;
 		if (likely(netdev->netdev_ops->ndo_xdp_flush))
 			netdev->netdev_ops->ndo_xdp_flush(netdev);
@@ -255,6 +298,20 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 	return obj;
 }
 
+/* Runs under RCU-read-side, plus in softirq under NAPI protection.
+ * Thus, safe percpu variable access.
+ */
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
+{
+	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+
+	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
+		bq_xmit_all(obj, bq);
+
+	bq->q[bq->count++] = xdpf;
+	return 0;
+}
+
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 {
 	struct net_device *dev = dst->dev;
@@ -268,8 +325,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	/* TODO: implement a bulking/enqueue step later */
-	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	err = bq_enqueue(dst, xdpf);
 	if (err)
 		return err;
 
@@ -288,13 +344,18 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
 {
 	if (dev->dev->netdev_ops->ndo_xdp_flush) {
 		struct net_device *fl = dev->dev;
+		struct xdp_bulk_queue *bq;
 		unsigned long *bitmap;
+
 		int cpu;
 
 		for_each_online_cpu(cpu) {
 			bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu);
 			__clear_bit(dev->bit, bitmap);
 
+			bq = per_cpu_ptr(dev->bulkq, cpu);
+			bq_xmit_all(dev, bq);
+
 			fl->netdev_ops->ndo_xdp_flush(dev->dev);
 		}
 	}
@@ -306,6 +367,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
 
 	dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
 	dev_map_flush_old(dev);
+	free_percpu(dev->bulkq);
 	dev_put(dev->dev);
 	kfree(dev);
 }
@@ -338,6 +400,7 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	struct net *net = current->nsproxy->net_ns;
+	gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
 	struct bpf_dtab_netdev *dev, *old_dev;
 	u32 i = *(u32 *)key;
 	u32 ifindex = *(u32 *)value;
@@ -352,11 +415,17 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (!ifindex) {
 		dev = NULL;
 	} else {
-		dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
-				   map->numa_node);
+		dev = kmalloc_node(sizeof(*dev), gfp, map->numa_node);
 		if (!dev)
 			return -ENOMEM;
 
+		dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
+						sizeof(void *), gfp);
+		if (!dev->bulkq) {
+			kfree(dev);
+			return -ENOMEM;
+		}
+
 		dev->dev = dev_get_by_index(net, ifindex);
 		if (!dev->dev) {
 			kfree(dev);

^ permalink raw reply related

* [bpf-next V3 PATCH 1/4] bpf: devmap introduce dev_map_enqueue
From: Jesper Dangaard Brouer @ 2018-05-15 12:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson,
	makita.toshiaki
In-Reply-To: <152638638695.9477.13781600009169577949.stgit@firesoul>

Functionality is the same, but the ndo_xdp_xmit call is now
simply invoked from inside the devmap.c code.

V2: Fix compile issue reported by kbuild test robot <lkp@intel.com>

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/bpf.h        |   14 +++++++++++---
 include/trace/events/xdp.h |    9 ++++++++-
 kernel/bpf/devmap.c        |   37 +++++++++++++++++++++++++++++++------
 net/core/filter.c          |   15 ++-------------
 4 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a38e474bf7ee..8527964da402 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -485,14 +485,15 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
 
 /* Map specifics */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
+struct xdp_buff;
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
 void __dev_map_flush(struct bpf_map *map);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
 void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
 void __cpu_map_flush(struct bpf_map *map);
-struct xdp_buff;
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 
@@ -571,6 +572,14 @@ static inline void __dev_map_flush(struct bpf_map *map)
 {
 }
 
+struct xdp_buff;
+struct bpf_dtab_netdev;
+static inline
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	return 0;
+}
+
 static inline
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -585,7 +594,6 @@ static inline void __cpu_map_flush(struct bpf_map *map)
 {
 }
 
-struct xdp_buff;
 static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
 				  struct xdp_buff *xdp,
 				  struct net_device *dev_rx)
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 8989a92c571a..96104610d40e 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -138,11 +138,18 @@ DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err,
 		  __entry->map_id, __entry->map_index)
 );
 
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+	struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
 #define devmap_ifindex(fwd, map)				\
 	(!fwd ? 0 :						\
 	 (!map ? 0 :						\
 	  ((map->map_type == BPF_MAP_TYPE_DEVMAP) ?		\
-	   ((struct net_device *)fwd)->ifindex : 0)))
+	   ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)))
 
 #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx)		\
 	 trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map),	\
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 565f9ece9115..808808bf2bf2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -48,18 +48,21 @@
  * calls will fail at this point.
  */
 #include <linux/bpf.h>
+#include <net/xdp.h>
 #include <linux/filter.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+/* objects in the map */
 struct bpf_dtab_netdev {
-	struct net_device *dev;
+	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct bpf_dtab *dtab;
 	unsigned int bit;
 	struct rcu_head rcu;
 };
 
+/* bpf map container */
 struct bpf_dtab {
 	struct bpf_map map;
 	struct bpf_dtab_netdev **netdev_map;
@@ -240,21 +243,43 @@ void __dev_map_flush(struct bpf_map *map)
  * update happens in parallel here a dev_put wont happen until after reading the
  * ifindex.
  */
-struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_dtab_netdev *dev;
+	struct bpf_dtab_netdev *obj;
 
 	if (key >= map->max_entries)
 		return NULL;
 
-	dev = READ_ONCE(dtab->netdev_map[key]);
-	return dev ? dev->dev : NULL;
+	obj = READ_ONCE(dtab->netdev_map[key]);
+	return obj;
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
+{
+	struct net_device *dev = dst->dev;
+	struct xdp_frame *xdpf;
+	int err;
+
+	if (!dev->netdev_ops->ndo_xdp_xmit)
+		return -EOPNOTSUPP;
+
+	xdpf = convert_to_xdp_frame(xdp);
+	if (unlikely(!xdpf))
+		return -EOVERFLOW;
+
+	/* TODO: implement a bulking/enqueue step later */
+	err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+	if (err)
+		return err;
+
+	return 0;
 }
 
 static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
 {
-	struct net_device *dev = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
+	struct net_device *dev = dev = obj ? obj->dev : NULL;
 
 	return dev ? &dev->ifindex : NULL;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index ca60d2872da4..284b9ec3d50f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3024,20 +3024,9 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 
 	switch (map->map_type) {
 	case BPF_MAP_TYPE_DEVMAP: {
-		struct net_device *dev = fwd;
-		struct xdp_frame *xdpf;
+		struct bpf_dtab_netdev *dst = fwd;
 
-		if (!dev->netdev_ops->ndo_xdp_xmit)
-			return -EOPNOTSUPP;
-
-		xdpf = convert_to_xdp_frame(xdp);
-		if (unlikely(!xdpf))
-			return -EOVERFLOW;
-
-		/* TODO: move to inside map code instead, for bulk support
-		 * err = dev_map_enqueue(dev, xdp);
-		 */
-		err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf);
+		err = dev_map_enqueue(dst, xdp);
 		if (err)
 			return err;
 		__dev_map_insert_ctx(map, index);

^ permalink raw reply related

* [bpf-next V3 PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API
From: Jesper Dangaard Brouer @ 2018-05-15 12:13 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson,
	makita.toshiaki

This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.

When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.

This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations.  The DMA APIs use of indirect
function pointer calls is the primary source the regression.  It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).

The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets.  The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system.  Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues.  E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.

I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of
this patchset.  This will be a followup patchset, once we know if this
will be needed (e.g. for non-map xdp_redirect flush-flag, and if
AF_XDP chooses to use ndo_xdp_xmit for TX).

---

Jesper Dangaard Brouer (4):
      bpf: devmap introduce dev_map_enqueue
      bpf: devmap prepare xdp frames for bulking
      xdp: add tracepoint for devmap like cpumap have
      xdp: change ndo_xdp_xmit API to support bulking


 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   26 ++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++-
 drivers/net/tun.c                             |   37 ++++---
 drivers/net/virtio_net.c                      |   66 +++++++++---
 include/linux/bpf.h                           |   16 ++-
 include/linux/netdevice.h                     |   14 ++-
 include/net/page_pool.h                       |    5 +
 include/net/xdp.h                             |    1 
 include/trace/events/xdp.h                    |   50 +++++++++
 kernel/bpf/devmap.c                           |  134 ++++++++++++++++++++++++-
 net/core/filter.c                             |   19 +---
 net/core/xdp.c                                |   20 +++-
 samples/bpf/xdp_monitor_kern.c                |   49 +++++++++
 samples/bpf/xdp_monitor_user.c                |   69 +++++++++++++
 15 files changed, 446 insertions(+), 83 deletions(-)

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Vlad Buslov @ 2018-05-15 12:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180515115739.GN2134@nanopsycho.orion>


On Tue 15 May 2018 at 11:57, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, May 15, 2018 at 01:41:45PM CEST, vladbu@mellanox.com wrote:
>>
>>On Tue 15 May 2018 at 11:39, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, May 15, 2018 at 01:32:51PM CEST, vladbu@mellanox.com wrote:
>>>>
>>>>On Tue 15 May 2018 at 11:24, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>>>>>>Change action API to assume that action init function always takes
>>>>>>reference to action, even when overwriting existing action. This is
>>>>>>necessary because action API continues to use action pointer after init
>>>>>>function is done. At this point action becomes accessible for concurrent
>>>>>>modifications so user must always hold reference to it.
>>>>>>
>>>>>>Implement helper put list function to atomically release list of actions
>>>>>>after action API init code is done using them.
>>>>>>
>>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>>---
>>>>>> net/sched/act_api.c | 38 +++++++++++++++++---------------------
>>>>>> 1 file changed, 17 insertions(+), 21 deletions(-)
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>
>>>>>>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>>>>>> 		return ret;
>>>>>> 	}
>>>>>> err:
>>>>>>-	if (event != RTM_GETACTION)
>>>>>
>>>>> Howcome you do this for RTM_GETACTION now too? Where is the related
>>>>> "get"?
>>>>
>>>>In patch 5. There is always a possibility of concurrent delete without
>>>>rtnl lock so all usages of action pointers were converted to hold
>>>>reference to action.
>>>
>>> So that means that if you run kernel in between, with patch 5 but
>>> without patch 7 and you do RTM_GETACTION, you leak a reference, right?
>>
>>Right.
>
> That is an issue. You need to make sure that the code is without bugs
> like this after every applied patch. You need to make sure the code is
> bisectable.

Got it. Will fix in V2.

>
> Thanks.

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Jiri Pirko @ 2018-05-15 11:57 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <vbfk1s5m9hi.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

Tue, May 15, 2018 at 01:41:45PM CEST, vladbu@mellanox.com wrote:
>
>On Tue 15 May 2018 at 11:39, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, May 15, 2018 at 01:32:51PM CEST, vladbu@mellanox.com wrote:
>>>
>>>On Tue 15 May 2018 at 11:24, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>>>>>Change action API to assume that action init function always takes
>>>>>reference to action, even when overwriting existing action. This is
>>>>>necessary because action API continues to use action pointer after init
>>>>>function is done. At this point action becomes accessible for concurrent
>>>>>modifications so user must always hold reference to it.
>>>>>
>>>>>Implement helper put list function to atomically release list of actions
>>>>>after action API init code is done using them.
>>>>>
>>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>>---
>>>>> net/sched/act_api.c | 38 +++++++++++++++++---------------------
>>>>> 1 file changed, 17 insertions(+), 21 deletions(-)
>>>>>
>>>>
>>>> [...]
>>>>
>>>>
>>>>>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>>>>> 		return ret;
>>>>> 	}
>>>>> err:
>>>>>-	if (event != RTM_GETACTION)
>>>>
>>>> Howcome you do this for RTM_GETACTION now too? Where is the related
>>>> "get"?
>>>
>>>In patch 5. There is always a possibility of concurrent delete without
>>>rtnl lock so all usages of action pointers were converted to hold
>>>reference to action.
>>
>> So that means that if you run kernel in between, with patch 5 but
>> without patch 7 and you do RTM_GETACTION, you leak a reference, right?
>
>Right.

That is an issue. You need to make sure that the code is without bugs
like this after every applied patch. You need to make sure the code is
bisectable.

Thanks.

^ permalink raw reply

* Re: [PATCH 05/14] net: sched: always take reference to action
From: Vlad Buslov @ 2018-05-15 11:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180515085827.GJ2134@nanopsycho.orion>


On Tue 15 May 2018 at 08:58, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, May 14, 2018 at 08:49:07PM CEST, vladbu@mellanox.com wrote:
>>
>>On Mon 14 May 2018 at 16:23, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Mon, May 14, 2018 at 04:27:06PM CEST, vladbu@mellanox.com wrote:
>>>>Without rtnl lock protection it is no longer safe to use pointer to tc
>>>>action without holding reference to it. (it can be destroyed concurrently)
>>>>
>>>>Remove unsafe action idr lookup function. Instead of it, implement safe tcf
>>>>idr check function that atomically looks up action in idr and increments
>>>>its reference and bind counters.
>>>>
>>>>Implement both action search and check using new safe function.
>>>>
>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>---
>>>> net/sched/act_api.c | 38 ++++++++++++++++----------------------
>>>> 1 file changed, 16 insertions(+), 22 deletions(-)
>>>>
>>>>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>>index 1331beb..9459cce 100644
>>>>--- a/net/sched/act_api.c
>>>>+++ b/net/sched/act_api.c
>>>>@@ -284,44 +284,38 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
>>>> }
>>>> EXPORT_SYMBOL(tcf_generic_walker);
>>>> 
>>>>-static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
>>>>+bool __tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
>>>>+		     int bind)
>>>> {
>>>>-	struct tc_action *p = NULL;
>>>>+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>>+	struct tc_action *p;
>>>> 
>>>> 	spin_lock_bh(&idrinfo->lock);
>>>
>>> Why "_bh" variant is necessary here?
>>
>>It is not my code.
>
> Yeah, yet still I wonder :)

I've been looking into it and I don't understand why it is used either.
Looking at the commit history I see that long time ago rw_lock was used
instead of spinlock, and some actions implemented they own lookup
methods. So it seems that it might have been used to protect lookups
running in bh context. However, in current architecture classifiers hold
direct reference to action so action lookup is not used when processing
packets.

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Vlad Buslov @ 2018-05-15 11:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180515113954.GM2134@nanopsycho.orion>


On Tue 15 May 2018 at 11:39, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, May 15, 2018 at 01:32:51PM CEST, vladbu@mellanox.com wrote:
>>
>>On Tue 15 May 2018 at 11:24, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>>>>Change action API to assume that action init function always takes
>>>>reference to action, even when overwriting existing action. This is
>>>>necessary because action API continues to use action pointer after init
>>>>function is done. At this point action becomes accessible for concurrent
>>>>modifications so user must always hold reference to it.
>>>>
>>>>Implement helper put list function to atomically release list of actions
>>>>after action API init code is done using them.
>>>>
>>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>>---
>>>> net/sched/act_api.c | 38 +++++++++++++++++---------------------
>>>> 1 file changed, 17 insertions(+), 21 deletions(-)
>>>>
>>>
>>> [...]
>>>
>>>
>>>>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>>>> 		return ret;
>>>> 	}
>>>> err:
>>>>-	if (event != RTM_GETACTION)
>>>
>>> Howcome you do this for RTM_GETACTION now too? Where is the related
>>> "get"?
>>
>>In patch 5. There is always a possibility of concurrent delete without
>>rtnl lock so all usages of action pointers were converted to hold
>>reference to action.
>
> So that means that if you run kernel in between, with patch 5 but
> without patch 7 and you do RTM_GETACTION, you leak a reference, right?

Right.

>
>
>>
>>>
>>>
>>>>-		tcf_action_destroy(&actions, 0);
>>>>+	tcf_action_put_lst(&actions);
>>>> 	return ret;
>>>> }
>>>> 
>>>
>>> [...]
>>

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Jiri Pirko @ 2018-05-15 11:39 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <vbflgclm9wc.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

Tue, May 15, 2018 at 01:32:51PM CEST, vladbu@mellanox.com wrote:
>
>On Tue 15 May 2018 at 11:24, Jiri Pirko <jiri@resnulli.us> wrote:
>> Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>>>Change action API to assume that action init function always takes
>>>reference to action, even when overwriting existing action. This is
>>>necessary because action API continues to use action pointer after init
>>>function is done. At this point action becomes accessible for concurrent
>>>modifications so user must always hold reference to it.
>>>
>>>Implement helper put list function to atomically release list of actions
>>>after action API init code is done using them.
>>>
>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>>---
>>> net/sched/act_api.c | 38 +++++++++++++++++---------------------
>>> 1 file changed, 17 insertions(+), 21 deletions(-)
>>>
>>
>> [...]
>>
>>
>>>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>>> 		return ret;
>>> 	}
>>> err:
>>>-	if (event != RTM_GETACTION)
>>
>> Howcome you do this for RTM_GETACTION now too? Where is the related
>> "get"?
>
>In patch 5. There is always a possibility of concurrent delete without
>rtnl lock so all usages of action pointers were converted to hold
>reference to action.

So that means that if you run kernel in between, with patch 5 but
without patch 7 and you do RTM_GETACTION, you leak a reference, right?


>
>>
>>
>>>-		tcf_action_destroy(&actions, 0);
>>>+	tcf_action_put_lst(&actions);
>>> 	return ret;
>>> }
>>> 
>>
>> [...]
>

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Vlad Buslov @ 2018-05-15 11:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180515112404.GL2134@nanopsycho.orion>


On Tue 15 May 2018 at 11:24, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>>Change action API to assume that action init function always takes
>>reference to action, even when overwriting existing action. This is
>>necessary because action API continues to use action pointer after init
>>function is done. At this point action becomes accessible for concurrent
>>modifications so user must always hold reference to it.
>>
>>Implement helper put list function to atomically release list of actions
>>after action API init code is done using them.
>>
>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>---
>> net/sched/act_api.c | 38 +++++++++++++++++---------------------
>> 1 file changed, 17 insertions(+), 21 deletions(-)
>>
>
> [...]
>
>
>>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>> 		return ret;
>> 	}
>> err:
>>-	if (event != RTM_GETACTION)
>
> Howcome you do this for RTM_GETACTION now too? Where is the related
> "get"?

In patch 5. There is always a possibility of concurrent delete without
rtnl lock so all usages of action pointers were converted to hold
reference to action.

>
>
>>-		tcf_action_destroy(&actions, 0);
>>+	tcf_action_put_lst(&actions);
>> 	return ret;
>> }
>> 
>
> [...]

^ permalink raw reply

* Re: [PATCH 07/14] net: sched: use reference counting action init
From: Jiri Pirko @ 2018-05-15 11:24 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <1526308035-12484-8-git-send-email-vladbu@mellanox.com>

Mon, May 14, 2018 at 04:27:08PM CEST, vladbu@mellanox.com wrote:
>Change action API to assume that action init function always takes
>reference to action, even when overwriting existing action. This is
>necessary because action API continues to use action pointer after init
>function is done. At this point action becomes accessible for concurrent
>modifications so user must always hold reference to it.
>
>Implement helper put list function to atomically release list of actions
>after action API init code is done using them.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>---
> net/sched/act_api.c | 38 +++++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 21 deletions(-)
>

[...]


>@@ -1196,8 +1190,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
> 		return ret;
> 	}
> err:
>-	if (event != RTM_GETACTION)

Howcome you do this for RTM_GETACTION now too? Where is the related "get"?


>-		tcf_action_destroy(&actions, 0);
>+	tcf_action_put_lst(&actions);
> 	return ret;
> }
> 

[...]

^ permalink raw reply

* Re: [PATCH v1 4/4] samples/bpf: an example of a raw IR decoder
From: Sean Young @ 2018-05-15 10:40 UTC (permalink / raw)
  To: Y Song
  Cc: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller
In-Reply-To: <CAH3MdRXGnRFcPKhkjgSXNwGUBL-KbCTqOq3tVx6kLNY7d=FOig@mail.gmail.com>

On Mon, May 14, 2018 at 10:34:57PM -0700, Y Song wrote:
> On Mon, May 14, 2018 at 2:11 PM, Sean Young <sean@mess.org> wrote:
> > This implements the grundig-16 IR protocol.
> >
> > Signed-off-by: Sean Young <sean@mess.org>
> > ---
> >  samples/bpf/Makefile                      |   4 +
> >  samples/bpf/bpf_load.c                    |   9 +-
> >  samples/bpf/grundig_decoder_kern.c        | 112 ++++++++++++++++++++++
> >  samples/bpf/grundig_decoder_user.c        |  54 +++++++++++
> >  tools/bpf/bpftool/prog.c                  |   1 +
> >  tools/include/uapi/linux/bpf.h            |  17 +++-
> >  tools/testing/selftests/bpf/bpf_helpers.h |   6 ++
> >  7 files changed, 200 insertions(+), 3 deletions(-)
> >  create mode 100644 samples/bpf/grundig_decoder_kern.c
> >  create mode 100644 samples/bpf/grundig_decoder_user.c
> >
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index 4d6a6edd4bf6..c6fa111f103a 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
> >  hostprogs-y += xdp_rxq_info
> >  hostprogs-y += syscall_tp
> >  hostprogs-y += cpustat
> > +hostprogs-y += grundig_decoder
> >
> >  # Libbpf dependencies
> >  LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
> > @@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
> >  xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
> >  syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
> >  cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
> > +grundig_decoder-objs := bpf_load.o $(LIBBPF) grundig_decoder_user.o
> >
> >  # Tell kbuild to always build the programs
> >  always := $(hostprogs-y)
> > @@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
> >  always += xdp2skb_meta_kern.o
> >  always += syscall_tp_kern.o
> >  always += cpustat_kern.o
> > +always += grundig_decoder_kern.o
> >
> >  HOSTCFLAGS += -I$(objtree)/usr/include
> >  HOSTCFLAGS += -I$(srctree)/tools/lib/
> > @@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
> >  HOSTLOADLIBES_xdp_rxq_info += -lelf
> >  HOSTLOADLIBES_syscall_tp += -lelf
> >  HOSTLOADLIBES_cpustat += -lelf
> > +HOSTLOADLIBES_grundig_decoder += -lelf
> >
> >  # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
> >  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > index bebe4188b4b3..0fd389e95bb9 100644
> > --- a/samples/bpf/bpf_load.c
> > +++ b/samples/bpf/bpf_load.c
> > @@ -69,6 +69,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
> >         bool is_sockops = strncmp(event, "sockops", 7) == 0;
> >         bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
> >         bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
> > +       bool is_ir_decoder = strncmp(event, "ir_decoder", 10) == 0;
> >         size_t insns_cnt = size / sizeof(struct bpf_insn);
> >         enum bpf_prog_type prog_type;
> >         char buf[256];
> > @@ -102,6 +103,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
> >                 prog_type = BPF_PROG_TYPE_SK_SKB;
> >         } else if (is_sk_msg) {
> >                 prog_type = BPF_PROG_TYPE_SK_MSG;
> > +       } else if (is_ir_decoder) {
> > +               prog_type = BPF_PROG_TYPE_RAWIR_DECODER;
> >         } else {
> >                 printf("Unknown event '%s'\n", event);
> >                 return -1;
> > @@ -116,7 +119,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
> >
> >         prog_fd[prog_cnt++] = fd;
> >
> > -       if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
> > +       if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk ||
> > +           is_ir_decoder)
> >                 return 0;
> >
> >         if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
> > @@ -607,7 +611,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
> >                     memcmp(shname, "cgroup/", 7) == 0 ||
> >                     memcmp(shname, "sockops", 7) == 0 ||
> >                     memcmp(shname, "sk_skb", 6) == 0 ||
> > -                   memcmp(shname, "sk_msg", 6) == 0) {
> > +                   memcmp(shname, "sk_msg", 6) == 0 ||
> > +                   memcmp(shname, "ir_decoder", 10) == 0) {
> >                         ret = load_and_attach(shname, data->d_buf,
> >                                               data->d_size);
> >                         if (ret != 0)
> > diff --git a/samples/bpf/grundig_decoder_kern.c b/samples/bpf/grundig_decoder_kern.c
> > new file mode 100644
> > index 000000000000..c80f2c9cc69a
> > --- /dev/null
> > +++ b/samples/bpf/grundig_decoder_kern.c
> > @@ -0,0 +1,112 @@
> > +
> > +#include <uapi/linux/bpf.h>
> > +#include <uapi/linux/bpf_rcdev.h>
> > +#include "bpf_helpers.h"
> > +#include <linux/version.h>
> > +
> > +enum grundig_state {
> > +       STATE_INACTIVE,
> > +       STATE_HEADER_SPACE,
> > +       STATE_LEADING_PULSE,
> > +       STATE_BITS_SPACE,
> > +       STATE_BITS_PULSE,
> > +};
> > +
> > +struct decoder_state {
> > +       u32 bits;
> > +       enum grundig_state state;
> > +       u32 count;
> > +       u32 last_space;
> > +};
> > +
> > +struct bpf_map_def SEC("maps") decoder_state_map = {
> > +       .type = BPF_MAP_TYPE_ARRAY,
> > +       .key_size = sizeof(u32),
> > +       .value_size = sizeof(struct decoder_state),
> > +       .max_entries = 1,
> > +};
> > +
> > +#define US_TO_NS(t) 1000*(t)
> > +static inline bool eq_margin(unsigned d1, unsigned d2, unsigned margin)
> > +{
> > +       return ((d1 > (d2 - margin)) && (d1 < (d2 + margin)));
> > +}
> > +
> > +SEC("ir_decoder/grundig_decoder")
> > +int bpf_decoder(struct ir_raw_event *raw)
> > +{
> > +       u32 key = 0;
> > +       struct decoder_state init = {};
> > +
> > +       struct decoder_state *s = bpf_map_lookup_elem(&decoder_state_map, &key);
> > +
> > +       if (!s)
> > +               s = &init;
> > +
> > +       if (raw->carrier_report) {
> > +               // ignore
> > +       } else if (raw->reset) {
> > +               s->state = STATE_INACTIVE;
> > +       } else if (s->state == STATE_INACTIVE) {
> > +               if (raw->pulse && eq_margin(US_TO_NS(900), raw->duration, US_TO_NS(100))) {
> > +                       s->bits = 0;
> > +                       s->state = STATE_HEADER_SPACE;
> > +                       s->count = 0;
> > +               }
> > +       } else if (s->state == STATE_HEADER_SPACE) {
> > +               if (!raw->pulse && eq_margin(US_TO_NS(2900), raw->duration, US_TO_NS(200)))
> > +                       s->state = STATE_LEADING_PULSE;
> > +               else
> > +                       s->state = STATE_INACTIVE;
> > +       } else if (s->state == STATE_LEADING_PULSE) {
> > +               if (raw->pulse && eq_margin(US_TO_NS(1300), raw->duration, US_TO_NS(100)))
> > +                       s->state = STATE_BITS_SPACE;
> > +               else
> > +                       s->state = STATE_INACTIVE;
> > +       } else if (s->state == STATE_BITS_SPACE) {
> > +               s->last_space = raw->duration;
> > +               s->state = STATE_BITS_PULSE;
> > +       } else if (s->state == STATE_BITS_PULSE) {
> > +               int t = -1;
> > +               if (eq_margin(s->last_space, US_TO_NS(472), US_TO_NS(150)) &&
> > +                   eq_margin(raw->duration, US_TO_NS(583), US_TO_NS(150)))
> > +                       t = 0;
> > +               if (eq_margin(s->last_space, US_TO_NS(1139), US_TO_NS(150)) &&
> > +                   eq_margin(raw->duration, US_TO_NS(583), US_TO_NS(150)))
> > +                       t = 1;
> > +               if (eq_margin(s->last_space, US_TO_NS(1806), US_TO_NS(150)) &&
> > +                   eq_margin(raw->duration, US_TO_NS(583), US_TO_NS(150)))
> > +                       t = 2;
> > +               if (eq_margin(s->last_space, US_TO_NS(2200), US_TO_NS(150)) &&
> > +                   eq_margin(raw->duration, US_TO_NS(1139), US_TO_NS(150)))
> > +                       t = 3;
> > +               if (t < 0) {
> > +                       s->state = STATE_INACTIVE;
> > +               } else {
> > +                       s->bits <<= 2;
> > +                       switch (t) {
> > +                       case 3: s->bits |= 0; break;
> > +                       case 2: s->bits |= 3; break;
> > +                       case 1: s->bits |= 2; break;
> > +                       case 0: s->bits |= 1; break;
> > +                       }
> > +                       s->count += 2;
> > +                       if (s->count == 16) {
> > +                               bpf_rc_keydown(raw, 0x40, s->bits, 0);
> > +                               s->state = STATE_INACTIVE;
> > +                       } else {
> > +                               s->state = STATE_BITS_SPACE;
> > +                       }
> > +               }
> > +       }
> > +
> > +       if (s == &init)
> > +               bpf_map_update_elem(&decoder_state_map, &key, s, BPF_NOEXIST);
> > +
> > +       return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +u32 _version SEC("version") = LINUX_VERSION_CODE;
> > +
> > diff --git a/samples/bpf/grundig_decoder_user.c b/samples/bpf/grundig_decoder_user.c
> > new file mode 100644
> > index 000000000000..61e8ee5f73ee
> > --- /dev/null
> > +++ b/samples/bpf/grundig_decoder_user.c
> > @@ -0,0 +1,54 @@
> > +
> > +#include <linux/bpf.h>
> > +#include <assert.h>
> > +#include <errno.h>
> > +#include <signal.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <stdbool.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <libgen.h>
> > +#include <sys/resource.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <fcntl.h>
> > +
> > +#include "bpf_load.h"
> > +#include "bpf_util.h"
> > +#include "libbpf.h"
> > +
> > +int main(int argc, char **argv)
> > +{
> > +       char filename[256];
> > +       int ret, lircfd;
> > +
> > +       if (argc != 2) {
> > +               printf("Usage: %s /dev/lircN\n", argv[0]);
> 
> Looks like the test requires /dev/lircN device. Is there any easy way
> to test it?

It can be tested using rc-loopback.

> Also, looks like the program does not depend on any kernel headers,
> maybe it can be
> moved to tools/testing/selftests/bpf/? There are testbot to run those
> tests regularly.

That's a good idea. Let me see what I can do.

Thanks for all review comments, I agree with all them (so I won't have
to reply to every message).

I'll re-roll a v2 of this series in the next couple of days.

Thanks

Sean

> 
> > +               return 2;
> > +       }
> > +
> > +       snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> > +
> > +       if (load_bpf_file(filename)) {
> > +               printf("%s", bpf_log_buf);
> > +               return 1;
> > +       }
> > +
> > +       lircfd = open(argv[1], O_RDWR);
> > +       if (lircfd == -1) {
> > +               printf("failed to open lirc device %s: %m\n", argv[1]);
> > +               return 1;
> > +       }
> > +
> > +       ret = bpf_prog_attach(prog_fd[0], lircfd, BPF_RAWIR_DECODER, 0);
> > +       if (ret) {
> > +               printf("Failed to attach bpf to lirc device: %m\n");
> > +               return 1;
> > +       }
> > +
> > +       printf("Grundig IR decoder loaded and attached. Hit any key to stop\n");
> > +       getchar();
> > +
> > +       return 0;
> > +}
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index f7a810897eac..ae1c26df212d 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -68,6 +68,7 @@ static const char * const prog_type_name[] = {
> >         [BPF_PROG_TYPE_SOCK_OPS]        = "sock_ops",
> >         [BPF_PROG_TYPE_SK_SKB]          = "sk_skb",
> >         [BPF_PROG_TYPE_CGROUP_DEVICE]   = "cgroup_device",
> > +       [BPF_PROG_TYPE_RAWIR_DECODER]   = "ir_decoder",
> >  };
> >
> >  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index c5ec89732a8d..d9740599adf6 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -137,6 +137,7 @@ enum bpf_prog_type {
> >         BPF_PROG_TYPE_SK_MSG,
> >         BPF_PROG_TYPE_RAW_TRACEPOINT,
> >         BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > +       BPF_PROG_TYPE_RAWIR_DECODER,
> >  };
> >
> >  enum bpf_attach_type {
> > @@ -154,6 +155,7 @@ enum bpf_attach_type {
> >         BPF_CGROUP_INET6_CONNECT,
> >         BPF_CGROUP_INET4_POST_BIND,
> >         BPF_CGROUP_INET6_POST_BIND,
> > +       BPF_RAWIR_DECODER,
> >         __MAX_BPF_ATTACH_TYPE
> >  };
> >
> > @@ -755,6 +757,17 @@ union bpf_attr {
> >   *     @addr: pointer to struct sockaddr to bind socket to
> >   *     @addr_len: length of sockaddr structure
> >   *     Return: 0 on success or negative error code
> > + *
> > + * int bpf_rc_keydown(ctx, protocol, scancode, toggle)
> > + *     Report decoded scancode with toggle value
> > + *     @ctx: pointer to ctx
> > + *     @protocol: decoded protocol
> > + *     @scancode: decoded scancode
> > + *     @toggle: set to 1 if button was toggled, else 0
> > + *
> > + * int bpf_rc_repeat(ctx)
> > + *     Repeat the last decoded scancode
> > + *     @ctx: pointer to ctx
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -821,7 +834,9 @@ union bpf_attr {
> >         FN(msg_apply_bytes),            \
> >         FN(msg_cork_bytes),             \
> >         FN(msg_pull_data),              \
> > -       FN(bind),
> > +       FN(bind),                       \
> > +       FN(rc_repeat),                  \
> > +       FN(rc_keydown),
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> > index d8223d99f96d..4bf23d3dfc33 100644
> > --- a/tools/testing/selftests/bpf/bpf_helpers.h
> > +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> > @@ -96,6 +96,12 @@ static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
> >         (void *) BPF_FUNC_msg_pull_data;
> >  static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
> >         (void *) BPF_FUNC_bind;
> > +static int (*bpf_rc_repeat)(void *ctx) =
> > +       (void *) BPF_FUNC_rc_repeat;
> > +static int (*bpf_rc_keydown)(void *ctx, unsigned protocol, unsigned scancode,
> > +                            unsigned toggle) =
> > +       (void *) BPF_FUNC_rc_keydown;
> > +
> >
> >  /* llvm builtin functions that eBPF C program may use to
> >   * emit BPF_LD_ABS and BPF_LD_IND instructions
> > --
> > 2.17.0
> >

^ permalink raw reply

* Re: [PATCH v1 1/4] media: rc: introduce BPF_PROG_IR_DECODER
From: Sean Young @ 2018-05-15 10:30 UTC (permalink / raw)
  To: Y Song
  Cc: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller
In-Reply-To: <CAH3MdRXkzpJZ=VLwJWpuwiNhcjJwjNqpaZXwk+1-HfL_7hjJew@mail.gmail.com>

On Mon, May 14, 2018 at 09:48:05PM -0700, Y Song wrote:
> On Mon, May 14, 2018 at 2:10 PM, Sean Young <sean@mess.org> wrote:
> > Add support for BPF_PROG_IR_DECODER. This type of BPF program can call
> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> > that the last key should be repeated.
> >
> > Signed-off-by: Sean Young <sean@mess.org>
> > ---
> >  drivers/media/rc/Kconfig          |  8 +++
> >  drivers/media/rc/Makefile         |  1 +
> >  drivers/media/rc/ir-bpf-decoder.c | 93 +++++++++++++++++++++++++++++++
> >  include/linux/bpf_types.h         |  3 +
> >  include/uapi/linux/bpf.h          | 16 +++++-
> >  5 files changed, 120 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/media/rc/ir-bpf-decoder.c
> >
> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> > index eb2c3b6eca7f..10ad6167d87c 100644
> > --- a/drivers/media/rc/Kconfig
> > +++ b/drivers/media/rc/Kconfig
> > @@ -120,6 +120,14 @@ config IR_IMON_DECODER
> >            remote control and you would like to use it with a raw IR
> >            receiver, or if you wish to use an encoder to transmit this IR.
> >
> > +config IR_BPF_DECODER
> > +       bool "Enable IR raw decoder using BPF"
> > +       depends on BPF_SYSCALL
> > +       depends on RC_CORE=y
> > +       help
> > +          Enable this option to make it possible to load custom IR
> > +          decoders written in BPF.
> > +
> >  endif #RC_DECODERS
> >
> >  menuconfig RC_DEVICES
> > diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> > index 2e1c87066f6c..12e1118430d0 100644
> > --- a/drivers/media/rc/Makefile
> > +++ b/drivers/media/rc/Makefile
> > @@ -5,6 +5,7 @@ obj-y += keymaps/
> >  obj-$(CONFIG_RC_CORE) += rc-core.o
> >  rc-core-y := rc-main.o rc-ir-raw.o
> >  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> > +rc-core-$(CONFIG_IR_BPF_DECODER) += ir-bpf-decoder.o
> >  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
> >  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
> >  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> > diff --git a/drivers/media/rc/ir-bpf-decoder.c b/drivers/media/rc/ir-bpf-decoder.c
> > new file mode 100644
> > index 000000000000..aaa5e208b1a5
> > --- /dev/null
> > +++ b/drivers/media/rc/ir-bpf-decoder.c
> > @@ -0,0 +1,93 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// ir-bpf-decoder.c - handles bpf decoders
> > +//
> > +// Copyright (C) 2018 Sean Young <sean@mess.org>
> > +
> > +#include <linux/bpf.h>
> > +#include <linux/filter.h>
> > +#include "rc-core-priv.h"
> > +
> > +/*
> > + * BPF interface for raw IR decoder
> > + */
> > +const struct bpf_prog_ops ir_decoder_prog_ops = {
> > +};
> > +
> > +BPF_CALL_1(bpf_rc_repeat, struct ir_raw_event*, event)
> > +{
> > +       struct ir_raw_event_ctrl *ctrl;
> > +
> > +       ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> > +
> > +       rc_repeat(ctrl->dev);
> > +       return 0;
> > +}
> > +
> > +static const struct bpf_func_proto rc_repeat_proto = {
> > +       .func      = bpf_rc_repeat,
> > +       .gpl_only  = true, // rc_repeat is EXPORT_SYMBOL_GPL
> > +       .ret_type  = RET_VOID,
> 
> I suggest using RET_INTEGER here since we do return an integer 0.
> RET_INTEGER will also make it easy to extend if you want to return
> a non-zero value for error code or other reasons.

Ok.

> > +       .arg1_type = ARG_PTR_TO_CTX,
> > +};
> > +
> > +BPF_CALL_4(bpf_rc_keydown, struct ir_raw_event*, event, u32, protocol,
> > +          u32, scancode, u32, toggle)
> > +{
> > +       struct ir_raw_event_ctrl *ctrl;
> > +
> > +       ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> > +       rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> > +       return 0;
> > +}
> > +
> > +static const struct bpf_func_proto rc_keydown_proto = {
> > +       .func      = bpf_rc_keydown,
> > +       .gpl_only  = true, // rc_keydown is EXPORT_SYMBOL_GPL
> > +       .ret_type  = RET_VOID,
> 
> ditto. RET_INTEGER is preferable.

Ok.

> > +       .arg1_type = ARG_PTR_TO_CTX,
> > +       .arg2_type = ARG_ANYTHING,
> > +       .arg3_type = ARG_ANYTHING,
> > +       .arg4_type = ARG_ANYTHING,
> > +};
> > +
> > +static const struct bpf_func_proto *ir_decoder_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > +{
> > +       switch (func_id) {
> > +       case BPF_FUNC_rc_repeat:
> > +               return &rc_repeat_proto;
> > +       case BPF_FUNC_rc_keydown:
> > +               return &rc_keydown_proto;
> > +       case BPF_FUNC_map_lookup_elem:
> > +               return &bpf_map_lookup_elem_proto;
> > +       case BPF_FUNC_map_update_elem:
> > +               return &bpf_map_update_elem_proto;
> > +       case BPF_FUNC_map_delete_elem:
> > +               return &bpf_map_delete_elem_proto;
> > +       case BPF_FUNC_ktime_get_ns:
> > +               return &bpf_ktime_get_ns_proto;
> > +       case BPF_FUNC_tail_call:
> > +               return &bpf_tail_call_proto;
> > +       case BPF_FUNC_get_prandom_u32:
> > +               return &bpf_get_prandom_u32_proto;
> > +       default:
> > +               return NULL;
> > +       }
> > +}
> > +
> > +static bool ir_decoder_is_valid_access(int off, int size,
> > +                                      enum bpf_access_type type,
> > +                                      const struct bpf_prog *prog,
> > +                                      struct bpf_insn_access_aux *info)
> > +{
> > +       if (type == BPF_WRITE)
> > +               return false;
> > +       if (off < 0 || off + size > sizeof(struct ir_raw_event))
> > +               return false;
> 
> You probably need more than just checking the boundary.
> >From patch #3, the structure is:
>  struct ir_raw_event {
>         union {
>                 __u32           duration;
>                 __u32           carrier;
>         };
>         __u8                    duty_cycle;
> 
>         unsigned                pulse:1;
>         unsigned                reset:1;
>         unsigned                timeout:1;
>        unsigned                carrier_report:1;
> };
> 
> You would like the memory access to be aligned,
> so accessing duration/carrier with 4-byte alignment, and
> pulse/reset/timeout/carrier_report 4-byte alignment as well.
> 
> You could only allow __u32 access to duration/carrier.
> But if you want bpf program to access duration/carrier with
> code like (__u16)(ctx->duration), then the compiler may
> translate the load to a 2-byte load. You may need to handle
> endianness here. You can check net/core/filter.c function
> bpf_skb_is_valid_access for some examples.

Thanks, yes that makes sense. Actually exposing a struct with bit fields
isn't great. I think I can rework into something simpler (two u32 fields).

> > +
> > +       return true;
> > +}
> > +
> > +const struct bpf_verifier_ops ir_decoder_verifier_ops = {
> > +       .get_func_proto  = ir_decoder_func_proto,
> > +       .is_valid_access = ir_decoder_is_valid_access
> > +};
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index 2b28fcf6f6ae..ee5355715ee0 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -25,6 +25,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> >  #ifdef CONFIG_CGROUP_BPF
> >  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
> >  #endif
> > +#ifdef CONFIG_IR_BPF_DECODER
> > +BPF_PROG_TYPE(BPF_PROG_TYPE_RAWIR_DECODER, ir_decoder)
> > +#endif
> >
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c5ec89732a8d..6ad053e831c0 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -137,6 +137,7 @@ enum bpf_prog_type {
> >         BPF_PROG_TYPE_SK_MSG,
> >         BPF_PROG_TYPE_RAW_TRACEPOINT,
> >         BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > +       BPF_PROG_TYPE_RAWIR_DECODER,
> >  };
> >
> >  enum bpf_attach_type {
> > @@ -755,6 +756,17 @@ union bpf_attr {
> >   *     @addr: pointer to struct sockaddr to bind socket to
> >   *     @addr_len: length of sockaddr structure
> >   *     Return: 0 on success or negative error code
> > + *
> > + * int bpf_rc_keydown(ctx, protocol, scancode, toggle)
> > + *     Report decoded scancode with toggle value
> > + *     @ctx: pointer to ctx
> > + *     @protocol: decoded protocol
> > + *     @scancode: decoded scancode
> > + *     @toggle: set to 1 if button was toggled, else 0
> > + *
> > + * int bpf_rc_repeat(ctx)
> > + *     Repeat the last decoded scancode
> > + *     @ctx: pointer to ctx
> 
> The comment format has changed dramatically for
> documentation reason. Could you rebase your change
> on top of bpf-next tree? You will need to rewrite the above
> helper description so tools can generate proper documentation
> for them.

Ah, I need to rebase on top of bpf-next.

Thanks!

> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -821,7 +833,9 @@ union bpf_attr {
> >         FN(msg_apply_bytes),            \
> >         FN(msg_cork_bytes),             \
> >         FN(msg_pull_data),              \
> > -       FN(bind),
> > +       FN(bind),                       \
> > +       FN(rc_repeat),                  \
> > +       FN(rc_keydown),
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > --
> > 2.17.0
> >

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox