Netdev List
 help / color / mirror / Atom feed
* sparc64 and ARM64 JIT bug (was Re: LLVM 4.0 code generation bug)
From: David Miller @ 2017-05-02  3:02 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xi.wang, catalin.marinas
In-Reply-To: <e75404e5-c68d-6f08-afdc-e57174b88a32@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 1 May 2017 19:39:33 -0700

> On 5/1/17 7:31 PM, David Miller wrote:
>>
>> If the last BPF instruction before exit is a ldimm64, branches to the
>> exit point at the wrong location.
>>
>> Here is what I get from test_pkt_access.c on sparc:
>>
>> 0000000000000000 <process>:
>>    0:	b7 00 00 00 00 00 00 02 	mov	r0, 2
>>    8:	61 21 00 50 00 00 00 00 	ldw	r2, [r1+80]
>>   10:	61 11 00 4c 00 00 00 00 	ldw	r1, [r1+76]
>>   18:	bf 41 00 00 00 00 00 00 	mov	r4, r1
>>   20:	07 40 00 00 00 00 00 0e 	add	r4, 14
>>   28:	2d 42 00 25 00 00 00 00 	jgt	r4, r2, 148 <LBB0_11>
>>  ...
>> 0000000000000148 <LBB0_11>:
>>  148:	18 00 00 00 ff ff ff ff 	ldimm64	r0, 4294967295
>>  150:	00 00 00 00 00 00 00 00
>>
>> 0000000000000158 <LBB0_12>:
>>  158:	95 00 00 00 00 00 00 00 	exit	
 ...
> looks fine to me. it jumps to 0x158,
> since offset 0 is the next insn after jump which is 0x30
> That's how classic bpf defined jumps.

Ok, it seems that both arm64 and sparc64's JIT handle the above
situation improperly.

They both work by recording the instruction offsets in an array which
is indexed off by one.  It it built like this:

	for (i = 0; i < prog->len; i++) {
		const struct bpf_insn *insn = &prog->insnsi[i];
		int ret;

		ret = build_insn(insn, ctx);
		ctx->offset[i] = ctx->idx;

		if (ret > 0) {
			i++;
			continue;
		}
		if (ret)
			return ret;
	}

That is, we record the JIT'd instruction offset for BPF instruction
'idx' in array entry 'idx - 1'.

Then when we emit a relative branch, we lookup the destination offset
using "ctx->offset[this_insn_idx + insn->off]"

And this works most of the time.  It doesn't work for the scenerio
above, because 'idx - 1' is not necessarily the index of the previous
BPF instruction.  Instead, that might point to the second half of an
lddimm64 instruction.

This bug was introduced by commit
8eee539ddea09bccae2426f09b0ba6a18b72b691 ("arm64: bpf: fix
out-of-bounds read in bpf2a64_offset()") and I copied the logic into
sparc64 :-)

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-02  3:03 UTC (permalink / raw)
  To: ast; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <b96ef138-9862-5091-9041-76516f852e48@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 1 May 2017 19:49:21 -0700

> On 4/30/17 11:21 AM, David Miller wrote:
>> built with:
>>
>> 	clang -O2 -target bpfel -g -c x.c -o x.o
>>
>> readelf can see it just fine:
>>
>> [davem@localhost binutils]$ ./readelf --debug-dump=loc ./xel.o
>> Contents of the .debug_loc section:
>>
>>     Offset   Begin            End              Expression
>>     00000000 0000000000000000 0000000000000010 (DW_OP_reg1 (r1))
>>     00000013 <End of list>
>>     00000023 0000000000000010 0000000000000020 (DW_OP_constu:
>>     590618314553; DW_OP_stack_value)
>>     0000003d 0000000000000020 0000000000000030 (DW_OP_reg1 (r1))
>>     00000050 <End of list>
>>
>> But with big-endian:
>>
>> [davem@localhost binutils]$ ./readelf --debug-dump=loc ./xeb.o
>> readelf: Warning: Invalid pointer size (0) in compunit header, using 4
>> instead
>> readelf: Warning: Bogus end-of-siblings marker detected at offset 27
>> in .debug_info section
>> readelf: Warning: Bogus end-of-siblings marker detected at offset 28
>> in .debug_info section
>> readelf: Warning: DIE at offset 0x29 refers to abbreviation number 48
>> which does not exist
>> readelf: Warning: Unable to load/parse the .debug_info section, so
>> cannot interpret the .debug_loc section.
> 
> yeah. clang emitted dwarf for big-endian is broken.
> This dwarf stuff is too complicated for normal human beings.
> The tight packing making debugging it quite painful.

But doesn't the CLANG DWARF2 emission code look at the target
endianness?

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-05-02  3:14 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <20170501.230323.816591707269395321.davem@davemloft.net>

On 5/1/17 8:03 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Mon, 1 May 2017 19:49:21 -0700
>
>> On 4/30/17 11:21 AM, David Miller wrote:
>>> built with:
>>>
>>> 	clang -O2 -target bpfel -g -c x.c -o x.o
>>>
>>> readelf can see it just fine:
>>>
>>> [davem@localhost binutils]$ ./readelf --debug-dump=loc ./xel.o
>>> Contents of the .debug_loc section:
>>>
>>>     Offset   Begin            End              Expression
>>>     00000000 0000000000000000 0000000000000010 (DW_OP_reg1 (r1))
>>>     00000013 <End of list>
>>>     00000023 0000000000000010 0000000000000020 (DW_OP_constu:
>>>     590618314553; DW_OP_stack_value)
>>>     0000003d 0000000000000020 0000000000000030 (DW_OP_reg1 (r1))
>>>     00000050 <End of list>
>>>
>>> But with big-endian:
>>>
>>> [davem@localhost binutils]$ ./readelf --debug-dump=loc ./xeb.o
>>> readelf: Warning: Invalid pointer size (0) in compunit header, using 4
>>> instead
>>> readelf: Warning: Bogus end-of-siblings marker detected at offset 27
>>> in .debug_info section
>>> readelf: Warning: Bogus end-of-siblings marker detected at offset 28
>>> in .debug_info section
>>> readelf: Warning: DIE at offset 0x29 refers to abbreviation number 48
>>> which does not exist
>>> readelf: Warning: Unable to load/parse the .debug_info section, so
>>> cannot interpret the .debug_loc section.
>>
>> yeah. clang emitted dwarf for big-endian is broken.
>> This dwarf stuff is too complicated for normal human beings.
>> The tight packing making debugging it quite painful.
>
> But doesn't the CLANG DWARF2 emission code look at the target
> endianness?

it certainly does and on bpf backend side I'm not doing
anything special comparing to what other bi-endian architectures
like ppc and mips are doing. Obviously I missed something.

^ permalink raw reply

* [PATCH net-next iproute2 0/3] ip: Initial support for extack errors
From: David Ahern @ 2017-05-02  3:18 UTC (permalink / raw)
  To: netdev, stephen; +Cc: jakub.kicinski, David Ahern

Introduce a new function, rtnl_ack_extack, to allow commands to flip
to the new error reporting over time.

Convert iplink_modify to use the new function to display error strings
returned from ip link set commands.

David Ahern (3):
  netlink: import netlink message parsing from kernel
  netlink: Add support for extended ack to rtnl_talk
  ip link: Add extack handling for setlink

 include/libnetlink.h |  14 +++++
 include/nlattr.h     | 162 +++++++++++++++++++++++++++++++++++++++++++++++++++
 ip/iplink.c          |  18 +++++-
 lib/Makefile         |   2 +-
 lib/libnetlink.c     |  96 ++++++++++++++++++++++++++----
 lib/nlattr.c         | 145 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 423 insertions(+), 14 deletions(-)
 create mode 100644 include/nlattr.h
 create mode 100644 lib/nlattr.c

-- 
2.1.4

^ permalink raw reply

* [PATCH net-next iproute2 1/3] netlink: import netlink message parsing from kernel
From: David Ahern @ 2017-05-02  3:18 UTC (permalink / raw)
  To: netdev, stephen; +Cc: jakub.kicinski, David Ahern
In-Reply-To: <1493695105-9418-1-git-send-email-dsa@cumulusnetworks.com>

include/nlattr.h is pulled from include/net/netlink.h.
lib/nlattr.c is pulled from lib/nlattr.c

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/libnetlink.h |   8 +++
 include/nlattr.h     | 162 +++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/Makefile         |   2 +-
 lib/libnetlink.c     |   4 --
 lib/nlattr.c         | 145 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 316 insertions(+), 5 deletions(-)
 create mode 100644 include/nlattr.h
 create mode 100644 lib/nlattr.c

diff --git a/include/libnetlink.h b/include/libnetlink.h
index c43ab0a2d9d9..e7c46f1870aa 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -11,6 +11,11 @@
 #include <linux/neighbour.h>
 #include <linux/netconf.h>
 #include <arpa/inet.h>
+#include "nlattr.h"
+
+#ifndef MIN
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+#endif
 
 struct rtnl_handle {
 	int			fd;
@@ -227,4 +232,7 @@ int rtnl_from_file(FILE *, rtnl_listen_filter_t handler,
  * messages from dump file */
 #define NLMSG_TSTAMP	15
 
+int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
+	      int len, const struct nla_policy *policy);
+
 #endif /* __LIBNETLINK_H__ */
diff --git a/include/nlattr.h b/include/nlattr.h
new file mode 100644
index 000000000000..0859b6ce686c
--- /dev/null
+++ b/include/nlattr.h
@@ -0,0 +1,162 @@
+#ifndef __NLATTR_H
+#define __NLATTR_H
+
+#include <linux/netlink.h>
+
+/**
+ * Standard attribute types to specify validation policy
+ */
+enum {
+	NLA_UNSPEC,
+	NLA_U8,
+	NLA_U16,
+	NLA_U32,
+	NLA_U64,
+	NLA_STRING,
+	NLA_FLAG,
+	NLA_MSECS,
+	NLA_NESTED,
+	NLA_NESTED_COMPAT,
+	NLA_NUL_STRING,
+	NLA_BINARY,
+	NLA_S8,
+	NLA_S16,
+	NLA_S32,
+	NLA_S64,
+	__NLA_TYPE_MAX,
+};
+
+#define NLA_TYPE_MAX (__NLA_TYPE_MAX - 1)
+
+/**
+ * nla_type - attribute type
+ * @nla: netlink attribute
+ */
+static inline int nla_type(const struct nlattr *nla)
+{
+	return nla->nla_type & NLA_TYPE_MASK;
+}
+
+/**
+ * nla_len - length of payload
+ * @nla: netlink attribute
+ */
+static inline int nla_len(const struct nlattr *nla)
+{
+	return nla->nla_len - NLA_HDRLEN;
+}
+
+/**
+ * struct nla_policy - attribute validation policy
+ * @type: Type of attribute or NLA_UNSPEC
+ * @len: Type specific length of payload
+ *
+ * Policies are defined as arrays of this struct, the array must be
+ * accessible by attribute type up to the highest identifier to be expected.
+ *
+ * Meaning of `len' field:
+ *    NLA_STRING           Maximum length of string
+ *    NLA_NUL_STRING       Maximum length of string (excluding NUL)
+ *    NLA_FLAG             Unused
+ *    NLA_BINARY           Maximum length of attribute payload
+ *    NLA_NESTED           Don't use `len' field -- length verification is
+ *                         done by checking len of nested header (or empty)
+ *    NLA_NESTED_COMPAT    Minimum length of structure payload
+ *    NLA_U8, NLA_U16,
+ *    NLA_U32, NLA_U64,
+ *    NLA_S8, NLA_S16,
+ *    NLA_S32, NLA_S64,
+ *    NLA_MSECS            Leaving the length field zero will verify the
+ *                         given type fits, using it verifies minimum length
+ *                         just like "All other"
+ *    All other            Minimum length of attribute payload
+ *
+ * Example:
+ * static const struct nla_policy my_policy[ATTR_MAX+1] = {
+ *      [ATTR_FOO] = { .type = NLA_U16 },
+ *      [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
+ *      [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
+ * };
+ */
+struct nla_policy {
+	__u16	type;
+	__u16	len;
+};
+
+/**
+ * nla_ok - check if the netlink attribute fits into the remaining bytes
+ * @nla: netlink attribute
+ * @remaining: number of bytes remaining in attribute stream
+ */
+static inline int nla_ok(const struct nlattr *nla, int remaining)
+{
+	return remaining >= (int) sizeof(*nla) &&
+	       nla->nla_len >= sizeof(*nla) &&
+	       nla->nla_len <= remaining;
+}
+
+/**
+ * nla_next - next netlink attribute in attribute stream
+ * @nla: netlink attribute
+ * @remaining: number of bytes remaining in attribute stream
+ *
+ * Returns the next netlink attribute in the attribute stream and
+ * decrements remaining by the size of the current attribute.
+ */
+static inline struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
+{
+	unsigned int totlen = NLA_ALIGN(nla->nla_len);
+
+	*remaining -= totlen;
+	return (struct nlattr *) ((char *) nla + totlen);
+}
+
+/**
+ * nla_for_each_attr - iterate over a stream of attributes
+ * @pos: loop counter, set to current attribute
+ * @head: head of attribute stream
+ * @len: length of attribute stream
+ * @rem: initialized to len, holds bytes currently remaining in stream
+ */
+#define nla_for_each_attr(pos, head, len, rem) \
+	for (pos = head, rem = len; \
+	     nla_ok(pos, rem); \
+	     pos = nla_next(pos, &(rem)))
+
+/**
+ * nla_data - head of payload
+ * @nla: netlink attribute
+ */
+static inline void *nla_data(const struct nlattr *nla)
+{
+	return (char *) nla + NLA_HDRLEN;
+}
+
+/**
+ * nla_get_u32 - return payload of u32 attribute
+ * @nla: u32 netlink attribute
+ */
+static inline __u32 nla_get_u32(const struct nlattr *nla)
+{
+	return *(__u32 *) nla_data(nla);
+}
+
+/**
+ * nla_get_u16 - return payload of u16 attribute
+ * @nla: u16 netlink attribute
+ */
+static inline __u16 nla_get_u16(const struct nlattr *nla)
+{
+	return *(__u16 *) nla_data(nla);
+}
+
+/**
+ * nlmsg_len - length of message payload
+ * @nlh: netlink message header
+ */
+static inline int nlmsg_len(const struct nlmsghdr *nlh)
+{
+	return nlh->nlmsg_len - NLMSG_HDRLEN;
+}
+
+#endif /* __NLATTR_H */
diff --git a/lib/Makefile b/lib/Makefile
index 1d24ca24b9a3..77fac8d59446 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -8,7 +8,7 @@ CFLAGS += -fPIC
 
 UTILOBJ = utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o \
 	inet_proto.o namespace.o json_writer.o \
-	names.o color.o bpf.o exec.o fs.o
+	names.o color.o bpf.o exec.o fs.o nlattr.o
 
 NLOBJ=libgenl.o ll_map.o libnetlink.o
 
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 5b75b2db4e0b..b5ee751c6b86 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -30,10 +30,6 @@
 #define SOL_NETLINK 270
 #endif
 
-#ifndef MIN
-#define MIN(a, b) ((a) < (b) ? (a) : (b))
-#endif
-
 int rcvbuf = 1024 * 1024;
 
 void rtnl_close(struct rtnl_handle *rth)
diff --git a/lib/nlattr.c b/lib/nlattr.c
new file mode 100644
index 000000000000..2a3a031fdb65
--- /dev/null
+++ b/lib/nlattr.c
@@ -0,0 +1,145 @@
+/*
+ * NETLINK      Netlink attributes
+ *
+ *		Authors:	Thomas Graf <tgraf@suug.ch>
+ *				Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
+ */
+
+#include <errno.h>
+#include "nlattr.h"
+#include "libnetlink.h"
+
+static const __u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
+	[NLA_U8]	= sizeof(__u8),
+	[NLA_U16]	= sizeof(__u16),
+	[NLA_U32]	= sizeof(__u32),
+	[NLA_U64]	= sizeof(__u64),
+	[NLA_MSECS]	= sizeof(__u64),
+	[NLA_NESTED]	= NLA_HDRLEN,
+	[NLA_S8]	= sizeof(__s8),
+	[NLA_S16]	= sizeof(__s16),
+	[NLA_S32]	= sizeof(__s32),
+	[NLA_S64]	= sizeof(__s64),
+};
+
+static int validate_nla(const struct nlattr *nla, int maxtype,
+			const struct nla_policy *policy)
+{
+	const struct nla_policy *pt;
+	int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla);
+
+	if (type <= 0 || type > maxtype)
+		return 0;
+
+	pt = &policy[type];
+
+	if (pt->type > NLA_TYPE_MAX)
+		return -EINVAL;
+
+	switch (pt->type) {
+	case NLA_FLAG:
+		if (attrlen > 0)
+			return -ERANGE;
+		break;
+
+	case NLA_NUL_STRING:
+		if (pt->len)
+			minlen = MIN(attrlen, pt->len + 1);
+		else
+			minlen = attrlen;
+
+		if (!minlen || memchr(nla_data(nla), '\0', minlen) == NULL)
+			return -EINVAL;
+		/* fall through */
+
+	case NLA_STRING:
+		if (attrlen < 1)
+			return -ERANGE;
+
+		if (pt->len) {
+			char *buf = nla_data(nla);
+
+			if (buf[attrlen - 1] == '\0')
+				attrlen--;
+
+			if (attrlen > pt->len)
+				return -ERANGE;
+		}
+		break;
+
+	case NLA_BINARY:
+		if (pt->len && attrlen > pt->len)
+			return -ERANGE;
+		break;
+
+	case NLA_NESTED_COMPAT:
+		if (attrlen < pt->len)
+			return -ERANGE;
+		if (attrlen < NLA_ALIGN(pt->len))
+			break;
+		if (attrlen < NLA_ALIGN(pt->len) + NLA_HDRLEN)
+			return -ERANGE;
+		nla = nla_data(nla) + NLA_ALIGN(pt->len);
+		if (attrlen < NLA_ALIGN(pt->len) + NLA_HDRLEN + nla_len(nla))
+			return -ERANGE;
+		break;
+	case NLA_NESTED:
+		/* a nested attributes is allowed to be empty; if its not,
+		 * it must have a size of at least NLA_HDRLEN.
+		 */
+		if (attrlen == 0)
+			break;
+	default:
+		if (pt->len)
+			minlen = pt->len;
+		else if (pt->type != NLA_UNSPEC)
+			minlen = nla_attr_minlen[pt->type];
+
+		if (attrlen < minlen)
+			return -ERANGE;
+	}
+
+	return 0;
+}
+
+/**
+ * nla_parse - Parse a stream of attributes into a tb buffer
+ * @tb: destination array with maxtype+1 elements
+ * @maxtype: maximum attribute type to be expected
+ * @head: head of attribute stream
+ * @len: length of attribute stream
+ * @policy: validation policy
+ *
+ * Parses a stream of attributes and stores a pointer to each attribute in
+ * the tb array accessible via the attribute type. Attributes with a type
+ * exceeding maxtype will be silently ignored for backwards compatibility
+ * reasons. policy may be set to NULL if no validation is required.
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int nla_parse(struct nlattr **tb, int maxtype, const struct nlattr *head,
+	      int len, const struct nla_policy *policy)
+{
+	const struct nlattr *nla;
+	int rem, err;
+
+	memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
+
+	nla_for_each_attr(nla, head, len, rem) {
+		__u16 type = nla_type(nla);
+
+		if (type > 0 && type <= maxtype) {
+			if (policy) {
+				err = validate_nla(nla, maxtype, policy);
+				if (err < 0)
+					goto errout;
+			}
+
+			tb[type] = (struct nlattr *)nla;
+		}
+	}
+
+	err = 0;
+errout:
+	return err;
+}
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next iproute2 3/3] ip link: Add extack handling for setlink
From: David Ahern @ 2017-05-02  3:18 UTC (permalink / raw)
  To: netdev, stephen; +Cc: jakub.kicinski, David Ahern
In-Reply-To: <1493695105-9418-1-git-send-email-dsa@cumulusnetworks.com>

Flip iplink_modify to rtnl_talk_extack. For this first patch only
error messages returned from the kernel are displayed to the user.

Follow on patches can add parsing of the returned message and the
error offset to show which attribute caused an error.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 ip/iplink.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index ae1c70ebcc81..aad0220a63a7 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -861,6 +861,19 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 	return ret - argc;
 }
 
+static int iplink_extack(const char *errmsg, __u32 off,
+			 struct nlmsghdr *err_nlh)
+{
+	int rc = 0;
+
+	if (errmsg) {
+		rc++;
+		fprintf(stderr, "Error: %s\n", errmsg);
+	}
+
+	return rc;
+}
+
 static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 {
 	int len;
@@ -906,7 +919,8 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 			req.i.ifi_index = 0;
 			addattr32(&req.n, sizeof(req), IFLA_GROUP, group);
-			if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
+			if (rtnl_talk_extack(&rth, &req.n, NULL, 0,
+					     iplink_extack) < 0)
 				return -2;
 			return 0;
 		}
@@ -1001,7 +1015,7 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 		return -1;
 	}
 
-	if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
+	if (rtnl_talk_extack(&rth, &req.n, NULL, 0, iplink_extack) < 0)
 		return -2;
 
 	return 0;
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next iproute2 2/3] netlink: Add support for extended ack to rtnl_talk
From: David Ahern @ 2017-05-02  3:18 UTC (permalink / raw)
  To: netdev, stephen; +Cc: jakub.kicinski, David Ahern
In-Reply-To: <1493695105-9418-1-git-send-email-dsa@cumulusnetworks.com>

Add support for extended ack error reporting.

Add a new function rtnl_talk_extack that takes a callback as an input
arg. If a netlink response contains extack attributes, the callback is
is invoked with the the err string, offset in the message and a pointer
to the message returned by the kernel.

Adding a new function allows commands to be moved over to the
extended error reporting over time.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/libnetlink.h |  6 ++++
 lib/libnetlink.c     | 92 ++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index e7c46f1870aa..6b031454ce2b 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -71,6 +71,9 @@ typedef int (*rtnl_listen_filter_t)(const struct sockaddr_nl *,
 				    struct rtnl_ctrl_data *,
 				    struct nlmsghdr *n, void *);
 
+typedef int (*nl_ext_ack_fn_t)(const char *errmsg, __u32 off,
+			       struct nlmsghdr *inner_nlh);
+
 struct rtnl_dump_filter_arg {
 	rtnl_filter_t filter;
 	void *arg1;
@@ -87,6 +90,9 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
 int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	      struct nlmsghdr *answer, size_t len)
 	__attribute__((warn_unused_result));
+int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+	      struct nlmsghdr *answer, size_t len, nl_ext_ack_fn_t errfn)
+	__attribute__((warn_unused_result));
 int rtnl_talk_suppress_rtnl_errmsg(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 				   struct nlmsghdr *answer, size_t len)
 	__attribute__((warn_unused_result));
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index b5ee751c6b86..f6451dec1332 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -32,6 +32,61 @@
 
 int rcvbuf = 1024 * 1024;
 
+/* dump netlink extended ack error message */
+static int nl_dump_ext_err(struct nlmsghdr *nlh, nl_ext_ack_fn_t errfn)
+{
+	const struct nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = {
+		[NLMSGERR_ATTR_MSG]	= { .type = NLA_STRING },
+		[NLMSGERR_ATTR_OFFS]	= { .type = NLA_U32 },
+	};
+	struct nlattr *tb[NLMSGERR_ATTR_MAX + 1], *attr;
+	struct nlmsghdr *err_nlh = NULL;
+	struct nlmsgerr *err;
+	char *errmsg = NULL;
+	int hlen, alen;
+	__u32 off = 0;
+
+	if (!errfn)
+		return 0;
+
+	/* no TLVs, nothing to do here */
+	if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS))
+		return 0;
+
+	err = (struct nlmsgerr *)NLMSG_DATA(nlh);
+	hlen = sizeof(*err);
+
+	/* if NLM_F_CAPPED is set then the inner err msg was capped */
+	if (!(nlh->nlmsg_flags & NLM_F_CAPPED))
+		hlen += nlmsg_len(&err->msg);
+
+	attr = (struct nlattr *) ((void *) err + hlen);
+	alen = nlh->nlmsg_len - hlen;
+
+	if (nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, extack_policy) != 0) {
+		fprintf(stderr,
+			"Failed to parse extended error attributes\n");
+		return 0;
+	}
+
+
+	if (tb[NLMSGERR_ATTR_MSG])
+		errmsg = (char *) nla_data(tb[NLMSGERR_ATTR_MSG]);
+
+	if (tb[NLMSGERR_ATTR_OFFS]) {
+		off = nla_get_u32(tb[NLMSGERR_ATTR_OFFS]);
+
+		if (off > nlh->nlmsg_len) {
+			fprintf(stderr,
+				"Invalid offset for NLMSGERR_ATTR_OFFS\n");
+			off = 0;
+		} else if (!(nlh->nlmsg_flags & NLM_F_CAPPED))
+			err_nlh = &err->msg;
+	}
+
+	return errfn(errmsg, off, err_nlh);
+}
+
 void rtnl_close(struct rtnl_handle *rth)
 {
 	if (rth->fd >= 0) {
@@ -45,6 +100,7 @@ int rtnl_open_byproto(struct rtnl_handle *rth, unsigned int subscriptions,
 {
 	socklen_t addr_len;
 	int sndbuf = 32768;
+	int one = 1;
 
 	memset(rth, 0, sizeof(*rth));
 
@@ -67,6 +123,11 @@ int rtnl_open_byproto(struct rtnl_handle *rth, unsigned int subscriptions,
 		return -1;
 	}
 
+	if (setsockopt(rth->fd, SOL_NETLINK, NETLINK_EXT_ACK,
+		       &one, sizeof(one)) < 0) {
+		/* debug/verbose message that it is not supported */
+	}
+
 	memset(&rth->local, 0, sizeof(rth->local));
 	rth->local.nl_family = AF_NETLINK;
 	rth->local.nl_groups = subscriptions;
@@ -413,9 +474,19 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
 	return rtnl_dump_filter_l(rth, a);
 }
 
+static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err,
+			    nl_ext_ack_fn_t errfn)
+{
+	if (nl_dump_ext_err(h, errfn))
+		return;
+
+	fprintf(stderr, "RTNETLINK answers: %s\n",
+		strerror(-err->error));
+}
+
 static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 		       struct nlmsghdr *answer, size_t maxlen,
-		       bool show_rtnl_err)
+		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
 {
 	int status;
 	unsigned int seq;
@@ -502,10 +573,10 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 					return 0;
 				}
 
-				if (rtnl->proto != NETLINK_SOCK_DIAG && show_rtnl_err)
-					fprintf(stderr,
-						"RTNETLINK answers: %s\n",
-						strerror(-err->error));
+				if (rtnl->proto != NETLINK_SOCK_DIAG &&
+				    show_rtnl_err)
+					rtnl_talk_error(h, err, errfn);
+
 				errno = -err->error;
 				return -1;
 			}
@@ -537,13 +608,20 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 	      struct nlmsghdr *answer, size_t maxlen)
 {
-	return __rtnl_talk(rtnl, n, answer, maxlen, true);
+	return __rtnl_talk(rtnl, n, answer, maxlen, true, NULL);
+}
+
+int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n,
+		     struct nlmsghdr *answer, size_t maxlen,
+		     nl_ext_ack_fn_t errfn)
+{
+	return __rtnl_talk(rtnl, n, answer, maxlen, true, errfn);
 }
 
 int rtnl_talk_suppress_rtnl_errmsg(struct rtnl_handle *rtnl, struct nlmsghdr *n,
 				   struct nlmsghdr *answer, size_t maxlen)
 {
-	return __rtnl_talk(rtnl, n, answer, maxlen, false);
+	return __rtnl_talk(rtnl, n, answer, maxlen, false, NULL);
 }
 
 int rtnl_listen_all_nsid(struct rtnl_handle *rth)
-- 
2.1.4

^ permalink raw reply related

* Re: sparc64 and ARM64 JIT bug
From: David Miller @ 2017-05-02  3:19 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xi.wang, catalin.marinas
In-Reply-To: <20170501.230234.787989809925411599.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 01 May 2017 23:02:34 -0400 (EDT)

> 	for (i = 0; i < prog->len; i++) {
> 		const struct bpf_insn *insn = &prog->insnsi[i];
> 		int ret;
> 
> 		ret = build_insn(insn, ctx);
> 		ctx->offset[i] = ctx->idx;
> 
> 		if (ret > 0) {
> 			i++;
> 			continue;
> 		}
> 		if (ret)
> 			return ret;
> 	}

Ok, the fix is to defer the ctx->offset[i] setting until after the
potential extra "i++" increment inside of the "if (ret > 0)" test.

This is how x86_64's JIT handles this.

I'm testing this fix on sparc64 now.

^ permalink raw reply

* Re: [PATCH net-next iproute2 0/3] ip: Initial support for extack errors
From: Jakub Kicinski @ 2017-05-02  3:34 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, stephen
In-Reply-To: <1493695105-9418-1-git-send-email-dsa@cumulusnetworks.com>

On Mon,  1 May 2017 20:18:22 -0700, David Ahern wrote:
> Introduce a new function, rtnl_ack_extack, to allow commands to flip
> to the new error reporting over time.
> 
> Convert iplink_modify to use the new function to display error strings
> returned from ip link set commands.

Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-05-02  3:49 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, xdp-newbies
In-Reply-To: <20170430.120750.651845251226226775.davem@davemloft.net>

On 4/30/17 9:07 AM, David Miller wrote:
> This is mainly a synchronization point, I still need to look
> more deeply into Alexei's -g issue.
>
> New in this version from v3:
>  - Remove tailcall from opcode table
>  - Rearrange relocations so that numbers match with LLVM ones
>  - Emit relocs properly so that dwarf2 debug info tests pass
>  - Handle negative load/store offsets properly, add tests
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

dwarf on little endian works now :)

$ /w/binutils-gdb/bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfle

Disassembly of section .text:

0000000000000000 <bpf_prog1>:
int bpf_prog1(void *ign)
{
   volatile unsigned long t = 0x8983984739ull;
    0:	18 01 00 00 39 47 98 83 	ldimm64	r0, 590618314553
    8:	00 00 00 00 89 00 00 00
   10:	7b 1a f8 ff 00 00 00 00 	stdw	[r1+-8], r10
   return *(unsigned long *)((0xffffffff8fff0002ull) + t);
   18:	79 a1 f8 ff 00 00 00 00 	lddw	r10, [r1+-8]

This is great milestone.

Also I finally figured out how to enable native+bpf:
../configure --enable-targets=bpf-elf,x86_64-elf

having support for both in one binary is a big deal :)

Only 'gdb' warns with dual arch support:
"
warning: A handler for the OS ABI "GNU/Linux" is not built into this 
configuration
of GDB.  Attempting to continue with the default bpf settings.
"

(gdb) x/10i bpf_prog1
    0x0 <bpf_prog1>:	ldimm64	r0, 590618314553
    0x10 <bpf_prog1+16>:	stdw	[r1+-8], r10
    0x18 <bpf_prog1+24>:	lddw	r10, [r1+-8]
    0x20 <bpf_prog1+32>:	add	r0, -1879113726
    0x28 <bpf_prog1+40>:	lddw	r1, [r0+0]
    0x30 <bpf_prog1+48>:	exit
    0x38:	Cannot access memory at address 0x38

the last line also seems wrong. Off by 1 error?

^ permalink raw reply

* [PATCH] sparc64: Fix BPF JIT wrt. branches and ldimm64 instructions.
From: David Miller @ 2017-05-02  3:50 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xi.wang, catalin.marinas


Like other JITs, sparc64 maintains an array of instruction offsets but
stores the entries off by one.  This is done because jumps to the exit
block are indexed to one past the last BPF instruction.

So if we size the array by the program length, we need to record the
previous instruction in order to stay within the array bounds.

This is explained in ARM JIT commit 8eee539ddea0 ("arm64: bpf: fix
out-of-bounds read in bpf2a64_offset()").

But this scheme requires a little bit of careful handling when the
instruction before the branch destination is a 64-bit load immediate.
It takes up 2 BPF instruction slots.

Therefore, we have to fill in the array entry for the second half of
the 64-bit load immediate instruction rather than for the one for the
beginning of that instruction.

Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/net/bpf_jit_comp_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index ec7d10d..21de774 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1446,12 +1446,13 @@ static int build_body(struct jit_ctx *ctx)
 		int ret;
 
 		ret = build_insn(insn, ctx);
-		ctx->offset[i] = ctx->idx;
 
 		if (ret > 0) {
 			i++;
+			ctx->offset[i] = ctx->idx;
 			continue;
 		}
+		ctx->offset[i] = ctx->idx;
 		if (ret)
 			return ret;
 	}
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* Re: [PATCH v4 binutils] Add BPF support to binutils...
From: David Miller @ 2017-05-02  3:51 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies
In-Reply-To: <33505cff-f730-ebac-c2d7-38f1793062b7@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 1 May 2017 20:49:21 -0700

> (gdb) x/10i bpf_prog1
>    0x0 <bpf_prog1>:	ldimm64	r0, 590618314553
>    0x10 <bpf_prog1+16>:	stdw	[r1+-8], r10
>    0x18 <bpf_prog1+24>:	lddw	r10, [r1+-8]
>    0x20 <bpf_prog1+32>:	add	r0, -1879113726
>    0x28 <bpf_prog1+40>:	lddw	r1, [r0+0]
>    0x30 <bpf_prog1+48>:	exit
>    0x38:	Cannot access memory at address 0x38
> 
> the last line also seems wrong. Off by 1 error?

Maybe, I'll look into it tomorrow.  I'll also post my fix for the
branch destination disassembler output tomorrow as well.

^ permalink raw reply

* bpf_test_finish()
From: David Miller @ 2017-05-02  3:56 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev


It dereferences a user pointer:

static int bpf_test_finish(union bpf_attr __user *uattr, const void *data,
			   u32 size, u32 retval, u32 duration)
{
	void __user *data_out = u64_to_user_ptr(uattr->test.data_out);
                                                ^^^^^^^^^^^^^^^^^^^^
Which of course doesn't work so well :-)

I really wish that didn't silently work on x86/x86_64.

You're going to have to do a "get_user(&uattr->test.data_out)"

^ permalink raw reply

* Re: bpf_test_finish()
From: Alexei Starovoitov @ 2017-05-02  4:46 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170501.235610.564976046138352257.davem@davemloft.net>

On 5/1/17 8:56 PM, David Miller wrote:
>
> It dereferences a user pointer:
>
> static int bpf_test_finish(union bpf_attr __user *uattr, const void *data,
> 			   u32 size, u32 retval, u32 duration)
> {
> 	void __user *data_out = u64_to_user_ptr(uattr->test.data_out);
>                                                 ^^^^^^^^^^^^^^^^^^^^
> Which of course doesn't work so well :-)
>
> I really wish that didn't silently work on x86/x86_64.

argh. my bad.
I'll send a patch first thing tomorrow unless Daniel beats me to it.
We have kattr there as well which has the whole bpf_attr copied into
kernel memory already. Should have taken data_out from there and
passed into bpf_test_finish().

^ permalink raw reply

* Re: [patch net-next 00/10] net: sched: introduce multichain support for filters
From: Cong Wang @ 2017-05-02  5:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	David Ahern, Eric Dumazet, Stephen Hemminger, Daniel Borkmann,
	Alexander Duyck, mlxsw, Simon Horman
In-Reply-To: <20170428223446.GB1905@nanopsycho.orion>

On Fri, Apr 28, 2017 at 3:34 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Fri, Apr 28, 2017 at 07:40:24PM CEST, xiyou.wangcong@gmail.com wrote:
>>On Thu, Apr 27, 2017 at 11:53 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Thu, Apr 27, 2017 at 07:46:03PM CEST, xiyou.wangcong@gmail.com wrote:
>>>>On Thu, Apr 27, 2017 at 4:12 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Simple example:
>>>>> $ tc qdisc add dev eth0 ingress
>>>>> $ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
>>>>> $ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
>>>>> $ tc filter show dev eth0 root
>>>>
>>>>Interesting.
>>>>
>>>>I don't look into the code yet. If I understand the concepts correctly,
>>>>so with your patchset we can mark either filter with a chain No. to
>>>>choose which chain it belongs to _logically_ even though
>>>>_physically_ it is still in the old-fashion chain (prio, proto)?
>>>
>>> You have to see the code :)
>>
>>I don't understand why I have to, these are high-level concepts
>>and should be put in your cover letter (aka. design doc). You miss
>>a lot of information about the ordering here.
>
> Well, the description is one thing, but seeing the actual code should
> put the whole view. But if you are missing something, I can add it. What
> do you mean by "information about the ordering"?
>

By ordering, I mean:

1) before your patch, filters are ordered by prio and categorized by proto

2) after your patch, we can jump from one filter to a specified one, how
does this work or not work with the prio/proto?

^ permalink raw reply

* 10921 netdev
From: gestu @ 2017-05-02  5:38 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 801737.zip --]
[-- Type: application/zip, Size: 4277 bytes --]

^ permalink raw reply

* Re: [patch net-next 00/10] net: sched: introduce multichain support for filters
From: Jiri Pirko @ 2017-05-02  5:50 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	David Ahern, Eric Dumazet, Stephen Hemminger, Daniel Borkmann,
	Alexander Duyck, mlxsw, Simon Horman
In-Reply-To: <CAM_iQpX8i5-YZRkKN9DyvjCeE8W4M33wiUUTHizsyGsnsE=0KQ@mail.gmail.com>

Tue, May 02, 2017 at 07:26:07AM CEST, xiyou.wangcong@gmail.com wrote:
>On Fri, Apr 28, 2017 at 3:34 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Fri, Apr 28, 2017 at 07:40:24PM CEST, xiyou.wangcong@gmail.com wrote:
>>>On Thu, Apr 27, 2017 at 11:53 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Thu, Apr 27, 2017 at 07:46:03PM CEST, xiyou.wangcong@gmail.com wrote:
>>>>>On Thu, Apr 27, 2017 at 4:12 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>> Simple example:
>>>>>> $ tc qdisc add dev eth0 ingress
>>>>>> $ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
>>>>>> $ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
>>>>>> $ tc filter show dev eth0 root
>>>>>
>>>>>Interesting.
>>>>>
>>>>>I don't look into the code yet. If I understand the concepts correctly,
>>>>>so with your patchset we can mark either filter with a chain No. to
>>>>>choose which chain it belongs to _logically_ even though
>>>>>_physically_ it is still in the old-fashion chain (prio, proto)?
>>>>
>>>> You have to see the code :)
>>>
>>>I don't understand why I have to, these are high-level concepts
>>>and should be put in your cover letter (aka. design doc). You miss
>>>a lot of information about the ordering here.
>>
>> Well, the description is one thing, but seeing the actual code should
>> put the whole view. But if you are missing something, I can add it. What
>> do you mean by "information about the ordering"?
>>
>
>By ordering, I mean:
>
>1) before your patch, filters are ordered by prio and categorized by proto
>
>2) after your patch, we can jump from one filter to a specified one, how
>does this work or not work with the prio/proto?

No, you can jump to another chain. And that chain is also ordered by
prio/proto.

Just imagine currently you have only chain 0. This patchset just extends
for other chains.

^ permalink raw reply

* Re: [patch net-next] net: sched: add helpers to handle extended actions
From: Jiri Pirko @ 2017-05-02  5:51 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, davem, xiyou.wangcong, mlxsw
In-Reply-To: <6450af04-a6d4-866e-82ee-424ff54b2ef3@mojatatu.com>

Sun, Apr 30, 2017 at 04:08:15PM CEST, jhs@mojatatu.com wrote:
>Jiri,
>
>With "goto chain X" this will have to be more generalized. Maybe
>we have 0xAXXXXXXX Where "A" recognizes the extension with
>current values ACT_JUMP(0x1) and GOTO_CHAIN(maybe 0x2)
>and the rest "XXXXXXX" is a free floating parameter values
>which carry the goto count for ACT_JUMP and GOTO_CHAIN chain-id.

That is exactly what this patch is doing.


>
>cheers,
>jamal
>
>On 17-04-28 12:13 PM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Jump is now the only one using value action opcode. This is going to
>> change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/uapi/linux/pkt_cls.h | 15 ++++++++++++++-
>>  net/sched/act_api.c          |  2 +-
>>  2 files changed, 15 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index f1129e3..d613be3 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -37,7 +37,20 @@ enum {
>>  #define TC_ACT_QUEUED		5
>>  #define TC_ACT_REPEAT		6
>>  #define TC_ACT_REDIRECT		7
>> -#define TC_ACT_JUMP		0x10000000
>> +
>> +/* There is a special kind of actions called "extended actions",
>> + * which need a value parameter. These have a local opcode located in
>> + * the highest nibble, starting from 1. The rest of the bits
>> + * are used to carry the value. These two parts together make
>> + * a combined opcode.
>> + */
>> +#define __TC_ACT_EXT_SHIFT 28
>> +#define __TC_ACT_EXT(local) ((local) << __TC_ACT_EXT_SHIFT)
>> +#define TC_ACT_EXT_VAL_MASK ((1 << __TC_ACT_EXT_SHIFT) - 1)
>> +#define TC_ACT_EXT_CMP(combined, opcode) \
>> +	(((combined) & (~TC_ACT_EXT_VAL_MASK)) == opcode)
>> +
>> +#define TC_ACT_JUMP __TC_ACT_EXT(1)
>> 
>>  /* Action type identifiers*/
>>  enum {
>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>> index 7f2cd70..a90e8f3 100644
>> --- a/net/sched/act_api.c
>> +++ b/net/sched/act_api.c
>> @@ -453,7 +453,7 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
>>  		if (ret == TC_ACT_REPEAT)
>>  			goto repeat;	/* we need a ttl - JHS */
>> 
>> -		if (ret & TC_ACT_JUMP) {
>> +		if (TC_ACT_EXT_CMP(ret, TC_ACT_JUMP)) {
>>  			jmp_prgcnt = ret & TCA_ACT_MAX_PRIO_MASK;
>>  			if (!jmp_prgcnt || (jmp_prgcnt > nr_actions)) {
>>  				/* faulty opcode, stop pipeline */
>> 
>

^ permalink raw reply

* [PATCH net v4 00/12] Fix possbile memleaks when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng

From: Gao Feng <gfree.wind@foxmail.com>

These following drivers allocate kinds of resources in its ndo_init
func, free some of them or all in the destructor func. Then there is
one memleak that some errors happen after register_netdevice invokes
the ndo_init callback. Because only the ndo_uninit callback is invoked
in the error handler of register_netdevice, but destructor not.

In my original approach, I tried to free the resources in the newlink
func when fail to register_netdevice, like destructor did except not
free the net_dev. This method is not good when destructor is changed,
and the memleak could be not fixed when there is no newlink callback.

Now create one new func used to free the resources in the destructor,
and the ndo_uninit func also could invokes it when fail to register
the net_device by comparing the dev->reg_state with NETREG_UNINITIALIZED.
If there is no existing ndo_uninit, just add one.

This solution doesn't only make sure free all resources in any case,
but also follows the original desgin that some resources could be kept
until the destructor executes normally after register the device
successfully.

Gao Feng (12):
  driver: dummy: Fix one possbile memleak when fail to
    register_netdevice
  driver: ifb: Fix one possbile memleak when fail to register_netdevice
  driver: loopback: Fix one possbile memleak when fail to
    register_netdevice
  driver: team: Fix one possbile memleak when fail to register_netdevice
  driver: veth: Fix one possbile memleak when fail to register_netdevice
  net: ip6_gre: Fix one possbile memleak when fail to register_netdevice
  ip6_tunnel: Fix one possbile memleak when fail to register_netdevice
  net: ip6_vti: Fix one possbile memleak when fail to register_netdevice
  net: ip_tunnel: Fix one possbile memleak when fail to
    register_netdevice
  net: sit: Fix one possbile memleak when fail to register_netdevice
  net: vlan: Fix one possbile memleak when fail to register_netdevice
  net: batman-adv: Fix one possbile memleak when fail to
    register_netdevice

 drivers/net/dummy.c             | 14 +++++++++++---
 drivers/net/ifb.c               | 33 +++++++++++++++++++++++----------
 drivers/net/loopback.c          | 15 ++++++++++++++-
 drivers/net/team/team.c         | 15 ++++++++++++---
 drivers/net/veth.c              | 15 ++++++++++++++-
 net/8021q/vlan_dev.c            | 17 +++++++++++++----
 net/batman-adv/soft-interface.c | 18 +++++++++++++++---
 net/ipv4/ip_tunnel.c            | 11 ++++++++++-
 net/ipv6/ip6_gre.c              | 17 +++++++++++++----
 net/ipv6/ip6_tunnel.c           | 11 ++++++++++-
 net/ipv6/ip6_vti.c              | 11 ++++++++++-
 net/ipv6/sit.c                  | 17 +++++++++++++----
 12 files changed, 158 insertions(+), 36 deletions(-)

^ permalink raw reply

* [PATCH net v4 01/12] driver: dummy: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The dummy driver allocates dev->dstats and priv->vfinfo in its
ndo_init func dummy_dev_init, free the dev->dstats in the ndo_uninit
and free the priv->vfinfo in its destructor func. Then there is one
memleak that some errors happen after register_netdevice invokes the
ndo_init callback. Because only the ndo_uninit callback is invoked in
the error handler of register_netdevice, but destructor not.

Now create one new func dummy_destructor_free to free the mem in the
destructor, and the ndo_uninit func also invokes it when fail to
register the dummy device.

It's not only free all resources, but also follow the original desgin
that the priv->vfinfo is freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 drivers/net/dummy.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 2c80611..0b3c1cc 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -153,9 +153,19 @@ static int dummy_dev_init(struct net_device *dev)
 	return 0;
 }
 
+static void dummy_destructor_free(struct net_device *dev)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	kfree(priv->vfinfo);
+}
+
 static void dummy_dev_uninit(struct net_device *dev)
 {
 	free_percpu(dev->dstats);
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		dummy_destructor_free(dev);
 }
 
 static int dummy_change_carrier(struct net_device *dev, bool new_carrier)
@@ -310,9 +320,7 @@ static void dummy_get_drvinfo(struct net_device *dev,
 
 static void dummy_free_netdev(struct net_device *dev)
 {
-	struct dummy_priv *priv = netdev_priv(dev);
-
-	kfree(priv->vfinfo);
+	dummy_destructor_free(dev);
 	free_netdev(dev);
 }
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v4 06/12] net: ip6_gre: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The ip6_gre allocates some resources in its ndo_init func, and
free some of them in its destructor func. Then there is one memleak
that some errors happen after register_netdevice invokes the ndo_init
callback. Because only the ndo_uninit callback is invoked in the error
handler of register_netdevice, but destructor not.

Now create one new func ip6_gre_destructor_free to free the mem in
the destructor, and ndo_uninit func also invokes it when fail to
register the ip6_gre device.

It's not only free all resources, but also follow the original desgin
that the resources are freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 net/ipv6/ip6_gre.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 6fcb7cb..e53c3ac 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -355,6 +355,14 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
 	return NULL;
 }
 
+static void ip6gre_destructor_free(struct net_device *dev)
+{
+	struct ip6_tnl *t = netdev_priv(dev);
+
+	dst_cache_destroy(&t->dst_cache);
+	free_percpu(dev->tstats);
+}
+
 static void ip6gre_tunnel_uninit(struct net_device *dev)
 {
 	struct ip6_tnl *t = netdev_priv(dev);
@@ -363,6 +371,10 @@ static void ip6gre_tunnel_uninit(struct net_device *dev)
 	ip6gre_tunnel_unlink(ign, t);
 	dst_cache_reset(&t->dst_cache);
 	dev_put(dev);
+
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		ip6gre_destructor_free(dev);
 }
 
 
@@ -981,10 +993,7 @@ static int ip6gre_header(struct sk_buff *skb, struct net_device *dev,
 
 static void ip6gre_dev_free(struct net_device *dev)
 {
-	struct ip6_tnl *t = netdev_priv(dev);
-
-	dst_cache_destroy(&t->dst_cache);
-	free_percpu(dev->tstats);
+	ip6gre_destructor_free(dev);
 	free_netdev(dev);
 }
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v4 02/12] driver: ifb: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The ifb driver allocates some resources in its ndo_init func, and free
them in its destructor func. Then there is one memleak that some errors
happen after register_netdevice invokes the ndo_init callback. Because
the destructor would not be invoked to free the resources.

Now create one new func ifb_destructor_free to free the mem in the
destructor, and add ndo_uninit func also invokes it when fail to register
the ifb device.

It's not only free all resources, but also follow the original desgin
that the resources are freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 drivers/net/ifb.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 312fce7..b25aea1 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -180,6 +180,27 @@ static int ifb_dev_init(struct net_device *dev)
 	return 0;
 }
 
+static void ifb_destructor_free(struct net_device *dev)
+{
+	struct ifb_dev_private *dp = netdev_priv(dev);
+	struct ifb_q_private *txp = dp->tx_private;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++, txp++) {
+		tasklet_kill(&txp->ifb_tasklet);
+		__skb_queue_purge(&txp->rq);
+		__skb_queue_purge(&txp->tq);
+	}
+	kfree(dp->tx_private);
+}
+
+static void ifb_dev_uninit(struct net_device *dev)
+{
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		ifb_destructor_free(dev);
+}
+
 static const struct net_device_ops ifb_netdev_ops = {
 	.ndo_open	= ifb_open,
 	.ndo_stop	= ifb_close,
@@ -187,6 +208,7 @@ static int ifb_dev_init(struct net_device *dev)
 	.ndo_start_xmit	= ifb_xmit,
 	.ndo_validate_addr = eth_validate_addr,
 	.ndo_init	= ifb_dev_init,
+	.ndo_uninit	= ifb_dev_uninit,
 };
 
 #define IFB_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG  | NETIF_F_FRAGLIST	| \
@@ -197,16 +219,7 @@ static int ifb_dev_init(struct net_device *dev)
 
 static void ifb_dev_free(struct net_device *dev)
 {
-	struct ifb_dev_private *dp = netdev_priv(dev);
-	struct ifb_q_private *txp = dp->tx_private;
-	int i;
-
-	for (i = 0; i < dev->num_tx_queues; i++,txp++) {
-		tasklet_kill(&txp->ifb_tasklet);
-		__skb_queue_purge(&txp->rq);
-		__skb_queue_purge(&txp->tq);
-	}
-	kfree(dp->tx_private);
+	ifb_destructor_free(dev);
 	free_netdev(dev);
 }
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v4 04/12] driver: team: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The team driver allocates some resources in its ndo_init func, and
free some of them in its destructor func. Then there is one memleak
that some errors happen after register_netdevice invokes the ndo_init
callback. Because only the ndo_uninit callback is invoked in the error
handler of register_netdevice, but destructor not.

Now create one new func team_destructor_free to free the mem in the
destructor, and ndo_uninit func also invokes it when fail to register
the team device.

It's not only free all resources, but also follow the original desgin
that the resources are freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 drivers/net/team/team.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 85c0124..d8dd203 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1619,6 +1619,13 @@ static int team_init(struct net_device *dev)
 	return err;
 }
 
+static void team_destructor_free(struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+
+	free_percpu(team->pcpu_stats);
+}
+
 static void team_uninit(struct net_device *dev)
 {
 	struct team *team = netdev_priv(dev);
@@ -1636,13 +1643,15 @@ static void team_uninit(struct net_device *dev)
 	team_queue_override_fini(team);
 	mutex_unlock(&team->lock);
 	netdev_change_features(dev);
+
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		team_destructor_free(dev);
 }
 
 static void team_destructor(struct net_device *dev)
 {
-	struct team *team = netdev_priv(dev);
-
-	free_percpu(team->pcpu_stats);
+	team_destructor_free(dev);
 	free_netdev(dev);
 }
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v4 05/12] driver: veth: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The veth driver allocates some resources in its ndo_init func, and
free them in its destructor func. Then there is one memleak that some
errors happen after register_netdevice invokes the ndo_init callback.
Because the destructor would not be invoked to free the resources.

Now create one new func veth_destructor_free to free the mem in the
destructor, and add ndo_uninit func also invokes it when fail to register
the veth device.

It's not only free all resources, but also follow the original desgin
that the resources are freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 drivers/net/veth.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c39d6d..418376a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -224,9 +224,21 @@ static int veth_dev_init(struct net_device *dev)
 	return 0;
 }
 
-static void veth_dev_free(struct net_device *dev)
+static void veth_destructor_free(struct net_device *dev)
 {
 	free_percpu(dev->vstats);
+}
+
+static void veth_dev_uninit(struct net_device *dev)
+{
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		veth_destructor_free(dev);
+}
+
+static void veth_dev_free(struct net_device *dev)
+{
+	veth_destructor_free(dev);
 	free_netdev(dev);
 }
 
@@ -284,6 +296,7 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr)
 
 static const struct net_device_ops veth_netdev_ops = {
 	.ndo_init            = veth_dev_init,
+	.ndo_uninit          = veth_dev_uninit,
 	.ndo_open            = veth_open,
 	.ndo_stop            = veth_close,
 	.ndo_start_xmit      = veth_xmit,
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v4 03/12] driver: loopback: Fix one possbile memleak when fail to register_netdevice
From: gfree.wind @ 2017-05-02  5:58 UTC (permalink / raw)
  To: davem, jiri, mareklindner, sw, a, kuznet, jmorris, yoshfuji,
	kaber, steffen.klassert, herbert, netdev
  Cc: Gao Feng
In-Reply-To: <cover.1493699451.git.gfree.wind@foxmail.com>

From: Gao Feng <gfree.wind@foxmail.com>

The loopback driver allocates some resources in its ndo_init func, and
free them in its destructor func. Then there is one memleak that some
errors happen after register_netdevice invokes the ndo_init callback.
Because the destructor would not be invoked to free the resources.

Now create one new func loopback_destructor_free to free the mem in
the destructor, and add ndo_uninit func also invokes it when fail to
register the loopback device.

It's not only free all resources, but also follow the original desgin
that the resources are freed in the destructor normally after
register the device successfully.

Signed-off-by: Gao Feng <gfree.wind@foxmail.com>
---
 drivers/net/loopback.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index b23b719..d7c1016 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -141,15 +141,28 @@ static int loopback_dev_init(struct net_device *dev)
 	return 0;
 }
 
-static void loopback_dev_free(struct net_device *dev)
+static void loopback_destructor_free(struct net_device *dev)
 {
 	dev_net(dev)->loopback_dev = NULL;
 	free_percpu(dev->lstats);
+}
+
+static void loopback_dev_uninit(struct net_device *dev)
+{
+	/* dev is not registered, perform the free instead of destructor */
+	if (dev->reg_state == NETREG_UNINITIALIZED)
+		loopback_destructor_free(dev);
+}
+
+static void loopback_dev_free(struct net_device *dev)
+{
+	loopback_destructor_free(dev);
 	free_netdev(dev);
 }
 
 static const struct net_device_ops loopback_ops = {
 	.ndo_init      = loopback_dev_init,
+	.ndo_uninit = loopback_dev_uninit,
 	.ndo_start_xmit= loopback_xmit,
 	.ndo_get_stats64 = loopback_get_stats64,
 	.ndo_set_mac_address = eth_mac_addr,
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox