Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] lib: fix multiple strlcpy definition
From: Phil Sutter @ 2017-09-26 15:55 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Stephen Hemminger, netdev
In-Reply-To: <7f5fee59e46bfd3b4acf8ed9a8fbd8c7b4f1cd70.1506424129.git.baruch@tkos.co.il>

On Tue, Sep 26, 2017 at 02:08:49PM +0300, Baruch Siach wrote:
[...]
> diff --git a/configure b/configure
> index 7be8fb113cc9..787b2e061af9 100755
> --- a/configure
> +++ b/configure
> @@ -326,6 +326,27 @@ EOF
>      rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
>  }
>  
> +check_strlcpy()
> +{
> +    cat >$TMPDIR/strtest.c <<EOF
> +#include <string.h>
> +int main(int argc, char **argv) {
> +	char dst[10];
> +	strlcpy("test", dst, sizeof(dst));

You swapped source and destination here. It's not important for the
given use-case, but the resulting binary should segfault.

Apart from that, LGTM!

Cheers, Phil

^ permalink raw reply

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: John Hurley @ 2017-09-26 15:39 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAJ3xEMhio5+5BK3p5mq=TvA8SzAvjSJm75mN0XiChdWkW3C89g@mail.gmail.com>

On Tue, Sep 26, 2017 at 4:33 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 6:11 PM, John Hurley <john.hurley@netronome.com> wrote:
>> On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>>>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>>>> <simon.horman@netronome.com> wrote:
>>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>>
>>>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>>>
>>>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>>>
>>>>
>>>> Sorry, this is a typo  and should refer to TC.
>>>>
>>>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>>>> @@ -52,8 +52,25 @@
>>>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>>>
>>>>> this series takes care of IPv6 tunnels too?
>>>>
>>>> IPv6 is not included in this set.
>>>> The reason the IPv6 bit is included here is to account for behavior we
>>>> have noticed in TC flower.
>>>> If, for example, I add a filter with the following match fields:
>>>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>>>> enc_dst_port 4789 enc_key_id 123'
>>>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>>>> addresses as 'used'.
>>>> I am not sure if this is a bug in TC or that we are expected to check
>>>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>>>
>>> you should have your code to check enc_control->addr_type to be
>>> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>>>
>>>
>>>> Including the IPv6 used_keys bit in our whitelist approach allows us
>>>> to accept legitimate IPv4 tunnel rules in these situations.
>>>
>>> mmm can please take a look on fl_init_dissector() and tell me if you
>>> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
>>> I am not sure.
>>
>>
>> The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
>> array of keys which are then translated to the used_keys values.
>> The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
>> checks if any mask bits are set in a particular field - if so it
>> eventually marks it as used.
>> In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
>> represented as a union of the 2.
>> Therefore, if we have masked bits set for IPv4, they are also being
>> set for the IPv6 field.
>
> I see, do you consider it a bug?

The code seems to insist that, if either IPv4 or IPv6 is in use then a
control encap key is also used:

if (FL_KEY_IS_MASKED(&mask->key, enc_ipv4) ||
   FL_KEY_IS_MASKED(&mask->key, enc_ipv6))
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_CONTROL,
  enc_control);

Therefore, I think it should be ok to use this to determine the IP
type in use by the tunnel.

^ permalink raw reply

* [PATCH v2] ebtables: fix race condition in frame_filter_net_init()
From: Artem Savkov @ 2017-09-26 15:39 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel,
	Artem Savkov
In-Reply-To: <20170926124211.GA14971@breakpoint.cc>

It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  7 ++++---
 net/bridge/netfilter/ebtable_broute.c     |  4 ++--
 net/bridge/netfilter/ebtable_filter.c     |  4 ++--
 net/bridge/netfilter/ebtable_nat.c        |  4 ++--
 net/bridge/netfilter/ebtables.c           | 17 ++++++++---------
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 		     ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-					    const struct ebt_table *table,
-					    const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+			      const struct ebt_table *table,
+			      const struct nf_hook_ops *ops,
+			      struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 				 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-	net->xt.broute_table = ebt_register_table(net, &broute_table, NULL);
-	return PTR_ERR_OR_ZERO(net->xt.broute_table);
+	return ebt_register_table(net, &broute_table, NULL,
+				  &net->xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-	net->xt.frame_filter = ebt_register_table(net, &frame_filter, ebt_ops_filter);
-	return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+	return ebt_register_table(net, &frame_filter, ebt_ops_filter,
+				  &net->xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-	net->xt.frame_nat = ebt_register_table(net, &frame_nat, ebt_ops_nat);
-	return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+	return ebt_register_table(net, &frame_nat, ebt_ops_nat,
+				  &net->xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..aa81afe81f23 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, struct ebt_table *table)
 	kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-		   const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+		       const struct nf_hook_ops *ops, struct ebt_table **res)
 {
 	struct ebt_table_info *newinfo;
 	struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	    repl->entries == NULL || repl->entries_size == 0 ||
 	    repl->counters != NULL || input_table->private != NULL) {
 		BUGPRINT("Bad table data for ebt_register_table!!!\n");
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 	}
 
 	/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,16 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	list_add(&table->list, &net->xt.tables[NFPROTO_BRIDGE]);
 	mutex_unlock(&ebt_mutex);
 
-	if (!ops)
-		return table;
+	WRITE_ONCE(*res, table);
 
 	ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
 	if (ret) {
 		__ebt_unregister_table(net, table);
-		return ERR_PTR(ret);
+		*res = NULL;
+		return ret;
 	}
 
-	return table;
+	return 0;
 free_unlock:
 	mutex_unlock(&ebt_mutex);
 free_chainstack:
@@ -1276,7 +1275,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 free_table:
 	kfree(table);
 out:
-	return ERR_PTR(ret);
+	return ret;
 }
 
 void ebt_unregister_table(struct net *net, struct ebt_table *table,
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>

Add a simple tool for querying and updating BPF objects on the system.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 tools/bpf/Makefile             |  18 +-
 tools/bpf/bpftool/Makefile     |  80 +++++
 tools/bpf/bpftool/common.c     | 214 ++++++++++++
 tools/bpf/bpftool/jit_disasm.c |  83 +++++
 tools/bpf/bpftool/main.c       | 212 ++++++++++++
 tools/bpf/bpftool/main.h       |  99 ++++++
 tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
 8 files changed, 1837 insertions(+), 3 deletions(-)
 create mode 100644 tools/bpf/bpftool/Makefile
 create mode 100644 tools/bpf/bpftool/common.c
 create mode 100644 tools/bpf/bpftool/jit_disasm.c
 create mode 100644 tools/bpf/bpftool/main.c
 create mode 100644 tools/bpf/bpftool/main.h
 create mode 100644 tools/bpf/bpftool/map.c
 create mode 100644 tools/bpf/bpftool/prog.c

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index ddf888010652..325a35e1c28e 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -3,6 +3,7 @@ prefix = /usr
 CC = gcc
 LEX = flex
 YACC = bison
+MAKE = make
 
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
@@ -13,7 +14,7 @@ CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 %.lex.c: %.l
 	$(LEX) -o $@ $<
 
-all : bpf_jit_disasm bpf_dbg bpf_asm
+all: bpf_jit_disasm bpf_dbg bpf_asm bpftool
 
 bpf_jit_disasm : CFLAGS += -DPACKAGE='bpf_jit_disasm'
 bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
@@ -26,10 +27,21 @@ bpf_asm : LDLIBS =
 bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o
 bpf_exp.lex.o : bpf_exp.yacc.c
 
-clean :
+clean: bpftool_clean
 	rm -rf *.o bpf_jit_disasm bpf_dbg bpf_asm bpf_exp.yacc.* bpf_exp.lex.*
 
-install :
+install: bpftool_install
 	install bpf_jit_disasm $(prefix)/bin/bpf_jit_disasm
 	install bpf_dbg $(prefix)/bin/bpf_dbg
 	install bpf_asm $(prefix)/bin/bpf_asm
+
+bpftool:
+	$(MAKE) -C bpftool
+
+bpftool_install:
+	$(MAKE) -C bpftool install
+
+bpftool_clean:
+	$(MAKE) -C bpftool clean
+
+.PHONY: bpftool FORCE
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
new file mode 100644
index 000000000000..a7151f47fb40
--- /dev/null
+++ b/tools/bpf/bpftool/Makefile
@@ -0,0 +1,80 @@
+include ../../scripts/Makefile.include
+
+include ../../scripts/utilities.mak
+
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+#$(info Determined 'srctree' to be $(srctree))
+endif
+
+ifneq ($(objtree),)
+#$(info Determined 'objtree' to be $(objtree))
+endif
+
+ifneq ($(OUTPUT),)
+#$(info Determined 'OUTPUT' to be $(OUTPUT))
+# Adding $(OUTPUT) as a directory to look for source files,
+# because use generated output files as sources dependency
+# for flex/bison parsers.
+VPATH += $(OUTPUT)
+export VPATH
+endif
+
+ifeq ($(V),1)
+  Q =
+else
+  Q = @
+endif
+
+BPF_DIR	= $(srctree)/tools/lib/bpf/
+
+ifneq ($(OUTPUT),)
+  BPF_PATH=$(OUTPUT)
+else
+  BPF_PATH=$(BPF_DIR)
+endif
+
+LIBBPF = $(BPF_PATH)libbpf.a
+
+$(LIBBPF): FORCE
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
+
+$(LIBBPF)-clean:
+	$(call QUIET_CLEAN, libbpf)
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
+
+prefix = /usr
+
+CC = gcc
+
+CFLAGS += -O2
+CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
+CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
+LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
+
+include $(wildcard *.d)
+
+all: $(OUTPUT)bpftool
+
+SRCS=$(wildcard *.c)
+OBJS=$(patsubst %.c,$(OUTPUT)%.o,$(SRCS))
+
+$(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
+	$(QUIET_LINK)$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
+
+$(OUTPUT)%.o: %.c
+	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
+
+clean: $(LIBBPF)-clean
+	$(call QUIET_CLEAN, bpftool)
+	$(Q)rm -rf $(OUTPUT)bpftool $(OUTPUT)*.o $(OUTPUT)*.d
+
+install:
+	install $(OUTPUT)bpftool $(prefix)/sbin/bpftool
+
+FORCE:
+
+.PHONY: all clean FORCE
+.DEFAULT_GOAL := all
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
new file mode 100644
index 000000000000..db7bb966c844
--- /dev/null
+++ b/tools/bpf/bpftool/common.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <errno.h>
+#include <libgen.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <linux/limits.h>
+#include <linux/magic.h>
+#include <sys/types.h>
+#include <sys/vfs.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static bool is_bpffs(char *path)
+{
+	struct statfs st_fs;
+
+	if (statfs(path, &st_fs) < 0)
+		return false;
+
+	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
+}
+
+int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
+{
+	enum bpf_obj_type type;
+	int fd;
+
+	fd = bpf_obj_get(path);
+	if (fd < 1) {
+		err("bpf obj get (%s): %s\n", path,
+		    errno == EACCES && !is_bpffs(dirname(path)) ?
+		    "directory not in bpf file system (bpffs)" :
+		    strerror(errno));
+		return -1;
+	}
+
+	type = get_fd_type(fd);
+	if (type < 0) {
+		close(fd);
+		return type;
+	}
+	if (type != exp_type) {
+		err("incorrect object type: %s\n", get_fd_type_name(type));
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+{
+	unsigned int id;
+	char *endptr;
+	int err;
+	int fd;
+
+	if (!is_prefix(*argv, "id")) {
+		err("expected 'id' got %s\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	id = strtoul(*argv, &endptr, 0);
+	if (*endptr) {
+		err("can't parse %s as ID\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	if (argc != 1)
+		usage();
+
+	fd = get_fd_by_id(id);
+	if (fd < 1) {
+		err("can't get prog by id (%u): %s\n", id, strerror(errno));
+		return -1;
+	}
+
+	err = bpf_obj_pin(fd, *argv);
+	close(fd);
+	if (err) {
+		err("can't pin the object (%s): %s\n", *argv,
+		    errno == EACCES && !is_bpffs(dirname(*argv)) ?
+		    "directory not in bpf file system (bpffs)" :
+		    strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+const char *get_fd_type_name(enum bpf_obj_type type)
+{
+	static const char * const names[] = {
+		[BPF_OBJ_PROG]	= "prog",
+		[BPF_OBJ_MAP]	= "map",
+	};
+
+	if (type > 0 && type < ARRAY_SIZE(names) && names[type])
+		return names[type];
+
+	return "unknown";
+}
+
+int get_fd_type(int fd)
+{
+	char path[PATH_MAX];
+	char buf[512];
+	ssize_t n;
+
+	snprintf(path, sizeof(path), "/proc/%d/fd/%d", getpid(), fd);
+
+	n = readlink(path, buf, sizeof(buf));
+	if (n < 0) {
+		err("can't read link type: %s\n", strerror(errno));
+		return -1;
+	}
+	if (n == sizeof(path)) {
+		err("can't read link type: path too long!\n");
+		return -1;
+	}
+
+	if (strstr(buf, "bpf-map"))
+		return BPF_OBJ_MAP;
+	else if (strstr(buf, "bpf-prog"))
+		return BPF_OBJ_PROG;
+
+	return BPF_OBJ_UNKNOWN;
+}
+
+char *get_fdinfo(int fd, const char *key)
+{
+	char path[PATH_MAX];
+	char *line = NULL;
+	size_t line_n = 0;
+	ssize_t n;
+	FILE *fdi;
+
+	snprintf(path, sizeof(path), "/proc/%d/fdinfo/%d", getpid(), fd);
+
+	fdi = fopen(path, "r");
+	if (!fdi) {
+		err("can't open fdinfo: %s\n", strerror(errno));
+		return NULL;
+	}
+
+	while ((n = getline(&line, &line_n, fdi))) {
+		char *value;
+		int len;
+
+		if (!strstr(line, key))
+			continue;
+
+		fclose(fdi);
+
+		value = strchr(line, '\t');
+		if (!value) {
+			err("malformed fdinfo!?\n");
+			free(line);
+			return NULL;
+		}
+		value++;
+
+		len = strlen(value);
+		memmove(line, value, len);
+		line[len - 1] = '\0';
+
+		return line;
+	}
+
+	err("key '%s' not found in fdinfo\n", key);
+	fclose(fdi);
+	return NULL;
+}
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
new file mode 100644
index 000000000000..e2bcfbf9b824
--- /dev/null
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -0,0 +1,83 @@
+/*
+ * Based on:
+ *
+ * Minimal BPF JIT image disassembler
+ *
+ * Disassembles BPF JIT compiler emitted opcodes back to asm insn's for
+ * debugging or verification purposes.
+ *
+ * Copyright 2013 Daniel Borkmann <daniel@iogearbox.net>
+ * Licensed under the GNU General Public License, version 2.0 (GPLv2)
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unistd.h>
+#include <string.h>
+#include <bfd.h>
+#include <dis-asm.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+static void get_exec_path(char *tpath, size_t size)
+{
+	ssize_t len;
+	char *path;
+
+	snprintf(tpath, size, "/proc/%d/exe", (int) getpid());
+	tpath[size - 1] = 0;
+
+	path = strdup(tpath);
+	assert(path);
+
+	len = readlink(path, tpath, size);
+	tpath[len] = 0;
+
+	free(path);
+}
+
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
+{
+	disassembler_ftype disassemble;
+	struct disassemble_info info;
+	int count, i, pc = 0;
+	char tpath[256];
+	bfd *bfdf;
+
+	memset(tpath, 0, sizeof(tpath));
+	get_exec_path(tpath, sizeof(tpath));
+
+	bfdf = bfd_openr(tpath, NULL);
+	assert(bfdf);
+	assert(bfd_check_format(bfdf, bfd_object));
+
+	init_disassemble_info(&info, stdout, (fprintf_ftype) fprintf);
+	info.arch = bfd_get_arch(bfdf);
+	info.mach = bfd_get_mach(bfdf);
+	info.buffer = image;
+	info.buffer_length = len;
+
+	disassemble_init_for_target(&info);
+
+	disassemble = disassembler(bfdf);
+	assert(disassemble);
+
+	do {
+		printf("%4x:\t", pc);
+
+		count = disassemble(pc, &info);
+
+		if (opcodes) {
+			printf("\n\t");
+			for (i = 0; i < count; ++i)
+				printf("%02x ", (uint8_t) image[pc + i]);
+		}
+		printf("\n");
+
+		pc += count;
+	} while (count > 0 && pc < len);
+
+	bfd_close(bfdf);
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
new file mode 100644
index 000000000000..622be3b3a28a
--- /dev/null
+++ b/tools/bpf/bpftool/main.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <bfd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+const char *bin_name;
+static int last_argc;
+static char **last_argv;
+static int (*last_do_help)(int argc, char **argv);
+
+void usage(void)
+{
+	last_do_help(last_argc - 1, last_argv + 1);
+
+	exit(-1);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s OBJECT { COMMAND | help }\n"
+		"       %s batch file FILE\n"
+		"\n"
+		"       OBJECT := { prog | map }\n",
+		bin_name, bin_name);
+
+	return 0;
+}
+
+int cmd_select(const struct cmd *cmds, int argc, char **argv,
+	       int (*help)(int argc, char **argv))
+{
+	unsigned int i;
+
+	last_argc = argc;
+	last_argv = argv;
+	last_do_help = help;
+
+	if (argc < 1 && cmds[0].func)
+		return cmds[0].func(argc, argv);
+
+	for (i = 0; cmds[i].func; i++)
+		if (is_prefix(*argv, cmds[i].cmd))
+			return cmds[i].func(argc - 1, argv + 1);
+
+	help(argc - 1, argv + 1);
+
+	return -1;
+}
+
+bool is_prefix(const char *pfx, const char *str)
+{
+	if (!pfx)
+		return false;
+	if (strlen(str) < strlen(pfx))
+		return false;
+
+	return !memcmp(str, pfx, strlen(pfx));
+}
+
+void print_hex(void *arg, unsigned int n, const char *sep)
+{
+	unsigned char *data = arg;
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		const char *pfx = "";
+
+		if (!i)
+			/* nothing */;
+		else if (!(i % 16))
+			printf("\n");
+		else if (!(i % 8))
+			printf("  ");
+		else
+			pfx = sep;
+
+		printf("%s%02hhx", i ? pfx : "", data[i]);
+	}
+}
+
+static int do_batch(int argc, char **argv);
+
+static const struct cmd cmds[] = {
+	{ "help",	do_help },
+	{ "batch",	do_batch },
+	{ "prog",	do_prog },
+	{ "map",	do_map },
+	{ 0 }
+};
+
+static int do_batch(int argc, char **argv)
+{
+	unsigned int lines = 0;
+	char *n_argv[4096];
+	char buf[65536];
+	int n_argc;
+	char *ptr;
+	FILE *fp;
+	int err;
+
+	if (argc < 2) {
+		err("too few parameters for batch\n");
+		return -1;
+	} else if (!is_prefix(*argv, "file")) {
+		err("expected 'file', got: %s\n", *argv);
+		return -1;
+	} else if (argc > 2) {
+		err("too many parameters for batch\n");
+		return -1;
+	}
+	NEXT_ARG();
+
+	fp = fopen(*argv, "r");
+	if (!fp) {
+		err("Can't open file (%s): %s\n", *argv, strerror(errno));
+		return -1;
+	}
+
+	while (fgets(buf, sizeof(buf), fp)) {
+		if (strlen(buf) == sizeof(buf) - 1) {
+			errno = E2BIG;
+			break;
+		}
+
+		ptr = buf;
+		n_argc = 0;
+		while (*ptr) {
+			if (isspace(*ptr)) {
+				ptr++;
+				continue;
+			}
+			n_argv[n_argc++] = ptr;
+
+			ptr += strcspn(ptr, " \t\n");
+			*ptr++ = 0;
+		}
+
+		if (!n_argc)
+			continue;
+
+		err = cmd_select(cmds, n_argc, n_argv, do_help);
+		if (err)
+			goto err_close;
+
+		lines++;
+	}
+
+	if (errno && errno != ENOENT) {
+		perror("reading batch file failed");
+		err = -1;
+	} else {
+		info("processed %d lines\n", lines);
+		err = 0;
+	}
+err_close:
+	fclose(fp);
+
+	return err;
+}
+
+int main(int argc, char **argv)
+{
+	bin_name = argv[0];
+	NEXT_ARG();
+
+	bfd_init();
+
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
new file mode 100644
index 000000000000..85d2d7870a58
--- /dev/null
+++ b/tools/bpf/bpftool/main.h
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#ifndef __BPF_TOOL_H
+#define __BPF_TOOL_H
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <linux/bpf.h>
+
+#define ARRAY_SIZE(a)	(sizeof(a) / sizeof(a[0]))
+
+#define err(msg...)	fprintf(stderr, "Error: " msg)
+#define warn(msg...)	fprintf(stderr, "Warning: " msg)
+#define info(msg...)	fprintf(stderr, msg)
+
+#define ptr_to_u64(ptr)	((__u64)(unsigned long)(ptr))
+
+#define min(a, b)							\
+	({ typeof(a) _a = (a); typeof(b) _b = (b); _a > _b ? _b : _a; })
+#define max(a, b)							\
+	({ typeof(a) _a = (a); typeof(b) _b = (b); _a < _b ? _b : _a; })
+
+#define NEXT_ARG()	({ argc--; argv++; if (argc < 0) usage(); })
+#define NEXT_ARGP()	({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
+#define BAD_ARG()	({ err("what is '%s'?\n", *argv); -1; })
+
+#define BPF_TAG_FMT	"%02hhx:%02hhx:%02hhx:%02hhx:"	\
+			"%02hhx:%02hhx:%02hhx:%02hhx"
+
+#define HELP_SPEC_PROGRAM						\
+	"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }"
+
+enum bpf_obj_type {
+	BPF_OBJ_UNKNOWN,
+	BPF_OBJ_PROG,
+	BPF_OBJ_MAP,
+};
+
+extern const char *bin_name;
+
+bool is_prefix(const char *pfx, const char *str);
+void print_hex(void *arg, unsigned int n, const char *sep);
+void usage(void) __attribute__((noreturn));
+
+struct cmd {
+	const char *cmd;
+	int (*func)(int argc, char **argv);
+};
+
+int cmd_select(const struct cmd *cmds, int argc, char **argv,
+	       int (*help)(int argc, char **argv));
+
+int get_fd_type(int fd);
+const char *get_fd_type_name(enum bpf_obj_type type);
+char *get_fdinfo(int fd, const char *key);
+int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
+
+int do_prog(int argc, char **arg);
+int do_map(int argc, char **arg);
+
+int prog_parse_fd(int *argc, char ***argv);
+
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes);
+
+#endif
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
new file mode 100644
index 000000000000..db46986fef73
--- /dev/null
+++ b/tools/bpf/bpftool/map.c
@@ -0,0 +1,742 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static const char * const map_type_name[] = {
+	[BPF_MAP_TYPE_UNSPEC]		= "unspec",
+	[BPF_MAP_TYPE_HASH]		= "hash",
+	[BPF_MAP_TYPE_ARRAY]		= "array",
+	[BPF_MAP_TYPE_PROG_ARRAY]	= "prog_array",
+	[BPF_MAP_TYPE_PERF_EVENT_ARRAY]	= "perf_event_array",
+	[BPF_MAP_TYPE_PERCPU_HASH]	= "percpu_hash",
+	[BPF_MAP_TYPE_PERCPU_ARRAY]	= "percpu_array",
+	[BPF_MAP_TYPE_STACK_TRACE]	= "stack_trace",
+	[BPF_MAP_TYPE_CGROUP_ARRAY]	= "cgroup_array",
+	[BPF_MAP_TYPE_LRU_HASH]		= "lru_hash",
+	[BPF_MAP_TYPE_LRU_PERCPU_HASH]	= "lru_percpu_hash",
+	[BPF_MAP_TYPE_LPM_TRIE]		= "lpm_trie",
+	[BPF_MAP_TYPE_ARRAY_OF_MAPS]	= "array_of_maps",
+	[BPF_MAP_TYPE_HASH_OF_MAPS]	= "hash_of_maps",
+	[BPF_MAP_TYPE_DEVMAP]		= "devmap",
+	[BPF_MAP_TYPE_SOCKMAP]		= "sockmap",
+};
+
+static unsigned int get_possible_cpus(void)
+{
+	static unsigned int result;
+	char buf[128];
+	long int n;
+	char *ptr;
+	int fd;
+
+	if (result)
+		return result;
+
+	fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
+	if (fd < 1) {
+		err("can't open sysfs possible cpus\n");
+		exit(-1);
+	}
+
+	n = read(fd, buf, sizeof(buf));
+	if (n < 2) {
+		err("can't read sysfs possible cpus\n");
+		exit(-1);
+	}
+	close(fd);
+
+	if (n == sizeof(buf)) {
+		err("read sysfs possible cpus overflow\n");
+		exit(-1);
+	}
+
+	ptr = buf;
+	n = 0;
+	while (*ptr && *ptr != '\n') {
+		unsigned int a, b;
+
+		if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
+			n += b - a + 1;
+
+			ptr = strchr(ptr, '-') + 1;
+		} else if (sscanf(ptr, "%u", &a) == 1) {
+			n++;
+		}
+
+		while (isdigit(*ptr))
+			ptr++;
+		if (*ptr == ',')
+			ptr++;
+	}
+
+	result = n;
+
+	return result;
+}
+
+static bool map_is_per_cpu(__u32 type)
+{
+	return type == BPF_MAP_TYPE_PERCPU_HASH ||
+	       type == BPF_MAP_TYPE_PERCPU_ARRAY ||
+	       type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
+}
+
+static bool map_is_map_of_maps(__u32 type)
+{
+	return type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
+	       type == BPF_MAP_TYPE_HASH_OF_MAPS;
+}
+
+static bool map_is_map_of_progs(__u32 type)
+{
+	return type == BPF_MAP_TYPE_PROG_ARRAY;
+}
+
+static void *alloc_value(struct bpf_map_info *info)
+{
+	if (map_is_per_cpu(info->type))
+		return malloc(info->value_size * get_possible_cpus());
+	else
+		return malloc(info->value_size);
+}
+
+static int map_parse_fd(int *argc, char ***argv)
+{
+	int fd;
+
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			err("can't parse %s as ID\n", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		fd = bpf_map_get_fd_by_id(id);
+		if (fd < 1)
+			err("get map by id (%u): %s\n", id, strerror(errno));
+		return fd;
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		return open_obj_pinned_any(path, BPF_OBJ_MAP);
+	}
+
+	err("expected 'id' or 'pinned', got: '%s'?\n", **argv);
+	return -1;
+}
+
+static int
+map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
+{
+	int err;
+	int fd;
+
+	fd = map_parse_fd(argc, argv);
+	if (fd < 1)
+		return -1;
+
+	err = bpf_obj_get_info_by_fd(fd, info, info_len);
+	if (err) {
+		err("can't get map info: %s\n", strerror(errno));
+		close(fd);
+		return err;
+	}
+
+	return fd;
+}
+
+static void print_entry(struct bpf_map_info *info, unsigned char *key,
+			unsigned char *value)
+{
+	unsigned int i, n;
+
+	if (!map_is_per_cpu(info->type)) {
+		/* Single line print */
+		if (info->key_size + info->value_size <= 24 &&
+		    max(info->key_size, info->value_size) <= 16) {
+			printf("key: ");
+			print_hex(key, info->key_size, " ");
+			printf("  value: ");
+			print_hex(value, info->value_size, " ");
+			printf("\n");
+			return;
+		}
+
+		printf("key: ");
+		print_hex(key, info->key_size, " ");
+		printf("\nvalue: ");
+		print_hex(value, info->value_size, " ");
+		printf("\n");
+
+		return;
+	}
+
+	n = get_possible_cpus();
+
+	printf("key:\n");
+	print_hex(key, info->key_size, " ");
+	printf("\n");
+	for (i = 0; i < n; i++) {
+		printf("value (CPU %02d):%c",
+		       i, info->value_size > 16 ? '\n' : ' ');
+		print_hex(value + i * info->value_size, info->value_size, " ");
+		printf("\n");
+	}
+}
+
+static char **parse_val(char **argv, const char *name, unsigned char *val,
+			unsigned int n)
+{
+	unsigned int i = 0;
+	char *endptr;
+
+	while (i < n && argv[i]) {
+		val[i] = strtoul(argv[i], &endptr, 0);
+		if (*endptr) {
+			err("error parsing byte: %s\n", argv[i]);
+			break;
+		}
+		i++;
+	}
+
+	if (i != n) {
+		err("%s expected %d bytes got %d\n", name, n, i);
+		return NULL;
+	}
+
+	return argv + i;
+}
+
+static int parse_elem(char **argv, struct bpf_map_info *info,
+		      void *key, void *value, __u32 key_size, __u32 value_size,
+		      __u32 *flags, __u32 **value_fd)
+{
+	if (!*argv) {
+		if (!key && !value)
+			return 0;
+		err("did not find %s\n", key ? "key" : "value");
+		return -1;
+	}
+
+	if (is_prefix(*argv, "key")) {
+		if (!key) {
+			if (key_size)
+				err("duplicate key\n");
+			else
+				err("unnecessary key\n");
+			return -1;
+		}
+
+		argv = parse_val(argv + 1, "key", key, key_size);
+		if (!argv)
+			return -1;
+
+		return parse_elem(argv, info, NULL, value, key_size, value_size,
+				  flags, value_fd);
+	} else if (is_prefix(*argv, "value")) {
+		int fd;
+
+		if (!value) {
+			if (value_size)
+				err("duplicate value\n");
+			else
+				err("unnecessary value\n");
+			return -1;
+		}
+
+		argv++;
+
+		if (map_is_map_of_maps(info->type)) {
+			int argc = 2;
+
+			if (value_size != 4) {
+				err("value smaller than 4B for map in map?\n");
+				return -1;
+			}
+			if (!argv[0] || !argv[1]) {
+				err("not enough value arguments for map in map\n");
+				return -1;
+			}
+
+			fd = map_parse_fd(&argc, &argv);
+			if (fd < 1)
+				return -1;
+
+			*value_fd = value;
+			**value_fd = fd;
+		} else if (map_is_map_of_progs(info->type)) {
+			int argc = 2;
+
+			if (value_size != 4) {
+				err("value smaller than 4B for map of progs?\n");
+				return -1;
+			}
+			if (!argv[0] || !argv[1]) {
+				err("not enough value arguments for map of progs\n");
+				return -1;
+			}
+
+			fd = prog_parse_fd(&argc, &argv);
+			if (fd < 1)
+				return -1;
+
+			*value_fd = value;
+			**value_fd = fd;
+		} else {
+			argv = parse_val(argv, "value", value, value_size);
+			if (!argv)
+				return -1;
+		}
+
+		return parse_elem(argv, info, key, NULL, key_size, value_size,
+				  flags, NULL);
+	} else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") ||
+		   is_prefix(*argv, "exist")) {
+		if (!flags) {
+			err("flags specified multiple times: %s\n", *argv);
+			return -1;
+		}
+
+		if (is_prefix(*argv, "any"))
+			*flags = BPF_ANY;
+		else if (is_prefix(*argv, "noexist"))
+			*flags = BPF_NOEXIST;
+		else if (is_prefix(*argv, "exist"))
+			*flags = BPF_EXIST;
+
+		return parse_elem(argv + 1, info, key, value, key_size,
+				  value_size, NULL, value_fd);
+	}
+
+	err("expected key or value, got: %s\n", *argv);
+	return -1;
+}
+
+static int show_map_close(int fd, struct bpf_map_info *info)
+{
+	char *memlock;
+
+	memlock = get_fdinfo(fd, "memlock");
+	close(fd);
+
+	printf("   %u: ", info->id);
+	if (info->type < ARRAY_SIZE(map_type_name))
+		printf("%s  ", map_type_name[info->type]);
+	else
+		printf("type:%u  ", info->type);
+
+	printf("flags:0x%x  key:%uB  value:%uB  max_entries:%u",
+	       info->map_flags, info->key_size, info->value_size,
+	       info->max_entries);
+
+	if (memlock)
+		printf("  memlock:%sB", memlock);
+	free(memlock);
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	__u32 id = 0;
+	int err;
+	int fd;
+
+	if (argc == 2) {
+		fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+		if (fd < 0)
+			return -1;
+
+		return show_map_close(fd, &info);
+	}
+
+	if (argc)
+		return BAD_ARG();
+
+	while (true) {
+		err = bpf_map_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			err("can't get next map: %s\n", strerror(errno));
+			if (errno == EINVAL)
+				err("kernel too old?\n");
+			return -1;
+		}
+
+		fd = bpf_map_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get map by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = bpf_obj_get_info_by_fd(fd, &info, &len);
+		if (err) {
+			err("can't get map info: %s\n", strerror(errno));
+			close(fd);
+			return -1;
+		}
+
+		show_map_close(fd, &info);
+	}
+
+	return errno == ENOENT ? 0 : -1;
+}
+
+static int do_dump(int argc, char **argv)
+{
+	void *key, *value, *prev_key;
+	unsigned int num_elems = 0;
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	int err;
+	int fd;
+
+	if (argc != 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	if (map_is_map_of_maps(info.type) || map_is_map_of_progs(info.type)) {
+		err("Dumping maps of maps and program maps not supported\n");
+		close(fd);
+		return -1;
+	}
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed\n");
+		err = -1;
+		goto exit_free;
+	}
+
+	prev_key = NULL;
+	while (true) {
+		err = bpf_map_get_next_key(fd, prev_key, key);
+		if (err) {
+			if (errno == ENOENT)
+				err = 0;
+			break;
+		}
+
+		err = bpf_map_lookup_elem(fd, key, value);
+		if (err) {
+			info("can't lookup element with key: ");
+			print_hex(key, info.key_size, " ");
+			printf("\n");
+			goto next_key;
+		}
+
+		print_entry(&info, key, value);
+next_key:
+		prev_key = key;
+		num_elems++;
+	}
+
+	printf("Found %u element%s\n", num_elems, num_elems != 1 ? "s" : "");
+
+exit_free:
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_update(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	__u32 *value_fd = NULL;
+	__u32 flags = BPF_ANY;
+	void *key, *value;
+	int fd, err;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, value, info.key_size,
+			 info.value_size, &flags, &value_fd);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_update_elem(fd, key, value, flags);
+	if (err) {
+		err("update failed: %s\n", strerror(errno));
+		goto exit_free;
+	}
+
+exit_free:
+	if (value_fd)
+		close(*value_fd);
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_lookup(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key, *value;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_lookup_elem(fd, key, value);
+	if (!err) {
+		print_entry(&info, key, value);
+	} else if (errno == ENOENT) {
+		printf("key:\n");
+		print_hex(key, info.key_size, " ");
+		printf("\n\nNot found\n");
+	} else {
+		err("lookup failed: %s\n", strerror(errno));
+	}
+
+exit_free:
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_getnext(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key, *nextkey;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	nextkey = malloc(info.key_size);
+	if (!key || !nextkey) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	if (argc) {
+		err = parse_elem(argv, &info, key, NULL, info.key_size, 0,
+				 NULL, NULL);
+		if (err)
+			goto exit_free;
+	} else {
+		free(key);
+		key = NULL;
+	}
+
+	err = bpf_map_get_next_key(fd, key, nextkey);
+	if (err) {
+		err("can't get next key: %s\n", strerror(errno));
+		goto exit_free;
+	}
+
+	if (key) {
+		printf("key:\n");
+		print_hex(key, info.key_size, " ");
+		printf("\n");
+	} else {
+		printf("key: None\n");
+	}
+
+	printf("next key:\n");
+	print_hex(nextkey, info.key_size, " ");
+	printf("\n");
+
+exit_free:
+	free(nextkey);
+	free(key);
+	close(fd);
+
+	return err;
+}
+
+static int do_delete(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	if (!key) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_delete_elem(fd, key);
+	if (err)
+		err("delete failed: %s\n", strerror(errno));
+
+exit_free:
+	free(key);
+	close(fd);
+
+	return err;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	return do_pin_any(argc, argv, bpf_map_get_fd_by_id);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s %s show   [MAP]\n"
+		"       %s %s dump    MAP\n"
+		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
+		"       %s %s lookup  MAP  key BYTES\n"
+		"       %s %s getnext MAP [key BYTES]\n"
+		"       %s %s delete  MAP  key BYTES\n"
+		"       %s %s pin     MAP  FILE\n"
+		"       %s %s help\n"
+		"\n"
+		"       MAP := { id MAP_ID | pinned FILE }\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       VALUE := { BYTES | MAP | PROG }\n"
+		"       UPDATE_FLAGS := { any | exist | noexist }\n"
+		"",
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "help",	do_help },
+	{ "dump",	do_dump },
+	{ "update",	do_update },
+	{ "lookup",	do_lookup },
+	{ "getnext",	do_getnext },
+	{ "delete",	do_delete },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_map(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
new file mode 100644
index 000000000000..3129159c593e
--- /dev/null
+++ b/tools/bpf/bpftool/prog.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static const char * const prog_type_name[] = {
+	[BPF_PROG_TYPE_UNSPEC]		= "unspec",
+	[BPF_PROG_TYPE_SOCKET_FILTER]	= "socket_filter",
+	[BPF_PROG_TYPE_KPROBE]		= "kprobe",
+	[BPF_PROG_TYPE_SCHED_CLS]	= "sched_cls",
+	[BPF_PROG_TYPE_SCHED_ACT]	= "sched_act",
+	[BPF_PROG_TYPE_TRACEPOINT]	= "tracepoint",
+	[BPF_PROG_TYPE_XDP]		= "xdp",
+	[BPF_PROG_TYPE_PERF_EVENT]	= "perf_event",
+	[BPF_PROG_TYPE_CGROUP_SKB]	= "cgroup_skb",
+	[BPF_PROG_TYPE_CGROUP_SOCK]	= "cgroup_sock",
+	[BPF_PROG_TYPE_LWT_IN]		= "lwt_in",
+	[BPF_PROG_TYPE_LWT_OUT]		= "lwt_out",
+	[BPF_PROG_TYPE_LWT_XMIT]	= "lwt_xmit",
+	[BPF_PROG_TYPE_SOCK_OPS]	= "sock_ops",
+	[BPF_PROG_TYPE_SK_SKB]		= "sk_skb",
+};
+
+static int prog_fd_by_tag(unsigned char *tag)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	unsigned int id = 0;
+	int err;
+	int fd;
+
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			err("%s\n", strerror(errno));
+			return -1;
+		}
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get prog by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = bpf_obj_get_info_by_fd(fd, &info, &len);
+		if (err) {
+			err("can't get prog info (%u): %s\n",
+			    id, strerror(errno));
+			close(fd);
+			return -1;
+		}
+
+		if (!memcmp(tag, info.tag, BPF_TAG_SIZE))
+			return fd;
+
+		close(fd);
+	}
+}
+
+int prog_parse_fd(int *argc, char ***argv)
+{
+	int fd;
+
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			err("can't parse %s as ID\n", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1)
+			err("get by id (%u): %s\n", id, strerror(errno));
+		return fd;
+	} else if (is_prefix(**argv, "tag")) {
+		unsigned char tag[BPF_TAG_SIZE];
+
+		NEXT_ARGP();
+
+		if (sscanf(**argv, BPF_TAG_FMT, tag, tag + 1, tag + 2,
+			   tag + 3, tag + 4, tag + 5, tag + 6, tag + 7)
+		    != BPF_TAG_SIZE) {
+			err("can't parse tag\n");
+			return -1;
+		}
+		NEXT_ARGP();
+
+		return prog_fd_by_tag(tag);
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		return open_obj_pinned_any(path, BPF_OBJ_PROG);
+	}
+
+	err("expected 'id', 'tag' or 'pinned', got: '%s'?\n", **argv);
+	return -1;
+}
+
+static int show_prog(int fd)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	char *memlock;
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		return -1;
+	}
+
+	printf("   %u: ", info.id);
+	if (info.type < ARRAY_SIZE(prog_type_name))
+		printf("%s  ", prog_type_name[info.type]);
+	else
+		printf("type:%u  ", info.type);
+
+	printf("tag ");
+	print_hex(info.tag, BPF_TAG_SIZE, ":");
+
+	printf("  xlated:%uB", info.xlated_prog_len);
+
+	if (info.jited_prog_len)
+		printf("  jited:%uB", info.jited_prog_len);
+	else
+		printf("  not jited");
+
+	memlock = get_fdinfo(fd, "memlock");
+	if (memlock)
+		printf("  memlock:%sB", memlock);
+	free(memlock);
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{	__u32 id = 0;
+	int err;
+	int fd;
+
+	if (argc == 2) {
+		fd = prog_parse_fd(&argc, &argv);
+		if (fd < 1)
+			return -1;
+
+		return show_prog(fd);
+	}
+
+	if (argc)
+		return BAD_ARG();
+
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			err("can't get next program: %s\n", strerror(errno));
+			if (errno == EINVAL)
+				err("kernel too old?\n");
+			return -1;
+		}
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get prog by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = show_prog(fd);
+		close(fd);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int do_dump(int argc, char **argv)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	bool can_disasm = false;
+	unsigned int buf_size;
+	char *filepath = NULL;
+	bool opcodes = false;
+	unsigned char *buf;
+	__u32 *member_len;
+	__u64 *member_ptr;
+	ssize_t n;
+	int err;
+	int fd;
+
+	if (is_prefix(*argv, "jited")) {
+		member_len = &info.jited_prog_len;
+		member_ptr = &info.jited_prog_insns;
+		can_disasm = true;
+	} else if (is_prefix(*argv, "xlated")) {
+		member_len = &info.xlated_prog_len;
+		member_ptr = &info.xlated_prog_insns;
+	} else {
+		err("expected 'xlated' or 'jited', got: %s\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	if (argc < 2)
+		usage();
+
+	fd = prog_parse_fd(&argc, &argv);
+	if (fd < 0)
+		return -1;
+
+	if (is_prefix(*argv, "file")) {
+		NEXT_ARG();
+		if (!argc) {
+			err("expected file path\n");
+			return -1;
+		}
+
+		filepath = *argv;
+		NEXT_ARG();
+	} else if (is_prefix(*argv, "opcodes")) {
+		opcodes = true;
+		NEXT_ARG();
+	}
+
+	if (!filepath && !can_disasm) {
+		err("expected 'file' got %s\n", *argv);
+		return -1;
+	}
+	if (argc) {
+		usage();
+		return -1;
+	}
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (!*member_len) {
+		info("no instructions returned\n");
+		close(fd);
+		return 0;
+	}
+
+	buf_size = *member_len;
+
+	buf = malloc(buf_size);
+	if (!buf) {
+		err("mem alloc failed\n");
+		close(fd);
+		return -1;
+	}
+
+	memset(&info, 0, sizeof(info));
+
+	*member_ptr = ptr_to_u64(buf);
+	*member_len = buf_size;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	close(fd);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		goto err_free;
+	}
+
+	if (*member_len > buf_size) {
+		info("too many instructions returned\n");
+		goto err_free;
+	}
+
+	if (filepath) {
+		fd = open(filepath, O_WRONLY | O_CREAT | O_TRUNC, 0600);
+		if (fd < 1) {
+			err("can't open file %s: %s\n", filepath,
+			    strerror(errno));
+			goto err_free;
+		}
+
+		n = write(fd, buf, *member_len);
+		close(fd);
+		if (n != *member_len) {
+			err("error writing output file: %s\n",
+			    n < 0 ? strerror(errno) : "short write");
+			goto err_free;
+		}
+	} else {
+		disasm_print_insn(buf, *member_len, opcodes);
+	}
+
+	free(buf);
+
+	return 0;
+
+err_free:
+	free(buf);
+	return -1;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	return do_pin_any(argc, argv, bpf_prog_get_fd_by_id);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s %s show [PROG]\n"
+		"       %s %s dump xlated PROG  file FILE\n"
+		"       %s %s dump jited  PROG [file FILE] [opcodes]\n"
+		"       %s %s pin   PROG FILE\n"
+		"       %s %s help\n"
+		"\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"",
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "dump",	do_dump },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_prog(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Edward Cree @ 2017-09-26 15:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321
In-Reply-To: <52270348-67f1-4e7a-cd2f-9d611ae94064@solarflare.com>

BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
 compound-assignments.  So give it its own format in print_bpf_insn().

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 kernel/bpf/verifier.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3aaa3262..04e0508 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -344,6 +344,11 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
 				verbose("BUG_alu64_%02x\n", insn->code);
 			else
 				print_bpf_end_insn(env, insn);
+		} else if (BPF_OP(insn->code) == BPF_NEG) {
+			verbose("(%02x) r%d = %s-r%d\n",
+				insn->code, insn->dst_reg,
+				class == BPF_ALU ? "(u32) " : "",
+				insn->dst_reg);
 		} else if (BPF_SRC(insn->code) == BPF_X) {
 			verbose("(%02x) %sr%d %s %sr%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",

^ permalink raw reply related

* [PATCH net-next 1/2] tools: rename tools/net directory to tools/bpf
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>

We currently only have BPF tools in the tools/net directory.
We are about to add more BPF tools there, not necessarily
networking related, rename the directory and related Makefile
targets to bpf.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 MAINTAINERS                         |  3 +--
 tools/Makefile                      | 14 +++++++-------
 tools/{net => bpf}/Makefile         |  0
 tools/{net => bpf}/bpf_asm.c        |  0
 tools/{net => bpf}/bpf_dbg.c        |  0
 tools/{net => bpf}/bpf_exp.l        |  0
 tools/{net => bpf}/bpf_exp.y        |  0
 tools/{net => bpf}/bpf_jit_disasm.c |  0
 8 files changed, 8 insertions(+), 9 deletions(-)
 rename tools/{net => bpf}/Makefile (100%)
 rename tools/{net => bpf}/bpf_asm.c (100%)
 rename tools/{net => bpf}/bpf_dbg.c (100%)
 rename tools/{net => bpf}/bpf_exp.l (100%)
 rename tools/{net => bpf}/bpf_exp.y (100%)
 rename tools/{net => bpf}/bpf_jit_disasm.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6671f375f7fc..2f79b94a41ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2725,7 +2725,7 @@ F:	net/core/filter.c
 F:	net/sched/act_bpf.c
 F:	net/sched/cls_bpf.c
 F:	samples/bpf/
-F:	tools/net/bpf*
+F:	tools/bpf/
 F:	tools/testing/selftests/bpf/
 
 BROADCOM B44 10/100 ETHERNET DRIVER
@@ -9416,7 +9416,6 @@ F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
 F:	include/uapi/linux/net_namespace.h
-F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
 
diff --git a/tools/Makefile b/tools/Makefile
index 9dfede37c8ff..df6fcb293fbc 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -19,7 +19,7 @@ include scripts/Makefile.include
 	@echo '  kvm_stat               - top-like utility for displaying kvm statistics'
 	@echo '  leds                   - LEDs  tools'
 	@echo '  liblockdep             - user-space wrapper for kernel locking-validator'
-	@echo '  net                    - misc networking tools'
+	@echo '  bpf                    - misc BPF tools'
 	@echo '  perf                   - Linux performance measurement and analysis tool'
 	@echo '  selftests              - various kernel selftests'
 	@echo '  spi                    - spi tools'
@@ -57,7 +57,7 @@ acpi: FORCE
 cpupower: FORCE
 	$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm net iio gpio objtool leds: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds: FORCE
 	$(call descend,$@)
 
 liblockdep: FORCE
@@ -91,7 +91,7 @@ kvm_stat: FORCE
 
 all: acpi cgroup cpupower gpio hv firewire liblockdep \
 		perf selftests spi turbostat usb \
-		virtio vm net x86_energy_perf_policy \
+		virtio vm bpf x86_energy_perf_policy \
 		tmon freefall iio objtool kvm_stat
 
 acpi_install:
@@ -100,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
 cpupower_install:
 	$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install net_install objtool_install:
+cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install:
 	$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -124,7 +124,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
 install: acpi_install cgroup_install cpupower_install gpio_install \
 		hv_install firewire_install iio_install liblockdep_install \
 		perf_install selftests_install turbostat_install usb_install \
-		virtio_install vm_install net_install x86_energy_perf_policy_install \
+		virtio_install vm_install bpf_install x86_energy_perf_policy_install \
 		tmon_install freefall_install objtool_install kvm_stat_install
 
 acpi_clean:
@@ -133,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \
 cpupower_clean:
 	$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean net_clean iio_clean gpio_clean objtool_clean leds_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean:
 	$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -169,7 +169,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \
 
 clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \
 		perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \
-		vm_clean net_clean iio_clean x86_energy_perf_policy_clean tmon_clean \
+		vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \
 		freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \
 		gpio_clean objtool_clean leds_clean
 
diff --git a/tools/net/Makefile b/tools/bpf/Makefile
similarity index 100%
rename from tools/net/Makefile
rename to tools/bpf/Makefile
diff --git a/tools/net/bpf_asm.c b/tools/bpf/bpf_asm.c
similarity index 100%
rename from tools/net/bpf_asm.c
rename to tools/bpf/bpf_asm.c
diff --git a/tools/net/bpf_dbg.c b/tools/bpf/bpf_dbg.c
similarity index 100%
rename from tools/net/bpf_dbg.c
rename to tools/bpf/bpf_dbg.c
diff --git a/tools/net/bpf_exp.l b/tools/bpf/bpf_exp.l
similarity index 100%
rename from tools/net/bpf_exp.l
rename to tools/bpf/bpf_exp.l
diff --git a/tools/net/bpf_exp.y b/tools/bpf/bpf_exp.y
similarity index 100%
rename from tools/net/bpf_exp.y
rename to tools/bpf/bpf_exp.y
diff --git a/tools/net/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
similarity index 100%
rename from tools/net/bpf_jit_disasm.c
rename to tools/bpf/bpf_jit_disasm.c
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 0/2] tools: add bpftool
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski

Hi!

I'm looking for a home for bpftool, Daniel suggested that 
tools/net could be a good place, since there are only BPF
utilities there already.

The tool should be complete for simple use cases and we
will continue extending it as we go along.  E.g. providing
disassembly of loaded programs directly using LLVM library
and JSON output are high on the priority list.

The first patch renames tools/net to tools/bpf, while the
second one adds the new code.


Jakub Kicinski (2):
  tools: rename tools/net directory to tools/bpf
  tools: bpf: add bpftool

 MAINTAINERS                         |   3 +-
 tools/Makefile                      |  20 +-
 tools/{net => bpf}/Makefile         |  18 +-
 tools/{net => bpf}/bpf_asm.c        |   0
 tools/{net => bpf}/bpf_dbg.c        |   0
 tools/{net => bpf}/bpf_exp.l        |   0
 tools/{net => bpf}/bpf_exp.y        |   0
 tools/{net => bpf}/bpf_jit_disasm.c |   0
 tools/bpf/bpftool/Makefile          |  80 ++++
 tools/bpf/bpftool/common.c          | 214 +++++++++++
 tools/bpf/bpftool/jit_disasm.c      |  83 ++++
 tools/bpf/bpftool/main.c            | 212 +++++++++++
 tools/bpf/bpftool/main.h            |  99 +++++
 tools/bpf/bpftool/map.c             | 728 ++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/prog.c            | 392 +++++++++++++++++++
 15 files changed, 1834 insertions(+), 15 deletions(-)
 rename tools/{net => bpf}/Makefile (73%)
 rename tools/{net => bpf}/bpf_asm.c (100%)
 rename tools/{net => bpf}/bpf_dbg.c (100%)
 rename tools/{net => bpf}/bpf_exp.l (100%)
 rename tools/{net => bpf}/bpf_exp.y (100%)
 rename tools/{net => bpf}/bpf_jit_disasm.c (100%)
 create mode 100644 tools/bpf/bpftool/Makefile
 create mode 100644 tools/bpf/bpftool/common.c
 create mode 100644 tools/bpf/bpftool/jit_disasm.c
 create mode 100644 tools/bpf/bpftool/main.c
 create mode 100644 tools/bpf/bpftool/main.h
 create mode 100644 tools/bpf/bpftool/map.c
 create mode 100644 tools/bpf/bpftool/prog.c

-- 
2.14.1

^ permalink raw reply

* [PATCH v2 net-next 1/2] bpf/verifier: improve disassembly of BPF_END instructions
From: Edward Cree @ 2017-09-26 15:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321
In-Reply-To: <52270348-67f1-4e7a-cd2f-9d611ae94064@solarflare.com>

print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
 different structure: it has a size in insn->imm (even if it's BPF_X) and
 uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
 needs different code to print it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 kernel/bpf/verifier.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 799b245..3aaa3262 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -325,26 +325,40 @@ static const char *const bpf_jmp_string[16] = {
 	[BPF_EXIT >> 4] = "exit",
 };
 
+static void print_bpf_end_insn(const struct bpf_verifier_env *env,
+			       const struct bpf_insn *insn)
+{
+	verbose("(%02x) r%d = %s%d r%d\n", insn->code, insn->dst_reg,
+		BPF_SRC(insn->code) == BPF_TO_BE ? "be" : "le",
+		insn->imm, insn->dst_reg);
+}
+
 static void print_bpf_insn(const struct bpf_verifier_env *env,
 			   const struct bpf_insn *insn)
 {
 	u8 class = BPF_CLASS(insn->code);
 
 	if (class == BPF_ALU || class == BPF_ALU64) {
-		if (BPF_SRC(insn->code) == BPF_X)
+		if (BPF_OP(insn->code) == BPF_END) {
+			if (class == BPF_ALU64)
+				verbose("BUG_alu64_%02x\n", insn->code);
+			else
+				print_bpf_end_insn(env, insn);
+		} else if (BPF_SRC(insn->code) == BPF_X) {
 			verbose("(%02x) %sr%d %s %sr%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",
 				insn->dst_reg,
 				bpf_alu_string[BPF_OP(insn->code) >> 4],
 				class == BPF_ALU ? "(u32) " : "",
 				insn->src_reg);
-		else
+		} else {
 			verbose("(%02x) %sr%d %s %s%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",
 				insn->dst_reg,
 				bpf_alu_string[BPF_OP(insn->code) >> 4],
 				class == BPF_ALU ? "(u32) " : "",
 				insn->imm);
+		}
 	} else if (class == BPF_STX) {
 		if (BPF_MODE(insn->code) == BPF_MEM)
 			verbose("(%02x) *(%s *)(r%d %+d) = r%d\n",

^ permalink raw reply related

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: Or Gerlitz @ 2017-09-26 15:33 UTC (permalink / raw)
  To: John Hurley
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAK+XE=kBG9761tN6uC3ziiqP-jrAjLJ_3J7M-cyfKvTqHVBk7A@mail.gmail.com>

On Tue, Sep 26, 2017 at 6:11 PM, John Hurley <john.hurley@netronome.com> wrote:
> On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>>> <simon.horman@netronome.com> wrote:
>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>
>>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>>
>>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>>
>>>
>>> Sorry, this is a typo  and should refer to TC.
>>>
>>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>>> @@ -52,8 +52,25 @@
>>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>>
>>>> this series takes care of IPv6 tunnels too?
>>>
>>> IPv6 is not included in this set.
>>> The reason the IPv6 bit is included here is to account for behavior we
>>> have noticed in TC flower.
>>> If, for example, I add a filter with the following match fields:
>>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>>> enc_dst_port 4789 enc_key_id 123'
>>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>>> addresses as 'used'.
>>> I am not sure if this is a bug in TC or that we are expected to check
>>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>>
>> you should have your code to check enc_control->addr_type to be
>> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>>
>>
>>> Including the IPv6 used_keys bit in our whitelist approach allows us
>>> to accept legitimate IPv4 tunnel rules in these situations.
>>
>> mmm can please take a look on fl_init_dissector() and tell me if you
>> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
>> I am not sure.
>
>
> The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
> array of keys which are then translated to the used_keys values.
> The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
> checks if any mask bits are set in a particular field - if so it
> eventually marks it as used.
> In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
> represented as a union of the 2.
> Therefore, if we have masked bits set for IPv4, they are also being
> set for the IPv6 field.

I see, do you consider it a bug?

^ permalink raw reply

* [PATCH v2 net-next 0/2] bpf/verifier: disassembly improvements
From: Edward Cree @ 2017-09-26 15:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321

Fix the output of print_bpf_insn() for ALU ops that don't look like
 compound assignment (i.e. BPF_END and BPF_NEG).

Sample output for a short test program:
0: (b4) (u32) r0 = (u32) 0
1: (dc) r0 = be32 r0
2: (84) r0 = (u32) -r0
3: (95) exit
processed 4 insns, stack depth 0

Edward Cree (2):
  bpf/verifier: improve disassembly of BPF_END instructions
  bpf/verifier: improve disassembly of BPF_NEG instructions

 kernel/bpf/verifier.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Eric Dumazet @ 2017-09-26 15:30 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <33E69A03-6594-423A-86E6-02029046BE7D@gmail.com>

On Tue, Sep 26, 2017 at 8:22 AM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> It is in Greg's tree where all kobject patches should go through as far as I know.

Yes, I will fix this, adding a second memmove()

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Tariq Toukan @ 2017-09-26 15:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89iJ4io_gX-V8A4hE04cH=czGZYs8WGBOpaiBpWMMHOgVVw@mail.gmail.com>



On 26/09/2017 6:13 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>> I have absolutely no idea. Please bisect.
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F9%2F12%2F523&data=02%7C01%7Ctariqt%40mellanox.com%7C4a275c766aeb4224376e08d504f12193%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420356043309380&sdata=GGeDFkX277R%2BKShsUPsePoAD6p5yaO2v0CteABtCrcY%3D&reserved=0
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>> Are you really using netns in the first place ?
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
> Ah this makes sense now.
>
> Dmitry Torokhov hack breaks the assumption I used in my patch.
>
> Since it is not upstream yet, I believe that it will need more work
> before being in a proper state.
>
> Thanks.
I see. Thanks for the clarification.
I guess we'll keep only one patch for now, until issues are resolved.

Regards.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Dmitry Torokhov @ 2017-09-26 15:22 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89iJ4io_gX-V8A4hE04cH=czGZYs8WGBOpaiBpWMMHOgVVw@mail.gmail.com>

On September 26, 2017 8:13:21 AM PDT, Eric Dumazet <edumazet@google.com> wrote:
>On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>>
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it
>out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still
>in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should
>have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not
>deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>>
>>> I have absolutely no idea. Please bisect.
>>
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://lkml.org/lkml/2017/9/12/523
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>>
>>> Are you really using netns in the first place ?
>>
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
>
>Ah this makes sense now.
>
>Dmitry Torokhov hack breaks the assumption I used in my patch.
>
>Since it is not upstream yet, I believe that it will need more work
>before being in a proper state.

It is in Greg's tree where all kobject patches should go through as far as I know.


Thanks.

-- 
Dmitry

^ permalink raw reply

* Re: [PATCH V2] r8152: add Linksys USB3GIGV1 id
From: Doug Anderson @ 2017-09-26 15:19 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Hayes Wang, linux-usb, David S . Miller, LKML, netdev,
	Greg Kroah-Hartman
In-Reply-To: <20170926010925.114436-1-grundler@chromium.org>

Hi

On Mon, Sep 25, 2017 at 6:09 PM, Grant Grundler <grundler@chromium.org> wrote:
> This linksys dongle by default comes up in cdc_ether mode.
> This patch allows r8152 to claim the device:
>    Bus 002 Device 002: ID 13b1:0041 Linksys
>
> Signed-off-by: Grant Grundler <grundler@chromium.org>
> ---
>  drivers/net/usb/cdc_ether.c | 8 ++++++++
>  drivers/net/usb/r8152.c     | 2 ++
>  2 files changed, 10 insertions(+)
>
> V2: add LINKSYS_VENDOR_ID to cdc_ether blacklist

I understand that in v1 people pointed out that if we didn't add this
to the cdc_ether blacklist that we might end up picking the wrong
driver.  ...but one thing concerns me: what happens if someone has the
CDC_ETHER driver configured in their system but _not_ the R8152
driver?  All of a sudden this USB Ethernet adapter which used to work
fine with the CDC Ethernet driver will stop working.

I know that for at least some of the adapters in the CDC Ethernet
blacklist it was claimed that the CDC Ethernet support in the adapter
was kinda broken anyway so the blacklist made sense.  ...but for the
Linksys Gigabit adapter the CDC Ethernet driver seems to work OK, it's
just not quite as full featured / efficient as the R8152 driver.

Is that not a concern?  I guess you could tell people in this
situation that they simply need to enable the R8152 driver to get
continued support for their Ethernet adapter?

-Doug

^ permalink raw reply

* Re: [PATCH v2 2/2] ip_tunnel: add mpls over gre encapsulation
From: Roopa Prabhu @ 2017-09-26 15:15 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <c40295d24ec5207f5be695a2f888bfa840e2ef2c.1506416988.git.amine.kherbouche@6wind.com>

On Tue, Sep 26, 2017 at 2:22 AM, Amine Kherbouche
<amine.kherbouche@6wind.com> wrote:
> This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
> API.
>
> Encap:
>   - Add a new iptunnel type mpls.
>   - Share tx path: gre type mpls loaded from skb->protocol.
>
> Decap:
>   - pull gre hdr and call mpls_forward().
>
> Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
> ---
>  include/net/gre.h              |  3 +++
>  include/uapi/linux/if_tunnel.h |  1 +
>  net/ipv4/gre_demux.c           | 22 ++++++++++++++++++++++
>  net/ipv4/ip_gre.c              |  9 +++++++++
>  net/ipv6/ip6_gre.c             |  7 +++++++
>  net/mpls/af_mpls.c             | 40 ++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 82 insertions(+)
>
> diff --git a/include/net/gre.h b/include/net/gre.h
> index d25d836..88a8343 100644
> --- a/include/net/gre.h
> +++ b/include/net/gre.h
> @@ -35,6 +35,9 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
>                                        u8 name_assign_type);
>  int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>                      bool *csum_err, __be16 proto, int nhs);
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
> +#endif
>
>  static inline int gre_calc_hlen(__be16 o_flags)
>  {
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 2e52088..a2f48c0 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -84,6 +84,7 @@ enum tunnel_encap_types {
>         TUNNEL_ENCAP_NONE,
>         TUNNEL_ENCAP_FOU,
>         TUNNEL_ENCAP_GUE,
> +       TUNNEL_ENCAP_MPLS,
>  };
>
>  #define TUNNEL_ENCAP_FLAG_CSUM         (1<<0)
> diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
> index b798862..a6a937e 100644
> --- a/net/ipv4/gre_demux.c
> +++ b/net/ipv4/gre_demux.c
> @@ -23,6 +23,9 @@
>  #include <linux/netdevice.h>
>  #include <linux/if_tunnel.h>
>  #include <linux/spinlock.h>
> +#if IS_ENABLED(CONFIG_MPLS)
> +#include <linux/mpls.h>
> +#endif
>  #include <net/protocol.h>
>  #include <net/gre.h>
>
> @@ -122,6 +125,25 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>  }
>  EXPORT_SYMBOL(gre_parse_header);
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len)
> +{
> +       if (unlikely(!pskb_may_pull(skb, gre_hdr_len)))
> +               goto drop;
> +
> +       /* Pop GRE hdr and reset the skb */
> +       skb_pull(skb, gre_hdr_len);
> +       skb_reset_network_header(skb);
> +
> +       mpls_forward(skb, skb->dev, NULL, NULL);

pls check return value

can mpls_gre_rcv be moved to af_mpls.c ?


> +
> +       return 0;
> +drop:
> +       return NET_RX_DROP;
> +}
> +EXPORT_SYMBOL(mpls_gre_rcv);
> +#endif
> +
>  static int gre_rcv(struct sk_buff *skb)
>  {
>         const struct gre_protocol *proto;
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 9cee986..dd4431c 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -412,10 +412,19 @@ static int gre_rcv(struct sk_buff *skb)
>                         return 0;
>         }
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }
> +#endif
> +
>         if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
>                 return 0;
>
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> +

unnecessary new line..

>  drop:
>         kfree_skb(skb);
>         return 0;
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index c82d41e..e52396d 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -476,6 +476,13 @@ static int gre_rcv(struct sk_buff *skb)
>         if (hdr_len < 0)
>                 goto drop;
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }

+newline

also would be nice if the IS_ENABLED could be moved to around mpls_gre_rcv.


> +#endif
>         if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
>                 goto drop;
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..5505074 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -16,6 +16,7 @@
>  #include <net/arp.h>
>  #include <net/ip_fib.h>
>  #include <net/netevent.h>
> +#include <net/ip_tunnels.h>
>  #include <net/netns/generic.h>
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ipv6.h>
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> +       return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> +                           u8 *protocol, struct flowi4 *fl4)
> +{
> +       return 0;
> +}
> +
> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
> +       .encap_hlen     = ipgre_mpls_encap_hlen,
> +       .build_header   = ipgre_mpls_build_header,

There are checks for build header before calling it in iptunnel code,
so, any reason
you can't skip declaring .build_header ?


> +};
> +
> +static int ipgre_tunnel_encap_add_mpls_ops(void)
> +{
> +       int ret = -1;
> +
> +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
> +       ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +#endif
> +
> +       return ret;
> +}
> +
> +static void ipgre_tunnel_encap_del_mpls_ops(void)
> +{
> +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
> +       ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +#endif
> +}
> +
>  static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
>                        struct nlmsghdr *nlh, struct net *net, u32 portid,
>                        unsigned int nlm_flags);
> @@ -2486,6 +2521,10 @@ static int __init mpls_init(void)
>                       0);
>         rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>                       mpls_netconf_dump_devconf, 0);
> +       err = ipgre_tunnel_encap_add_mpls_ops();
> +       if (err)
> +               pr_err("Can't add mpls over gre tunnel ops\n");
> +

This will throw an error  if CONFIG_NET_IP_TUNNEL is not enabled.
Can you pls put the CONFIG_NET_IP_TUNNEL around
ipgre_tunnel_encap_add_mpls_ops ?
see CONFIG_INET checks in the rest of af_mpls as example.
same for del_ops


>         err = 0;
>  out:
>         return err;
> @@ -2503,6 +2542,7 @@ static void __exit mpls_exit(void)
>         dev_remove_pack(&mpls_packet_type);
>         unregister_netdevice_notifier(&mpls_dev_notifier);
>         unregister_pernet_subsys(&mpls_net_ops);
> +       ipgre_tunnel_encap_del_mpls_ops();
>  }
>  module_exit(mpls_exit);
>
> --
> 2.1.4
>

^ permalink raw reply

* [PATCH][next] cxgb4: make function ch_flower_stats_cb, fixes warning
From: Colin King @ 2017-09-26 15:14 UTC (permalink / raw)
  To: Ganesh Goudar, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The function ch_flower_stats_cb is local to the source and does not need
to be in global scope, so make it static.

Cleans up sparse warnings:
symbol 'ch_flower_stats_cb' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index a36bd66d2834..92a311767381 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -366,7 +366,7 @@ int cxgb4_tc_flower_destroy(struct net_device *dev,
 	return ret;
 }
 
-void ch_flower_stats_cb(unsigned long data)
+static void ch_flower_stats_cb(unsigned long data)
 {
 	struct adapter *adap = (struct adapter *)data;
 	struct ch_tc_flower_entry *flower_entry;
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Eric Dumazet @ 2017-09-26 15:13 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <266d30e7-5164-48e8-b802-56bb93558823@mellanox.com>

On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
>
> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>
>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>
>>>
>>> Hi Eric,
>>>
>>> We see a regression introduced in this series, specifically in the
>>> patches
>>> touching lib/kobject_uevent.c.
>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>
>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>> been
>>> unloaded at this point
>>> Please see log below.
>>>
>>> This looks to be some kind of a race, as the repro is not deterministic.
>>> Probably the en/ib modules are now mistakenly reloaded.
>>>
>>> Any idea what could this be?
>>>
>>> Regards,
>>> Tariq
>>>
>>>
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading HCA driver:                                      [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading mlx4_core                                        [FAILED]
>>> rmmod: ERROR: Module mlx4_core is in use
>>
>> I have absolutely no idea. Please bisect.
>
> We previously saw a similar issue, that was reported in mailing list.
> Dmitry Torokhov suggested the following fix:
> https://lkml.org/lkml/2017/9/12/523
>
> And indeed, it solved the issue.
>
> We kept the suggested patch in our internal branch, and rebased.
> Issue appeared again once your series was accepted.
>
> By bisecting, we see that the issue re-appears in this patch:
> 4a336a23d619 kobject: copy env blob in one go
>
>>
>> Are you really using netns in the first place ?
>
> No. But seems like it still affects the modules load/unload.
>
> Regards,
> Tariq

Ah this makes sense now.

Dmitry Torokhov hack breaks the assumption I used in my patch.

Since it is not upstream yet, I believe that it will need more work
before being in a proper state.

Thanks.

^ permalink raw reply

* Re: [PATCH] p54: don't unregister leds when they are inited
From: Andrey Konovalov @ 2017-09-26 15:12 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Christian Lamparter, Kalle Valo, linux-wireless, netdev, LKML,
	Dmitry Vyukov, Kostya Serebryany
In-Reply-To: <1506438516.22427.21.camel@sipsolutions.net>

On Tue, Sep 26, 2017 at 5:08 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Subject should say *not* initialized?

Yes, sent v2.

>
> johannes

^ permalink raw reply

* [PATCH v2] p54: don't unregister leds when they are not initialized
From: Andrey Konovalov @ 2017-09-26 15:11 UTC (permalink / raw)
  To: Christian Lamparter, Kalle Valo, linux-wireless, netdev,
	linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany, Andrey Konovalov

ieee80211_register_hw() in p54_register_common() may fail and leds won't
get initialized. Currently p54_unregister_common() doesn't check that and
always calls p54_unregister_leds(). The fix is to check priv->registered
flag before calling p54_unregister_leds().

Found by syzkaller.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
4.14.0-rc1-42251-gebb2c2437d80-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
 flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
 __cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
 cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
 p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
 p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
 p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
 __device_release_driver drivers/base/dd.c:861
 device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
 device_release_driver+0x1e/0x30 drivers/base/dd.c:918
 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
 device_del+0x5c4/0xab0 drivers/base/core.c:1985
 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
 hub_port_connect drivers/usb/core/hub.c:4754
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 process_scheduled_works kernel/workqueue.c:2179
 worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---

changes in v2:
- fixed typo in patch subject

---
 drivers/net/wireless/intersil/p54/main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index d5a3bf91a03e..ab6d39e12069 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -852,12 +852,11 @@ void p54_unregister_common(struct ieee80211_hw *dev)
 {
 	struct p54_common *priv = dev->priv;

-#ifdef CONFIG_P54_LEDS
-	p54_unregister_leds(priv);
-#endif /* CONFIG_P54_LEDS */
-
 	if (priv->registered) {
 		priv->registered = false;
+#ifdef CONFIG_P54_LEDS
+		p54_unregister_leds(priv);
+#endif /* CONFIG_P54_LEDS */
 		ieee80211_unregister_hw(dev);
 	}

-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: John Hurley @ 2017-09-26 15:11 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAJ3xEMgmC+4V+Yr9PeO1wJ=ACBoPMTrasd7Joyx--F9PjV1yvg@mail.gmail.com>

On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>> <simon.horman@netronome.com> wrote:
>>>> From: John Hurley <john.hurley@netronome.com>
>>>>
>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>
>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>
>>
>> Sorry, this is a typo  and should refer to TC.
>>
>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> @@ -52,8 +52,25 @@
>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>
>>> this series takes care of IPv6 tunnels too?
>>
>> IPv6 is not included in this set.
>> The reason the IPv6 bit is included here is to account for behavior we
>> have noticed in TC flower.
>> If, for example, I add a filter with the following match fields:
>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>> enc_dst_port 4789 enc_key_id 123'
>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>> addresses as 'used'.
>> I am not sure if this is a bug in TC or that we are expected to check
>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>
> you should have your code to check enc_control->addr_type to be
> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>
>
>> Including the IPv6 used_keys bit in our whitelist approach allows us
>> to accept legitimate IPv4 tunnel rules in these situations.
>
> mmm can please take a look on fl_init_dissector() and tell me if you
> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
> I am not sure.


The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
array of keys which are then translated to the used_keys values.
The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
checks if any mask bits are set in a particular field - if so it
eventually marks it as used.
In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
represented as a union of the 2.
Therefore, if we have masked bits set for IPv4, they are also being
set for the IPv6 field.


>
>> If it is found to be IPv6 when the rule is parsed, it will be rejected here.

^ permalink raw reply

* Re: [PATCH] p54: don't unregister leds when they are inited
From: Johannes Berg @ 2017-09-26 15:08 UTC (permalink / raw)
  To: Andrey Konovalov, Christian Lamparter, Kalle Valo, linux-wireless,
	netdev, linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany
In-Reply-To: <384966f3a79915b4617d808c40e63072aa8b868d.1506438160.git.andreyknvl@google.com>

Subject should say *not* initialized?

johannes

^ permalink raw reply

* Re: [PATCH] tun: bail out from tun_get_user() if the skb is empty
From: Alexander Potapenko @ 2017-09-26 15:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Dmitry Vyukov, syzkaller, netdev, LKML
In-Reply-To: <CANn89i+0uj=N2aWc6wUq4EK1jHmEVoxw0UcCPiFoggGTea++Dw@mail.gmail.com>

On Tue, Sep 26, 2017 at 5:02 PM, 'Eric Dumazet' via syzkaller
<syzkaller@googlegroups.com> wrote:
> On Tue, Sep 26, 2017 at 7:53 AM, Alexander Potapenko <glider@google.com> wrote:
>> KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
>> skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
>>
>> ================================================
>> BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
>> CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>> ...
>>  __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
>>  tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
>>  tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
>>  call_write_iter ./include/linux/fs.h:1743
>>  new_sync_write fs/read_write.c:457
>>  __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
>>  vfs_write+0x3e4/0x770 fs/read_write.c:518
>>  SYSC_write+0x12f/0x2b0 fs/read_write.c:565
>>  SyS_write+0x55/0x80 fs/read_write.c:557
>>  do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
>>  entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
>> ...
>> origin:
>> ...
>>  kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
>>  slab_alloc_node mm/slub.c:2732
>>  __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
>>  __kmalloc_reserve net/core/skbuff.c:138
>>  __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
>>  alloc_skb ./include/linux/skbuff.h:903
>>  alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
>>  sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
>>  tun_alloc_skb drivers/net/tun.c:1144
>>  tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
>>  tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
>>  call_write_iter ./include/linux/fs.h:1743
>>  new_sync_write fs/read_write.c:457
>>  __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
>>  vfs_write+0x3e4/0x770 fs/read_write.c:518
>>  SYSC_write+0x12f/0x2b0 fs/read_write.c:565
>>  SyS_write+0x55/0x80 fs/read_write.c:557
>>  do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
>>  return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
>> ================================================
>>
>> Make sure tun_get_user() doesn't touch skb->data[0] unless there is
>> actual data.
>>
>> C reproducer below:
>> ==========================
>>     // autogenerated by syzkaller (http://github.com/google/syzkaller)
>>
>>     #define _GNU_SOURCE
>>
>>     #include <fcntl.h>
>>     #include <linux/if_tun.h>
>>     #include <netinet/ip.h>
>>     #include <net/if.h>
>>     #include <string.h>
>>     #include <sys/ioctl.h>
>>
>>     int main()
>>     {
>>       int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
>>       int tun_fd = open("/dev/net/tun", O_RDWR);
>>       struct ifreq req;
>>       memset(&req, 0, sizeof(struct ifreq));
>>       strcpy((char*)&req.ifr_name, "gre0");
>>       req.ifr_flags = IFF_UP | IFF_MULTICAST;
>>       ioctl(tun_fd, TUNSETIFF, &req);
>>       ioctl(sock, SIOCSIFFLAGS, "gre0");
>>       write(tun_fd, "hi", 0);
>>       return 0;
>>     }
>> ==========================
>>
>> Signed-off-by: Alexander Potapenko <glider@google.com>
>> ---
>>  drivers/net/tun.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 3c9985f29950..25d96f54a729 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1496,6 +1496,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>         switch (tun->flags & TUN_TYPE_MASK) {
>>         case IFF_TUN:
>>                 if (tun->flags & IFF_NO_PI) {
>> +                       if (!skb->len)
>> +                               return -EINVAL;
>>                         switch (skb->data[0] & 0xf0) {
>>                         case 0x40:
>>                                 pi.proto = htons(ETH_P_IP);
>> --
>> 2.14.1.821.g8fa685d3b7-goog
>>
>
> Good catch, but...
>
> You are replacing an harmless read by a memory leak, which is far more
> problematic :/
Ah, you're right, I should have done kfree_skb() and incremented the
stats. That's also rx_dropped, correct?
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* Re: [RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds
From: Andrey Konovalov @ 2017-09-26 15:06 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Johannes Berg, Kalle Valo, linux-wireless, netdev, LKML,
	Dmitry Vyukov, Kostya Serebryany, syzkaller, Stephen Boyd,
	Tejun Heo, Yong Zhang
In-Reply-To: <2589427.Vd4nrgaY4N@debian64>

On Sat, Sep 23, 2017 at 9:37 PM, 'Christian Lamparter' via syzkaller
<syzkaller@googlegroups.com> wrote:
> This got rejected by gmail once. Let's see if it works now.
>
> On Thursday, September 21, 2017 8:22:45 PM CEST Andrey Konovalov wrote:
>> On Wed, Sep 20, 2017 at 9:55 PM, Johannes Berg
>> <johannes@sipsolutions.net> wrote:
>> > On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:
>> >
>> >> It seems this is caused as a result of:
>> >>     -> lock_map_acquire(&work->lockdep_map);
>> >>           lock_map_release(&work->lockdep_map);
>> >>
>> >>     in flush_work() [0]
>> >
>> > Agree.
>> >
>> >> This was added by:
>> >>
>> >>       commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
>> >>       Author: Stephen Boyd <sboyd@codeaurora.org>
>> >>       Date:   Fri Apr 20 17:28:50 2012 -0700
>> >>
>> >>       workqueue: Catch more locking problems with flush_work()
>> >
>> > Yes, but that doesn't matter.
>> >
>> >> Looking at the Stephen's patch, it's clear that it was made
>> >> with "static DECLARE_WORK(work, my_work)" in mind. However
>> >> p54's led_work is "per-device", hence it is stored in the
>> >> devices context p54_common, which is dynamically allocated.
>> >> So, maybe revert Stephen's patch?
>> >
>> > I disagree - as the lockdep warning says:
>> >
>> >> > INFO: trying to register non-static key.
>> >> > the code is fine but needs lockdep annotation.
>> >> > turning off the locking correctness validator.
>> >
>> > What it needs is to actually correctly go through initializing the work
>> > at least once.
>> >
>> > Without more information, I can't really say what's going on, but I
>> > assume that something is failing and p54_unregister_leds() is getting
>> > invoked without p54_init_leds() having been invoked, so essentially
>> > it's trying to flush a work that was never initialized?
>> >
>> > INIT_DELAYED_WORK() does, after all, initialize the lockdep map
>> > properly via __INIT_WORK().
>
> Ok, thanks. This does indeed explain it.
>
> But this also begs the question: Is this really working then?
> From what I can tell, if CONFIG_LOCKDEP is not set then there's no BUG
> no WARN, no other splat or any other odd system behaviour. Does
> [cancel | flush]_[delayed_]work[_sync] really "just work" by *accident*,
> as long the delayed_work | work_struct is zeroed out?
> And should it work in the future as well?
>
>> Since I'm able to reproduce this, please let me know if you need me to
>> collect some debug traces to help with the triage.
>
> Do you want to take a shot at making a patch too? At a quick glance, it
> should be enough to move the [#ifdef CONFIG_P54_LEDS ... #endif] block
> in p54_unregister_common() into the if (priv->registered) { block
> (preferably before the ieee80211_unregister_hw(dev).

Just mailed a patch.

>
> Regards,
> Christian
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* [PATCH] p54: don't unregister leds when they are inited
From: Andrey Konovalov @ 2017-09-26 15:05 UTC (permalink / raw)
  To: Christian Lamparter, Kalle Valo, linux-wireless, netdev,
	linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany, Andrey Konovalov

ieee80211_register_hw() in p54_register_common() may fail and leds won't
get initialized. Currently p54_unregister_common() doesn't check that and
always calls p54_unregister_leds(). The fix is to check priv->registered
flag before calling p54_unregister_leds().

Found by syzkaller.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
4.14.0-rc1-42251-gebb2c2437d80-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
 flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
 __cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
 cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
 p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
 p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
 p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
 __device_release_driver drivers/base/dd.c:861
 device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
 device_release_driver+0x1e/0x30 drivers/base/dd.c:918
 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
 device_del+0x5c4/0xab0 drivers/base/core.c:1985
 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
 hub_port_connect drivers/usb/core/hub.c:4754
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 process_scheduled_works kernel/workqueue.c:2179
 worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 drivers/net/wireless/intersil/p54/main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index d5a3bf91a03e..ab6d39e12069 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -852,12 +852,11 @@ void p54_unregister_common(struct ieee80211_hw *dev)
 {
 	struct p54_common *priv = dev->priv;

-#ifdef CONFIG_P54_LEDS
-	p54_unregister_leds(priv);
-#endif /* CONFIG_P54_LEDS */
-
 	if (priv->registered) {
 		priv->registered = false;
+#ifdef CONFIG_P54_LEDS
+		p54_unregister_leds(priv);
+#endif /* CONFIG_P54_LEDS */
 		ieee80211_unregister_hw(dev);
 	}

-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Tariq Toukan @ 2017-09-26 15:04 UTC (permalink / raw)
  To: Eric Dumazet, Dmitry Torokhov
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89i+LE6He_E3VHwUTpLAVW+uUGo+FPs_zg1bs4P3CVcCL6Q@mail.gmail.com>



On 26/09/2017 3:51 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> Hi Eric,
>>
>> We see a regression introduced in this series, specifically in the patches
>> touching lib/kobject_uevent.c.
>> We tried to figure out what is wrong there, but couldn't point it out.
>>
>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>> According to module dependencies, both mlx4_en and mlx4_ib should have been
>> unloaded at this point
>> Please see log below.
>>
>> This looks to be some kind of a race, as the repro is not deterministic.
>> Probably the en/ib modules are now mistakenly reloaded.
>>
>> Any idea what could this be?
>>
>> Regards,
>> Tariq
>>
>>
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>> Unloading HCA driver:                                      [  OK  ]
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>> Loading HCA driver and Access Layer:                       [  OK  ]
>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>> Unloading mlx4_core                                        [FAILED]
>> rmmod: ERROR: Module mlx4_core is in use
> I have absolutely no idea. Please bisect.
We previously saw a similar issue, that was reported in mailing list.
Dmitry Torokhov suggested the following fix:
https://lkml.org/lkml/2017/9/12/523

And indeed, it solved the issue.

We kept the suggested patch in our internal branch, and rebased.
Issue appeared again once your series was accepted.

By bisecting, we see that the issue re-appears in this patch:
4a336a23d619 kobject: copy env blob in one go

>
> Are you really using netns in the first place ?
No. But seems like it still affects the modules load/unload.

Regards,
Tariq

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox