Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/5] bpf: list BTF objects loaded on system
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Hi,
This set adds a new command BPF_BTF_GET_NEXT_ID to the bpf() system call,
adds the relevant API function in libbpf, and uses it in bpftool to list
all BTF objects loaded on the system (and to dump the ids of maps and
programs associated with them, if any).

The main motivation of listing BTF objects is introspection and debugging
purposes. By getting BPF program and map information, it should already be
possible to list all BTF objects associated to at least one map or one
program. But there may be unattached BTF objects, held by a file descriptor
from a user space process only, and we may want to list them too.

As a side note, it also turned useful for examining the BTF objects
attached to offloaded programs, which would not show in program information
because the BTF id is not copied when retrieving such info. A fix is in
progress on that side.

v2:
- Rebase patch with new libbpf function on top of Andrii's changes
  regarding libbpf versioning.

Quentin Monnet (5):
  bpf: add new BPF_BTF_GET_NEXT_ID syscall command
  tools: bpf: synchronise BPF UAPI header with tools
  libbpf: refactor bpf_*_get_next_id() functions
  libbpf: add bpf_btf_get_next_id() to cycle through BTF objects
  tools: bpftool: implement "bpftool btf show|list"

 include/linux/bpf.h                           |   3 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/btf.c                              |   4 +-
 kernel/bpf/syscall.c                          |   4 +
 .../bpf/bpftool/Documentation/bpftool-btf.rst |   7 +
 tools/bpf/bpftool/bash-completion/bpftool     |  20 +-
 tools/bpf/bpftool/btf.c                       | 342 +++++++++++++++++-
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           |  24 +-
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.map                      |   2 +
 11 files changed, 389 insertions(+), 20 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH bpf-next v2 1/5] bpf: add new BPF_BTF_GET_NEXT_ID syscall command
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add a new command for the bpf() system call: BPF_BTF_GET_NEXT_ID is used
to cycle through all BTF objects loaded on the system.

The motivation is to be able to inspect (list) all BTF objects presents
on the system.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/bpf.h      | 3 +++
 include/uapi/linux/bpf.h | 1 +
 kernel/bpf/btf.c         | 4 ++--
 kernel/bpf/syscall.c     | 4 ++++
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 15ae49862b82..5b9d22338606 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -24,6 +24,9 @@ struct seq_file;
 struct btf;
 struct btf_type;
 
+extern struct idr btf_idr;
+extern spinlock_t btf_idr_lock;
+
 /* map is generic key/value storage optionally accesible by eBPF programs */
 struct bpf_map_ops {
 	/* funcs callable from userspace (via syscall) */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0ef594ac3899..8aa6126f0b6e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -106,6 +106,7 @@ enum bpf_cmd {
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
+	BPF_BTF_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..e716a64b2f7f 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -195,8 +195,8 @@
 	     i < btf_type_vlen(struct_type);					\
 	     i++, member++)
 
-static DEFINE_IDR(btf_idr);
-static DEFINE_SPINLOCK(btf_idr_lock);
+DEFINE_IDR(btf_idr);
+DEFINE_SPINLOCK(btf_idr_lock);
 
 struct btf {
 	void *data;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cf8052b016e7..c0f62fd67c6b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2884,6 +2884,10 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 		err = bpf_obj_get_next_id(&attr, uattr,
 					  &map_idr, &map_idr_lock);
 		break;
+	case BPF_BTF_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &btf_idr, &btf_idr_lock);
+		break;
 	case BPF_PROG_GET_FD_BY_ID:
 		err = bpf_prog_get_fd_by_id(&attr);
 		break;
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 2/5] tools: bpf: synchronise BPF UAPI header with tools
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Synchronise the bpf.h header under tools, to report the addition of the
new BPF_BTF_GET_NEXT_ID syscall command for bpf().

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0ef594ac3899..8aa6126f0b6e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -106,6 +106,7 @@ enum bpf_cmd {
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
+	BPF_BTF_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 3/5] libbpf: refactor bpf_*_get_next_id() functions
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

In preparation for the introduction of a similar function for retrieving
the id of the next BTF object, consolidate the code from
bpf_prog_get_next_id() and bpf_map_get_next_id() in libbpf.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/lib/bpf/bpf.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index c7d7993c44bb..1439e99c9be5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -568,7 +568,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
 	return ret;
 }
 
-int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
+static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id, int cmd)
 {
 	union bpf_attr attr;
 	int err;
@@ -576,26 +576,21 @@ int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
 	memset(&attr, 0, sizeof(attr));
 	attr.start_id = start_id;
 
-	err = sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr));
+	err = sys_bpf(cmd, &attr, sizeof(attr));
 	if (!err)
 		*next_id = attr.next_id;
 
 	return err;
 }
 
-int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
 {
-	union bpf_attr attr;
-	int err;
-
-	memset(&attr, 0, sizeof(attr));
-	attr.start_id = start_id;
-
-	err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, sizeof(attr));
-	if (!err)
-		*next_id = attr.next_id;
+	return bpf_obj_get_next_id(start_id, next_id, BPF_PROG_GET_NEXT_ID);
+}
 
-	return err;
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_MAP_GET_NEXT_ID);
 }
 
 int bpf_prog_get_fd_by_id(__u32 id)
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 4/5] libbpf: add bpf_btf_get_next_id() to cycle through BTF objects
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add an API function taking a BTF object id and providing the id of the
next BTF object in the kernel. This can be used to list all BTF objects
loaded on the system.

v2:
- Rebase on top of Andrii's changes regarding libbpf versioning.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/lib/bpf/bpf.c      | 5 +++++
 tools/lib/bpf/bpf.h      | 1 +
 tools/lib/bpf/libbpf.map | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 1439e99c9be5..cbb933532981 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -593,6 +593,11 @@ int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
 	return bpf_obj_get_next_id(start_id, next_id, BPF_MAP_GET_NEXT_ID);
 }
 
+int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_BTF_GET_NEXT_ID);
+}
+
 int bpf_prog_get_fd_by_id(__u32 id)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index ff42ca043dc8..0db01334740f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -156,6 +156,7 @@ LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
 				 __u32 *retval, __u32 *duration);
 LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
+LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 4e72df8e98ba..664ce8e7a60e 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -186,4 +186,6 @@ LIBBPF_0.0.4 {
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
+	global:
+		bpf_btf_get_next_id;
 } LIBBPF_0.0.4;
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 5/5] tools: bpftool: implement "bpftool btf show|list"
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add a "btf list" (alias: "btf show") subcommand to bpftool in order to
dump all BTF objects loaded on a system.

When running the command, hash tables are built in bpftool to retrieve
all the associations between BTF objects and BPF maps and programs. This
allows for printing all such associations when listing the BTF objects.

The command is added at the top of the subcommands for "bpftool btf", so
that typing only "bpftool btf" also comes down to listing the programs.
We could not have this with the previous command ("dump"), which
required a BTF object id, so it should not break any previous behaviour.
This also makes the "btf" command behaviour consistent with "prog" or
"map".

Bash completion is updated to use "bpftool btf" instead of "bpftool
prog" to list the BTF ids, as it looks more consistent.

Example output (plain):

    # bpftool btf show
    9: size 2989B  prog_ids 21  map_ids 15
    17: size 2847B  prog_ids 36  map_ids 30,29,28
    26: size 2847B

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../bpf/bpftool/Documentation/bpftool-btf.rst |   7 +
 tools/bpf/bpftool/bash-completion/bpftool     |  20 +-
 tools/bpf/bpftool/btf.c                       | 342 +++++++++++++++++-
 3 files changed, 363 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
index 6694a0fc8f99..39615f8e145b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
@@ -19,6 +19,7 @@ SYNOPSIS
 BTF COMMANDS
 =============
 
+|	**bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*]
 |	**bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*]
 |	**bpftool** **btf help**
 |
@@ -29,6 +30,12 @@ BTF COMMANDS
 
 DESCRIPTION
 ===========
+	**bpftool btf { show | list }** [**id** *BTF_ID*]
+		  Show information about loaded BTF objects. If a BTF ID is
+		  specified, show information only about given BTF object,
+		  otherwise list all BTF objects currently loaded on the
+		  system.
+
 	**bpftool btf dump** *BTF_SRC*
 		  Dump BTF entries from a given *BTF_SRC*.
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 4549fd424069..2ffd351f9dbf 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -73,8 +73,8 @@ _bpftool_get_prog_tags()
 
 _bpftool_get_btf_ids()
 {
-    COMPREPLY+=( $( compgen -W "$( bpftool -jp prog 2>&1 | \
-        command sed -n 's/.*"btf_id": \(.*\),\?$/\1/p' )" -- "$cur" ) )
+    COMPREPLY+=( $( compgen -W "$( bpftool -jp btf 2>&1 | \
+        command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
 }
 
 _bpftool_get_obj_map_names()
@@ -670,7 +670,7 @@ _bpftool()
                                 map)
                                     _bpftool_get_map_ids
                                     ;;
-                                dump)
+                                $command)
                                     _bpftool_get_btf_ids
                                     ;;
                             esac
@@ -698,9 +698,21 @@ _bpftool()
                             ;;
                     esac
                     ;;
+                show|list)
+                    case $prev in
+                        $command)
+                            COMPREPLY+=( $( compgen -W "id" -- "$cur" ) )
+                            ;;
+                        id)
+                            _bpftool_get_btf_ids
+                            ;;
+                    esac
+                    return 0
+                    ;;
                 *)
                     [[ $prev == $object ]] && \
-                        COMPREPLY=( $( compgen -W 'dump help' -- "$cur" ) )
+                        COMPREPLY=( $( compgen -W 'dump help show list' \
+                            -- "$cur" ) )
                     ;;
             esac
             ;;
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 8805637f1a7e..9a9376d1d3df 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -11,6 +11,7 @@
 #include <bpf.h>
 #include <libbpf.h>
 #include <linux/btf.h>
+#include <linux/hashtable.h>
 
 #include "btf.h"
 #include "json_writer.h"
@@ -35,6 +36,16 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
 	[BTF_KIND_DATASEC]	= "DATASEC",
 };
 
+struct btf_attach_table {
+	DECLARE_HASHTABLE(table, 16);
+};
+
+struct btf_attach_point {
+	__u32 obj_id;
+	__u32 btf_id;
+	struct hlist_node hash;
+};
+
 static const char *btf_int_enc_str(__u8 encoding)
 {
 	switch (encoding) {
@@ -522,6 +533,330 @@ static int do_dump(int argc, char **argv)
 	return err;
 }
 
+static int btf_parse_fd(int *argc, char ***argv)
+{
+	unsigned int id;
+	char *endptr;
+	int fd;
+
+	if (!is_prefix(*argv[0], "id")) {
+		p_err("expected 'id', got: '%s'?", **argv);
+		return -1;
+	}
+	NEXT_ARGP();
+
+	id = strtoul(**argv, &endptr, 0);
+	if (*endptr) {
+		p_err("can't parse %s as ID", **argv);
+		return -1;
+	}
+	NEXT_ARGP();
+
+	fd = bpf_btf_get_fd_by_id(id);
+	if (fd < 0)
+		p_err("can't get BTF object by id (%u): %s",
+		      id, strerror(errno));
+
+	return fd;
+}
+
+static void delete_btf_table(struct btf_attach_table *tab)
+{
+	struct btf_attach_point *obj;
+	struct hlist_node *tmp;
+
+	unsigned int bkt;
+
+	hash_for_each_safe(tab->table, bkt, tmp, obj, hash) {
+		hash_del(&obj->hash);
+		free(obj);
+	}
+}
+
+static int
+build_btf_type_table(struct btf_attach_table *tab, enum bpf_obj_type type,
+		     void *info, __u32 *len)
+{
+	static const char * const names[] = {
+		[BPF_OBJ_UNKNOWN]	= "unknown",
+		[BPF_OBJ_PROG]		= "prog",
+		[BPF_OBJ_MAP]		= "map",
+	};
+	struct btf_attach_point *obj_node;
+	__u32 btf_id, id = 0;
+	int err;
+	int fd;
+
+	while (true) {
+		switch (type) {
+		case BPF_OBJ_PROG:
+			err = bpf_prog_get_next_id(id, &id);
+			break;
+		case BPF_OBJ_MAP:
+			err = bpf_map_get_next_id(id, &id);
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			p_err("can't get next %s: %s%s", names[type],
+			      strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			goto err_free;
+		}
+
+		switch (type) {
+		case BPF_OBJ_PROG:
+			fd = bpf_prog_get_fd_by_id(id);
+			break;
+		case BPF_OBJ_MAP:
+			fd = bpf_map_get_fd_by_id(id);
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get %s by id (%u): %s", names[type], id,
+			      strerror(errno));
+			err = -1;
+			goto err_free;
+		}
+
+		memset(info, 0, *len);
+		err = bpf_obj_get_info_by_fd(fd, info, len);
+		close(fd);
+		if (err) {
+			p_err("can't get %s info: %s", names[type],
+			      strerror(errno));
+			goto err_free;
+		}
+
+		switch (type) {
+		case BPF_OBJ_PROG:
+			btf_id = ((struct bpf_prog_info *)info)->btf_id;
+			break;
+		case BPF_OBJ_MAP:
+			btf_id = ((struct bpf_map_info *)info)->btf_id;
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (!btf_id)
+			continue;
+
+		obj_node = calloc(1, sizeof(*obj_node));
+		if (!obj_node) {
+			p_err("failed to allocate memory: %s", strerror(errno));
+			goto err_free;
+		}
+
+		obj_node->obj_id = id;
+		obj_node->btf_id = btf_id;
+		hash_add(tab->table, &obj_node->hash, obj_node->btf_id);
+	}
+
+	return 0;
+
+err_free:
+	delete_btf_table(tab);
+	return err;
+}
+
+static int
+build_btf_tables(struct btf_attach_table *btf_prog_table,
+		 struct btf_attach_table *btf_map_table)
+{
+	struct bpf_prog_info prog_info;
+	__u32 prog_len = sizeof(prog_info);
+	struct bpf_map_info map_info;
+	__u32 map_len = sizeof(map_info);
+	int err = 0;
+
+	err = build_btf_type_table(btf_prog_table, BPF_OBJ_PROG, &prog_info,
+				   &prog_len);
+	if (err)
+		return err;
+
+	err = build_btf_type_table(btf_map_table, BPF_OBJ_MAP, &map_info,
+				   &map_len);
+	if (err) {
+		delete_btf_table(btf_prog_table);
+		return err;
+	}
+
+	return 0;
+}
+
+static void
+show_btf_plain(struct bpf_btf_info *info, int fd,
+	       struct btf_attach_table *btf_prog_table,
+	       struct btf_attach_table *btf_map_table)
+{
+	struct btf_attach_point *obj;
+	int n;
+
+	printf("%u: ", info->id);
+	printf("size %uB", info->btf_size);
+
+	n = 0;
+	hash_for_each_possible(btf_prog_table->table, obj, hash, info->id) {
+		if (obj->btf_id == info->id)
+			printf("%s%u", n++ == 0 ? "  prog_ids " : ",",
+			       obj->obj_id);
+	}
+
+	n = 0;
+	hash_for_each_possible(btf_map_table->table, obj, hash, info->id) {
+		if (obj->btf_id == info->id)
+			printf("%s%u", n++ == 0 ? "  map_ids " : ",",
+			       obj->obj_id);
+	}
+
+	printf("\n");
+}
+
+static void
+show_btf_json(struct bpf_btf_info *info, int fd,
+	      struct btf_attach_table *btf_prog_table,
+	      struct btf_attach_table *btf_map_table)
+{
+	struct btf_attach_point *obj;
+
+	jsonw_start_object(json_wtr);	/* btf object */
+	jsonw_uint_field(json_wtr, "id", info->id);
+	jsonw_uint_field(json_wtr, "size", info->btf_size);
+
+	jsonw_name(json_wtr, "prog_ids");
+	jsonw_start_array(json_wtr);	/* prog_ids */
+	hash_for_each_possible(btf_prog_table->table, obj, hash,
+			       info->id) {
+		if (obj->btf_id == info->id)
+			jsonw_uint(json_wtr, obj->obj_id);
+	}
+	jsonw_end_array(json_wtr);	/* prog_ids */
+
+	jsonw_name(json_wtr, "map_ids");
+	jsonw_start_array(json_wtr);	/* map_ids */
+	hash_for_each_possible(btf_map_table->table, obj, hash,
+			       info->id) {
+		if (obj->btf_id == info->id)
+			jsonw_uint(json_wtr, obj->obj_id);
+	}
+	jsonw_end_array(json_wtr);	/* map_ids */
+	jsonw_end_object(json_wtr);	/* btf object */
+}
+
+static int
+show_btf(int fd, struct btf_attach_table *btf_prog_table,
+	 struct btf_attach_table *btf_map_table)
+{
+	struct bpf_btf_info info = {};
+	__u32 len = sizeof(info);
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		p_err("can't get BTF object info: %s", strerror(errno));
+		return -1;
+	}
+
+	if (json_output)
+		show_btf_json(&info, fd, btf_prog_table, btf_map_table);
+	else
+		show_btf_plain(&info, fd, btf_prog_table, btf_map_table);
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	struct btf_attach_table btf_prog_table;
+	struct btf_attach_table btf_map_table;
+	int err, fd = -1;
+	__u32 id = 0;
+
+	if (argc == 2) {
+		fd = btf_parse_fd(&argc, &argv);
+		if (fd < 0)
+			return -1;
+	}
+
+	if (argc) {
+		if (fd >= 0)
+			close(fd);
+		return BAD_ARG();
+	}
+
+	hash_init(btf_prog_table.table);
+	hash_init(btf_map_table.table);
+	err = build_btf_tables(&btf_prog_table, &btf_map_table);
+	if (err) {
+		if (fd >= 0)
+			close(fd);
+		return err;
+	}
+
+	if (fd >= 0) {
+		err = show_btf(fd, &btf_prog_table, &btf_map_table);
+		close(fd);
+		goto exit_free;
+	}
+
+	if (json_output)
+		jsonw_start_array(json_wtr);	/* root array */
+
+	while (true) {
+		err = bpf_btf_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			p_err("can't get next BTF object: %s%s",
+			      strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			err = -1;
+			break;
+		}
+
+		fd = bpf_btf_get_fd_by_id(id);
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get BTF object by id (%u): %s",
+			      id, strerror(errno));
+			err = -1;
+			break;
+		}
+
+		err = show_btf(fd, &btf_prog_table, &btf_map_table);
+		close(fd);
+		if (err)
+			break;
+	}
+
+	if (json_output)
+		jsonw_end_array(json_wtr);	/* root array */
+
+exit_free:
+	delete_btf_table(&btf_prog_table);
+	delete_btf_table(&btf_map_table);
+
+	return err;
+}
+
 static int do_help(int argc, char **argv)
 {
 	if (json_output) {
@@ -530,7 +865,8 @@ static int do_help(int argc, char **argv)
 	}
 
 	fprintf(stderr,
-		"Usage: %s btf dump BTF_SRC [format FORMAT]\n"
+		"Usage: %s btf { show | list } [id BTF_ID]\n"
+		"       %s btf dump BTF_SRC [format FORMAT]\n"
 		"       %s btf help\n"
 		"\n"
 		"       BTF_SRC := { id BTF_ID | prog PROG | map MAP [{key | value | kv | all}] | file FILE }\n"
@@ -539,12 +875,14 @@ static int do_help(int argc, char **argv)
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
-		bin_name, bin_name);
+		bin_name, bin_name, bin_name);
 
 	return 0;
 }
 
 static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "list",	do_show },
 	{ "help",	do_help },
 	{ "dump",	do_dump },
 	{ 0 }
-- 
2.17.1


^ permalink raw reply related

* Re: Help needed - Kernel lockup while running ipsec
From: Florian Westphal @ 2019-08-20  9:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620487074796FBC015AFD098BAB0@DB7PR04MB4620.eurprd04.prod.outlook.com>

Vakul Garg <vakul.garg@nxp.com> wrote:
> 
> 
> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 2:53 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> > 
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > running single
> > > > static ipsec tunnel.
> > > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > > encap
> > > > test (on my dual core arm board).
> > > > >
> > > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > > in variable
> > > > 'ret' shows refcnt=0 under problem situation.
> > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > > hence the
> > > > lockup.
> > > > >
> > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > > can it
> > > > become '0'?
> > > >
> > > > Yes, when policy is destroyed and the last user calls
> > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > >
> > > It seems that policy reference count never gets decremented during packet
> > ipsec encap.
> > > It is getting incremented for every frame that hits the policy.
> > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > 
> > Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> > as well?
>  
> I am yet to try it on 4.19.
> Can you help me with the right fix? Which part of code should it get decremented?
> I am not conversant with xfrm code.

Normally policy reference counts get decremented when the skb is free'd, via dst
destruction (xfrm_dst_destroy()).

Do you see a dst leak as well?


^ permalink raw reply

* Re: [PATCH -next] bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc()
From: Dan Carpenter @ 2019-08-20  9:44 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Björn Töpel, YueHaibing, Karlsson, Magnus,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, Netdev,
	bpf, kernel-janitors
In-Reply-To: <CAJ+HfNhRf+=yN6eOOZ1zp8=VicT-k6nHLO6r+f__O5X3M+N=ug@mail.gmail.com>

On Tue, Aug 20, 2019 at 11:25:29AM +0200, Björn Töpel wrote:
> On Tue, 20 Aug 2019 at 10:59, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> >
> > On Tue, Aug 20, 2019 at 09:28:26AM +0200, Björn Töpel wrote:
> > > For future patches: Prefix AF_XDP socket work with "xsk:" and use "PATCH
> > > bpf-next" to let the developers know what tree you're aiming for.
> >
> > There are over 300 trees in linux-next.  It impossible to try remember
> > everyone's trees.  No one else has this requirement.
> >
> 
> Net/bpf are different, and I wanted to point that out to lessen the
> burden for the maintainers. It's documented in:
> 
> Documentation/bpf/bpf_devel_QA.rst.
> Documentation/networking/netdev-FAQ.rst

Ah...  I hadn't realized that BPF patches were confusing to Dave.

I actually do keep track of net and net-next.  I do quite a bit of extra
stuff for netdev patches.  So what about if we used [PATCH] for bpf and
[PATCH net] and [PATCH net-next] for networking?

I will do that.

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, linux-kernel, Andrew Lunn, Richard Cochran,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean,
	David S. Miller
In-Reply-To: <20190820084833.6019-3-hubert.feurstein@vahle.at>

On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:

> +	/* PTP offset compensation:
> +	 * After the MDIO access is completed (from the chip perspective), the
> +	 * switch chip will snapshot the PHC timestamp. To make sure our system
> +	 * timestamp corresponds to the PHC timestamp, we have to add the
> +	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> +	 * takes the average of pre_ts and post_ts to calculate the final
> +	 * system timestamp. With this in mind, we have to add ptp_sts_offset
> +	 * twice to post_ts, in order to not introduce an constant time offset.
> +	 */
> +	if (sts)
> +		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);

This correction looks good to me.

Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?

This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH] can: flexcan: free error skb if enqueueing failed
From: Sean Nyekjaer @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Martin Hundebøll, Wolfgang Grandegger, Marc Kleine-Budde,
	linux-can
  Cc: David S . Miller, netdev, Joakim Zhang
In-Reply-To: <20190715185308.104333-1-martin@geanix.com>

CC'ing Joakim Zhang

On 15/07/2019 20.53, Martin Hundebøll wrote:
> If the call to can_rx_offload_queue_sorted() fails, the passed skb isn't
> consumed, so the caller must do so.
> 
> Fixes: 30164759db1b ("can: flexcan: make use of rx-offload's irq_offload_fifo")
> Signed-off-by: Martin Hundebøll <martin@geanix.com>
> ---
>   drivers/net/can/flexcan.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 1c66fb2ad76b..21f39e805d42 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -688,7 +688,8 @@ static void flexcan_irq_bus_err(struct net_device *dev, u32 reg_esr)
>   	if (tx_errors)
>   		dev->stats.tx_errors++;
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> @@ -732,7 +733,8 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>   	if (unlikely(new_state == CAN_STATE_BUS_OFF))
>   		can_bus_off(dev);
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
> 

^ permalink raw reply

* [PATCH bpf-next] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Quentin Monnet @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Implement the show_fdinfo hook for BTF FDs file operations, and make it
print the id and the size of the BTF object. This allows for a quick
retrieval of the BTF id from its FD; or it can help understanding what
type of object (BTF) the file descriptor points to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 kernel/bpf/btf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..39e184f1b27c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3376,6 +3376,19 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 	btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
 }
 
+#ifdef CONFIG_PROC_FS
+static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	const struct btf *btf = filp->private_data;
+
+	seq_printf(m,
+		   "btf_id:\t%u\n"
+		   "data_size:\t%u\n",
+		   btf->id,
+		   btf->data_size);
+}
+#endif
+
 static int btf_release(struct inode *inode, struct file *filp)
 {
 	btf_put(filp->private_data);
@@ -3383,6 +3396,9 @@ static int btf_release(struct inode *inode, struct file *filp)
 }
 
 const struct file_operations btf_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= bpf_btf_show_fdinfo,
+#endif
 	.release	= btf_release,
 };
 
-- 
2.17.1


^ permalink raw reply related

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190820093800.GN2588@breakpoint.cc>



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 3:08 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@strlen.de>
> > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > To: Vakul Garg <vakul.garg@nxp.com>
> > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > running single
> > > > > static ipsec tunnel.
> > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > ipsec encap
> > > > > test (on my dual core arm board).
> > > > > >
> > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > policy in variable
> > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > and hence the
> > > > > lockup.
> > > > > >
> > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > condition can it
> > > > > become '0'?
> > > > >
> > > > > Yes, when policy is destroyed and the last user calls
> > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > >
> > > > It seems that policy reference count never gets decremented during
> > > > packet
> > > ipsec encap.
> > > > It is getting incremented for every frame that hits the policy.
> > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > >
> > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > current tree as well?
> >
> > I am yet to try it on 4.19.
> > Can you help me with the right fix? Which part of code should it get
> decremented?
> > I am not conversant with xfrm code.
> 
> Normally policy reference counts get decremented when the skb is free'd, via
> dst destruction (xfrm_dst_destroy()).
> 
> Do you see a dst leak as well?

Can you please guide me how to detect it?

(I am checking refcount on recent kernel and will let you know.)

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: Delete the VID from the upstream port as well
From: Vladimir Oltean @ 2019-08-20  9:54 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820015138.GB975@t480s.localdomain>

Hi Vivien!

On Tue, 20 Aug 2019 at 08:51, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> Vladimir,
>
> On Tue, 20 Aug 2019 02:59:59 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Commit b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members")
> > is littering a lot. After deleting a VLAN added on a DSA port, it still
> > remains installed in the hardware filter of the upstream port. Fix this.
>
> Littering a lot, really?
>
> FYI we are not removing the target VLAN from the hardware yet because it would
> be too expensive to cache data in DSA core in order to know if the VID is not
> used by any other slave port of the fabric anymore, and thus safe to remove.
>
> Keeping the VID programmed for DSA and CPU ports is simpler for the moment,
> as an hardware VLAN with only these ports as members is unlikely to harm.
>
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/switch.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> > index 09d9286b27cc..84ab2336131e 100644
> > --- a/net/dsa/switch.c
> > +++ b/net/dsa/switch.c
> > @@ -295,11 +295,20 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
> >                              struct dsa_notifier_vlan_info *info)
> >  {
> >       const struct switchdev_obj_port_vlan *vlan = info->vlan;
> > +     int port;
> >
> >       if (!ds->ops->port_vlan_del)
> >               return -EOPNOTSUPP;
> >
> > +     /* Build a mask of VLAN members */
> > +     bitmap_zero(ds->bitmap, ds->num_ports);
> >       if (ds->index == info->sw_index)
> > +             set_bit(info->port, ds->bitmap);
> > +     for (port = 0; port < ds->num_ports; port++)
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     set_bit(port, ds->bitmap);
> > +
> > +     for_each_set_bit(port, ds->bitmap, ds->num_ports)
> >               return ds->ops->port_vlan_del(ds, info->port, vlan);
>
> You return right away from the loop? You use info->port instead of port?
>
> >
> >       return 0;
>
> Even if you patch wasn't badly broken, "bridge vlan del" targeting a single
> switch port would also remove the VLAN from the CPU port and thus breaking
> offloaded 802.1q. It would also remove it from the DSA ports interconnecting
> multiple switches, thus breaking the 802.1q conduit for the whole fabric.
>
> So you're not fixing anything here, but you're breaking single-chip and
> cross-chip hardware VLAN. Seriously wtf is this patch?
>
> NAK!
>
>
>         Vivien

I can agree that this isn't one of my brightest moments. But at least
we get to see Cunningham's law in action :)
When dsa_8021q is cleaning up the switch's VLAN table for the bridge
to use it, it is good to really clean it up, i.e. not leave any VLAN
installed on the upstream ports.
But I think this is just an academical concern at this point. In
vlan_filtering mode, the CPU port will accept VLAN frames with the
dsa_8021q ID's, but they will eventually get dropped due to no
destination. The real breaker is the pvid change. If something like
patch 4/6 gets accepted I will drop this one.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Christophe de Dinechin @ 2019-08-20  9:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Alex Williamson, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia@nvidia.com, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668B6221E477A873688CDBD1AB0@AM0PR05MB4866.eurprd05.prod.outlook.com>


Parav Pandit writes:

> + Dave.
>
> Hi Jiri, Dave, Alex, Kirti, Cornelia,
>
> Please provide your feedback on it, how shall we proceed?
>
> Hence, I would like to discuss below options.
>
> Option-1: mdev index
> Introduce an optional mdev index/handle as u32 during mdev create time.
> User passes mdev index/handle as input.
>
> phys_port_name=mIndex=m%u
> mdev_index will be available in sysfs as mdev attribute for udev to name the mdev's netdev.
>
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID index=10 > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */
> mdev_netdev=enm10
>
> Pros:
> 1. mdevctl and any other existing tools are unaffected.
> 2. netdev stack, ovs and other switching platforms are unaffected.
> 3. achieves unique phys_port_name for representor netdev
> 4. achieves unique mdev eth netdev name for the mdev using udev/systemd extension.
> 5. Aligns well with mdev and netdev subsystem and similar to existing sriov bdf's.
>
> Option-2: shorter mdev name
> Extend mdev to have shorter mdev device name in addition to UUID.
> such as 'foo', 'bar'.
> Mdev will continue to have UUID.
> phys_port_name=mdev_name
>
> Pros:
> 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> It is common practice to upgrade iproute2 package along with the kernel.
> Similar practice to be done with mdevctl.
> 2. Newer users of mdevctl who wants to work with non_UUID names, will use newer mdevctl/tools.
> Cons:
> 1. Dual naming scheme of mdev might affect some of the existing tools.
> It's unclear how/if it actually affects.
> mdevctl [2] is very recently developed and can be enhanced for dual naming scheme.
>
> Option-3: mdev uuid alias
> Instead of shorter mdev name or mdev index, have alpha-numeric name alias.
> Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID alias=foo > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> examle netdevs:
> repnetdev = ens2f0_mfoo
> mdev_netdev=enmfoo
>
> Pros:
> 1. All same as option-1.
> 2. Doesn't affect existing mdev naming scheme.
> Cons:
> 1. Index scheme of option-1 is better which can number large number of mdevs with fewer characters, simplifying the management tool.

I believe that Alex pointed out another "Cons" to all three options,
which is that it forces user-space to resolve potential race conditions
when creating an index or short name or alias.

Also, what happens if `index=10` is not provided on the command-line?
Does that make the device unusable for your purpose?


--
Cheers,
Christophe de Dinechin (IRC c3d)

^ permalink raw reply

* Re: [PATCH] ipvs: change type of delta and previous_delta in ip_vs_seq.
From: Julian Anastasov @ 2019-08-20 10:00 UTC (permalink / raw)
  To: zhang kai; +Cc: Wensong Zhang, Simon Horman, lvs-devel, netdev
In-Reply-To: <20190820003718.GA16620@toolchain>


	Hello,

	Cc list trimmed...

On Tue, 20 Aug 2019, zhang kai wrote:

> In NAT forwarding mode, Applications may decrease the size of packets,
> and TCP sequences will get smaller, so both of variables will be negetive
> values in this case.

	As long as nobody cares about their sign, the type should not
matter. You can not solve all signed/unsigned mismatches with such
small patch. Or you are seeing some problem, may be in debug?

> 
> Signed-off-by: zhang kai <zhangkaiheb@126.com>
> ---
>  include/net/ip_vs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 3759167f91f5..de7e75063c7c 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -346,8 +346,8 @@ enum ip_vs_sctp_states {
>   */
>  struct ip_vs_seq {
>  	__u32			init_seq;	/* Add delta from this seq */
> -	__u32			delta;		/* Delta in sequence numbers */
> -	__u32			previous_delta;	/* Delta in sequence numbers
> +	__s32			delta;		/* Delta in sequence numbers */
> +	__s32			previous_delta;	/* Delta in sequence numbers
>  						 * before last resized pkt */
>  };
>  
> -- 
> 2.17.1

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:01 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Igor Russkikh, Andrew Lunn, Antoine Tenart, davem@davemloft.net,
	f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132959.GC8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:29:59PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 16:18:40 +0000, Igor Russkikh wrote:
> > On 13.08.2019 16:17, Andrew Lunn wrote:
> 
> > That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> 
> Agreed, I think an offload that cannot be disabled is quite problematic.
> 
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> +1
> 
> > 2) I think, Antoine, its not totally true that otherwise the user macsec API
> > will be broken/changed. netlink api is the same, the only thing we may want to
> > add is an optional parameter to force selection of SW macsec engine.
> 
> Yes, I think we need an offload on/off parameter (and IMO it should
> probably be off by default). Then, if offloading is requested but
> cannot be satisfied (unsupported key length, too many SAs, etc), or if
> incompatible settings are requested (mixing offloaded and
> non-offloaded SCs on a device that cannot do it), return an error.
> 
> If we also export that offload parameter during netlink dumps, we can
> inspect the state of the system, which helps for debugging.

So it seems the ability to enable or disable the offloading on a given
interface is the main missing feature. I'll add that, however I'll
probably (at least at first):

- Have the interface to be fully offloaded or fully handled in s/w (with
  errors being thrown if a given configuration isn't supported). Having
  both at the same time on a given interface would be tricky because of
  the MACsec validation parameter.

- Won't allow to enable/disable the offloading of there are rules in
  place, as we're not sure the same rules would be accepted by the other
  implementation.

I'm not sure if we should allow to mix the implementations on a given
physical interface (by having two MACsec interfaces attached) as the
validation would be impossible to do (we would have no idea if a
packet was correctly handled by the offloading part or just not being
a MACsec packet in the first place, in Rx).

I agree the offloading should be disabled by default, and only enabled
by an user explicitly.

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Igor Russkikh, Antoine Tenart, davem@davemloft.net,
	sd@queasysnail.net, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813162823.GH15047@lunn.ch>

Hi Andrew,

On Tue, Aug 13, 2019 at 06:28:23PM +0200, Andrew Lunn wrote:
> > 1) With current implementation it's impossible to install SW macsec engine onto
> > the device which supports HW offload. That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> It would also be nice to add extra information to the netlink API to
> indicate if HW or SW is being used. In other places where we offload
> to accelerators we have such additional information.

Agreed, in addition to being able to enable/disable the offloading we
should have a way to know if a MACsec interface is being offloaded or
not.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* [PATCH bpf-next] xsk: proper socket state check in xsk_poll
From: Björn Töpel @ 2019-08-20 10:04 UTC (permalink / raw)
  To: syzbot+c82697e3043781e08802, ast, daniel, netdev
  Cc: bjorn.topel, bpf, davem, hawk, jakub.kicinski, john.fastabend,
	jonathan.lemon, kafai, linux-kernel, magnus.karlsson,
	songliubraving, syzkaller-bugs, xdp-newbies, yhs, hdanton
In-Reply-To: <0000000000009167320590823a8c@google.com>

From: Björn Töpel <bjorn.topel@intel.com>

The poll() implementation for AF_XDP sockets did not perform the
proper state checks, prior accessing the socket umem. This patch fixes
that by performing a xsk_is_bound() check.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	return err;
 }
 
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	struct net_device *dev = READ_ONCE(xs->dev);
+
+	return dev && xs->state == XSK_BOUND;
+}
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 
-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(!(xs->dev->flags & IFF_UP)))
 		return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
 	struct net_device *dev = xs->dev;
 	struct xdp_umem *umem = xs->umem;
 
+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
 	if (umem->need_wakeup)
 		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
 						umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
 
-	if (!dev || xs->state != XSK_BOUND)
+	if (!xsk_is_bound(xs))
 		return;
 
 	xs->state = XSK_UNBOUND;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Antoine Tenart, Igor Russkikh, davem@davemloft.net,
	andrew@lunn.ch, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132500.GA8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:25:00PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 10:58:17 +0200, Antoine Tenart wrote:
> > 
> > As for the need for xmit / handle_frame ops (for a MAC w/ MACsec
> > offloading), I'd say the xmit / handle_frame ops of the real net device
> > driver could be used as the one of the MACsec virtual interface do not
> > do much (regardless of the implementation choice discussed above).
> 
> There's no "handle_frame" op on a real device. macsec_handle_frame is
> an rx_handler specificity that grabs packets from a real device and
> sends them into a virtual device stacked on top of it. A real device
> just hands packets over to the stack via NAPI.

Right, so if there is a need for a custom implementation of xmit /
hamdle_frame we could let the offloading driver provide it. I'll
probably let the first MAC implementation add it, as we agreed not to
add an API with no user (and mine is done in the PHY).

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 5/9] net: phy: add MACsec ops in phy_device
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, sd, andrew, hkallweit1, netdev,
	linux-kernel, thomas.petazzoni, alexandre.belloni, allan.nielsen,
	camelia.groza, Simon.Edelhaus
In-Reply-To: <1521a28b-a0af-b3fb-d1bf-af82ec2f3d47@gmail.com>

Hi Florian,

On Wed, Aug 14, 2019 at 04:15:03PM -0700, Florian Fainelli wrote:
> On 8/8/19 7:05 AM, Antoine Tenart wrote:
> > This patch adds a reference to MACsec ops in the phy_device, to allow
> > PHYs to support offloading MACsec operations. The phydev lock will be
> > held while calling those helpers.
> > 
> > Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> > ---
> >  include/linux/phy.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 462b90b73f93..6947a19587e4 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -22,6 +22,10 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/mod_devicetable.h>
> >  
> > +#ifdef CONFIG_MACSEC
> > +#include <net/macsec.h>
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> > +
> >  #include <linux/atomic.h>
> >  
> >  #define PHY_DEFAULT_FEATURES	(SUPPORTED_Autoneg | \
> > @@ -345,6 +349,7 @@ struct phy_c45_device_ids {
> >   * attached_dev: The attached enet driver's device instance ptr
> >   * adjust_link: Callback for the enet controller to respond to
> >   * changes in the link state.
> > + * macsec_ops: MACsec offloading ops.
> >   *
> >   * speed, duplex, pause, supported, advertising, lp_advertising,
> >   * and autoneg are used like in mii_if_info
> > @@ -438,6 +443,11 @@ struct phy_device {
> >  
> >  	void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier);
> >  	void (*adjust_link)(struct net_device *dev);
> > +
> > +#if defined(CONFIG_MACSEC)
> > +	/* MACsec management functions */
> > +	const struct macsec_ops *macsec_ops;
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> likewise.

I'll fix it.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* RE: [PATCH] can: flexcan: free error skb if enqueueing failed
From: Joakim Zhang @ 2019-08-20 10:12 UTC (permalink / raw)
  To: Sean Nyekjaer, Martin Hundebøll, Wolfgang Grandegger,
	Marc Kleine-Budde, linux-can@vger.kernel.org
  Cc: David S . Miller, netdev@vger.kernel.org
In-Reply-To: <6bddb702-e9ba-1c9e-7d7a-eb974d2e0fdd@geanix.com>


> -----Original Message-----
> From: Sean Nyekjaer <sean@geanix.com>
> Sent: 2019年8月20日 17:50
> To: Martin Hundebøll <martin@geanix.com>; Wolfgang Grandegger
> <wg@grandegger.com>; Marc Kleine-Budde <mkl@pengutronix.de>;
> linux-can@vger.kernel.org
> Cc: David S . Miller <davem@davemloft.net>; netdev@vger.kernel.org; Joakim
> Zhang <qiangqing.zhang@nxp.com>
> Subject: Re: [PATCH] can: flexcan: free error skb if enqueueing failed
> 
> CC'ing Joakim Zhang

Looks good, so add my tag:
Acked-by: Joakim Zhang <qiangqing.zhang@nxp.com>

Best Regards,
Joakim Zhang
> On 15/07/2019 20.53, Martin Hundebøll wrote:
> > If the call to can_rx_offload_queue_sorted() fails, the passed skb
> > isn't consumed, so the caller must do so.
> >
> > Fixes: 30164759db1b ("can: flexcan: make use of rx-offload's
> > irq_offload_fifo")
> > Signed-off-by: Martin Hundebøll <martin@geanix.com>
> > ---
> >   drivers/net/can/flexcan.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> > index 1c66fb2ad76b..21f39e805d42 100644
> > --- a/drivers/net/can/flexcan.c
> > +++ b/drivers/net/can/flexcan.c
> > @@ -688,7 +688,8 @@ static void flexcan_irq_bus_err(struct net_device
> *dev, u32 reg_esr)
> >   	if (tx_errors)
> >   		dev->stats.tx_errors++;
> >
> > -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> > +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> > +		kfree_skb(skb);
> >   }
> >
> >   static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> > @@ -732,7 +733,8 @@ static void flexcan_irq_state(struct net_device *dev,
> u32 reg_esr)
> >   	if (unlikely(new_state == CAN_STATE_BUS_OFF))
> >   		can_bus_off(dev);
> >
> > -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> > +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> > +		kfree_skb(skb);
> >   }
> >
> >   static inline struct flexcan_priv *rx_offload_to_priv(struct
> > can_rx_offload *offload)
> >

^ permalink raw reply

* Re: [PATCH 2/3] macb: Update compatibility string for SiFive FU540-C000
From: Nicolas.Ferre @ 2019-08-20 10:16 UTC (permalink / raw)
  To: schwab, paul.walmsley, davem, jakub.kicinski
  Cc: yash.shah, robh+dt, netdev, devicetree, linux-kernel, linux-riscv,
	mark.rutland, palmer, aou, ynezz, sachin.ghadi
In-Reply-To: <mvm5zmskxs3.fsf@suse.de>

On 20/08/2019 at 11:10, Andreas Schwab wrote:
> External E-Mail
> 
> 
> On Aug 13 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
> 
>> Dave, Nicolas,
>>
>> On Mon, 22 Jul 2019, Yash Shah wrote:
>>
>>> On Fri, Jul 19, 2019 at 5:36 PM <Nicolas.Ferre@microchip.com> wrote:
>>>>
>>>> On 19/07/2019 at 13:10, Yash Shah wrote:
>>>>> Update the compatibility string for SiFive FU540-C000 as per the new
>>>>> string updated in the binding doc.
>>>>> Reference: https://lkml.org/lkml/2019/7/17/200
>>>>
>>>> Maybe referring to lore.kernel.org is better:
>>>> https://lore.kernel.org/netdev/CAJ2_jOFEVZQat0Yprg4hem4jRrqkB72FKSeQj4p8P5KA-+rgww@mail.gmail.com/
>>>
>>> Sure. Will keep that in mind for future reference.
>>>
>>>>
>>>>> Signed-off-by: Yash Shah <yash.shah@sifive.com>
>>>>
>>>> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
>>>
>>> Thanks.
>>
>> Am assuming you'll pick this up for the -net tree for v5.4-rc1 or earlier.
>> If not, please let us know.
> 
> This is still missing in v5.4-rc5, which means that networking is broken.

Andreas, Paul,

The patchwork state for the 2 first patches of this series is "Changes 
Requested". It's probably due to my advice of using lore.kernel.org (or 
something else).

I'm perfectly fine in accepting the patches are they are today but can't 
change their patchwork state myself. We would need Dave or Jakub to take 
them.

Dave, Jakub,

All tags are collected in patchwork and all should be fine on DT (Rob) 
side as well.
Please tell me if you're waiting some sign from me.

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply

* Re: [PATCH net-next 0/6] Dynamic toggling of vlan_filtering for SJA1105 DSA
From: Vladimir Oltean @ 2019-08-20 10:18 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820040443.GB4919@t480s.localdomain>

On Tue, 20 Aug 2019 at 11:04, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> On Tue, 20 Aug 2019 02:59:56 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > This patchset addresses a few limitations in DSA and the bridge core
> > that made it impossible for this sequence of commands to work:
> >
> >   ip link add name br0 type bridge
> >   ip link set dev swp2 master br0
> >   echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> >
> > Only this sequence was previously working:
> >
> >   ip link add name br0 type bridge vlan_filtering 1
> >   ip link set dev swp2 master br0
>
> This is not quite true, these sequences of commands do "work". What I see
> though is that with the first sequence, the PVID 1 won't be programmed in
> the hardware. But the second sequence does program the hardware.
>
> But following bridge members will be correctly programmed with the VLAN
> though. The sequence below programs the hardware with VLAN 1 for swp3 as
> well as CPU and DSA ports, but not for swp2:
>
>     ip link add name br0 type bridge
>     ip link set dev swp2 master br0
>     echo 1 > /sys/class/net/br0/bridge/vlan_filtering
>     ip link set dev swp3 master br0
>
> This is unfortunately also true for any 802.1Q VLANs. For example, only VID
> 43 is programmed with the following sequence, but not VID 1 and VID 42:
>
>     ip link add name br0 type bridge
>     ip link set dev swp2 master br0
>     bridge vlan add dev swp2 vid 42
>     echo 1 > /sys/class/net/br0/bridge/vlan_filtering
>     bridge vlan add dev swp2 vid 43
>
> So I understand that because VLANs are not propagated by DSA to the hardware
> when VLAN filtering is disabled, a port may not be programmed with its
> bridge's default PVID, and this is causing a problem for tag_8021q.
>
> Please reword so that we understand better what is the issue being fixed here.
>

Not exactly, no.
The fact that a port in DSA is not programmed with the bridge's
default_pvid is just a fact.
Right now, letting the bridge install the default_pvid into the ports
even breaks dsa_8021q when enslaving the ports to a vlan_filtering=0
bridge, but that is perhaps only a timing issue and can be resolved in
other ways.

To understand my issue I need to make a side-note.
The sja1105 needs to be reset at runtime for changing some settings. I
am working on a separate patchset that tries to make those switch
resets as seamless as possible. The end goal is to not disrupt
linuxptp (ptp4l, phc2sys) operation when enabling the tc-taprio
offload. That means:
- saving and restoring the PTP time after reset
- preventing switch reset to happen during TX timestamping
- preventing any clock operation on the PTP timer during switch reset
Since toggling vlan_filtering also needs to reset the switch, I am
simply using that as a handy tool to test how seamless the switch
reset is.
But currently, toggling vlan_filtering breaks traffic, due to nobody
restoring the correct (from the bridge's perspective) pvid on the
ports. So I need to fix this to continue the testing.
On the user ports, I can just query the bridge, and that's exactly
what I'm doing.
On the CPU and DSA ports, I can't query the bridge because the bridge
knows nothing about them. And I also can't query the pvid from the DSA
core - it can change it, but not know what it is. So I need to be
proactive and make DSA avoid changing their pvid needlessly.
With this set of changes, the goal was for traffic to 'just work' when
changing the vlan_filtering setting back and forth. This would also
benefit other future users of dsa_8021q.
There is still work to be done. dsa_8021q uses VID range 1024-2559
privately. If the bridge had been using any VLAN in that range during
vlan_filtering=1, then dsa_8021q should also restore that VLAN.
Currently I'm not doing that because I'm still not clear whether it's
just as simple as calling br_vlan_get_info(slave, {rx,tx}_vid,
&vinfo); and adding that back (I don't completely understand the
interaction with the implicit rules on the CPU ports - I am concerned
that if the switch driver sees the VID is already installed on the CPU
port, it will just skip the command).
Ido Schimmel mentioned that static FDB entries with VLAN != PVID need
to be preserved. I am already doing something in that direction there
- in vlan_filtering=0 mode I am switching to shared VLAN learning
mode. This means the switch is ignoring the VLAN from the FDB entries,
and just uses the DMAC for L2 routing. When going back to the
vlan_filtering=1 mode, the VLAN from the static FDB entries goes back
in use. I think that's ok.
There is also stuff that cannot be reasonably expected to be preserved
across switch resets. Learned FDB entries is one thing. Traffic
(including regular, but not management, traffic originated from the
CPU) will be temporarily dropped.

> >
> > On SJA1105, the situation is further complicated by the fact that
> > toggling vlan_filtering is causing a switch reset. However, the hardware
> > state restoration logic is already there in the driver. It is a matter
> > of the layers above which need a few fixups.
> >
> > Also see this discussion thread:
> > https://www.spinics.net/lists/netdev/msg581042.html
> >
> > Patch 1/6 is not functionally related but also related to dsa_8021q
> > handling of VLANs and this is a good opportunity to bring up the subject
> > for discussion.
>
> So please send 1/6 as a separate patch and bring up the discussion there.
>

Ok.

>
> Thanks,
>
>         Vivien

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH net-next 4/6] net: dsa: Don't program the VLAN as pvid on the upstream port
From: Vladimir Oltean @ 2019-08-20 10:22 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820020709.GD975@t480s.localdomain>

Hi Vivien,

On Tue, 20 Aug 2019 at 09:07, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> On Tue, 20 Aug 2019 03:00:00 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Commit b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members")
> > programs the VLAN from the bridge into the specified port as well as the
> > upstream port, with the same set of flags.
> >
> > Consider the typical case of installing pvid 1 on user port 1, pvid 2 on
> > user port 2, etc. The upstream port would end up having a pvid equal to
> > the last user port whose pvid was programmed from the bridge. Less than
> > useful.
> >
> > So just don't change the pvid of the upstream port and let it be
> > whatever the driver set it internally to be.
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/switch.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> > index 84ab2336131e..02ccc53f1926 100644
> > --- a/net/dsa/switch.c
> > +++ b/net/dsa/switch.c
> > @@ -239,17 +239,21 @@ dsa_switch_vlan_prepare_bitmap(struct dsa_switch *ds,
> >                              const struct switchdev_obj_port_vlan *vlan,
> >                              const unsigned long *bitmap)
> >  {
> > +     struct switchdev_obj_port_vlan v = *vlan;
> >       int port, err;
> >
> >       if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
> >               return -EOPNOTSUPP;
> >
> >       for_each_set_bit(port, bitmap, ds->num_ports) {
> > -             err = dsa_port_vlan_check(ds, port, vlan);
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     v.flags &= ~BRIDGE_VLAN_INFO_PVID;
>
> So you keep the BRIDGE_VLAN_INFO_PVID flag cleared for all other ports that
> come after any CPU or DSA port?
>

It looks like the convenient hardware decision of making the CPU port
on my board also be the numerically highest one strikes again :)
I always find bugs when I change the CPU port to another number.
This is another example (also related to the inclusion of upstream
ports in the VLAN bitmap, like they all seem to be):
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d34d2baa9173f6e0c0f22d005d18e83d1cb54d8d

> > +
> > +             err = dsa_port_vlan_check(ds, port, &v);
> >               if (err)
> >                       return err;
> >
> > -             err = ds->ops->port_vlan_prepare(ds, port, vlan);
> > +             err = ds->ops->port_vlan_prepare(ds, port, &v);
> >               if (err)
> >                       return err;
> >       }
> > @@ -262,10 +266,14 @@ dsa_switch_vlan_add_bitmap(struct dsa_switch *ds,
> >                          const struct switchdev_obj_port_vlan *vlan,
> >                          const unsigned long *bitmap)
> >  {
> > +     struct switchdev_obj_port_vlan v = *vlan;
> >       int port;
> >
> > -     for_each_set_bit(port, bitmap, ds->num_ports)
> > -             ds->ops->port_vlan_add(ds, port, vlan);
> > +     for_each_set_bit(port, bitmap, ds->num_ports) {
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     v.flags &= ~BRIDGE_VLAN_INFO_PVID;
>
> Same here. Did you intend to initialize your switchdev_obj_port_vlan structure
> _within_ the for_each_set_bit loop maybe?
>

Thanks for pointing this out.

> > +             ds->ops->port_vlan_add(ds, port, &v);
> > +     }
> >  }
> >
> >  static int dsa_switch_vlan_add(struct dsa_switch *ds,
>
> Do you even test your patches?

No.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup
From: Sean Nyekjaer @ 2019-08-20 10:25 UTC (permalink / raw)
  To: Joakim Zhang, mkl@pengutronix.de, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx,
	Martin Hundebøll
In-Reply-To: <20190816081749.19300-2-qiangqing.zhang@nxp.com>



On 16/08/2019 10.20, Joakim Zhang wrote:
> As reproted by Sean Nyekjaer below:
> When suspending, when there is still can traffic on the interfaces the
> flexcan immediately wakes the platform again. As it should :-). But it
> throws this error msg:
> [ 3169.378661] PM: noirq suspend of devices failed
> 
> On the way down to suspend the interface that throws the error message does
> call flexcan_suspend but fails to call flexcan_noirq_suspend. That means the
> flexcan_enter_stop_mode is called, but on the way out of suspend the driver
> only calls flexcan_resume and skips flexcan_noirq_resume, thus it doesn't call
> flexcan_exit_stop_mode. This leaves the flexcan in stop mode, and with the
> current driver it can't recover from this even with a soft reboot, it requires
> a hard reboot.
> 
> The best way to exit stop mode is in Wake Up interrupt context, and then
> suspend() and resume() functions can be symmetric. However, stop mode
> request and ack will be controlled by SCU(System Control Unit) firmware(manage
> clock,power,stop mode, etc. by Cortex-M4 core) in coming i.MX8(QM/QXP). And SCU
> firmware interface can't be available in interrupt context.
> 
> For compatibillity, the wake up mechanism can't be symmetric, so we need
> in_stop_mode hack.
> 
> Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
> Reported-by: Sean Nyekjaer <sean@geanix.com>
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> 

Unfortunatly it's still possible to reproduce the deadlock with this 
patch...

[  689.921717] flexcan: probe of 2094000.flexcan failed with error -110

My test setup:
PC with CAN-USB dongle connected to can0 and can1.

PC:
$ while true; do cansend can0 '123#DEADBEEF'; done

iMX6ull:
root@iwg26:~# systemctl suspend 
 

[  365.858054] systemd[1]: Reached target Sleep.
root@iwg26:~# [  365.939826] systemd[1]: Starting Suspend...
[  366.115839] systemd-sleep[248]: Suspending system...
[  366.517949] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  366.518249] PM: Device 2094000.flexcan failed to suspend: error -110
[  366.518406] PM: Some devices failed to suspend, or early wake event 
detected
[  366.732162] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  366.732285] PM: Device 2090000.flexcan failed to suspend: error -110
[  366.732330] PM: Some devices failed to suspend, or early wake event 
detected
[  366.890637] systemd-sleep[248]: System resumed.
[  366.923062] systemd[1]: Started Suspend.
[  366.942819] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  366.954791] systemd[1]: Stopped target Sleep.
[  366.962402] systemd[1]: Reached target Suspend.
[  366.977546] systemd-logind[135]: Operation 'sleep' finished.
[  366.979194] systemd[1]: suspend.target: Unit not needed anymore. 
Stopping.
[  366.993831] systemd[1]: Stopped target Suspend.
[  367.139972] systemd-networkd[220]: usb0: Lost carrier
[  367.294077] systemd-networkd[220]: usb0: Gained carrier

root@iwg26:~# candump can0 | head -n 2 

   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2 

   can1  123   [4]  DE AD BE EF
   can1  123   [4]  DE AD BE EF
root@iwg26:~# systemctl suspend 

root@iwg26:~# [  385.106658] systemd[1]: Reached target Sleep.
[  385.147602] systemd[1]: Starting Suspend...
[  385.246421] systemd-sleep[260]: Suspending system...
[  385.634733] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  385.634855] PM: Device 2090000.flexcan failed to suspend: error -110
[  385.634897] PM: Some devices failed to suspend, or early wake event 
detected
[  385.856251] PM: noirq suspend of devices failed
[  385.998364] systemd-sleep[260]: System resumed.
[  386.023390] systemd[1]: Started Suspend.
[  386.031570] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  386.055886] systemd[1]: Stopped target Sleep.
[  386.061430] systemd[1]: Reached target Suspend.
[  386.066142] systemd[1]: suspend.target: Unit not needed anymore. 
Stopping.
[  386.112575] systemd-networkd[220]: usb0: Lost carrier
[  386.116797] systemd-logind[135]: Operation 'sleep' finished.
[  386.146161] systemd[1]: Stopped target Suspend.
[  386.260866] systemd-networkd[220]: usb0: Gained carrier
root@iwg26:~# candump can0 | head -n 2
   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2 

   can1  123   [4]  DE AD BE EF
   can1  123   [4]  DE AD BE EF
root@iwg26:~# systemctl suspend 

[  396.919303] systemd[1]: Reached target Sleep.
root@iwg26:~# [  396.964722] systemd[1]: Starting Suspend...
[  397.067336] systemd-sleep[268]: Suspending system...
[  397.574571] PM: noirq suspend of devices failed
[  397.834731] PM: noirq suspend of devices failed
[  397.807996] systemd-networkd[220]: usb0: Lost carrier
[  398.156295] dpm_run_callback(): platform_pm_suspend+0x0/0x5c returns -110
[  398.156339] PM: Device 2094000.flexcan failed to suspend: error -110
[  398.156509] PM: Some devices failed to suspend, or early wake event 
detected
[  398.053555] systemd-sleep[268]: Failed to write /sys/power/state: 
Device or resource busy
[  398.074751] systemd[1]: systemd-suspend.service: Main process exited, 
code=exited, status=1/FAILURE
[  398.076779] systemd[1]: systemd-suspend.service: Failed with result 
'exit-code'.
[  398.109255] systemd[1]: Failed to start Suspend.
[  398.118704] systemd[1]: Dependency failed for Suspend.
[  398.136283] systemd-logind[135]: Operation 'sleep' finished.
[  398.137770] systemd[1]: suspend.target: Job suspend.target/start 
failed with result 'dependency'.
[  398.139105] systemd[1]: sleep.target: Unit not needed anymore. Stopping.
[  398.167590] systemd[1]: Stopped target Sleep.
[  398.201558] systemd-networkd[220]: usb0: Gained carrier

root@iwg26:~# candump can0 | head -n 2
   can0  123   [4]  DE AD BE EF
   can0  123   [4]  DE AD BE EF
root@iwg26:~# candump can1 | head -n 2

nothing on can1 anymore :-(

root@iwg26:~# rmmod flexcan
[  622.884746] systemd-networkd[220]: can1: Lost carrier
[  623.046766] systemd-networkd[220]: can0: Lost carrier
root@iwg26:~# insmod /mnt/flexcan.ko
[  628.323981] flexcan 2094000.flexcan: registering netdev failed

and can1 fails to register with:
[  628.347485] flexcan: probe of 2094000.flexcan failed with error -110

/Sean

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox