Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6
From: David Ahern @ 2018-11-07 19:28 UTC (permalink / raw)
  To: Stefano Brivio, David S. Miller; +Cc: Sabrina Dubroca, Xin Long, netdev
In-Reply-To: <366b75ae560cc2d0b3a0f69b84d43b621c8fcce4.1541533786.git.sbrivio@redhat.com>

On 11/6/18 2:39 PM, Stefano Brivio wrote:
> Use a router between endpoints, implemented via namespaces, set a low MTU
> between router and destination endpoint, exceed it and check PMTU value in
> route exceptions.
> 
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> This only introduces tests over VxLAN over IPv6 right now. I'll introduce
> tests over IPv4 (they can be added trivially) once DF configuration support
> is accepted into iproute2.

you can add them now and wrapped in a 'does ip support the df option'
check. That is needed regardless of order (kernel vs iproute2).

^ permalink raw reply

* Re: [PATCH net-next 07/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv6
From: David Ahern @ 2018-11-07 19:28 UTC (permalink / raw)
  To: Stefano Brivio, David S. Miller; +Cc: Sabrina Dubroca, Xin Long, netdev
In-Reply-To: <30bc36eea991cd6f35418c9ddc4cf322ed99e6e9.1541533786.git.sbrivio@redhat.com>

On 11/6/18 2:39 PM, Stefano Brivio wrote:
> This only introduces tests over GENEVE over IPv6 right now. I'll introduce
> tests over IPv4 (they can be added trivially) once DF configuration support
> is accepted into iproute2.

same here.

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Edward Cree @ 2018-11-07 19:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin Lau, Yonghong Song, Alexei Starovoitov,
	daniel@iogearbox.net, netdev@vger.kernel.org, Kernel Team
In-Reply-To: <20181107005955.qwgbytqtziaggdve@ast-mbp>

On 07/11/18 00:59, Alexei Starovoitov wrote:
> Function name and function argument names are part of the same debug info.
> Splitting them makes no sense.
... except where combining them involves creating pain elsewhere.
Sure the function name *could* go in the type record, but there
 still needs to be a separate function record in a functions
 table (because types are not instances), and that being the
 case the latter _may as well_ be where the name lives so that
 multiple functions (and pointers to them) can all share the
 same type record when the param names etc. all match.

> struct name and struct field names live in the same BTF record.
No, the struct _type_ name lives in the same record as the field
 names, but an _instance_ name doesn't.  I.e. in
    struct foo {int x;} bar;
 the BTF type record holds the names 'foo' and 'x', but not 'bar'
 because that's not the name of the _type_.  Indeed there isn't
 room in the record for both 'foo' and 'bar' because there's only
 one name_off field for the type.
And I argue that the name of a function is more like 'bar' than
 'foo' here, not least from the point of view of which C namespace
 they occupy.

> Similarly function name and function argument names should be
> in the same BTF record, so we can reuse most of the BTF validation
> and BTF parsing logic by doing so.
I think it's incredibly short-sighted to focus on 'what can most
 easily be done with the existing implementation' when designing a
 file format which is intended to have 'long legs'.

> assembler is not a high level language.
I never said it was.
> I believe it's a proper trade-off to make C easier to use
> in expense of some ugliness in your ebpf_asm.
Please respond to the arguments I make, rather than unrelated
 arguments that you might imagine me making.  Asm is merely a
cause of my present interest in BTF, it is not the lens
 through which I see the whole thing.

> Let's keep 'nasty hack' claims out of this discussion.
> I find the current BTF design and KIND_FUNC addition to be elegant
> and appropriate.
Whereas I don't, and I don't feel like my core criticisms have
 been addressed _at all_.  The only answer I get to "BTF should
 store type and instance information in separate records" is
 "it's a debuginfo", no indication of why that's a meaningful
 noun let alone why it implies they should be conflated in the
 format.
And please explain what's "elegant" about how map types are
 currently handled.

> BTF is not *type* only format. It's debug info format.
> Trying to make BTF into type only is not going to work.
> It's already more than type only as I showed earlier.
Again, as I have *repeatedly* said, I am not trying to remove
 non-type information from BTF.  I am just trying to organise
 BTF to consist of separate _parts_ for types and instances,
 rather than forcing both into the same Procrustean bed.

(I don't feel like we're making progress in understanding one
 another here; maybe we should have a telephone discussion?
 Sadly I'm not going to Plumbers, else that would be the
 perfect place to thrash this out.)

-Ed

^ permalink raw reply

* [PATCH bpf-next 0/2] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-07 19:35 UTC (permalink / raw)
  To: netdev, linux-kselftest, ast, daniel, shuah
  Cc: jakub.kicinski, quentin.monnet, guro, jiong.wang, sdf,
	bhole_prashant_q7, john.fastabend, jbenc, treeze.taeung, yhs, osk,
	sandipan

This patch series adds support for loading and attaching flow dissector
programs from the bpftool:

* first patch fixes flow dissector section name in the selftests (so
  libbpf auto-detection works)
* second patch adds actual support to the bpftool

See second patch for the description/details.

Stanislav Fomichev (2):
  selftests/bpf: rename flow dissector section to flow_dissector
  bpftool: support loading flow dissector

 .../bpftool/Documentation/bpftool-prog.rst    |  16 ++-
 tools/bpf/bpftool/common.c                    |  32 +++--
 tools/bpf/bpftool/main.h                      |   1 +
 tools/bpf/bpftool/prog.c                      | 135 +++++++++++++++---
 tools/testing/selftests/bpf/bpf_flow.c        |   2 +-
 .../selftests/bpf/test_flow_dissector.sh      |   2 +-
 6 files changed, 143 insertions(+), 45 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply

* [PATCH bpf-next 1/2] selftests/bpf: rename flow dissector section to flow_dissector
From: Stanislav Fomichev @ 2018-11-07 19:35 UTC (permalink / raw)
  To: netdev, linux-kselftest, ast, daniel, shuah
  Cc: jakub.kicinski, quentin.monnet, guro, jiong.wang, sdf,
	bhole_prashant_q7, john.fastabend, jbenc, treeze.taeung, yhs, osk,
	sandipan
In-Reply-To: <20181107193552.77894-1-sdf@google.com>

Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/bpf_flow.c             | 2 +-
 tools/testing/selftests/bpf/test_flow_dissector.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_flow.c b/tools/testing/selftests/bpf/bpf_flow.c
index 107350a7821d..b9798f558ca7 100644
--- a/tools/testing/selftests/bpf/bpf_flow.c
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -116,7 +116,7 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
 	return BPF_DROP;
 }
 
-SEC("dissect")
+SEC("flow_dissector")
 int _dissect(struct __sk_buff *skb)
 {
 	if (!skb->vlan_present)
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh
index c0fb073b5eab..d23d4da66b83 100755
--- a/tools/testing/selftests/bpf/test_flow_dissector.sh
+++ b/tools/testing/selftests/bpf/test_flow_dissector.sh
@@ -59,7 +59,7 @@ else
 fi
 
 # Attach BPF program
-./flow_dissector_load -p bpf_flow.o -s dissect
+./flow_dissector_load -p bpf_flow.o -s flow_dissector
 
 # Setup
 tc qdisc add dev lo ingress
-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply related

* [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-07 19:35 UTC (permalink / raw)
  To: netdev, linux-kselftest, ast, daniel, shuah
  Cc: jakub.kicinski, quentin.monnet, guro, jiong.wang, sdf,
	bhole_prashant_q7, john.fastabend, jbenc, treeze.taeung, yhs, osk,
	sandipan
In-Reply-To: <20181107193552.77894-1-sdf@google.com>

This commit adds support for loading/attaching/detaching flow
dissector program. The structure of the flow dissector program is
assumed to be the same as in the selftests:

* flow_dissector section with the main entry point
* a bunch of tail call progs
* a jmp_table map that is populated with the tail call progs

When `bpftool load` is called with a flow_dissector prog (i.e. when the
first section is flow_dissector of 'type flow_dissector' argument is
passed), we load and pin all the programs and build the jump table.

The last argument of `bpftool attach` is made optional for this use
case.

Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
	/sys/fs/bpf/flow type flow_dissector
bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector

Tested by using the above two lines to load the prog in
the test_flow_dissector.sh selftest.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../bpftool/Documentation/bpftool-prog.rst    |  16 ++-
 tools/bpf/bpftool/common.c                    |  32 +++--
 tools/bpf/bpftool/main.h                      |   1 +
 tools/bpf/bpftool/prog.c                      | 135 +++++++++++++++---
 4 files changed, 141 insertions(+), 43 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ac4e904b10fb..3caa9153435b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -25,8 +25,8 @@ MAP COMMANDS
 |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
 |	**bpftool** **prog pin** *PROG* *FILE*
 |	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
-|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
-|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
+|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
 |	**bpftool** **prog help**
 |
 |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -39,7 +39,9 @@ MAP COMMANDS
 |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
 |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
 |	}
-|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
+|       *ATTACH_TYPE* := {
+|	|	**msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**
+|	}
 
 
 DESCRIPTION
@@ -97,13 +99,13 @@ DESCRIPTION
 		  contain a dot character ('.'), which is reserved for future
 		  extensions of *bpffs*.
 
-        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
+        **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
                   Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
-                  to the map *MAP*.
+                  to the optional map *MAP*.
 
-        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
+        **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
                   Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
-                  from the map *MAP*.
+                  from the optional map *MAP*.
 
 	**bpftool prog help**
 		  Print short help message.
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 25af85304ebe..963881142dfb 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -169,34 +169,24 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
 	return fd;
 }
 
-int do_pin_fd(int fd, const char *name)
+int mount_bpffs_for_pin(const char *name)
 {
 	char err_str[ERR_MAX_LEN];
 	char *file;
 	char *dir;
 	int err = 0;
 
-	err = bpf_obj_pin(fd, name);
-	if (!err)
-		goto out;
-
 	file = malloc(strlen(name) + 1);
 	strcpy(file, name);
 	dir = dirname(file);
 
-	if (errno != EPERM || is_bpffs(dir)) {
-		p_err("can't pin the object (%s): %s", name, strerror(errno));
+	if (is_bpffs(dir)) {
+		/* nothing to do if already mounted */
 		goto out_free;
 	}
 
-	/* Attempt to mount bpffs, then retry pinning. */
 	err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
-	if (!err) {
-		err = bpf_obj_pin(fd, name);
-		if (err)
-			p_err("can't pin the object (%s): %s", name,
-			      strerror(errno));
-	} else {
+	if (err) {
 		err_str[ERR_MAX_LEN - 1] = '\0';
 		p_err("can't mount BPF file system to pin the object (%s): %s",
 		      name, err_str);
@@ -204,10 +194,22 @@ int do_pin_fd(int fd, const char *name)
 
 out_free:
 	free(file);
-out:
 	return err;
 }
 
+int do_pin_fd(int fd, const char *name)
+{
+	int err = mount_bpffs_for_pin(name);
+
+	if (err) {
+		p_err("can't mount bpffs for pin %s: %s",
+		      name, strerror(errno));
+		return err;
+	}
+
+	return bpf_obj_pin(fd, name);
+}
+
 int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
 {
 	unsigned int id;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28322ace2856..1383824c9baf 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -129,6 +129,7 @@ const char *get_fd_type_name(enum bpf_obj_type type);
 char *get_fdinfo(int fd, const char *key);
 int open_obj_pinned(char *path);
 int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
+int mount_bpffs_for_pin(const char *name);
 int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
 int do_pin_fd(int fd, const char *name);
 
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5302ee282409..f3a07ec3a444 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
 	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
 	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
 	[BPF_SK_MSG_VERDICT] = "msg_verdict",
+	[BPF_FLOW_DISSECTOR] = "flow_dissector",
 	[__MAX_BPF_ATTACH_TYPE] = NULL,
 };
 
@@ -724,9 +725,10 @@ int map_replace_compar(const void *p1, const void *p2)
 static int do_attach(int argc, char **argv)
 {
 	enum bpf_attach_type attach_type;
-	int err, mapfd, progfd;
+	int err, progfd;
+	int mapfd = 0;
 
-	if (!REQ_ARGS(5)) {
+	if (!REQ_ARGS(3)) {
 		p_err("too few parameters for map attach");
 		return -EINVAL;
 	}
@@ -741,10 +743,11 @@ static int do_attach(int argc, char **argv)
 		return -EINVAL;
 	}
 	NEXT_ARG();
-
-	mapfd = map_parse_fd(&argc, &argv);
-	if (mapfd < 0)
-		return mapfd;
+	if (argc > 0) {
+		mapfd = map_parse_fd(&argc, &argv);
+		if (mapfd < 0)
+			return mapfd;
+	}
 
 	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
 	if (err) {
@@ -760,9 +763,10 @@ static int do_attach(int argc, char **argv)
 static int do_detach(int argc, char **argv)
 {
 	enum bpf_attach_type attach_type;
-	int err, mapfd, progfd;
+	int err, progfd;
+	int mapfd = 0;
 
-	if (!REQ_ARGS(5)) {
+	if (!REQ_ARGS(3)) {
 		p_err("too few parameters for map detach");
 		return -EINVAL;
 	}
@@ -777,10 +781,11 @@ static int do_detach(int argc, char **argv)
 		return -EINVAL;
 	}
 	NEXT_ARG();
-
-	mapfd = map_parse_fd(&argc, &argv);
-	if (mapfd < 0)
-		return mapfd;
+	if (argc > 0) {
+		mapfd = map_parse_fd(&argc, &argv);
+		if (mapfd < 0)
+			return mapfd;
+	}
 
 	err = bpf_prog_detach2(progfd, mapfd, attach_type);
 	if (err) {
@@ -792,6 +797,56 @@ static int do_detach(int argc, char **argv)
 		jsonw_null(json_wtr);
 	return 0;
 }
+
+/* Flow dissector consists of a main program and a jump table for each
+ * supported protocol. The assumption here is that the first prog is the main
+ * one and the other progs are used in the tail calls. In this routine we
+ * build the jump table for the non-main progs.
+ */
+static int build_flow_dissector_jmp_table(struct bpf_object *obj,
+					  struct bpf_program *prog,
+					  const char *jmp_table_map)
+{
+	struct bpf_map *jmp_table;
+	struct bpf_program *pos;
+	int i = 0;
+	int prog_fd, jmp_table_fd, fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		p_err("failed to get fd of main prog");
+		return prog_fd;
+	}
+
+	jmp_table = bpf_object__find_map_by_name(obj, jmp_table_map);
+	if (jmp_table == NULL) {
+		p_err("failed to find '%s' map", jmp_table_map);
+		return -1;
+	}
+
+	jmp_table_fd = bpf_map__fd(jmp_table);
+	if (jmp_table_fd < 0) {
+		p_err("failed to get fd of jmp_table");
+		return jmp_table_fd;
+	}
+
+	bpf_object__for_each_program(pos, obj) {
+		fd = bpf_program__fd(pos);
+		if (fd < 0) {
+			p_err("failed to get fd of '%s'",
+			      bpf_program__title(pos, false));
+			return fd;
+		}
+
+		if (fd != prog_fd) {
+			bpf_map_update_elem(jmp_table_fd, &i, &fd, BPF_ANY);
+			++i;
+		}
+	}
+
+	return 0;
+}
+
 static int do_load(int argc, char **argv)
 {
 	enum bpf_attach_type expected_attach_type;
@@ -800,7 +855,7 @@ static int do_load(int argc, char **argv)
 	};
 	struct map_replace *map_replace = NULL;
 	unsigned int old_map_fds = 0;
-	struct bpf_program *prog;
+	struct bpf_program *prog, *pos;
 	struct bpf_object *obj;
 	struct bpf_map *map;
 	const char *pinfile;
@@ -918,13 +973,19 @@ static int do_load(int argc, char **argv)
 		goto err_free_reuse_maps;
 	}
 
-	prog = bpf_program__next(NULL, obj);
+	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
+		/* for the flow dissector type, the entry point is in the
+		 * section flow_dissector; other progs are tail calls
+		 */
+		prog = bpf_object__find_program_by_title(obj, "flow_dissector");
+	} else {
+		prog = bpf_program__next(NULL, obj);
+	}
 	if (!prog) {
 		p_err("object file doesn't contain any bpf program");
 		goto err_close_obj;
 	}
 
-	bpf_program__set_ifindex(prog, ifindex);
 	if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
 		const char *sec_name = bpf_program__title(prog, false);
 
@@ -936,8 +997,13 @@ static int do_load(int argc, char **argv)
 			goto err_close_obj;
 		}
 	}
-	bpf_program__set_type(prog, attr.prog_type);
-	bpf_program__set_expected_attach_type(prog, expected_attach_type);
+
+	bpf_object__for_each_program(pos, obj) {
+		bpf_program__set_ifindex(pos, ifindex);
+		bpf_program__set_type(pos, attr.prog_type);
+		bpf_program__set_expected_attach_type(pos,
+						      expected_attach_type);
+	}
 
 	qsort(map_replace, old_map_fds, sizeof(*map_replace),
 	      map_replace_compar);
@@ -1001,8 +1067,34 @@ static int do_load(int argc, char **argv)
 		goto err_close_obj;
 	}
 
-	if (do_pin_fd(bpf_program__fd(prog), pinfile))
+	err = mount_bpffs_for_pin(pinfile);
+	if (err) {
+		p_err("failed to mount bpffs for pin '%s'", pinfile);
 		goto err_close_obj;
+	}
+
+	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
+		err = build_flow_dissector_jmp_table(obj, prog, "jmp_table");
+		if (err) {
+			p_err("failed to build flow dissector jump table");
+			goto err_close_obj;
+		}
+		/* flow dissector consist of multiple programs,
+		 * we want to pin them all
+		 */
+		err = bpf_object__pin(obj, pinfile);
+		if (err) {
+			p_err("failed to pin flow dissector object");
+			goto err_close_obj;
+		}
+	} else {
+		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
+		if (err) {
+			p_err("failed to pin program %s",
+			      bpf_program__title(prog, false));
+			goto err_close_obj;
+		}
+	}
 
 	if (json_output)
 		jsonw_null(json_wtr);
@@ -1037,8 +1129,8 @@ static int do_help(int argc, char **argv)
 		"       %s %s pin   PROG FILE\n"
 		"       %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
 		"                         [map { idx IDX | name NAME } MAP]\n"
-		"       %s %s attach PROG ATTACH_TYPE MAP\n"
-		"       %s %s detach PROG ATTACH_TYPE MAP\n"
+		"       %s %s attach PROG ATTACH_TYPE [MAP]\n"
+		"       %s %s detach PROG ATTACH_TYPE [MAP]\n"
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_MAP "\n"
@@ -1050,7 +1142,8 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
 		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
 		"                 cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
-		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
+		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse |\n"
+		"                        flow_dissector }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
-- 
2.19.1.930.g4563a0d9d0-goog

^ permalink raw reply related

* [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine
From: Heiner Kallweit @ 2018-11-07 19:41 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org

This patch series is based on two axioms:

- During autoneg a PHY always reports the link being down

- Info in clause 22/45 registers doesn't allow to differentiate between
  these two states:
  1. Link is physically down
  2. A link partner is connected and PHY is autonegotiating
  In both cases "link up" and "aneg finished" bits aren't set.
  One consequence is that having separate states PHY_NOLINK and PHY_AN
  isn't needed.

By using these two axioms the state machine can be significantly
simplified.

Heiner Kallweit (5):
  net: phy: remove useless check in state machine case PHY_NOLINK
  net: phy: remove useless check in state machine case PHY_RESUMING
  net: phy: add phy_check_link_status
  net: phy: remove state PHY_AN
  net: phy: use phy_check_link_status in more places in the state machine

 drivers/net/phy/phy.c | 172 +++++++++++-------------------------------
 include/linux/phy.h   |  19 +----
 2 files changed, 46 insertions(+), 145 deletions(-)

-- 
2.19.1

^ permalink raw reply

* Re: [PATCH net-next 0/7] nfp: more set actions and notifier refactor
From: David Miller @ 2018-11-07 19:46 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20181107010734.29935-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue,  6 Nov 2018 17:07:27 -0800

> This series brings updates to flower offload code.  First Pieter adds
> support for setting TTL, ToS, Flow Label and Hop Limit fields in IPv4
> and IPv6 headers.
> 
> Remaining 5 patches deal with factoring out netdev notifiers from flower
> code.  We already have two instances, and more is coming, so it's time
> to move to one central notifier which then feeds individual feature
> handlers.
> 
> I start that part by cleaning up the existing notifiers.  Next a central
> notifier is added, and used by flower offloads.

Looks good, series applied, thanks Jakub.

^ permalink raw reply

* [PATCH net-next 1/5] net: phy: remove useless check in state machine case PHY_NOLINK
From: Heiner Kallweit @ 2018-11-07 19:43 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

If aneg is enabled and the PHY reports the link as up then definitely
aneg finished successfully. Therefore this check is useless and
can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 476578746..87c6d304c 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -970,17 +970,6 @@ void phy_state_machine(struct work_struct *work)
 			break;
 
 		if (phydev->link) {
-			if (AUTONEG_ENABLE == phydev->autoneg) {
-				err = phy_aneg_done(phydev);
-				if (err < 0)
-					break;
-
-				if (!err) {
-					phydev->state = PHY_AN;
-					phydev->link_timeout = PHY_AN_TIMEOUT;
-					break;
-				}
-			}
 			phydev->state = PHY_RUNNING;
 			phy_link_up(phydev);
 		}
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 2/5] net: phy: remove useless check in state machine case PHY_RESUMING
From: Heiner Kallweit @ 2018-11-07 19:44 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

If aneg isn't finished yet then the PHY reports the link as down.
There's no benefit in setting the state to PHY_AN because the next
state machine run would set the status to PHY_NOLINK anyway (except
in the meantime aneg has been finished and link is up). Therefore
we can set the state to PHY_RUNNING or PHY_NOLINK directly.

In addition change the do_carrier parameter in phy_link_down() to true.
If carrier was marked as up before (what should never be the case because
PHY was in state PHY_HALTED before) then we should mark it as down now.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 87c6d304c..14dffa0da 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1022,17 +1022,6 @@ void phy_state_machine(struct work_struct *work)
 		}
 		break;
 	case PHY_RESUMING:
-		if (AUTONEG_ENABLE == phydev->autoneg) {
-			err = phy_aneg_done(phydev);
-			if (err < 0) {
-				break;
-			} else if (!err) {
-				phydev->state = PHY_AN;
-				phydev->link_timeout = PHY_AN_TIMEOUT;
-				break;
-			}
-		}
-
 		err = phy_read_status(phydev);
 		if (err)
 			break;
@@ -1042,7 +1031,7 @@ void phy_state_machine(struct work_struct *work)
 			phy_link_up(phydev);
 		} else	{
 			phydev->state = PHY_NOLINK;
-			phy_link_down(phydev, false);
+			phy_link_down(phydev, true);
 		}
 		break;
 	}
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 3/5] net: phy: add phy_check_link_status
From: Heiner Kallweit @ 2018-11-07 19:45 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

In few places in the state machine the state is set to PHY_RUNNING or
PHY_NOLINK after doing a phy_read_status(). So factor this out to
phy_check_link_status().

First use it in phy_start_aneg(): By setting the state to PHY_RUNNING
or PHY_NOLINK directly we can remove the code to handle the case that
we're using interrupts and aneg was finished already.

Definition of phy_link_up and phy_link_down needs to be moved because
they are called in the new function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 70 ++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 14dffa0da..87ed00030 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -62,6 +62,17 @@ static const char *phy_state_to_str(enum phy_state st)
 	return NULL;
 }
 
+static void phy_link_up(struct phy_device *phydev)
+{
+	phydev->phy_link_change(phydev, true, true);
+	phy_led_trigger_change_speed(phydev);
+}
+
+static void phy_link_down(struct phy_device *phydev, bool do_carrier)
+{
+	phydev->phy_link_change(phydev, false, do_carrier);
+	phy_led_trigger_change_speed(phydev);
+}
 
 /**
  * phy_print_status - Convenience function to print out the current phy status
@@ -493,6 +504,34 @@ static int phy_config_aneg(struct phy_device *phydev)
 	return genphy_config_aneg(phydev);
 }
 
+/**
+ * phy_check_link_status - check link status and set state accordingly
+ * @phydev: the phy_device struct
+ *
+ * Description: Check for link and whether autoneg was triggered / is running
+ * and set state accordingly
+ */
+static int phy_check_link_status(struct phy_device *phydev)
+{
+	int err;
+
+	WARN_ON(!mutex_is_locked(&phydev->lock));
+
+	err = phy_read_status(phydev);
+	if (err)
+		return err;
+
+	if (phydev->link && phydev->state != PHY_RUNNING) {
+		phydev->state = PHY_RUNNING;
+		phy_link_up(phydev);
+	} else if (!phydev->link && phydev->state != PHY_NOLINK) {
+		phydev->state = PHY_NOLINK;
+		phy_link_down(phydev, true);
+	}
+
+	return 0;
+}
+
 /**
  * phy_start_aneg - start auto-negotiation for this PHY device
  * @phydev: the phy_device struct
@@ -504,7 +543,6 @@ static int phy_config_aneg(struct phy_device *phydev)
  */
 int phy_start_aneg(struct phy_device *phydev)
 {
-	bool trigger = 0;
 	int err;
 
 	if (!phydev->drv)
@@ -524,32 +562,16 @@ int phy_start_aneg(struct phy_device *phydev)
 
 	if (phydev->state != PHY_HALTED) {
 		if (AUTONEG_ENABLE == phydev->autoneg) {
-			phydev->state = PHY_AN;
-			phydev->link_timeout = PHY_AN_TIMEOUT;
+			err = phy_check_link_status(phydev);
 		} else {
 			phydev->state = PHY_FORCING;
 			phydev->link_timeout = PHY_FORCE_TIMEOUT;
 		}
 	}
 
-	/* Re-schedule a PHY state machine to check PHY status because
-	 * negotiation may already be done and aneg interrupt may not be
-	 * generated.
-	 */
-	if (!phy_polling_mode(phydev) && phydev->state == PHY_AN) {
-		err = phy_aneg_done(phydev);
-		if (err > 0) {
-			trigger = true;
-			err = 0;
-		}
-	}
-
 out_unlock:
 	mutex_unlock(&phydev->lock);
 
-	if (trigger)
-		phy_trigger_machine(phydev);
-
 	return err;
 }
 EXPORT_SYMBOL(phy_start_aneg);
@@ -893,18 +915,6 @@ void phy_start(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_start);
 
-static void phy_link_up(struct phy_device *phydev)
-{
-	phydev->phy_link_change(phydev, true, true);
-	phy_led_trigger_change_speed(phydev);
-}
-
-static void phy_link_down(struct phy_device *phydev, bool do_carrier)
-{
-	phydev->phy_link_change(phydev, false, do_carrier);
-	phy_led_trigger_change_speed(phydev);
-}
-
 /**
  * phy_state_machine - Handle the state machine
  * @work: work_struct that describes the work to be done
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 4/5] net: phy: remove state PHY_AN
From: Heiner Kallweit @ 2018-11-07 19:46 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

After the recent changes in the state machine state PHY_AN isn't used
any longer and can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 27 ---------------------------
 include/linux/phy.h   | 19 +------------------
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 87ed00030..226824804 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -50,7 +50,6 @@ static const char *phy_state_to_str(enum phy_state st)
 	PHY_STATE_STR(READY)
 	PHY_STATE_STR(PENDING)
 	PHY_STATE_STR(UP)
-	PHY_STATE_STR(AN)
 	PHY_STATE_STR(RUNNING)
 	PHY_STATE_STR(NOLINK)
 	PHY_STATE_STR(FORCING)
@@ -944,32 +943,6 @@ void phy_state_machine(struct work_struct *work)
 	case PHY_UP:
 		needs_aneg = true;
 
-		phydev->link_timeout = PHY_AN_TIMEOUT;
-
-		break;
-	case PHY_AN:
-		err = phy_read_status(phydev);
-		if (err < 0)
-			break;
-
-		/* If the link is down, give up on negotiation for now */
-		if (!phydev->link) {
-			phydev->state = PHY_NOLINK;
-			phy_link_down(phydev, true);
-			break;
-		}
-
-		/* Check if negotiation is done.  Break if there's an error */
-		err = phy_aneg_done(phydev);
-		if (err < 0)
-			break;
-
-		/* If AN is done, we're running */
-		if (err > 0) {
-			phydev->state = PHY_RUNNING;
-			phy_link_up(phydev);
-		} else if (0 == phydev->link_timeout--)
-			needs_aneg = true;
 		break;
 	case PHY_NOLINK:
 		if (!phy_polling_mode(phydev))
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 9e4d49ef4..2090277ea 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -178,7 +178,6 @@ static inline const char *phy_modes(phy_interface_t interface)
 #define PHY_INIT_TIMEOUT	100000
 #define PHY_STATE_TIME		1
 #define PHY_FORCE_TIMEOUT	10
-#define PHY_AN_TIMEOUT		10
 
 #define PHY_MAX_ADDR	32
 
@@ -297,24 +296,10 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
  *
  * UP: The PHY and attached device are ready to do work.
  * Interrupts should be started here.
- * - timer moves to AN
- *
- * AN: The PHY is currently negotiating the link state.  Link is
- * therefore down for now.  phy_timer will set this state when it
- * detects the state is UP.  config_aneg will set this state
- * whenever called with phydev->autoneg set to AUTONEG_ENABLE.
- * - If autonegotiation finishes, but there's no link, it sets
- *   the state to NOLINK.
- * - If aneg finishes with link, it sets the state to RUNNING,
- *   and calls adjust_link
- * - If autonegotiation did not finish after an arbitrary amount
- *   of time, autonegotiation should be tried again if the PHY
- *   supports "magic" autonegotiation (back to AN)
- * - If it didn't finish, and no magic_aneg, move to FORCING.
+ * - timer moves to NOLINK or RUNNING
  *
  * NOLINK: PHY is up, but not currently plugged in.
  * - If the timer notes that the link comes back, we move to RUNNING
- * - config_aneg moves to AN
  * - phy_stop moves to HALTED
  *
  * FORCING: PHY is being configured with forced settings
@@ -329,7 +314,6 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
  *   link state is polled every other cycle of this state machine,
  *   which makes it every other second)
  * - irq will set CHANGELINK
- * - config_aneg will set AN
  * - phy_stop moves to HALTED
  *
  * CHANGELINK: PHY experienced a change in link state
@@ -353,7 +337,6 @@ enum phy_state {
 	PHY_READY,
 	PHY_PENDING,
 	PHY_UP,
-	PHY_AN,
 	PHY_RUNNING,
 	PHY_NOLINK,
 	PHY_FORCING,
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 5/5] net: phy: use phy_check_link_status in more places in the state machine
From: Heiner Kallweit @ 2018-11-07 19:47 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

Use phy_check_link_status in more places in the state machine.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 53 ++++---------------------------------------
 1 file changed, 5 insertions(+), 48 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 226824804..dd5bff955 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -945,17 +945,13 @@ void phy_state_machine(struct work_struct *work)
 
 		break;
 	case PHY_NOLINK:
+	case PHY_RUNNING:
 		if (!phy_polling_mode(phydev))
 			break;
-
-		err = phy_read_status(phydev);
-		if (err)
-			break;
-
-		if (phydev->link) {
-			phydev->state = PHY_RUNNING;
-			phy_link_up(phydev);
-		}
+		/* fall through */
+	case PHY_CHANGELINK:
+	case PHY_RESUMING:
+		err = phy_check_link_status(phydev);
 		break;
 	case PHY_FORCING:
 		err = genphy_update_link(phydev);
@@ -971,32 +967,6 @@ void phy_state_machine(struct work_struct *work)
 			phy_link_down(phydev, false);
 		}
 		break;
-	case PHY_RUNNING:
-		if (!phy_polling_mode(phydev))
-			break;
-
-		err = phy_read_status(phydev);
-		if (err)
-			break;
-
-		if (!phydev->link) {
-			phydev->state = PHY_NOLINK;
-			phy_link_down(phydev, true);
-		}
-		break;
-	case PHY_CHANGELINK:
-		err = phy_read_status(phydev);
-		if (err)
-			break;
-
-		if (phydev->link) {
-			phydev->state = PHY_RUNNING;
-			phy_link_up(phydev);
-		} else {
-			phydev->state = PHY_NOLINK;
-			phy_link_down(phydev, true);
-		}
-		break;
 	case PHY_HALTED:
 		if (phydev->link) {
 			phydev->link = 0;
@@ -1004,19 +974,6 @@ void phy_state_machine(struct work_struct *work)
 			do_suspend = true;
 		}
 		break;
-	case PHY_RESUMING:
-		err = phy_read_status(phydev);
-		if (err)
-			break;
-
-		if (phydev->link) {
-			phydev->state = PHY_RUNNING;
-			phy_link_up(phydev);
-		} else	{
-			phydev->state = PHY_NOLINK;
-			phy_link_down(phydev, true);
-		}
-		break;
 	}
 
 	mutex_unlock(&phydev->lock);
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine
From: Andrew Lunn @ 2018-11-07 19:48 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com>

On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
> This patch series is based on two axioms:
> 
> - During autoneg a PHY always reports the link being down

Hi Heiner

I think that is a risky assumption to make.

What happens if this assumption is incorrect?

     Andrew

^ permalink raw reply

* Re: [PATCH net-next 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6
From: David Miller @ 2018-11-07 19:48 UTC (permalink / raw)
  To: dsahern; +Cc: sbrivio, sd, lucien.xin, netdev
In-Reply-To: <3f6578a6-623b-a5cc-68fd-29b25a4585e8@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Wed, 7 Nov 2018 12:28:21 -0700

> you can add them now and wrapped in a 'does ip support the df option'
> check. That is needed regardless of order (kernel vs iproute2).

Indeed, that's a good point.

^ permalink raw reply

* Re: [PATCH net-next 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6
From: Stefano Brivio @ 2018-11-07 19:54 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Sabrina Dubroca, Xin Long, netdev
In-Reply-To: <3f6578a6-623b-a5cc-68fd-29b25a4585e8@gmail.com>

On Wed, 7 Nov 2018 12:28:21 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 11/6/18 2:39 PM, Stefano Brivio wrote:
> > Use a router between endpoints, implemented via namespaces, set a low MTU
> > between router and destination endpoint, exceed it and check PMTU value in
> > route exceptions.
> > 
> > Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
> > Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> > ---
> > This only introduces tests over VxLAN over IPv6 right now. I'll introduce
> > tests over IPv4 (they can be added trivially) once DF configuration support
> > is accepted into iproute2.  
> 
> you can add them now and wrapped in a 'does ip support the df option'
> check. That is needed regardless of order (kernel vs iproute2).

True, I thought about that, but then I also thought that if we end up
with a different syntax for the iproute2 command, that becomes ugly.

Then yes, the check would be there -- it's actually already there, that
|| return 1 after the ip-link command.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH net-next 00/10] udp: implement GRO support
From: Willem de Bruijn @ 2018-11-07 19:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David Miller, Willem de Bruijn,
	steffen.klassert, Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1541588248.git.pabeni@redhat.com>

On Wed, Nov 7, 2018 at 5:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> This series implements GRO support for UDP sockets, as the RX counterpart
> of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> The core functionality is implemented by the second patch, introducing a new
> sockopt to enable UDP_GRO, while patch 3 implements support for passing the
> segment size to the user space via a new cmsg.
> UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
> directed to UDP GRO enabled sockets with constant l4 tuple.
>
> UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
> rules, and that could potentially confuse existing applications.
>
> The solution adopted here is to de-segment the GRO packet before enqueuing
> as needed. Since we must cope with packet reinsertion after de-segmentation,
> the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
> to UDP usage.
>
> While the current code can probably be improved, this safeguard ,implemented in
> the patches 4-7, allows future enachements to enable UDP GSO offload on more
> virtual devices eventually even on forwarded packets.
>
> The last 4 for patches implement some performance and functional self-tests,
> re-using the existing udpgso infrastructure. The problematic scenario described
> above is explicitly tested.
>
> This revision of the series try to address the feedback provided by Willem and
> Subash on previous iteration.
>
> Paolo Abeni (10):
>   udp: implement complete book-keeping for encap_needed
>   udp: implement GRO for plain UDP sockets.
>   udp: add support for UDP_GRO cmsg
>   ip: factor out protocol delivery helper
>   ipv6: factor out protocol delivery helper
>   udp: cope with UDP GRO packet misdirection
>   selftests: add GRO support to udp bench rx program
>   selftests: add dummy xdp test helper
>   selftests: add some benchmark for UDP GRO
>   selftests: add functionals test for UDP GRO
>
>  include/linux/udp.h                           |  25 ++-
>  include/net/ip.h                              |   1 +
>  include/net/ipv6.h                            |   2 +
>  include/net/udp.h                             |  45 ++++-
>  include/net/udp_tunnel.h                      |   6 +
>  include/uapi/linux/udp.h                      |   1 +
>  net/ipv4/ip_input.c                           |  73 ++++----
>  net/ipv4/udp.c                                |  54 +++++-
>  net/ipv4/udp_offload.c                        | 109 +++++++++---
>  net/ipv6/ip6_input.c                          |  28 ++--
>  net/ipv6/udp.c                                |  41 ++++-
>  net/ipv6/udp_offload.c                        |   6 +-
>  tools/testing/selftests/bpf/Makefile          |   3 +-
>  tools/testing/selftests/bpf/xdp_dummy.c       |  13 ++
>  tools/testing/selftests/net/Makefile          |   1 +
>  tools/testing/selftests/net/udpgro.sh         | 148 +++++++++++++++++
>  tools/testing/selftests/net/udpgro_bench.sh   |  95 +++++++++++
>  tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
>  tools/testing/selftests/net/udpgso_bench_rx.c | 156 ++++++++++++++++--
>  tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
>  20 files changed, 708 insertions(+), 123 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/xdp_dummy.c
>  create mode 100755 tools/testing/selftests/net/udpgro.sh
>  create mode 100755 tools/testing/selftests/net/udpgro_bench.sh
>
> --
> 2.17.2
>

For the series:

Acked-by: Willem de Bruijn <willemb@google.com>

(Let me know if I need to Ack each patch directly)

Looks great. Thanks for addressing all comments, Paolo. I applied
the series mbox from patchwork to the same commit as the previous
RFCs to be able to incrementally review with git diff.

^ permalink raw reply

* Re: [PATCH net-next 00/10] udp: implement GRO support
From: David Miller @ 2018-11-07 20:03 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: pabeni, netdev, willemb, steffen.klassert, subashab
In-Reply-To: <CAF=yD-JG2wsRgAMW7Vv10yQdSDPGAg0akDJwDY1t+ySwQC-L9A@mail.gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 7 Nov 2018 13:56:52 -0600

> For the series:
> 
> Acked-by: Willem de Bruijn <willemb@google.com>
> 
> (Let me know if I need to Ack each patch directly)

You do not.

This is why I like header postings so much, reviewers can just
reply to it to ACK the entire series.

^ permalink raw reply

* Re: [PATCH iproute2 net-next 1/2] iplink_vxlan: Add DF configuration
From: David Ahern @ 2018-11-07 20:03 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, Stephen Hemminger
In-Reply-To: <d490d1617f253a974e17a9d2cba42c75435e05c2.1541535725.git.sbrivio@redhat.com>

On 11/6/18 2:39 PM, Stefano Brivio wrote:
> diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
> index 7fc0e2b4eb06..86afbe1334f0 100644
> --- a/ip/iplink_vxlan.c
> +++ b/ip/iplink_vxlan.c
> @@ -31,6 +31,7 @@ static void print_explain(FILE *f)
>  		"                 [ local ADDR ]\n"
>  		"                 [ ttl TTL ]\n"
>  		"                 [ tos TOS ]\n"
> +		"                 [ df DF ]\n"
>  		"                 [ flowlabel LABEL ]\n"
>  		"                 [ dev PHYS_DEV ]\n"
>  		"                 [ dstport PORT ]\n"

Since it is the df bit, that user option seems fine to me. Should be ok
to use that for your probe on iproute2 support.

That said, the man-page update should spell out what df refers to.

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 20:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kselftest, ast, daniel, shuah, quentin.monnet, guro,
	jiong.wang, bhole_prashant_q7, john.fastabend, jbenc,
	treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107193552.77894-3-sdf@google.com>

On Wed,  7 Nov 2018 11:35:52 -0800, Stanislav Fomichev wrote:
> This commit adds support for loading/attaching/detaching flow
> dissector program. The structure of the flow dissector program is
> assumed to be the same as in the selftests:
> 
> * flow_dissector section with the main entry point
> * a bunch of tail call progs
> * a jmp_table map that is populated with the tail call progs
> 
> When `bpftool load` is called with a flow_dissector prog (i.e. when the
> first section is flow_dissector of 'type flow_dissector' argument is
> passed), we load and pin all the programs and build the jump table.
> 
> The last argument of `bpftool attach` is made optional for this use
> case.
> 
> Example:
> bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
> 	/sys/fs/bpf/flow type flow_dissector
> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector
> 
> Tested by using the above two lines to load the prog in
> the test_flow_dissector.sh selftest.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  .../bpftool/Documentation/bpftool-prog.rst    |  16 ++-
>  tools/bpf/bpftool/common.c                    |  32 +++--
>  tools/bpf/bpftool/main.h                      |   1 +
>  tools/bpf/bpftool/prog.c                      | 135 +++++++++++++++---

Please add the new attach type to bash completions.

>  4 files changed, 141 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> index ac4e904b10fb..3caa9153435b 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> @@ -25,8 +25,8 @@ MAP COMMANDS
>  |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
>  |	**bpftool** **prog pin** *PROG* *FILE*
>  |	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> -|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
> -|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
> +|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>  |	**bpftool** **prog help**
>  |
>  |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> @@ -39,7 +39,9 @@ MAP COMMANDS
>  |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
>  |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
>  |	}
> -|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
> +|       *ATTACH_TYPE* := {
> +|	|	**msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**
> +|	}
>  
>  
>  DESCRIPTION
> @@ -97,13 +99,13 @@ DESCRIPTION
>  		  contain a dot character ('.'), which is reserved for future
>  		  extensions of *bpffs*.
>  
> -        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
> +        **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>                    Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> -                  to the map *MAP*.
> +                  to the optional map *MAP*.

Perhaps we can do better on help?  Attach BPF program *PROG* (with type
specified by *ATTACH_TYPE*).  Most *ATTACH_TYPEs* require a *MAP*
parameter, with the exception of *flow_dissector* which is attached to
current networking name space.

> -        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +        **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>                    Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> -                  from the map *MAP*.
> +                  from the optional map *MAP*.
>  
>  	**bpftool prog help**
>  		  Print short help message.
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index 25af85304ebe..963881142dfb 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c

> @@ -204,10 +194,22 @@ int do_pin_fd(int fd, const char *name)
>  
>  out_free:
>  	free(file);
> -out:
>  	return err;
>  }
>  
> +int do_pin_fd(int fd, const char *name)
> +{
> +	int err = mount_bpffs_for_pin(name);

Please don't initialize the error variable with a non-trivial function
call.

> +	if (err) {
> +		p_err("can't mount bpffs for pin %s: %s",
> +		      name, strerror(errno));

I think mount_bpffs_for_pin() will already print an error.  We can't
print two errors, because it will break JSON output.

> +		return err;
> +	}
> +
> +	return bpf_obj_pin(fd, name);
> +}
> +
>  int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
>  {
>  	unsigned int id;

> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 5302ee282409..f3a07ec3a444 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
>  	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
>  	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
>  	[BPF_SK_MSG_VERDICT] = "msg_verdict",
> +	[BPF_FLOW_DISSECTOR] = "flow_dissector",
>  	[__MAX_BPF_ATTACH_TYPE] = NULL,
>  };
>  
> @@ -724,9 +725,10 @@ int map_replace_compar(const void *p1, const void *p2)
>  static int do_attach(int argc, char **argv)
>  {
>  	enum bpf_attach_type attach_type;
> -	int err, mapfd, progfd;
> +	int err, progfd;
> +	int mapfd = 0;
>  
> -	if (!REQ_ARGS(5)) {
> +	if (!REQ_ARGS(3)) {
>  		p_err("too few parameters for map attach");
>  		return -EINVAL;
>  	}
> @@ -741,10 +743,11 @@ static int do_attach(int argc, char **argv)
>  		return -EINVAL;
>  	}
>  	NEXT_ARG();
> -
> -	mapfd = map_parse_fd(&argc, &argv);
> -	if (mapfd < 0)
> -		return mapfd;
> +	if (argc > 0) {

Flow dissector can't need a map right?  I think explicitly checking for
the correct number of arguments once attach type is known would be good.

> +		mapfd = map_parse_fd(&argc, &argv);
> +		if (mapfd < 0)
> +			return mapfd;
> +	}
>  
>  	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
>  	if (err) {
> @@ -760,9 +763,10 @@ static int do_attach(int argc, char **argv)
>  static int do_detach(int argc, char **argv)
>  {
>  	enum bpf_attach_type attach_type;
> -	int err, mapfd, progfd;
> +	int err, progfd;
> +	int mapfd = 0;
>  
> -	if (!REQ_ARGS(5)) {
> +	if (!REQ_ARGS(3)) {
>  		p_err("too few parameters for map detach");
>  		return -EINVAL;
>  	}
> @@ -777,10 +781,11 @@ static int do_detach(int argc, char **argv)
>  		return -EINVAL;
>  	}
>  	NEXT_ARG();
> -
> -	mapfd = map_parse_fd(&argc, &argv);
> -	if (mapfd < 0)
> -		return mapfd;
> +	if (argc > 0) {
> +		mapfd = map_parse_fd(&argc, &argv);
> +		if (mapfd < 0)
> +			return mapfd;
> +	}
>  
>  	err = bpf_prog_detach2(progfd, mapfd, attach_type);
>  	if (err) {
> @@ -792,6 +797,56 @@ static int do_detach(int argc, char **argv)
>  		jsonw_null(json_wtr);
>  	return 0;
>  }
> +
> +/* Flow dissector consists of a main program and a jump table for each
> + * supported protocol. The assumption here is that the first prog is the main
> + * one and the other progs are used in the tail calls. In this routine we
> + * build the jump table for the non-main progs.
> + */
> +static int build_flow_dissector_jmp_table(struct bpf_object *obj,
> +					  struct bpf_program *prog,
> +					  const char *jmp_table_map)
> +{
> +	struct bpf_map *jmp_table;
> +	struct bpf_program *pos;
> +	int i = 0;
> +	int prog_fd, jmp_table_fd, fd;

Please order variables longest to shortest.

> +	prog_fd = bpf_program__fd(prog);
> +	if (prog_fd < 0) {
> +		p_err("failed to get fd of main prog");
> +		return prog_fd;
> +	}
> +
> +	jmp_table = bpf_object__find_map_by_name(obj, jmp_table_map);
> +	if (jmp_table == NULL) {

nit: !jmp_table

> +		p_err("failed to find '%s' map", jmp_table_map);
> +		return -1;
> +	}
> +
> +	jmp_table_fd = bpf_map__fd(jmp_table);
> +	if (jmp_table_fd < 0) {
> +		p_err("failed to get fd of jmp_table");
> +		return jmp_table_fd;
> +	}
> +
> +	bpf_object__for_each_program(pos, obj) {
> +		fd = bpf_program__fd(pos);
> +		if (fd < 0) {
> +			p_err("failed to get fd of '%s'",
> +			      bpf_program__title(pos, false));
> +			return fd;
> +		}
> +
> +		if (fd != prog_fd) {
> +			bpf_map_update_elem(jmp_table_fd, &i, &fd, BPF_ANY);
> +			++i;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int do_load(int argc, char **argv)
>  {
>  	enum bpf_attach_type expected_attach_type;
> @@ -800,7 +855,7 @@ static int do_load(int argc, char **argv)
>  	};
>  	struct map_replace *map_replace = NULL;
>  	unsigned int old_map_fds = 0;
> -	struct bpf_program *prog;
> +	struct bpf_program *prog, *pos;

variable order

>  	struct bpf_object *obj;
>  	struct bpf_map *map;
>  	const char *pinfile;
> @@ -918,13 +973,19 @@ static int do_load(int argc, char **argv)
>  		goto err_free_reuse_maps;
>  	}
>  
> -	prog = bpf_program__next(NULL, obj);
> +	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
> +		/* for the flow dissector type, the entry point is in the
> +		 * section flow_dissector; other progs are tail calls
> +		 */
> +		prog = bpf_object__find_program_by_title(obj, "flow_dissector");
> +	} else {
> +		prog = bpf_program__next(NULL, obj);
> +	}
>  	if (!prog) {
>  		p_err("object file doesn't contain any bpf program");
>  		goto err_close_obj;
>  	}
>  
> -	bpf_program__set_ifindex(prog, ifindex);
>  	if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
>  		const char *sec_name = bpf_program__title(prog, false);
>  
> @@ -936,8 +997,13 @@ static int do_load(int argc, char **argv)
>  			goto err_close_obj;
>  		}
>  	}
> -	bpf_program__set_type(prog, attr.prog_type);
> -	bpf_program__set_expected_attach_type(prog, expected_attach_type);
> +
> +	bpf_object__for_each_program(pos, obj) {
> +		bpf_program__set_ifindex(pos, ifindex);
> +		bpf_program__set_type(pos, attr.prog_type);
> +		bpf_program__set_expected_attach_type(pos,
> +						      expected_attach_type);
> +	}
>  
>  	qsort(map_replace, old_map_fds, sizeof(*map_replace),
>  	      map_replace_compar);
> @@ -1001,8 +1067,34 @@ static int do_load(int argc, char **argv)
>  		goto err_close_obj;
>  	}
>  
> -	if (do_pin_fd(bpf_program__fd(prog), pinfile))
> +	err = mount_bpffs_for_pin(pinfile);
> +	if (err) {
> +		p_err("failed to mount bpffs for pin '%s'", pinfile);

Probably would be a duplicated error again?

>  		goto err_close_obj;
> +	}
> +
> +	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
> +		err = build_flow_dissector_jmp_table(obj, prog, "jmp_table");
> +		if (err) {
> +			p_err("failed to build flow dissector jump table");
> +			goto err_close_obj;
> +		}
> +		/* flow dissector consist of multiple programs,
> +		 * we want to pin them all

Why pin them all shouldn't the main program be the only one pinned?

> +		 */
> +		err = bpf_object__pin(obj, pinfile);
> +		if (err) {
> +			p_err("failed to pin flow dissector object");
> +			goto err_close_obj;
> +		}
> +	} else {
> +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> +		if (err) {
> +			p_err("failed to pin program %s",
> +			      bpf_program__title(prog, false));
> +			goto err_close_obj;
> +		}
> +	}

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine
From: Heiner Kallweit @ 2018-11-07 20:05 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <20181107194816.GC9599@lunn.ch>

On 07.11.2018 20:48, Andrew Lunn wrote:
> On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
>> This patch series is based on two axioms:
>>
>> - During autoneg a PHY always reports the link being down
> 
> Hi Heiner
> 
> I think that is a risky assumption to make.
> 
I wasn't sure initially too (found no clear rule in 802.3 clause 22)
and therefore asked around. Florian agrees to the assumption,
see here: https://www.spinics.net/lists/netdev/msg519242.html

If a PHY reports the link as up then every user would assume that
data can be transferred. But that's not the case during aneg.
Therefore reporting the link as up during aneg wouldn't make sense.

> What happens if this assumption is incorrect?
> 
Then we have to flush this patch series down the drain ;)
At least I would have to check in detail which parts need to be
changed. I clearly mention the assumptions so that every
reviewer can check whether he agrees.

>      Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Quentin Monnet @ 2018-11-07 20:08 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah
  Cc: jakub.kicinski, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107193552.77894-3-sdf@google.com>

Hi Stanislav,

2018-11-07 11:35 UTC-0800 ~ Stanislav Fomichev <sdf@google.com>
> This commit adds support for loading/attaching/detaching flow
> dissector program. The structure of the flow dissector program is
> assumed to be the same as in the selftests:
> 
> * flow_dissector section with the main entry point
> * a bunch of tail call progs
> * a jmp_table map that is populated with the tail call progs
> 
> When `bpftool load` is called with a flow_dissector prog (i.e. when the
> first section is flow_dissector of 'type flow_dissector' argument is
> passed), we load and pin all the programs and build the jump table.
> 
> The last argument of `bpftool attach` is made optional for this use
> case.
> 
> Example:
> bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
> 	/sys/fs/bpf/flow type flow_dissector
> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector
> 
> Tested by using the above two lines to load the prog in
> the test_flow_dissector.sh selftest.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  .../bpftool/Documentation/bpftool-prog.rst    |  16 ++-
>  tools/bpf/bpftool/common.c                    |  32 +++--
>  tools/bpf/bpftool/main.h                      |   1 +
>  tools/bpf/bpftool/prog.c                      | 135 +++++++++++++++---
>  4 files changed, 141 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> index ac4e904b10fb..3caa9153435b 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> @@ -25,8 +25,8 @@ MAP COMMANDS
>  |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
>  |	**bpftool** **prog pin** *PROG* *FILE*
>  |	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> -|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
> -|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
> +|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>  |	**bpftool** **prog help**
>  |
>  |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> @@ -39,7 +39,9 @@ MAP COMMANDS
>  |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
>  |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
>  |	}
> -|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
> +|       *ATTACH_TYPE* := {
> +|	|	**msg_verdict** | **skb_verdict** | **skb_parse** | **flow_dissector**

        ^
Nitpick: Could you please remove the above pipe?

> +|	}
>  
>  
>  DESCRIPTION
> @@ -97,13 +99,13 @@ DESCRIPTION
>  		  contain a dot character ('.'), which is reserved for future
>  		  extensions of *bpffs*.
>  
> -        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
> +        **bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>                    Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> -                  to the map *MAP*.
> +                  to the optional map *MAP*.
>  
> -        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +        **bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>                    Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
> -                  from the map *MAP*.
> +                  from the optional map *MAP*.
>  
>  	**bpftool prog help**
>  		  Print short help message.
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index 25af85304ebe..963881142dfb 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -169,34 +169,24 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
>  	return fd;
>  }
>  
> -int do_pin_fd(int fd, const char *name)
> +int mount_bpffs_for_pin(const char *name)
>  {
>  	char err_str[ERR_MAX_LEN];
>  	char *file;
>  	char *dir;
>  	int err = 0;
>  
> -	err = bpf_obj_pin(fd, name);
> -	if (!err)
> -		goto out;
> -
>  	file = malloc(strlen(name) + 1);
>  	strcpy(file, name);
>  	dir = dirname(file);
>  
> -	if (errno != EPERM || is_bpffs(dir)) {
> -		p_err("can't pin the object (%s): %s", name, strerror(errno));
> +	if (is_bpffs(dir)) {
> +		/* nothing to do if already mounted */
>  		goto out_free;
>  	}
>  
> -	/* Attempt to mount bpffs, then retry pinning. */
>  	err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
> -	if (!err) {
> -		err = bpf_obj_pin(fd, name);
> -		if (err)
> -			p_err("can't pin the object (%s): %s", name,
> -			      strerror(errno));
> -	} else {
> +	if (err) {
>  		err_str[ERR_MAX_LEN - 1] = '\0';
>  		p_err("can't mount BPF file system to pin the object (%s): %s",
>  		      name, err_str);
> @@ -204,10 +194,22 @@ int do_pin_fd(int fd, const char *name)
>  
>  out_free:
>  	free(file);
> -out:
>  	return err;
>  }
>  
> +int do_pin_fd(int fd, const char *name)
> +{
> +	int err = mount_bpffs_for_pin(name);
> +
> +	if (err) {
> +		p_err("can't mount bpffs for pin %s: %s",
> +		      name, strerror(errno));
> +		return err;
> +	}
> +
> +	return bpf_obj_pin(fd, name);
> +}
> +
>  int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
>  {
>  	unsigned int id;
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index 28322ace2856..1383824c9baf 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -129,6 +129,7 @@ const char *get_fd_type_name(enum bpf_obj_type type);
>  char *get_fdinfo(int fd, const char *key);
>  int open_obj_pinned(char *path);
>  int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
> +int mount_bpffs_for_pin(const char *name);
>  int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
>  int do_pin_fd(int fd, const char *name);
>  
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 5302ee282409..f3a07ec3a444 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -81,6 +81,7 @@ static const char * const attach_type_strings[] = {
>  	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
>  	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
>  	[BPF_SK_MSG_VERDICT] = "msg_verdict",
> +	[BPF_FLOW_DISSECTOR] = "flow_dissector",
>  	[__MAX_BPF_ATTACH_TYPE] = NULL,
>  };
>  
> @@ -724,9 +725,10 @@ int map_replace_compar(const void *p1, const void *p2)
>  static int do_attach(int argc, char **argv)
>  {
>  	enum bpf_attach_type attach_type;
> -	int err, mapfd, progfd;
> +	int err, progfd;
> +	int mapfd = 0;
>  
> -	if (!REQ_ARGS(5)) {
> +	if (!REQ_ARGS(3)) {
>  		p_err("too few parameters for map attach");
>  		return -EINVAL;
>  	}
> @@ -741,10 +743,11 @@ static int do_attach(int argc, char **argv)
>  		return -EINVAL;
>  	}
>  	NEXT_ARG();
> -
> -	mapfd = map_parse_fd(&argc, &argv);
> -	if (mapfd < 0)
> -		return mapfd;
> +	if (argc > 0) {
> +		mapfd = map_parse_fd(&argc, &argv);
> +		if (mapfd < 0)
> +			return mapfd;
> +	}
>  
>  	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
>  	if (err) {
> @@ -760,9 +763,10 @@ static int do_attach(int argc, char **argv)
>  static int do_detach(int argc, char **argv)
>  {
>  	enum bpf_attach_type attach_type;
> -	int err, mapfd, progfd;
> +	int err, progfd;
> +	int mapfd = 0;
>  
> -	if (!REQ_ARGS(5)) {
> +	if (!REQ_ARGS(3)) {
>  		p_err("too few parameters for map detach");
>  		return -EINVAL;
>  	}
> @@ -777,10 +781,11 @@ static int do_detach(int argc, char **argv)
>  		return -EINVAL;
>  	}
>  	NEXT_ARG();
> -
> -	mapfd = map_parse_fd(&argc, &argv);
> -	if (mapfd < 0)
> -		return mapfd;
> +	if (argc > 0) {
> +		mapfd = map_parse_fd(&argc, &argv);
> +		if (mapfd < 0)
> +			return mapfd;
> +	}
>  
>  	err = bpf_prog_detach2(progfd, mapfd, attach_type);
>  	if (err) {
> @@ -792,6 +797,56 @@ static int do_detach(int argc, char **argv)
>  		jsonw_null(json_wtr);
>  	return 0;
>  }
> +
> +/* Flow dissector consists of a main program and a jump table for each
> + * supported protocol. The assumption here is that the first prog is the main
> + * one and the other progs are used in the tail calls. In this routine we
> + * build the jump table for the non-main progs.
> + */
> +static int build_flow_dissector_jmp_table(struct bpf_object *obj,
> +					  struct bpf_program *prog,
> +					  const char *jmp_table_map)
> +{
> +	struct bpf_map *jmp_table;
> +	struct bpf_program *pos;
> +	int i = 0;
> +	int prog_fd, jmp_table_fd, fd;
> +
> +	prog_fd = bpf_program__fd(prog);
> +	if (prog_fd < 0) {
> +		p_err("failed to get fd of main prog");
> +		return prog_fd;
> +	}
> +
> +	jmp_table = bpf_object__find_map_by_name(obj, jmp_table_map);
> +	if (jmp_table == NULL) {
> +		p_err("failed to find '%s' map", jmp_table_map);
> +		return -1;
> +	}
> +
> +	jmp_table_fd = bpf_map__fd(jmp_table);
> +	if (jmp_table_fd < 0) {
> +		p_err("failed to get fd of jmp_table");
> +		return jmp_table_fd;
> +	}
> +
> +	bpf_object__for_each_program(pos, obj) {
> +		fd = bpf_program__fd(pos);
> +		if (fd < 0) {
> +			p_err("failed to get fd of '%s'",
> +			      bpf_program__title(pos, false));
> +			return fd;
> +		}
> +
> +		if (fd != prog_fd) {
> +			bpf_map_update_elem(jmp_table_fd, &i, &fd, BPF_ANY);
> +			++i;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int do_load(int argc, char **argv)
>  {
>  	enum bpf_attach_type expected_attach_type;
> @@ -800,7 +855,7 @@ static int do_load(int argc, char **argv)
>  	};
>  	struct map_replace *map_replace = NULL;
>  	unsigned int old_map_fds = 0;
> -	struct bpf_program *prog;
> +	struct bpf_program *prog, *pos;
>  	struct bpf_object *obj;
>  	struct bpf_map *map;
>  	const char *pinfile;
> @@ -918,13 +973,19 @@ static int do_load(int argc, char **argv)
>  		goto err_free_reuse_maps;
>  	}
>  
> -	prog = bpf_program__next(NULL, obj);
> +	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
> +		/* for the flow dissector type, the entry point is in the
> +		 * section flow_dissector; other progs are tail calls
> +		 */
> +		prog = bpf_object__find_program_by_title(obj, "flow_dissector");

Is the section always called like this? Or could we use instead the same
assumption as above, i.e. the main program is the first one?

> +	} else {
> +		prog = bpf_program__next(NULL, obj);
> +	}
>  	if (!prog) {
>  		p_err("object file doesn't contain any bpf program");
>  		goto err_close_obj;
>  	}
>  
> -	bpf_program__set_ifindex(prog, ifindex);
>  	if (attr.prog_type == BPF_PROG_TYPE_UNSPEC) {
>  		const char *sec_name = bpf_program__title(prog, false);
>  
> @@ -936,8 +997,13 @@ static int do_load(int argc, char **argv)
>  			goto err_close_obj;
>  		}
>  	}
> -	bpf_program__set_type(prog, attr.prog_type);
> -	bpf_program__set_expected_attach_type(prog, expected_attach_type);
> +
> +	bpf_object__for_each_program(pos, obj) {
> +		bpf_program__set_ifindex(pos, ifindex);
> +		bpf_program__set_type(pos, attr.prog_type);
> +		bpf_program__set_expected_attach_type(pos,
> +						      expected_attach_type);
> +	}

I believe it is possible to load program of several types or attach
types from a same object file? If so, we would need to get individual
prog types and attach types for each program with
libbpf_prog_type_by_name().

>  
>  	qsort(map_replace, old_map_fds, sizeof(*map_replace),
>  	      map_replace_compar);
> @@ -1001,8 +1067,34 @@ static int do_load(int argc, char **argv)
>  		goto err_close_obj;
>  	}
>  
> -	if (do_pin_fd(bpf_program__fd(prog), pinfile))
> +	err = mount_bpffs_for_pin(pinfile);
> +	if (err) {
> +		p_err("failed to mount bpffs for pin '%s'", pinfile);
>  		goto err_close_obj;
> +	}
> +
> +	if (attr.prog_type == BPF_PROG_TYPE_FLOW_DISSECTOR) {
> +		err = build_flow_dissector_jmp_table(obj, prog, "jmp_table");
> +		if (err) {
> +			p_err("failed to build flow dissector jump table");
> +			goto err_close_obj;
> +		}
> +		/* flow dissector consist of multiple programs,
> +		 * we want to pin them all
> +		 */
> +		err = bpf_object__pin(obj, pinfile);
> +		if (err) {
> +			p_err("failed to pin flow dissector object");
> +			goto err_close_obj;
> +		}

What happens for the programs previously loaded and pinned in one of the
program fails? Do they remain loaded and pinned even if all programs
were not successfully loaded? Or should we consider removing all links
we added instead and go back to a "clean" state?

> +	} else {
> +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> +		if (err) {
> +			p_err("failed to pin program %s",
> +			      bpf_program__title(prog, false));
> +			goto err_close_obj;
> +		}

I don't have the same opinion as Jakub for pinning :). I was hoping we
could also load additional programs (for tail calls) for
non-flow_dissector programs. Could this be an occasion to update the
code in that direction?

> +	}
>  
>  	if (json_output)
>  		jsonw_null(json_wtr);
> @@ -1037,8 +1129,8 @@ static int do_help(int argc, char **argv)
>  		"       %s %s pin   PROG FILE\n"
>  		"       %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
>  		"                         [map { idx IDX | name NAME } MAP]\n"
> -		"       %s %s attach PROG ATTACH_TYPE MAP\n"
> -		"       %s %s detach PROG ATTACH_TYPE MAP\n"
> +		"       %s %s attach PROG ATTACH_TYPE [MAP]\n"
> +		"       %s %s detach PROG ATTACH_TYPE [MAP]\n"
>  		"       %s %s help\n"
>  		"\n"
>  		"       " HELP_SPEC_MAP "\n"
> @@ -1050,7 +1142,8 @@ static int do_help(int argc, char **argv)
>  		"                 cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
>  		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
>  		"                 cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
> -		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
> +		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse |\n"
> +		"                        flow_dissector }\n"
>  		"       " HELP_SPEC_OPTIONS "\n"
>  		"",
>  		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> 

Thanks a lot for the set!
Quentin

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: phy: improve and simplify phylib state machine
From: Andrew Lunn @ 2018-11-07 20:21 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <5c2d1fba-0c94-210f-0d9b-e1d8f70fbee0@gmail.com>

On Wed, Nov 07, 2018 at 09:05:49PM +0100, Heiner Kallweit wrote:
> On 07.11.2018 20:48, Andrew Lunn wrote:
> > On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
> >> This patch series is based on two axioms:
> >>
> >> - During autoneg a PHY always reports the link being down
> > 
> > Hi Heiner
> > 
> > I think that is a risky assumption to make.
> > 
> I wasn't sure initially too (found no clear rule in 802.3 clause 22)
> and therefore asked around. Florian agrees to the assumption,
> see here: https://www.spinics.net/lists/netdev/msg519242.html
> 
> If a PHY reports the link as up then every user would assume that
> data can be transferred. But that's not the case during aneg.
> Therefore reporting the link as up during aneg wouldn't make sense.

Hi Heiner

If auto-neg has already been completed once before, i can see a lazy
hardware designed not reporting link down, or at least, not until
auto-neg actually fails.

And what about if link is down for too short a time for us to notice?
I've seen some code fail because the kernel went off and did something
else for too long, and a state change was missed. 

> > What happens if this assumption is incorrect?
> > 
> Then we have to flush this patch series down the drain ;)
> At least I would have to check in detail which parts need to be
> changed. I clearly mention the assumptions so that every
> reviewer can check whether he agrees.

Thanks for doing that. I want to be happy this is safe, and not going
to introduce regressions.

   Andrew

^ permalink raw reply

* Re: [PATCH iproute2 net-next 1/2] iplink_vxlan: Add DF configuration
From: Stefano Brivio @ 2018-11-07 20:30 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, Stephen Hemminger
In-Reply-To: <01b41b92-1cba-a81f-523c-99c00750538a@gmail.com>

On Wed, 7 Nov 2018 13:03:34 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 11/6/18 2:39 PM, Stefano Brivio wrote:
> > diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
> > index 7fc0e2b4eb06..86afbe1334f0 100644
> > --- a/ip/iplink_vxlan.c
> > +++ b/ip/iplink_vxlan.c
> > @@ -31,6 +31,7 @@ static void print_explain(FILE *f)
> >  		"                 [ local ADDR ]\n"
> >  		"                 [ ttl TTL ]\n"
> >  		"                 [ tos TOS ]\n"
> > +		"                 [ df DF ]\n"
> >  		"                 [ flowlabel LABEL ]\n"
> >  		"                 [ dev PHYS_DEV ]\n"
> >  		"                 [ dstport PORT ]\n"  
> 
> Since it is the df bit, that user option seems fine to me. Should be ok
> to use that for your probe on iproute2 support.

Okay, then I'll use it in v2 of the kernel series.

> That said, the man-page update should spell out what df refers to.

Sure, I'll add that.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 20:32 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	guro, jiong.wang, bhole_prashant_q7, john.fastabend, jbenc,
	treeze.taeung, yhs, osk, sandipan
In-Reply-To: <484ea7f5-0d4c-b828-c5fe-1fd4d9bf847d@netronome.com>

On Wed, 7 Nov 2018 20:08:53 +0000, Quentin Monnet wrote:
> > +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> > +		if (err) {
> > +			p_err("failed to pin program %s",
> > +			      bpf_program__title(prog, false));
> > +			goto err_close_obj;
> > +		}  
> 
> I don't have the same opinion as Jakub for pinning :). I was hoping we
> could also load additional programs (for tail calls) for
> non-flow_dissector programs. Could this be an occasion to update the
> code in that direction?

Do you mean having the bpftool construct an array for tail calling
automatically when loading an object?  Or do a "mass pin" of all
programs in an object file?

I'm not convinced about this strategy of auto assembling a tail call
array by assuming that a flow dissector object carries programs for
protocols in order (apart from the main program which doesn't have to
be first, for some reason).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox