Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v5 0/8] xdp: Avoid unloading xdp prog not attached by sample
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend

Hi!
This patchset tries to address the situation where:
* user loads a particular xdp sample application that does stats polling
* user loads another sample application on the same interface
* then, user sends SIGINT/SIGTERM to the app that was attached as a first one
* second application ends up with an unloaded xdp program

1st patch contains a helper libbpf function for getting the map fd by a
given map name.
In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which
was a blocker for converting this sample to libbpf usage.
3rd patch updates a bunch of xdp samples to make the use of libbpf.
Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
In patch 5 extack messages are added for cases where dev_change_xdp_fd returns
with an error so user has an idea what was the reason for not attaching the
xdp program onto interface.
Patch 6 makes the samples behavior similar to what iproute2 does when loading
xdp prog - the "force" flag is introduced.
Patch 7 introduces the libbpf function that will query the driver from
userspace about the currently attached xdp prog id.

Use it in samples that do polling by checking the prog id in signal handler
and comparing it with previously stored one which is the scope of patch 8.

Thanks!

v1->v2:
* add a libbpf helper for getting a prog via relative index
* include xdp_redirect_cpu into conversion

v2->v3: mostly addressing Daniel's/Jesper's comments
* get rid of the helper from v1->v2
* feed the xdp_redirect_cpu with program name instead of number

v3->v4:
* fix help message in xdp_sample_pkts

v4->v5:
* in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
  with 0
* add extack messages in dev_change_xdp_fd
* check the return value of bpf_get_link_xdp_id when exiting from sample progs

Jesper Dangaard Brouer (1):
  samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe

Maciej Fijalkowski (7):
  libbpf: Add a helper for retrieving a map fd for a given name
  samples/bpf: Convert XDP samples to libbpf usage
  samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
  xdp: Provide extack messages when prog attachment failed
  samples/bpf: Add a "force" flag to XDP samples
  libbpf: Add a support for getting xdp prog id on ifindex
  samples/bpf: Check the prog id before exiting

 net/core/dev.c                      |  12 ++-
 samples/bpf/Makefile                |   8 +-
 samples/bpf/xdp1_user.c             |  34 ++++++-
 samples/bpf/xdp_adjust_tail_user.c  |  38 +++++--
 samples/bpf/xdp_redirect_cpu_user.c | 196 +++++++++++++++++++++++++-----------
 samples/bpf/xdp_redirect_map_user.c | 106 +++++++++++++++----
 samples/bpf/xdp_redirect_user.c     | 103 ++++++++++++++++---
 samples/bpf/xdp_router_ipv4_user.c  | 179 +++++++++++++++++++++++---------
 samples/bpf/xdp_rxq_info_user.c     |  41 ++++++--
 samples/bpf/xdp_sample_pkts_user.c  |  81 ++++++++++++---
 samples/bpf/xdp_tx_iptunnel_user.c  |  71 ++++++++++---
 samples/bpf/xdpsock_user.c          |  30 +++++-
 tools/lib/bpf/libbpf.c              |   6 ++
 tools/lib/bpf/libbpf.h              |   4 +
 tools/lib/bpf/libbpf.map            |   2 +
 tools/lib/bpf/netlink.c             |  85 ++++++++++++++++
 16 files changed, 796 insertions(+), 200 deletions(-)

-- 
2.16.1

^ permalink raw reply

* [PATCH bpf-next v5 1/8] libbpf: Add a helper for retrieving a map fd for a given name
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

XDP samples are mostly cooperating with eBPF maps through their file
descriptors. In case of a eBPF program that contains multiple maps it
might be tiresome to iterate through them and call bpf_map__fd for each
one. Add a helper mostly based on bpf_object__find_map_by_name, but
instead of returning the struct bpf_map pointer, return map fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/lib/bpf/libbpf.c   | 6 ++++++
 tools/lib/bpf/libbpf.h   | 3 +++
 tools/lib/bpf/libbpf.map | 1 +
 3 files changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2ccde17957e6..03bc01ca2577 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2884,6 +2884,12 @@ bpf_object__find_map_by_name(struct bpf_object *obj, const char *name)
 	return NULL;
 }
 
+int
+bpf_object__find_map_fd_by_name(struct bpf_object *obj, const char *name)
+{
+	return bpf_map__fd(bpf_object__find_map_by_name(obj, name));
+}
+
 struct bpf_map *
 bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 62ae6cb93da1..931be6f3408c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -264,6 +264,9 @@ struct bpf_map;
 LIBBPF_API struct bpf_map *
 bpf_object__find_map_by_name(struct bpf_object *obj, const char *name);
 
+LIBBPF_API int
+bpf_object__find_map_fd_by_name(struct bpf_object *obj, const char *name);
+
 /*
  * Get bpf_map through the offset of corresponding struct bpf_map_def
  * in the BPF object file.
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 266bc95d0142..b183c6c3b990 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -130,4 +130,5 @@ LIBBPF_0.0.2 {
 		bpf_probe_helper;
 		bpf_probe_map_type;
 		bpf_probe_prog_type;
+		bpf_object__find_map_fd_by_name;
 } LIBBPF_0.0.1;
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 2/8] samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

From: Jesper Dangaard Brouer <brouer@redhat.com>

The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/xdp_redirect_cpu_user.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index 2d23054aaccf..f141e752ca0a 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -51,7 +51,6 @@ static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
 	{"dev",		required_argument,	NULL, 'd' },
 	{"skb-mode",	no_argument,		NULL, 'S' },
-	{"debug",	no_argument,		NULL, 'D' },
 	{"sec",		required_argument,	NULL, 's' },
 	{"prognum",	required_argument,	NULL, 'p' },
 	{"qsize",	required_argument,	NULL, 'q' },
@@ -563,7 +562,6 @@ int main(int argc, char **argv)
 	bool use_separators = true;
 	bool stress_mode = false;
 	char filename[256];
-	bool debug = false;
 	int added_cpus = 0;
 	int longindex = 0;
 	int interval = 2;
@@ -624,9 +622,6 @@ int main(int argc, char **argv)
 		case 'S':
 			xdp_flags |= XDP_FLAGS_SKB_MODE;
 			break;
-		case 'D':
-			debug = true;
-			break;
 		case 'x':
 			stress_mode = true;
 			break;
@@ -688,11 +683,6 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
-	if (debug) {
-		printf("Debug-mode reading trace pipe (fix #define DEBUG)\n");
-		read_trace_pipe();
-	}
-
 	stats_poll(interval, use_separators, prog_num, stress_mode);
 	return EXIT_OK;
 }
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 3/8] samples/bpf: Convert XDP samples to libbpf usage
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

Some of XDP samples that are attaching the bpf program to the interface
via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
loading and manipulating the ebpf program and maps. Convert them to do
this through libbpf usage and remove bpf_load from the picture.

While at it remove what looks like debug leftover in
xdp_redirect_map_user.c

In xdp_redirect_cpu, change the way that the program to be loaded onto
interface is chosen - user now needs to pass the program's section name
instead of the relative number. In case of typo print out the section
names to choose from.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 samples/bpf/Makefile                |   8 +-
 samples/bpf/xdp_redirect_cpu_user.c | 145 +++++++++++++++++++++++++-----------
 samples/bpf/xdp_redirect_map_user.c |  47 ++++++++----
 samples/bpf/xdp_redirect_user.c     |  44 ++++++++---
 samples/bpf/xdp_router_ipv4_user.c  |  75 +++++++++++++------
 samples/bpf/xdp_tx_iptunnel_user.c  |  37 ++++++---
 6 files changed, 253 insertions(+), 103 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index db1a91dfa702..a0ef7eddd0b3 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -87,18 +87,18 @@ test_cgrp2_sock2-objs := bpf_load.o test_cgrp2_sock2.o
 xdp1-objs := xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := xdp1_user.o
-xdp_router_ipv4-objs := bpf_load.o xdp_router_ipv4_user.o
+xdp_router_ipv4-objs := xdp_router_ipv4_user.o
 test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \
 				       test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o trace_event_user.o $(TRACE_HELPERS)
 sampleip-objs := bpf_load.o sampleip_user.o $(TRACE_HELPERS)
 tc_l2_redirect-objs := bpf_load.o tc_l2_redirect_user.o
 lwt_len_hist-objs := bpf_load.o lwt_len_hist_user.o
-xdp_tx_iptunnel-objs := bpf_load.o xdp_tx_iptunnel_user.o
+xdp_tx_iptunnel-objs := xdp_tx_iptunnel_user.o
 test_map_in_map-objs := bpf_load.o test_map_in_map_user.o
 per_socket_stats_example-objs := cookie_uid_helper_example.o
-xdp_redirect-objs := bpf_load.o xdp_redirect_user.o
-xdp_redirect_map-objs := bpf_load.o xdp_redirect_map_user.o
+xdp_redirect-objs := xdp_redirect_user.o
+xdp_redirect_map-objs := xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o xdp_monitor_user.o
 xdp_rxq_info-objs := xdp_rxq_info_user.o
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index f141e752ca0a..8645ddc2da0e 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -24,12 +24,8 @@ static const char *__doc__ =
 /* How many xdp_progs are defined in _kern.c */
 #define MAX_PROG 6
 
-/* Wanted to get rid of bpf_load.h and fake-"libbpf.h" (and instead
- * use bpf/libbpf.h), but cannot as (currently) needed for XDP
- * attaching to a device via bpf_set_link_xdp_fd()
- */
 #include <bpf/bpf.h>
-#include "bpf_load.h"
+#include "bpf/libbpf.h"
 
 #include "bpf_util.h"
 
@@ -38,6 +34,15 @@ static char ifname_buf[IF_NAMESIZE];
 static char *ifname;
 
 static __u32 xdp_flags;
+static int cpu_map_fd;
+static int rx_cnt_map_fd;
+static int redirect_err_cnt_map_fd;
+static int cpumap_enqueue_cnt_map_fd;
+static int cpumap_kthread_cnt_map_fd;
+static int cpus_available_map_fd;
+static int cpus_count_map_fd;
+static int cpus_iterator_map_fd;
+static int exception_cnt_map_fd;
 
 /* Exit return codes */
 #define EXIT_OK		0
@@ -52,7 +57,7 @@ static const struct option long_options[] = {
 	{"dev",		required_argument,	NULL, 'd' },
 	{"skb-mode",	no_argument,		NULL, 'S' },
 	{"sec",		required_argument,	NULL, 's' },
-	{"prognum",	required_argument,	NULL, 'p' },
+	{"progname",	required_argument,	NULL, 'p' },
 	{"qsize",	required_argument,	NULL, 'q' },
 	{"cpu",		required_argument,	NULL, 'c' },
 	{"stress-mode", no_argument,		NULL, 'x' },
@@ -70,7 +75,17 @@ static void int_exit(int sig)
 	exit(EXIT_OK);
 }
 
-static void usage(char *argv[])
+static void print_avail_progs(struct bpf_object *obj)
+{
+	struct bpf_program *pos;
+
+	bpf_object__for_each_program(pos, obj) {
+		if (bpf_program__is_xdp(pos))
+			printf(" %s\n", bpf_program__title(pos, false));
+	}
+}
+
+static void usage(char *argv[], struct bpf_object *obj)
 {
 	int i;
 
@@ -88,6 +103,8 @@ static void usage(char *argv[])
 				long_options[i].val);
 		printf("\n");
 	}
+	printf("\n Programs to be used for --progname:\n");
+	print_avail_progs(obj);
 	printf("\n");
 }
 
@@ -262,7 +279,7 @@ static __u64 calc_errs_pps(struct datarec *r,
 
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
-			int prog_num)
+			char *prog_name)
 {
 	unsigned int nr_cpus = bpf_num_possible_cpus();
 	double pps = 0, drop = 0, err = 0;
@@ -272,7 +289,7 @@ static void stats_print(struct stats_record *stats_rec,
 	int i;
 
 	/* Header */
-	printf("Running XDP/eBPF prog_num:%d\n", prog_num);
+	printf("Running XDP/eBPF prog_name:%s\n", prog_name);
 	printf("%-15s %-7s %-14s %-11s %-9s\n",
 	       "XDP-cpumap", "CPU:to", "pps", "drop-pps", "extra-info");
 
@@ -423,20 +440,20 @@ static void stats_collect(struct stats_record *rec)
 {
 	int fd, i;
 
-	fd = map_fd[1]; /* map: rx_cnt */
+	fd = rx_cnt_map_fd;
 	map_collect_percpu(fd, 0, &rec->rx_cnt);
 
-	fd = map_fd[2]; /* map: redirect_err_cnt */
+	fd = redirect_err_cnt_map_fd;
 	map_collect_percpu(fd, 1, &rec->redir_err);
 
-	fd = map_fd[3]; /* map: cpumap_enqueue_cnt */
+	fd = cpumap_enqueue_cnt_map_fd;
 	for (i = 0; i < MAX_CPUS; i++)
 		map_collect_percpu(fd, i, &rec->enq[i]);
 
-	fd = map_fd[4]; /* map: cpumap_kthread_cnt */
+	fd = cpumap_kthread_cnt_map_fd;
 	map_collect_percpu(fd, 0, &rec->kthread);
 
-	fd = map_fd[8]; /* map: exception_cnt */
+	fd = exception_cnt_map_fd;
 	map_collect_percpu(fd, 0, &rec->exception);
 }
 
@@ -461,7 +478,7 @@ static int create_cpu_entry(__u32 cpu, __u32 queue_size,
 	/* Add a CPU entry to cpumap, as this allocate a cpu entry in
 	 * the kernel for the cpu.
 	 */
-	ret = bpf_map_update_elem(map_fd[0], &cpu, &queue_size, 0);
+	ret = bpf_map_update_elem(cpu_map_fd, &cpu, &queue_size, 0);
 	if (ret) {
 		fprintf(stderr, "Create CPU entry failed (err:%d)\n", ret);
 		exit(EXIT_FAIL_BPF);
@@ -470,23 +487,22 @@ static int create_cpu_entry(__u32 cpu, __u32 queue_size,
 	/* Inform bpf_prog's that a new CPU is available to select
 	 * from via some control maps.
 	 */
-	/* map_fd[5] = cpus_available */
-	ret = bpf_map_update_elem(map_fd[5], &avail_idx, &cpu, 0);
+	ret = bpf_map_update_elem(cpus_available_map_fd, &avail_idx, &cpu, 0);
 	if (ret) {
 		fprintf(stderr, "Add to avail CPUs failed\n");
 		exit(EXIT_FAIL_BPF);
 	}
 
 	/* When not replacing/updating existing entry, bump the count */
-	/* map_fd[6] = cpus_count */
-	ret = bpf_map_lookup_elem(map_fd[6], &key, &curr_cpus_count);
+	ret = bpf_map_lookup_elem(cpus_count_map_fd, &key, &curr_cpus_count);
 	if (ret) {
 		fprintf(stderr, "Failed reading curr cpus_count\n");
 		exit(EXIT_FAIL_BPF);
 	}
 	if (new) {
 		curr_cpus_count++;
-		ret = bpf_map_update_elem(map_fd[6], &key, &curr_cpus_count, 0);
+		ret = bpf_map_update_elem(cpus_count_map_fd, &key,
+					  &curr_cpus_count, 0);
 		if (ret) {
 			fprintf(stderr, "Failed write curr cpus_count\n");
 			exit(EXIT_FAIL_BPF);
@@ -509,8 +525,8 @@ static void mark_cpus_unavailable(void)
 	int ret, i;
 
 	for (i = 0; i < MAX_CPUS; i++) {
-		/* map_fd[5] = cpus_available */
-		ret = bpf_map_update_elem(map_fd[5], &i, &invalid_cpu, 0);
+		ret = bpf_map_update_elem(cpus_available_map_fd, &i,
+					  &invalid_cpu, 0);
 		if (ret) {
 			fprintf(stderr, "Failed marking CPU unavailable\n");
 			exit(EXIT_FAIL_BPF);
@@ -530,7 +546,7 @@ static void stress_cpumap(void)
 	create_cpu_entry(1, 16000, 0, false);
 }
 
-static void stats_poll(int interval, bool use_separators, int prog_num,
+static void stats_poll(int interval, bool use_separators, char *prog_name,
 		       bool stress_mode)
 {
 	struct stats_record *record, *prev;
@@ -546,7 +562,7 @@ static void stats_poll(int interval, bool use_separators, int prog_num,
 	while (1) {
 		swap(&prev, &record);
 		stats_collect(record);
-		stats_print(record, prev, prog_num);
+		stats_print(record, prev, prog_name);
 		sleep(interval);
 		if (stress_mode)
 			stress_cpumap();
@@ -556,17 +572,51 @@ static void stats_poll(int interval, bool use_separators, int prog_num,
 	free_stats_record(prev);
 }
 
+static int init_map_fds(struct bpf_object *obj)
+{
+	cpu_map_fd = bpf_object__find_map_fd_by_name(obj, "cpu_map");
+	rx_cnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rx_cnt");
+	redirect_err_cnt_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "redirect_err_cnt");
+	cpumap_enqueue_cnt_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "cpumap_enqueue_cnt");
+	cpumap_kthread_cnt_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "cpumap_kthread_cnt");
+	cpus_available_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "cpus_available");
+	cpus_count_map_fd = bpf_object__find_map_fd_by_name(obj, "cpus_count");
+	cpus_iterator_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "cpus_iterator");
+	exception_cnt_map_fd =
+		bpf_object__find_map_fd_by_name(obj, "exception_cnt");
+
+	if (cpu_map_fd < 0 || rx_cnt_map_fd < 0 ||
+	    redirect_err_cnt_map_fd < 0 || cpumap_enqueue_cnt_map_fd < 0 ||
+	    cpumap_kthread_cnt_map_fd < 0 || cpus_available_map_fd < 0 ||
+	    cpus_count_map_fd < 0 || cpus_iterator_map_fd < 0 ||
+	    exception_cnt_map_fd < 0)
+		return -ENOENT;
+
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+	char *prog_name = "xdp_cpu_map5_lb_hash_ip_pairs";
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_UNSPEC,
+	};
 	bool use_separators = true;
 	bool stress_mode = false;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
 	char filename[256];
 	int added_cpus = 0;
 	int longindex = 0;
 	int interval = 2;
-	int prog_num = 5;
 	int add_cpu = -1;
+	int prog_fd;
 	__u32 qsize;
 	int opt;
 
@@ -579,22 +629,25 @@ int main(int argc, char **argv)
 	qsize = 128+64;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
 
 	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
 		perror("setrlimit(RLIMIT_MEMLOCK)");
 		return 1;
 	}
 
-	if (load_bpf_file(filename)) {
-		fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf);
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return EXIT_FAIL;
-	}
 
-	if (!prog_fd[0]) {
-		fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
+	if (prog_fd < 0) {
+		fprintf(stderr, "ERR: bpf_prog_load_xattr: %s\n",
+			strerror(errno));
+		return EXIT_FAIL;
+	}
+	if (init_map_fds(obj) < 0) {
+		fprintf(stderr, "bpf_object__find_map_fd_by_name failed\n");
 		return EXIT_FAIL;
 	}
-
 	mark_cpus_unavailable();
 
 	/* Parse commands line args */
@@ -630,13 +683,7 @@ int main(int argc, char **argv)
 			break;
 		case 'p':
 			/* Selecting eBPF prog to load */
-			prog_num = atoi(optarg);
-			if (prog_num < 0 || prog_num >= MAX_PROG) {
-				fprintf(stderr,
-					"--prognum too large err(%d):%s\n",
-					errno, strerror(errno));
-				goto error;
-			}
+			prog_name = optarg;
 			break;
 		case 'c':
 			/* Add multiple CPUs */
@@ -656,21 +703,21 @@ int main(int argc, char **argv)
 		case 'h':
 		error:
 		default:
-			usage(argv);
+			usage(argv, obj);
 			return EXIT_FAIL_OPTION;
 		}
 	}
 	/* Required option */
 	if (ifindex == -1) {
 		fprintf(stderr, "ERR: required option --dev missing\n");
-		usage(argv);
+		usage(argv, obj);
 		return EXIT_FAIL_OPTION;
 	}
 	/* Required option */
 	if (add_cpu == -1) {
 		fprintf(stderr, "ERR: required option --cpu missing\n");
 		fprintf(stderr, " Specify multiple --cpu option to add more\n");
-		usage(argv);
+		usage(argv, obj);
 		return EXIT_FAIL_OPTION;
 	}
 
@@ -678,11 +725,23 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd[prog_num], xdp_flags) < 0) {
+	prog = bpf_object__find_program_by_title(obj, prog_name);
+	if (!prog) {
+		fprintf(stderr, "bpf_object__find_program_by_title failed\n");
+		return EXIT_FAIL;
+	}
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		fprintf(stderr, "bpf_program__fd failed\n");
+		return EXIT_FAIL;
+	}
+
+	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
 		fprintf(stderr, "link set xdp fd failed\n");
 		return EXIT_FAIL_XDP;
 	}
 
-	stats_poll(interval, use_separators, prog_num, stress_mode);
+	stats_poll(interval, use_separators, prog_name, stress_mode);
 	return EXIT_OK;
 }
diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c
index 4445e76854b5..60d46eea225b 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -22,15 +22,16 @@
 #include <libgen.h>
 #include <sys/resource.h>
 
-#include "bpf_load.h"
 #include "bpf_util.h"
 #include <bpf/bpf.h>
+#include "bpf/libbpf.h"
 
 static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
 
 static __u32 xdp_flags;
+static int rxcnt_map_fd;
 
 static void int_exit(int sig)
 {
@@ -53,7 +54,7 @@ static void poll_stats(int interval, int ifindex)
 		int i;
 
 		sleep(interval);
-		assert(bpf_map_lookup_elem(map_fd[1], &key, values) == 0);
+		assert(bpf_map_lookup_elem(rxcnt_map_fd, &key, values) == 0);
 		for (i = 0; i < nr_cpus; i++)
 			sum += (values[i] - prev[i]);
 		if (sum)
@@ -76,9 +77,16 @@ static void usage(const char *prog)
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
+	struct bpf_program *prog, *dummy_prog;
+	int prog_fd, dummy_prog_fd;
 	const char *optstr = "SN";
-	char filename[256];
+	struct bpf_object *obj;
 	int ret, opt, key = 0;
+	char filename[256];
+	int tx_port_map_fd;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
 		switch (opt) {
@@ -109,24 +117,40 @@ int main(int argc, char **argv)
 	printf("input: %d output: %d\n", ifindex_in, ifindex_out);
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
+
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		return 1;
 
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+	prog = bpf_program__next(NULL, obj);
+	dummy_prog = bpf_program__next(prog, obj);
+	if (!prog || !dummy_prog) {
+		printf("finding a prog in obj file failed\n");
+		return 1;
+	}
+	/* bpf_prog_load_xattr gives us the pointer to first prog's fd,
+	 * so we're missing only the fd for dummy prog
+	 */
+	dummy_prog_fd = bpf_program__fd(dummy_prog);
+	if (prog_fd < 0 || dummy_prog_fd < 0) {
+		printf("bpf_prog_load_xattr: %s\n", strerror(errno));
 		return 1;
 	}
 
-	if (!prog_fd[0]) {
-		printf("load_bpf_file: %s\n", strerror(errno));
+	tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port");
+	rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt");
+	if (tx_port_map_fd < 0 || rxcnt_map_fd < 0) {
+		printf("bpf_object__find_map_fd_by_name failed\n");
 		return 1;
 	}
 
-	if (bpf_set_link_xdp_fd(ifindex_in, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex_in, prog_fd, xdp_flags) < 0) {
 		printf("ERROR: link set xdp fd failed on %d\n", ifindex_in);
 		return 1;
 	}
 
 	/* Loading dummy XDP prog on out-device */
-	if (bpf_set_link_xdp_fd(ifindex_out, prog_fd[1],
+	if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
 			    (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) {
 		printf("WARN: link set xdp fd failed on %d\n", ifindex_out);
 		ifindex_out_xdp_dummy_attached = false;
@@ -135,11 +159,8 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	printf("map[0] (vports) = %i, map[1] (map) = %i, map[2] (count) = %i\n",
-		map_fd[0], map_fd[1], map_fd[2]);
-
 	/* populate virtual to physical port map */
-	ret = bpf_map_update_elem(map_fd[0], &key, &ifindex_out, 0);
+	ret = bpf_map_update_elem(tx_port_map_fd, &key, &ifindex_out, 0);
 	if (ret) {
 		perror("bpf_update_elem");
 		goto out;
diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c
index 81a69e36cb78..93404820df68 100644
--- a/samples/bpf/xdp_redirect_user.c
+++ b/samples/bpf/xdp_redirect_user.c
@@ -22,15 +22,16 @@
 #include <libgen.h>
 #include <sys/resource.h>
 
-#include "bpf_load.h"
 #include "bpf_util.h"
 #include <bpf/bpf.h>
+#include "bpf/libbpf.h"
 
 static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
 
 static __u32 xdp_flags;
+static int rxcnt_map_fd;
 
 static void int_exit(int sig)
 {
@@ -53,7 +54,7 @@ static void poll_stats(int interval, int ifindex)
 		int i;
 
 		sleep(interval);
-		assert(bpf_map_lookup_elem(map_fd[1], &key, values) == 0);
+		assert(bpf_map_lookup_elem(rxcnt_map_fd, &key, values) == 0);
 		for (i = 0; i < nr_cpus; i++)
 			sum += (values[i] - prev[i]);
 		if (sum)
@@ -77,9 +78,16 @@ static void usage(const char *prog)
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
+	struct bpf_program *prog, *dummy_prog;
+	int prog_fd, tx_port_map_fd, opt;
 	const char *optstr = "SN";
+	struct bpf_object *obj;
 	char filename[256];
-	int ret, opt, key = 0;
+	int dummy_prog_fd;
+	int ret, key = 0;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
 		switch (opt) {
@@ -110,24 +118,40 @@ int main(int argc, char **argv)
 	printf("input: %d output: %d\n", ifindex_in, ifindex_out);
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
 
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		return 1;
+
+	prog = bpf_program__next(NULL, obj);
+	dummy_prog = bpf_program__next(prog, obj);
+	if (!prog || !dummy_prog) {
+		printf("finding a prog in obj file failed\n");
+		return 1;
+	}
+	/* bpf_prog_load_xattr gives us the pointer to first prog's fd,
+	 * so we're missing only the fd for dummy prog
+	 */
+	dummy_prog_fd = bpf_program__fd(dummy_prog);
+	if (prog_fd < 0 || dummy_prog_fd < 0) {
+		printf("bpf_prog_load_xattr: %s\n", strerror(errno));
 		return 1;
 	}
 
-	if (!prog_fd[0]) {
-		printf("load_bpf_file: %s\n", strerror(errno));
+	tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port");
+	rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt");
+	if (tx_port_map_fd < 0 || rxcnt_map_fd < 0) {
+		printf("bpf_object__find_map_fd_by_name failed\n");
 		return 1;
 	}
 
-	if (bpf_set_link_xdp_fd(ifindex_in, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex_in, prog_fd, xdp_flags) < 0) {
 		printf("ERROR: link set xdp fd failed on %d\n", ifindex_in);
 		return 1;
 	}
 
 	/* Loading dummy XDP prog on out-device */
-	if (bpf_set_link_xdp_fd(ifindex_out, prog_fd[1],
+	if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
 			    (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) {
 		printf("WARN: link set xdp fd failed on %d\n", ifindex_out);
 		ifindex_out_xdp_dummy_attached = false;
@@ -137,7 +161,7 @@ int main(int argc, char **argv)
 	signal(SIGTERM, int_exit);
 
 	/* bpf redirect port */
-	ret = bpf_map_update_elem(map_fd[0], &key, &ifindex_out, 0);
+	ret = bpf_map_update_elem(tx_port_map_fd, &key, &ifindex_out, 0);
 	if (ret) {
 		perror("bpf_update_elem");
 		goto out;
diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index b2b4dfa776c8..cea2306f5ab7 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -15,7 +15,6 @@
 #include <string.h>
 #include <sys/socket.h>
 #include <unistd.h>
-#include "bpf_load.h"
 #include <bpf/bpf.h>
 #include <arpa/inet.h>
 #include <fcntl.h>
@@ -25,11 +24,17 @@
 #include <sys/ioctl.h>
 #include <sys/syscall.h>
 #include "bpf_util.h"
+#include "bpf/libbpf.h"
 
 int sock, sock_arp, flags = 0;
 static int total_ifindex;
 int *ifindex_list;
 char buf[8192];
+static int lpm_map_fd;
+static int rxcnt_map_fd;
+static int arp_table_map_fd;
+static int exact_match_map_fd;
+static int tx_port_map_fd;
 
 static int get_route_table(int rtm_family);
 static void int_exit(int sig)
@@ -186,7 +191,8 @@ static void read_route(struct nlmsghdr *nh, int nll)
 				bpf_set_link_xdp_fd(ifindex_list[i], -1, flags);
 			exit(0);
 		}
-		assert(bpf_map_update_elem(map_fd[4], &route.iface, &route.iface, 0) == 0);
+		assert(bpf_map_update_elem(tx_port_map_fd,
+					   &route.iface, &route.iface, 0) == 0);
 		if (rtm_family == AF_INET) {
 			struct trie_value {
 				__u8 prefix[4];
@@ -207,11 +213,16 @@ static void read_route(struct nlmsghdr *nh, int nll)
 			direct_entry.arp.dst = 0;
 			if (route.dst_len == 32) {
 				if (nh->nlmsg_type == RTM_DELROUTE) {
-					assert(bpf_map_delete_elem(map_fd[3], &route.dst) == 0);
+					assert(bpf_map_delete_elem(exact_match_map_fd,
+								   &route.dst) == 0);
 				} else {
-					if (bpf_map_lookup_elem(map_fd[2], &route.dst, &direct_entry.arp.mac) == 0)
+					if (bpf_map_lookup_elem(arp_table_map_fd,
+								&route.dst,
+								&direct_entry.arp.mac) == 0)
 						direct_entry.arp.dst = route.dst;
-					assert(bpf_map_update_elem(map_fd[3], &route.dst, &direct_entry, 0) == 0);
+					assert(bpf_map_update_elem(exact_match_map_fd,
+								   &route.dst,
+								   &direct_entry, 0) == 0);
 				}
 			}
 			for (i = 0; i < 4; i++)
@@ -225,7 +236,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
 			       route.gw, route.dst_len,
 			       route.metric,
 			       route.iface_name);
-			if (bpf_map_lookup_elem(map_fd[0], prefix_key,
+			if (bpf_map_lookup_elem(lpm_map_fd, prefix_key,
 						prefix_value) < 0) {
 				for (i = 0; i < 4; i++)
 					prefix_value->prefix[i] = prefix_key->data[i];
@@ -234,7 +245,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
 				prefix_value->gw = route.gw;
 				prefix_value->metric = route.metric;
 
-				assert(bpf_map_update_elem(map_fd[0],
+				assert(bpf_map_update_elem(lpm_map_fd,
 							   prefix_key,
 							   prefix_value, 0
 							   ) == 0);
@@ -247,7 +258,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
 					       prefix_key->data[2],
 					       prefix_key->data[3],
 					       prefix_key->prefixlen);
-					assert(bpf_map_delete_elem(map_fd[0],
+					assert(bpf_map_delete_elem(lpm_map_fd,
 								   prefix_key
 								   ) == 0);
 					/* Rereading the route table to check if
@@ -275,8 +286,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
 					prefix_value->ifindex = route.iface;
 					prefix_value->gw = route.gw;
 					prefix_value->metric = route.metric;
-					assert(bpf_map_update_elem(
-								   map_fd[0],
+					assert(bpf_map_update_elem(lpm_map_fd,
 								   prefix_key,
 								   prefix_value,
 								   0) == 0);
@@ -401,7 +411,8 @@ static void read_arp(struct nlmsghdr *nh, int nll)
 		arp_entry.mac = atol(mac);
 		printf("%x\t\t%llx\n", arp_entry.dst, arp_entry.mac);
 		if (ndm_family == AF_INET) {
-			if (bpf_map_lookup_elem(map_fd[3], &arp_entry.dst,
+			if (bpf_map_lookup_elem(exact_match_map_fd,
+						&arp_entry.dst,
 						&direct_entry) == 0) {
 				if (nh->nlmsg_type == RTM_DELNEIGH) {
 					direct_entry.arp.dst = 0;
@@ -410,16 +421,17 @@ static void read_arp(struct nlmsghdr *nh, int nll)
 					direct_entry.arp.dst = arp_entry.dst;
 					direct_entry.arp.mac = arp_entry.mac;
 				}
-				assert(bpf_map_update_elem(map_fd[3],
+				assert(bpf_map_update_elem(exact_match_map_fd,
 							   &arp_entry.dst,
 							   &direct_entry, 0
 							   ) == 0);
 				memset(&direct_entry, 0, sizeof(direct_entry));
 			}
 			if (nh->nlmsg_type == RTM_DELNEIGH) {
-				assert(bpf_map_delete_elem(map_fd[2], &arp_entry.dst) == 0);
+				assert(bpf_map_delete_elem(arp_table_map_fd,
+							   &arp_entry.dst) == 0);
 			} else if (nh->nlmsg_type == RTM_NEWNEIGH) {
-				assert(bpf_map_update_elem(map_fd[2],
+				assert(bpf_map_update_elem(arp_table_map_fd,
 							   &arp_entry.dst,
 							   &arp_entry.mac, 0
 							   ) == 0);
@@ -553,7 +565,8 @@ static int monitor_route(void)
 		for (key = 0; key < nr_keys; key++) {
 			__u64 sum = 0;
 
-			assert(bpf_map_lookup_elem(map_fd[1], &key, values) == 0);
+			assert(bpf_map_lookup_elem(rxcnt_map_fd,
+						   &key, values) == 0);
 			for (i = 0; i < nr_cpus; i++)
 				sum += (values[i] - prev[key][i]);
 			if (sum)
@@ -596,11 +609,18 @@ static int monitor_route(void)
 
 int main(int ac, char **argv)
 {
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
+	struct bpf_object *obj;
 	char filename[256];
 	char **ifname_list;
+	int prog_fd;
 	int i = 1;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
+
 	if (ac < 2) {
 		printf("usage: %s [-S] Interface name list\n", argv[0]);
 		return 1;
@@ -614,15 +634,28 @@ int main(int ac, char **argv)
 		total_ifindex = ac - 1;
 		ifname_list = (argv + 1);
 	}
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
-	}
+
 	printf("\n**************loading bpf file*********************\n\n\n");
-	if (!prog_fd[0]) {
-		printf("load_bpf_file: %s\n", strerror(errno));
+	if (!prog_fd) {
+		printf("bpf_prog_load_xattr: %s\n", strerror(errno));
 		return 1;
 	}
+
+	lpm_map_fd = bpf_object__find_map_fd_by_name(obj, "lpm_map");
+	rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt");
+	arp_table_map_fd = bpf_object__find_map_fd_by_name(obj, "arp_table");
+	exact_match_map_fd = bpf_object__find_map_fd_by_name(obj,
+							     "exact_match");
+	tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port");
+	if (lpm_map_fd < 0 || rxcnt_map_fd < 0 || arp_table_map_fd < 0 ||
+	    exact_match_map_fd < 0 || tx_port_map_fd < 0) {
+		printf("bpf_object__find_map_fd_by_name failed\n");
+		return 1;
+	}
+
 	ifindex_list = (int *)malloc(total_ifindex * sizeof(int *));
 	for (i = 0; i < total_ifindex; i++) {
 		ifindex_list[i] = if_nametoindex(ifname_list[i]);
@@ -633,7 +666,7 @@ int main(int ac, char **argv)
 		}
 	}
 	for (i = 0; i < total_ifindex; i++) {
-		if (bpf_set_link_xdp_fd(ifindex_list[i], prog_fd[0], flags) < 0) {
+		if (bpf_set_link_xdp_fd(ifindex_list[i], prog_fd, flags) < 0) {
 			printf("link set xdp fd failed\n");
 			int recovery_index = i;
 
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index a4ccc33adac0..5093d8220da5 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -17,7 +17,7 @@
 #include <netinet/ether.h>
 #include <unistd.h>
 #include <time.h>
-#include "bpf_load.h"
+#include "bpf/libbpf.h"
 #include <bpf/bpf.h>
 #include "bpf_util.h"
 #include "xdp_tx_iptunnel_common.h"
@@ -26,6 +26,7 @@
 
 static int ifindex = -1;
 static __u32 xdp_flags = 0;
+static int rxcnt_map_fd;
 
 static void int_exit(int sig)
 {
@@ -53,7 +54,8 @@ static void poll_stats(unsigned int kill_after_s)
 		for (proto = 0; proto < nr_protos; proto++) {
 			__u64 sum = 0;
 
-			assert(bpf_map_lookup_elem(map_fd[0], &proto, values) == 0);
+			assert(bpf_map_lookup_elem(rxcnt_map_fd, &proto,
+						   values) == 0);
 			for (i = 0; i < nr_cpus; i++)
 				sum += (values[i] - prev[proto][i]);
 
@@ -138,15 +140,19 @@ static int parse_ports(const char *port_str, int *min_port, int *max_port)
 
 int main(int argc, char **argv)
 {
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type	= BPF_PROG_TYPE_XDP,
+	};
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	int min_port = 0, max_port = 0, vip2tnl_map_fd;
+	const char *optstr = "i:a:p:s:d:m:T:P:SNh";
 	unsigned char opt_flags[256] = {};
 	unsigned int kill_after_s = 0;
-	const char *optstr = "i:a:p:s:d:m:T:P:SNh";
-	int min_port = 0, max_port = 0;
 	struct iptnl_info tnl = {};
-	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_object *obj;
 	struct vip vip = {};
 	char filename[256];
-	int opt;
+	int opt, prog_fd;
 	int i;
 
 	tnl.family = AF_UNSPEC;
@@ -232,29 +238,36 @@ int main(int argc, char **argv)
 	}
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+	prog_load_attr.file = filename;
 
-	if (load_bpf_file(filename)) {
-		printf("%s", bpf_log_buf);
+	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
-	}
 
-	if (!prog_fd[0]) {
+	if (!prog_fd) {
 		printf("load_bpf_file: %s\n", strerror(errno));
 		return 1;
 	}
 
+	rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt");
+	vip2tnl_map_fd = bpf_object__find_map_fd_by_name(obj, "vip2tnl");
+	if (vip2tnl_map_fd < 0 || rxcnt_map_fd < 0) {
+		printf("bpf_object__find_map_fd_by_name failed\n");
+		return 1;
+	}
+
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
 	while (min_port <= max_port) {
 		vip.dport = htons(min_port++);
-		if (bpf_map_update_elem(map_fd[1], &vip, &tnl, BPF_NOEXIST)) {
+		if (bpf_map_update_elem(vip2tnl_map_fd, &vip, &tnl,
+					BPF_NOEXIST)) {
 			perror("bpf_map_update_elem(&vip2tnl)");
 			return 1;
 		}
 	}
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

In order to provide more meaningful messages to user when the process of
loading xdp program onto network interface failed, let's add extack
messages within dev_change_xdp_fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/core/dev.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8e276e0192a1..bfa4be42afff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7983,8 +7983,10 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	query = flags & XDP_FLAGS_HW_MODE ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
-	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE)))
+	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
+		NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
 		return -EOPNOTSUPP;
+	}
 	if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
 		bpf_op = generic_xdp_install;
 	if (bpf_op == bpf_chk)
@@ -7992,11 +7994,15 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 
 	if (fd >= 0) {
 		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
-		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
+			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
+		}
 		if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
-		    __dev_xdp_query(dev, bpf_op, query))
+		    __dev_xdp_query(dev, bpf_op, query)) {
+			NL_SET_ERR_MSG(extack, "XDP program already attached");
 			return -EBUSY;
+		}
 
 		prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
 					     bpf_op == ops->ndo_bpf);
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 6/8] samples/bpf: Add a "force" flag to XDP samples
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

Make xdp samples consistent with iproute2 behavior and set the
XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
interface. Provide an option for user to force the program loading,
which as a result will not include the mentioned flag in
bpf_set_link_xdp_fd call.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/xdp1_user.c             | 10 +++++---
 samples/bpf/xdp_adjust_tail_user.c  |  8 ++++--
 samples/bpf/xdp_redirect_cpu_user.c |  8 ++++--
 samples/bpf/xdp_redirect_map_user.c | 10 +++++---
 samples/bpf/xdp_redirect_user.c     | 10 +++++---
 samples/bpf/xdp_router_ipv4_user.c  | 50 +++++++++++++++++++++++++++----------
 samples/bpf/xdp_rxq_info_user.c     |  8 ++++--
 samples/bpf/xdp_sample_pkts_user.c  | 40 +++++++++++++++++++++++------
 samples/bpf/xdp_tx_iptunnel_user.c  |  8 ++++--
 samples/bpf/xdpsock_user.c          |  7 ++++--
 10 files changed, 119 insertions(+), 40 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 8bfda95c77ad..505bce207165 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -22,7 +22,7 @@
 #include "bpf/libbpf.h"
 
 static int ifindex;
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 static void int_exit(int sig)
 {
@@ -63,7 +63,8 @@ static void usage(const char *prog)
 		"usage: %s [OPTS] IFACE\n\n"
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
-		"    -N    enforce native mode\n",
+		"    -N    enforce native mode\n"
+		"    -F    force loading prog\n",
 		prog);
 }
 
@@ -73,7 +74,7 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
-	const char *optstr = "SN";
+	const char *optstr = "FSN";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -87,6 +88,9 @@ int main(int argc, char **argv)
 		case 'N':
 			xdp_flags |= XDP_FLAGS_DRV_MODE;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
index 3042ce37dae8..049bddf7778b 100644
--- a/samples/bpf/xdp_adjust_tail_user.c
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -24,7 +24,7 @@
 #define STATS_INTERVAL_S 2U
 
 static int ifindex = -1;
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 static void int_exit(int sig)
 {
@@ -60,6 +60,7 @@ static void usage(const char *cmd)
 	printf("    -T <stop-after-X-seconds> Default: 0 (forever)\n");
 	printf("    -S use skb-mode\n");
 	printf("    -N enforce native mode\n");
+	printf("    -F force loading prog\n");
 	printf("    -h Display this help\n");
 }
 
@@ -70,8 +71,8 @@ int main(int argc, char **argv)
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
 	unsigned char opt_flags[256] = {};
+	const char *optstr = "i:T:SNFh";
 	unsigned int kill_after_s = 0;
-	const char *optstr = "i:T:SNh";
 	int i, prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -96,6 +97,9 @@ int main(int argc, char **argv)
 		case 'N':
 			xdp_flags |= XDP_FLAGS_DRV_MODE;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(argv[0]);
 			return 1;
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index 8645ddc2da0e..0224afb55845 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -33,7 +33,7 @@ static int ifindex = -1;
 static char ifname_buf[IF_NAMESIZE];
 static char *ifname;
 
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int cpu_map_fd;
 static int rx_cnt_map_fd;
 static int redirect_err_cnt_map_fd;
@@ -62,6 +62,7 @@ static const struct option long_options[] = {
 	{"cpu",		required_argument,	NULL, 'c' },
 	{"stress-mode", no_argument,		NULL, 'x' },
 	{"no-separators", no_argument,		NULL, 'z' },
+	{"force",	no_argument,		NULL, 'F' },
 	{0, 0, NULL,  0 }
 };
 
@@ -651,7 +652,7 @@ int main(int argc, char **argv)
 	mark_cpus_unavailable();
 
 	/* Parse commands line args */
-	while ((opt = getopt_long(argc, argv, "hSd:",
+	while ((opt = getopt_long(argc, argv, "hSd:s:p:q:c:xzF",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		case 'd':
@@ -700,6 +701,9 @@ int main(int argc, char **argv)
 		case 'q':
 			qsize = atoi(optarg);
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		case 'h':
 		error:
 		default:
diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c
index 60d46eea225b..470e1a7e8810 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -30,7 +30,7 @@ static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
 
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
 
 static void int_exit(int sig)
@@ -70,7 +70,8 @@ static void usage(const char *prog)
 		"usage: %s [OPTS] IFINDEX_IN IFINDEX_OUT\n\n"
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
-		"    -N    enforce native mode\n",
+		"    -N    enforce native mode\n"
+		"    -F    force loading prog\n",
 		prog);
 }
 
@@ -82,7 +83,7 @@ int main(int argc, char **argv)
 	};
 	struct bpf_program *prog, *dummy_prog;
 	int prog_fd, dummy_prog_fd;
-	const char *optstr = "SN";
+	const char *optstr = "FSN";
 	struct bpf_object *obj;
 	int ret, opt, key = 0;
 	char filename[256];
@@ -96,6 +97,9 @@ int main(int argc, char **argv)
 		case 'N':
 			xdp_flags |= XDP_FLAGS_DRV_MODE;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c
index 93404820df68..be6058cda97c 100644
--- a/samples/bpf/xdp_redirect_user.c
+++ b/samples/bpf/xdp_redirect_user.c
@@ -30,7 +30,7 @@ static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
 
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
 
 static void int_exit(int sig)
@@ -70,7 +70,8 @@ static void usage(const char *prog)
 		"usage: %s [OPTS] IFINDEX_IN IFINDEX_OUT\n\n"
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
-		"    -N    enforce native mode\n",
+		"    -N    enforce native mode\n"
+		"    -F    force loading prog\n",
 		prog);
 }
 
@@ -83,7 +84,7 @@ int main(int argc, char **argv)
 	};
 	struct bpf_program *prog, *dummy_prog;
 	int prog_fd, tx_port_map_fd, opt;
-	const char *optstr = "SN";
+	const char *optstr = "FSN";
 	struct bpf_object *obj;
 	char filename[256];
 	int dummy_prog_fd;
@@ -97,6 +98,9 @@ int main(int argc, char **argv)
 		case 'N':
 			xdp_flags |= XDP_FLAGS_DRV_MODE;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index c63c6beec7d6..208d6a996478 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -26,8 +26,9 @@
 #include "bpf_util.h"
 #include "bpf/libbpf.h"
 #include <sys/resource.h>
+#include <libgen.h>
 
-int sock, sock_arp, flags = 0;
+int sock, sock_arp, flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int total_ifindex;
 int *ifindex_list;
 char buf[8192];
@@ -608,33 +609,56 @@ static int monitor_route(void)
 	return ret;
 }
 
+static void usage(const char *prog)
+{
+	fprintf(stderr,
+		"%s: %s [OPTS] interface name list\n\n"
+		"OPTS:\n"
+		"    -S    use skb-mode\n"
+		"    -F    force loading prog\n",
+		__func__, prog);
+}
+
 int main(int ac, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	const char *optstr = "SF";
 	struct bpf_object *obj;
 	char filename[256];
 	char **ifname_list;
-	int prog_fd;
+	int prog_fd, opt;
 	int i = 1;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	prog_load_attr.file = filename;
 
-	if (ac < 2) {
-		printf("usage: %s [-S] Interface name list\n", argv[0]);
-		return 1;
+	total_ifindex = ac - 1;
+	ifname_list = (argv + 1);
+
+	while ((opt = getopt(ac, argv, optstr)) != -1) {
+		switch (opt) {
+		case 'S':
+			flags |= XDP_FLAGS_SKB_MODE;
+			total_ifindex--;
+			ifname_list++;
+			break;
+		case 'F':
+			flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			total_ifindex--;
+			ifname_list++;
+			break;
+		default:
+			usage(basename(argv[0]));
+			return 1;
+		}
 	}
-	if (!strcmp(argv[1], "-S")) {
-		flags = XDP_FLAGS_SKB_MODE;
-		total_ifindex = ac - 2;
-		ifname_list = (argv + 2);
-	} else {
-		flags = 0;
-		total_ifindex = ac - 1;
-		ifname_list = (argv + 1);
+
+	if (optind == ac) {
+		usage(basename(argv[0]));
+		return 1;
 	}
 
 	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index ef26f882f92f..e7a98c2a440f 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -30,7 +30,7 @@ static int ifindex = -1;
 static char ifname_buf[IF_NAMESIZE];
 static char *ifname;
 
-static __u32 xdp_flags;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 static struct bpf_map *stats_global_map;
 static struct bpf_map *rx_queue_index_map;
@@ -52,6 +52,7 @@ static const struct option long_options[] = {
 	{"action",	required_argument,	NULL, 'a' },
 	{"readmem", 	no_argument,		NULL, 'r' },
 	{"swapmac", 	no_argument,		NULL, 'm' },
+	{"force",	no_argument,		NULL, 'F' },
 	{0, 0, NULL,  0 }
 };
 
@@ -487,7 +488,7 @@ int main(int argc, char **argv)
 	}
 
 	/* Parse commands line args */
-	while ((opt = getopt_long(argc, argv, "hSd:",
+	while ((opt = getopt_long(argc, argv, "FhSrmzd:s:a:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		case 'd':
@@ -524,6 +525,9 @@ int main(int argc, char **argv)
 		case 'm':
 			cfg_options |= SWAP_MAC;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		case 'h':
 		error:
 		default:
diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c
index 5f5828ee0761..62f34827c775 100644
--- a/samples/bpf/xdp_sample_pkts_user.c
+++ b/samples/bpf/xdp_sample_pkts_user.c
@@ -13,6 +13,8 @@
 #include <libbpf.h>
 #include <bpf/bpf.h>
 #include <sys/resource.h>
+#include <libgen.h>
+#include <linux/if_link.h>
 
 #include "perf-sys.h"
 #include "trace_helpers.h"
@@ -21,12 +23,13 @@
 static int pmu_fds[MAX_CPUS], if_idx;
 static struct perf_event_mmap_page *headers[MAX_CPUS];
 static char *if_name;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 static int do_attach(int idx, int fd, const char *name)
 {
 	int err;
 
-	err = bpf_set_link_xdp_fd(idx, fd, 0);
+	err = bpf_set_link_xdp_fd(idx, fd, xdp_flags);
 	if (err < 0)
 		printf("ERROR: failed to attach program to %s\n", name);
 
@@ -98,21 +101,42 @@ static void sig_handler(int signo)
 	exit(0);
 }
 
+static void usage(const char *prog)
+{
+	fprintf(stderr,
+		"%s: %s [OPTS] <ifname|ifindex>\n\n"
+		"OPTS:\n"
+		"    -F    force loading prog\n",
+		__func__, prog);
+}
+
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	const char *optstr = "F";
+	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
-	int prog_fd, map_fd;
 	char filename[256];
 	int ret, err, i;
 	int numcpus;
 
-	if (argc < 2) {
-		printf("Usage: %s <ifname>\n", argv[0]);
+	while ((opt = getopt(argc, argv, optstr)) != -1) {
+		switch (opt) {
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
+		default:
+			usage(basename(argv[0]));
+			return 1;
+		}
+	}
+
+	if (optind == argc) {
+		usage(basename(argv[0]));
 		return 1;
 	}
 
@@ -143,16 +167,16 @@ int main(int argc, char **argv)
 	}
 	map_fd = bpf_map__fd(map);
 
-	if_idx = if_nametoindex(argv[1]);
+	if_idx = if_nametoindex(argv[optind]);
 	if (!if_idx)
-		if_idx = strtoul(argv[1], NULL, 0);
+		if_idx = strtoul(argv[optind], NULL, 0);
 
 	if (!if_idx) {
 		fprintf(stderr, "Invalid ifname\n");
 		return 1;
 	}
-	if_name = argv[1];
-	err = do_attach(if_idx, prog_fd, argv[1]);
+	if_name = argv[optind];
+	err = do_attach(if_idx, prog_fd, if_name);
 	if (err)
 		return err;
 
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index 5093d8220da5..e3de60930d27 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -25,7 +25,7 @@
 #define STATS_INTERVAL_S 2U
 
 static int ifindex = -1;
-static __u32 xdp_flags = 0;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
 
 static void int_exit(int sig)
@@ -83,6 +83,7 @@ static void usage(const char *cmd)
 	printf("    -P <IP-Protocol> Default is TCP\n");
 	printf("    -S use skb-mode\n");
 	printf("    -N enforce native mode\n");
+	printf("    -F Force loading the XDP prog\n");
 	printf("    -h Display this help\n");
 }
 
@@ -145,7 +146,7 @@ int main(int argc, char **argv)
 	};
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	int min_port = 0, max_port = 0, vip2tnl_map_fd;
-	const char *optstr = "i:a:p:s:d:m:T:P:SNh";
+	const char *optstr = "i:a:p:s:d:m:T:P:FSNh";
 	unsigned char opt_flags[256] = {};
 	unsigned int kill_after_s = 0;
 	struct iptnl_info tnl = {};
@@ -217,6 +218,9 @@ int main(int argc, char **argv)
 		case 'N':
 			xdp_flags |= XDP_FLAGS_DRV_MODE;
 			break;
+		case 'F':
+			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(argv[0]);
 			return 1;
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 57ecadc58403..188723784768 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,7 +68,7 @@ enum benchmark_type {
 };
 
 static enum benchmark_type opt_bench = BENCH_RXDROP;
-static u32 opt_xdp_flags;
+static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static const char *opt_if = "";
 static int opt_ifindex;
 static int opt_queue;
@@ -682,7 +682,7 @@ static void parse_command_line(int argc, char **argv)
 	opterr = 0;
 
 	for (;;) {
-		c = getopt_long(argc, argv, "rtli:q:psSNn:cz", long_options,
+		c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -725,6 +725,9 @@ static void parse_command_line(int argc, char **argv)
 		case 'c':
 			opt_xdp_bind_flags |= XDP_COPY;
 			break;
+		case 'F':
+			opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+			break;
 		default:
 			usage(basename(argv[0]));
 		}
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 8/8] samples/bpf: Check the prog id before exiting
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

Check the program id within the signal handler on polling xdp samples
that were previously converted to libbpf usage. Avoid the situation of
unloading the program that was not attached by sample that is exiting.
Handle also the case where bpf_get_link_xdp_id didn't exit with an error
but the xdp program was not found on an interface.

Reported-by: Michal Papaj <michal.papaj@intel.com>
Reported-by: Jakub Spizewski <jakub.spizewski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 samples/bpf/xdp1_user.c             | 24 ++++++++++++++++-
 samples/bpf/xdp_adjust_tail_user.c  | 30 +++++++++++++++++++---
 samples/bpf/xdp_redirect_cpu_user.c | 35 ++++++++++++++++++++-----
 samples/bpf/xdp_redirect_map_user.c | 49 ++++++++++++++++++++++++++++++++---
 samples/bpf/xdp_redirect_user.c     | 49 ++++++++++++++++++++++++++++++++---
 samples/bpf/xdp_router_ipv4_user.c  | 51 ++++++++++++++++++++++++-------------
 samples/bpf/xdp_rxq_info_user.c     | 33 ++++++++++++++++++++----
 samples/bpf/xdp_sample_pkts_user.c  | 34 +++++++++++++++++++++----
 samples/bpf/xdp_tx_iptunnel_user.c  | 28 +++++++++++++++++---
 samples/bpf/xdpsock_user.c          | 23 ++++++++++++++++-
 10 files changed, 308 insertions(+), 48 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 505bce207165..6a64e93365e1 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -23,10 +23,22 @@
 
 static int ifindex;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+static __u32 prog_id;
 
 static void int_exit(int sig)
 {
-	bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+		printf("bpf_get_link_xdp_id failed\n");
+		exit(1);
+	}
+	if (prog_id == curr_prog_id)
+		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	else if (!curr_prog_id)
+		printf("couldn't find a prog id on a given interface\n");
+	else
+		printf("program on interface changed, not removing\n");
 	exit(0);
 }
 
@@ -74,11 +86,14 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	const char *optstr = "FSN";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
 	char filename[256];
+	int err;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
 		switch (opt) {
@@ -139,6 +154,13 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return err;
+	}
+	prog_id = info.id;
+
 	poll_stats(map_fd, 2);
 
 	return 0;
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
index 049bddf7778b..07e1b9269e49 100644
--- a/samples/bpf/xdp_adjust_tail_user.c
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -25,11 +25,24 @@
 
 static int ifindex = -1;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+static __u32 prog_id;
 
 static void int_exit(int sig)
 {
-	if (ifindex > -1)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (ifindex > -1) {
+		if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(1);
+		}
+		if (prog_id == curr_prog_id)
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		else if (!curr_prog_id)
+			printf("couldn't find a prog id on a given iface\n");
+		else
+			printf("program on interface changed, not removing\n");
+	}
 	exit(0);
 }
 
@@ -72,11 +85,14 @@ int main(int argc, char **argv)
 	};
 	unsigned char opt_flags[256] = {};
 	const char *optstr = "i:T:SNFh";
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	unsigned int kill_after_s = 0;
 	int i, prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
 	char filename[256];
+	int err;
 
 	for (i = 0; i < strlen(optstr); i++)
 		if (optstr[i] != 'h' && 'a' <= optstr[i] && optstr[i] <= 'z')
@@ -146,9 +162,15 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
-	poll_stats(map_fd, kill_after_s);
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return 1;
+	}
+	prog_id = info.id;
 
-	bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	poll_stats(map_fd, kill_after_s);
+	int_exit(0);
 
 	return 0;
 }
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index 0224afb55845..586b294d72d3 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -32,6 +32,7 @@ static const char *__doc__ =
 static int ifindex = -1;
 static char ifname_buf[IF_NAMESIZE];
 static char *ifname;
+static __u32 prog_id;
 
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int cpu_map_fd;
@@ -68,11 +69,24 @@ static const struct option long_options[] = {
 
 static void int_exit(int sig)
 {
-	fprintf(stderr,
-		"Interrupted: Removing XDP program on ifindex:%d device:%s\n",
-		ifindex, ifname);
-	if (ifindex > -1)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (ifindex > -1) {
+		if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(EXIT_FAIL);
+		}
+		if (prog_id == curr_prog_id) {
+			fprintf(stderr,
+				"Interrupted: Removing XDP program on ifindex:%d device:%s\n",
+				ifindex, ifname);
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		} else if (!curr_prog_id) {
+			printf("couldn't find a prog id on a given iface\n");
+		} else {
+			printf("program on interface changed, not removing\n");
+		}
+	}
 	exit(EXIT_OK);
 }
 
@@ -608,6 +622,8 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_UNSPEC,
 	};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	bool use_separators = true;
 	bool stress_mode = false;
 	struct bpf_program *prog;
@@ -617,9 +633,9 @@ int main(int argc, char **argv)
 	int longindex = 0;
 	int interval = 2;
 	int add_cpu = -1;
+	int opt, err;
 	int prog_fd;
 	__u32 qsize;
-	int opt;
 
 	/* Notice: choosing he queue size is very important with the
 	 * ixgbe driver, because it's driver page recycling trick is
@@ -746,6 +762,13 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return err;
+	}
+	prog_id = info.id;
+
 	stats_poll(interval, use_separators, prog_name, stress_mode);
 	return EXIT_OK;
 }
diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c
index 470e1a7e8810..327226be5a06 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -29,15 +29,41 @@
 static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
+static __u32 prog_id;
+static __u32 dummy_prog_id;
 
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
 
 static void int_exit(int sig)
 {
-	bpf_set_link_xdp_fd(ifindex_in, -1, xdp_flags);
-	if (ifindex_out_xdp_dummy_attached)
-		bpf_set_link_xdp_fd(ifindex_out, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(ifindex_in, &curr_prog_id, xdp_flags)) {
+		printf("bpf_get_link_xdp_id failed\n");
+		exit(1);
+	}
+	if (prog_id == curr_prog_id)
+		bpf_set_link_xdp_fd(ifindex_in, -1, xdp_flags);
+	else if (!curr_prog_id)
+		printf("couldn't find a prog id on iface IN\n");
+	else
+		printf("program on iface IN changed, not removing\n");
+
+	if (ifindex_out_xdp_dummy_attached) {
+		curr_prog_id = 0;
+		if (bpf_get_link_xdp_id(ifindex_out, &curr_prog_id,
+					xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(1);
+		}
+		if (prog_id == curr_prog_id)
+			bpf_set_link_xdp_fd(ifindex_out, -1, xdp_flags);
+		else if (!curr_prog_id)
+			printf("couldn't find a prog id on iface OUT\n");
+		else
+			printf("program on iface OUT changed, not removing\n");
+	}
 	exit(0);
 }
 
@@ -82,6 +108,8 @@ int main(int argc, char **argv)
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
 	struct bpf_program *prog, *dummy_prog;
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	int prog_fd, dummy_prog_fd;
 	const char *optstr = "FSN";
 	struct bpf_object *obj;
@@ -153,6 +181,13 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	ret = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (ret) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return ret;
+	}
+	prog_id = info.id;
+
 	/* Loading dummy XDP prog on out-device */
 	if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
 			    (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) {
@@ -160,6 +195,14 @@ int main(int argc, char **argv)
 		ifindex_out_xdp_dummy_attached = false;
 	}
 
+	memset(&info, 0, sizeof(info));
+	ret = bpf_obj_get_info_by_fd(dummy_prog_fd, &info, &info_len);
+	if (ret) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return ret;
+	}
+	dummy_prog_id = info.id;
+
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c
index be6058cda97c..a5d8ad3129ed 100644
--- a/samples/bpf/xdp_redirect_user.c
+++ b/samples/bpf/xdp_redirect_user.c
@@ -29,15 +29,41 @@
 static int ifindex_in;
 static int ifindex_out;
 static bool ifindex_out_xdp_dummy_attached = true;
+static __u32 prog_id;
+static __u32 dummy_prog_id;
 
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
 
 static void int_exit(int sig)
 {
-	bpf_set_link_xdp_fd(ifindex_in, -1, xdp_flags);
-	if (ifindex_out_xdp_dummy_attached)
-		bpf_set_link_xdp_fd(ifindex_out, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (bpf_get_link_xdp_id(ifindex_in, &curr_prog_id, xdp_flags)) {
+		printf("bpf_get_link_xdp_id failed\n");
+		exit(1);
+	}
+	if (prog_id == curr_prog_id)
+		bpf_set_link_xdp_fd(ifindex_in, -1, xdp_flags);
+	else if (!curr_prog_id)
+		printf("couldn't find a prog id on iface IN\n");
+	else
+		printf("program on iface IN changed, not removing\n");
+
+	if (ifindex_out_xdp_dummy_attached) {
+		curr_prog_id = 0;
+		if (bpf_get_link_xdp_id(ifindex_out, &curr_prog_id,
+					xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(1);
+		}
+		if (prog_id == curr_prog_id)
+			bpf_set_link_xdp_fd(ifindex_out, -1, xdp_flags);
+		else if (!curr_prog_id)
+			printf("couldn't find a prog id on iface OUT\n");
+		else
+			printf("program on iface OUT changed, not removing\n");
+	}
 	exit(0);
 }
 
@@ -84,6 +110,8 @@ int main(int argc, char **argv)
 	};
 	struct bpf_program *prog, *dummy_prog;
 	int prog_fd, tx_port_map_fd, opt;
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	const char *optstr = "FSN";
 	struct bpf_object *obj;
 	char filename[256];
@@ -154,6 +182,13 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	ret = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (ret) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return ret;
+	}
+	prog_id = info.id;
+
 	/* Loading dummy XDP prog on out-device */
 	if (bpf_set_link_xdp_fd(ifindex_out, dummy_prog_fd,
 			    (xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST)) < 0) {
@@ -161,6 +196,14 @@ int main(int argc, char **argv)
 		ifindex_out_xdp_dummy_attached = false;
 	}
 
+	memset(&info, 0, sizeof(info));
+	ret = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (ret) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return ret;
+	}
+	dummy_prog_id = info.id;
+
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index 208d6a996478..79fe7bc26ab4 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -30,7 +30,8 @@
 
 int sock, sock_arp, flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int total_ifindex;
-int *ifindex_list;
+static int *ifindex_list;
+static __u32 *prog_id_list;
 char buf[8192];
 static int lpm_map_fd;
 static int rxcnt_map_fd;
@@ -41,23 +42,34 @@ static int tx_port_map_fd;
 static int get_route_table(int rtm_family);
 static void int_exit(int sig)
 {
+	__u32 prog_id = 0;
 	int i = 0;
 
-	for (i = 0; i < total_ifindex; i++)
-		bpf_set_link_xdp_fd(ifindex_list[i], -1, flags);
+	for (i = 0; i < total_ifindex; i++) {
+		if (bpf_get_link_xdp_id(ifindex_list[i], &prog_id, flags)) {
+			printf("bpf_get_link_xdp_id on iface %d failed\n",
+			       ifindex_list[i]);
+			exit(1);
+		}
+		if (prog_id_list[i] == prog_id)
+			bpf_set_link_xdp_fd(ifindex_list[i], -1, flags);
+		else if (!prog_id)
+			printf("couldn't find a prog id on iface %d\n",
+			       ifindex_list[i]);
+		else
+			printf("program on iface %d changed, not removing\n",
+			       ifindex_list[i]);
+		prog_id = 0;
+	}
 	exit(0);
 }
 
 static void close_and_exit(int sig)
 {
-	int i = 0;
-
 	close(sock);
 	close(sock_arp);
 
-	for (i = 0; i < total_ifindex; i++)
-		bpf_set_link_xdp_fd(ifindex_list[i], -1, flags);
-	exit(0);
+	int_exit(0);
 }
 
 /* Get the mac address of the interface given interface name */
@@ -186,13 +198,8 @@ static void read_route(struct nlmsghdr *nh, int nll)
 		route.iface_name = alloca(sizeof(char *) * IFNAMSIZ);
 		route.iface_name = if_indextoname(route.iface, route.iface_name);
 		route.mac = getmac(route.iface_name);
-		if (route.mac == -1) {
-			int i = 0;
-
-			for (i = 0; i < total_ifindex; i++)
-				bpf_set_link_xdp_fd(ifindex_list[i], -1, flags);
-			exit(0);
-		}
+		if (route.mac == -1)
+			int_exit(0);
 		assert(bpf_map_update_elem(tx_port_map_fd,
 					   &route.iface, &route.iface, 0) == 0);
 		if (rtm_family == AF_INET) {
@@ -625,12 +632,14 @@ int main(int ac, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	const char *optstr = "SF";
 	struct bpf_object *obj;
 	char filename[256];
 	char **ifname_list;
 	int prog_fd, opt;
-	int i = 1;
+	int err, i = 1;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	prog_load_attr.file = filename;
@@ -687,7 +696,7 @@ int main(int ac, char **argv)
 		return 1;
 	}
 
-	ifindex_list = (int *)malloc(total_ifindex * sizeof(int *));
+	ifindex_list = (int *)calloc(total_ifindex, sizeof(int *));
 	for (i = 0; i < total_ifindex; i++) {
 		ifindex_list[i] = if_nametoindex(ifname_list[i]);
 		if (!ifindex_list[i]) {
@@ -696,6 +705,7 @@ int main(int ac, char **argv)
 			return 1;
 		}
 	}
+	prog_id_list = (__u32 *)calloc(total_ifindex, sizeof(__u32 *));
 	for (i = 0; i < total_ifindex; i++) {
 		if (bpf_set_link_xdp_fd(ifindex_list[i], prog_fd, flags) < 0) {
 			printf("link set xdp fd failed\n");
@@ -706,6 +716,13 @@ int main(int ac, char **argv)
 
 			return 1;
 		}
+		err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+		if (err) {
+			printf("can't get prog info - %s\n", strerror(errno));
+			return err;
+		}
+		prog_id_list[i] = info.id;
+		memset(&info, 0, sizeof(info));
 		printf("Attached to %d\n", ifindex_list[i]);
 	}
 	signal(SIGINT, int_exit);
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index e7a98c2a440f..1210f3b170f0 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -29,6 +29,7 @@ static const char *__doc__ = " XDP RX-queue info extract example\n\n"
 static int ifindex = -1;
 static char ifname_buf[IF_NAMESIZE];
 static char *ifname;
+static __u32 prog_id;
 
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
@@ -58,11 +59,24 @@ static const struct option long_options[] = {
 
 static void int_exit(int sig)
 {
-	fprintf(stderr,
-		"Interrupted: Removing XDP program on ifindex:%d device:%s\n",
-		ifindex, ifname);
-	if (ifindex > -1)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (ifindex > -1) {
+		if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(EXIT_FAIL);
+		}
+		if (prog_id == curr_prog_id) {
+			fprintf(stderr,
+				"Interrupted: Removing XDP program on ifindex:%d device:%s\n",
+				ifindex, ifname);
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		} else if (!curr_prog_id) {
+			printf("couldn't find a prog id on a given iface\n");
+		} else {
+			printf("program on interface changed, not removing\n");
+		}
+	}
 	exit(EXIT_OK);
 }
 
@@ -447,6 +461,8 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	int prog_fd, map_fd, opt, err;
 	bool use_separators = true;
 	struct config cfg = { 0 };
@@ -580,6 +596,13 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return err;
+	}
+	prog_id = info.id;
+
 	stats_poll(interval, action, cfg_options);
 	return EXIT_OK;
 }
diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c
index 62f34827c775..dc66345a929a 100644
--- a/samples/bpf/xdp_sample_pkts_user.c
+++ b/samples/bpf/xdp_sample_pkts_user.c
@@ -24,25 +24,49 @@ static int pmu_fds[MAX_CPUS], if_idx;
 static struct perf_event_mmap_page *headers[MAX_CPUS];
 static char *if_name;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+static __u32 prog_id;
 
 static int do_attach(int idx, int fd, const char *name)
 {
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	int err;
 
 	err = bpf_set_link_xdp_fd(idx, fd, xdp_flags);
-	if (err < 0)
+	if (err < 0) {
 		printf("ERROR: failed to attach program to %s\n", name);
+		return err;
+	}
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return err;
+	}
+	prog_id = info.id;
 
 	return err;
 }
 
 static int do_detach(int idx, const char *name)
 {
-	int err;
+	__u32 curr_prog_id = 0;
+	int err = 0;
 
-	err = bpf_set_link_xdp_fd(idx, -1, 0);
-	if (err < 0)
-		printf("ERROR: failed to detach program from %s\n", name);
+	err = bpf_get_link_xdp_id(idx, &curr_prog_id, 0);
+	if (err) {
+		printf("bpf_get_link_xdp_id failed\n");
+		return err;
+	}
+	if (prog_id == curr_prog_id) {
+		err = bpf_set_link_xdp_fd(idx, -1, 0);
+		if (err < 0)
+			printf("ERROR: failed to detach prog from %s\n", name);
+	} else if (!curr_prog_id) {
+		printf("couldn't find a prog id on a %s\n", name);
+	} else {
+		printf("program on interface changed, not removing\n");
+	}
 
 	return err;
 }
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index e3de60930d27..4a1511eb7812 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -27,11 +27,24 @@
 static int ifindex = -1;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static int rxcnt_map_fd;
+static __u32 prog_id;
 
 static void int_exit(int sig)
 {
-	if (ifindex > -1)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+	__u32 curr_prog_id = 0;
+
+	if (ifindex > -1) {
+		if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+			printf("bpf_get_link_xdp_id failed\n");
+			exit(1);
+		}
+		if (prog_id == curr_prog_id)
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		else if (!curr_prog_id)
+			printf("couldn't find a prog id on a given iface\n");
+		else
+			printf("program on interface changed, not removing\n");
+	}
 	exit(0);
 }
 
@@ -148,13 +161,15 @@ int main(int argc, char **argv)
 	int min_port = 0, max_port = 0, vip2tnl_map_fd;
 	const char *optstr = "i:a:p:s:d:m:T:P:FSNh";
 	unsigned char opt_flags[256] = {};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	unsigned int kill_after_s = 0;
 	struct iptnl_info tnl = {};
 	struct bpf_object *obj;
 	struct vip vip = {};
 	char filename[256];
 	int opt, prog_fd;
-	int i;
+	int i, err;
 
 	tnl.family = AF_UNSPEC;
 	vip.protocol = IPPROTO_TCP;
@@ -276,6 +291,13 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return err;
+	}
+	prog_id = info.id;
+
 	poll_stats(kill_after_s);
 
 	bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 188723784768..f73055e0191f 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -76,6 +76,7 @@ static int opt_poll;
 static int opt_shared_packet_buffer;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
+static __u32 prog_id;
 
 struct xdp_umem_uqueue {
 	u32 cached_prod;
@@ -631,9 +632,20 @@ static void *poller(void *arg)
 
 static void int_exit(int sig)
 {
+	__u32 curr_prog_id = 0;
+
 	(void)sig;
 	dump_stats();
-	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+	if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
+		printf("bpf_get_link_xdp_id failed\n");
+		exit(EXIT_FAILURE);
+	}
+	if (prog_id == curr_prog_id)
+		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+	else if (!curr_prog_id)
+		printf("couldn't find a prog id on a given interface\n");
+	else
+		printf("program on interface changed, not removing\n");
 	exit(EXIT_SUCCESS);
 }
 
@@ -907,6 +919,8 @@ int main(int argc, char **argv)
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
 	int prog_fd, qidconf_map, xsks_map;
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	struct bpf_object *obj;
 	char xdp_filename[256];
 	struct bpf_map *map;
@@ -953,6 +967,13 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
+	ret = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (ret) {
+		printf("can't get prog info - %s\n", strerror(errno));
+		return 1;
+	}
+	prog_id = info.id;
+
 	ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
 	if (ret) {
 		fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 7/8] libbpf: Add a support for getting xdp prog id on ifindex
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

Since we have a dedicated netlink attributes for xdp setup on a
particular interface, it is now possible to retrieve the program id that
is currently attached to the interface. The use case is targeted for
sample xdp programs, which will store the program id just after loading
bpf program onto iface. On shutdown, the sample will make sure that it
can unload the program by querying again the iface and verifying that
both program id's matches.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/lib/bpf/libbpf.h   |  1 +
 tools/lib/bpf/libbpf.map |  1 +
 tools/lib/bpf/netlink.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 931be6f3408c..43c77e98df6f 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -317,6 +317,7 @@ LIBBPF_API int bpf_prog_load(const char *file, enum bpf_prog_type type,
 			     struct bpf_object **pobj, int *prog_fd);
 
 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 
 enum bpf_perf_event_ret {
 	LIBBPF_PERF_EVENT_DONE	= 0,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b183c6c3b990..d0e023a75d72 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -131,4 +131,5 @@ LIBBPF_0.0.2 {
 		bpf_probe_map_type;
 		bpf_probe_prog_type;
 		bpf_object__find_map_fd_by_name;
+		bpf_get_link_xdp_id;
 } LIBBPF_0.0.1;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 0ce67aea8f3b..ce3ec81b71c0 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -21,6 +21,12 @@
 typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
 			      void *cookie);
 
+struct xdp_id_md {
+	int ifindex;
+	__u32 flags;
+	__u32 id;
+};
+
 int libbpf_netlink_open(__u32 *nl_pid)
 {
 	struct sockaddr_nl sa;
@@ -196,6 +202,85 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 	return dump_link_nlmsg(cookie, ifi, tb);
 }
 
+static unsigned char get_xdp_id_attr(unsigned char mode, __u32 flags)
+{
+	if (mode != XDP_ATTACHED_MULTI)
+		return IFLA_XDP_PROG_ID;
+	if (flags & XDP_FLAGS_DRV_MODE)
+		return IFLA_XDP_DRV_PROG_ID;
+	if (flags & XDP_FLAGS_HW_MODE)
+		return IFLA_XDP_HW_PROG_ID;
+	if (flags & XDP_FLAGS_SKB_MODE)
+		return IFLA_XDP_SKB_PROG_ID;
+
+	return IFLA_XDP_UNSPEC;
+}
+
+static int get_xdp_id(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
+	struct xdp_id_md *xdp_id = cookie;
+	struct ifinfomsg *ifinfo = msg;
+	unsigned char mode, xdp_attr;
+	int ret;
+
+	if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
+		return 0;
+
+	if (!tb[IFLA_XDP])
+		return 0;
+
+	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[IFLA_XDP], NULL);
+	if (ret)
+		return ret;
+
+	if (!xdp_tb[IFLA_XDP_ATTACHED])
+		return 0;
+
+	mode = libbpf_nla_getattr_u8(xdp_tb[IFLA_XDP_ATTACHED]);
+	if (mode == XDP_ATTACHED_NONE)
+		return 0;
+
+	xdp_attr = get_xdp_id_attr(mode, xdp_id->flags);
+	if (!xdp_attr || !xdp_tb[xdp_attr])
+		return 0;
+
+	xdp_id->id = libbpf_nla_getattr_u32(xdp_tb[xdp_attr]);
+
+	return 0;
+}
+
+int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
+{
+	struct xdp_id_md xdp_id = {};
+	int sock, ret;
+	__u32 nl_pid;
+	__u32 mask;
+
+	if (flags & ~XDP_FLAGS_MASK)
+		return -EINVAL;
+
+	/* Check whether the single {HW,DRV,SKB} mode is set */
+	flags &= (XDP_FLAGS_SKB_MODE | XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE);
+	mask = flags - 1;
+	if (flags && flags & mask)
+		return -EINVAL;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	xdp_id.ifindex = ifindex;
+	xdp_id.flags = flags;
+
+	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
+	if (!ret)
+		*prog_id = xdp_id.id;
+
+	close(sock);
+	return ret;
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
-- 
2.16.1


^ permalink raw reply related

* [PATCH bpf-next v5 4/8] samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
From: Maciej Fijalkowski @ 2019-02-01  0:19 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev, jakub.kicinski, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-1-maciej.fijalkowski@intel.com>

There is a common problem with xdp samples that happens when user wants
to run a particular sample and some bpf program is already loaded. The
default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
(assuming that xdp sample that is failing was converted to libbpf
usage):

libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
libbpf: failed to load object './xdp_sample_pkts_kern.o'

Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
to RLIM_INFINITY.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/bpf/xdp_router_ipv4_user.c | 7 +++++++
 samples/bpf/xdp_sample_pkts_user.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index cea2306f5ab7..c63c6beec7d6 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -25,6 +25,7 @@
 #include <sys/syscall.h>
 #include "bpf_util.h"
 #include "bpf/libbpf.h"
+#include <sys/resource.h>
 
 int sock, sock_arp, flags = 0;
 static int total_ifindex;
@@ -609,6 +610,7 @@ static int monitor_route(void)
 
 int main(int ac, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
@@ -635,6 +637,11 @@ int main(int ac, char **argv)
 		ifname_list = (argv + 1);
 	}
 
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
 
diff --git a/samples/bpf/xdp_sample_pkts_user.c b/samples/bpf/xdp_sample_pkts_user.c
index 8dd87c1eb560..5f5828ee0761 100644
--- a/samples/bpf/xdp_sample_pkts_user.c
+++ b/samples/bpf/xdp_sample_pkts_user.c
@@ -12,6 +12,7 @@
 #include <signal.h>
 #include <libbpf.h>
 #include <bpf/bpf.h>
+#include <sys/resource.h>
 
 #include "perf-sys.h"
 #include "trace_helpers.h"
@@ -99,6 +100,7 @@ static void sig_handler(int signo)
 
 int main(int argc, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
@@ -114,6 +116,11 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
 	numcpus = get_nprocs();
 	if (numcpus > MAX_CPUS)
 		numcpus = MAX_CPUS;
-- 
2.16.1


^ permalink raw reply related

* Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed
From: Jakub Kicinski @ 2019-02-01  0:32 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: daniel, ast, netdev, brouer, john.fastabend
In-Reply-To: <20190201001954.4130-6-maciej.fijalkowski@intel.com>

On Fri,  1 Feb 2019 01:19:51 +0100, Maciej Fijalkowski wrote:
> In order to provide more meaningful messages to user when the process of
> loading xdp program onto network interface failed, let's add extack
> messages within dev_change_xdp_fd.
> 
> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCHv2 net] sctp: check and update stream->out_curr when allocating stream_out
From: Marcelo Ricardo Leitner @ 2019-02-01  0:39 UTC (permalink / raw)
  To: Tuxdriver; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <1689af9c618.2807.d241da8dbb85b87157d6a44ac288e71f@tuxdriver.com>

On Tue, Jan 29, 2019 at 07:58:07PM +0100, Tuxdriver wrote:
> I was initially under the impression that with Kent's repost, the radixtree
> (which is what I think you meant by rhashtables) updates would be merged

Oops! Yep.. I had meant flex_arrays actually.

> imminently, but that doesn't seem to be the case.  I'd really like to know
> what the hold up there is, as that patch seems to have been stalled for
> months.  I hate the notion of breaking the radixtree patch, but if it's
> status is indeterminate, then, yes, we probably need to go with xins patch
> for the short term, and let Kent fix it up in due course.

Dave, can you please consider applying this patch? The conflict
resolution will be easy: just ignore the changes introduced by this
patch.

This is the radixtree converstion:
https://lwn.net/ml/linux-kernel/20181217131929.11727-1-kent.overstreet@gmail.com/
Seems that went to a limbo after
https://lwn.net/ml/linux-kernel/20181217210021.GA7144@kmo-pixel/
Maybe Kent should have reposted, but he didn't reply either.

My reasoning is below. Just please also notice that this is
triggerable by users and remotely, as remote peers may request to add
'in' streams and that implies in adding 'out' streams on local peer.
(https://tools.ietf.org/html/rfc6525#section-5.2.6)

> 
> Neil
> 
> On January 29, 2019 1:06:33 PM Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> 
> > On Thu, Nov 29, 2018 at 02:42:56PM +0800, Xin Long wrote:
> > > Now when using stream reconfig to add out streams, stream->out
> > > will get re-allocated, and all old streams' information will
> > > be copied to the new ones and the old ones will be freed.
> > > 
> > > So without stream->out_curr updated, next time when trying to
> > > send from stream->out_curr stream, a panic would be caused.
> > > 
> > > This patch is to check and update stream->out_curr when
> > > allocating stream_out.
> > > 
> > > v1->v2:
> > >   - define fa_index() to get elem index from stream->out_curr.
> > > 
> > > Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
> > > Reported-by: Ying Xu <yinxu@redhat.com>
> > > Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com
> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > 
> > We are sort of mixing things up here. We have a bug on SCTP stack that
> > triggers panics. As good practices recommends, the code should be as
> > generic as possible and the SCTP-only was dropped in favor of a more
> > generic one, fixing rhashtables instead. Okay. But then we discovered
> > rhashtables are going away and we are now waiting on a restructing
> > to fix the panic. That's not good, especially because it cannot and
> > should not be backported into -stable trees.
> > 
> > That said, we should not wait for the restructuring to _implicitly_
> > fix the bug. We should pursuit both fixes here:
> > - Apply this patch, to fix SCTP stack and allow it to be easily
> >  backportable.
> > - Apply the generic fix, which is the restructuring, whenever it
> >  actually lands.
> > 
> > Thoughts?
> > 
> > Thanks,
> > Marcelo
> 
> 
> Sent with AquaMail for Android
> https://www.mobisystems.com/aqua-mail
> 
> 

^ permalink raw reply

* Re: [RFC net-next 1/4] net: Reserve protocol identifiers for EnOcean
From: Alexander Aring @ 2019-02-01  0:58 UTC (permalink / raw)
  To: Andreas Färber
  Cc: linux-lpwan, linux-wpan, Alexander Aring, Stefan Schmidt, netdev,
	linux-kernel, support, Jonathan Cameron, Rob Herring
In-Reply-To: <6a220f80-1c75-12e6-1f8c-53b76412257a@suse.de>

Hi,

On Wed, Jan 30, 2019 at 02:42:29AM +0100, Andreas Färber wrote:
> Hi Alex,
> 
> Am 29.01.19 um 13:57 schrieb Alexander Aring:
> > On Tue, Jan 29, 2019 at 06:01:27AM +0100, Andreas Färber wrote:
> >> EnOcean wireless technology is based on ASK (ERP1) and FSK (ERP2) modulations
> >> for sub-GHz and on IEEE 802.15.4 for 2.4 GHz.
> >>
> > 
> > I am not sure what you try to do here. If I see that correctly you
> > want to add for some special protocol vendor specific transceiver which
> > is underneath an 802.15.4 transceiver a new ARPHRD type and even more
> > for each modulation what it supports?
> 
> No. EnOcean uses a 4-byte node ID across PHY layers, which I am using a
> single ARPHRD_ENOCEAN for (which you conveniently cut off above).
> 
> As indicated above, the 868 MHz transceiver is _not_ using 802.15.4 PHY
> or MAC to my knowledge. It does sound like you spotted "IEEE 802.15.4"
> and literally blended out all the rest...
> 

Ah okay, I am curious about that. As far I undetstood now this has
nothing to do with LoRa? I was not getting that point.

Is the PHY layer open? Do they actually refer to 802.15.4 in their
specs, but the PHY layer is changed by... preamble, phy header, MTU?

> > 
> > If it's a 802.15.4 transceiver why not using the 802.15.4 subsystem?
> > 
> > For me it sounds more like a HardMAC transceiver driver for doing the
> > vendor protocol. The different modulations is part of a 802.15.4 phy
> > device class. Similar like in wireless.
> 
> I've tried to design this exactly so that one _could_ implement it based
> on 802.15.4 PHY framework for 2.4 GHz or based on an FSK PHY for sub-GHz
> as a soft-MAC, layered similarly to LoRaWAN vs. LoRa, alongside the ESP
> serdev driver in this series.
> 
> In ESP3 the only 802.15.4 specific operations are getting/setting the
> channel (COMMAND_2_4 packet type), and there's a CO_GET_FREQUENCY_INFO
> command to discover frequency and protocol, with 802.15.4 having a
> different ID than ERP2 (and I spot a value 0x30 for "Long Range" :-)).
> So in theory it might be possible to instantiate an 802.15.4 PHY after
> discovering that ESP3 value, but neither is this a generic 802.15.4 PHY
> nor a generic FSK PHY, and none of that relates to above ARPHRD really.
> 

I keep it in mind, thanks.

> PF_PACKET with SOCK_DGRAM for ETH_P_ERP2 gives me the subtelegram
> contents to transmit via ESP, whereas SOCK_RAW would give the full frame
> to transmit via FSK PHY. By avoiding a custom PF_ENOCEAN we seem to lose
> the ability to prepend any protocol headers on the skb for SOCK_DGRAM.
> 

I am not quite following here. SOCK_RAW full frame and SOCK_DGRAM
payload sounds like what I suppose it should work.

A switch of protocol will do a switch from ESP to FSK which is a phy layer
behaviour?

> Did you actually read my P.S. in the cover letter? I was glad to avoid
> much PF_ socket boilerplate code here (as a playground for LoRa), and
> now you're complaining about a single ARPHRD constant! :-/
> By that standard we could stop implementing anything new... If you're
> worried about number space, why has no one commented on the values added
> for LoRa and other previous wireless technologies? No one had any such
> comments on my LoRa RFC, nor on Jian-Hong's LoRaWAN patches, so I've
> been reserving new ARPHRD_ constants for each technology I work on. If
> ARPHRD_NONE would be a better value to use for PHY layers, no one
> bothered to point it out so far! Nor did anyone suggest to Jian-Hong to
> reuse ARPHRD_EUI64! And yet I spot nothing more suitable for EnOcean
> addresses than a custom value. Fact is, the net_device wants some value.
> Note that you have two ARPHRD constants assigned for 802.15.4 alone, so
> please be fair to others.
> 

Indeed we only need one. :-)

> An 802.15.4 PHY won't help me for 868 MHz FSK - by my reading 802.15.4
> is PSK (BPSK/OQPSK), thus incompatible with ASK/OOK and FSK/MSK.
> 
> As noted in the cover letter, Semtech chips have FSK and OOK support
> alongside LoRa modulation; so I am looking into FSK PHY support, both
> for those chips as well as for some pure FSK/OOK transceivers posted to
> linux-lpwan list (and potentially more, given time):
> https://lists.infradead.org/pipermail/linux-lpwan/2019-January/000116.html
> https://lists.infradead.org/pipermail/linux-lpwan/2019-January/000117.html
> https://en.opensuse.org/User:A_faerber/LoRa_interop
> 
> Therefore an FSK PHY's netlink interface will need to be able to handle
> the requirements of upper-layer protocols, such as:
> * Wireless M-Bus (which I could not yet find a suitable 868MHz hard-MAC
> for to test against, only 169MHz; Si4432 has an Application Note AN451),
> * KNX RF (which I have not come across a hard-MAC for either),
> * Sigfox downstream (cf. mm002 LoRa driver as hard-MAC; no public docs),
> * Z-Wave (not enough docs to implement much more for now), and here
> * EnOcean Radio Protocol 2.
> 
> In general I want to make sure my implementations can work with both
> soft- and hard-MAC hardware out there, as demonstrated for LoRaWAN.
> Pointing a user with hard-MAC device to a theoretical generic subsystem
> of your preference doesn't help them, nor does it help to split the
> community into separate hard-MAC vs. soft-MAC implementation camps that
> make it hard for users to switch.

agree.

> * For example, when looking for how to actually use the Pine64 Z-Wave
> adapter, back in the day I merely found an OpenHAB Raspbian(?) image
> that as an openSUSE contributor I would surely not block my board with;
> no explanations, no instructions, nothing. And when you have a pure Java
> application on the one hand and a C/Python/whatever application on the
> other, chances are that the kernel is the only common point of reuse. I
> surely mentioned that I hate any userspace applications that attempt to
> detect hardware on spidev/i2c-dev/tty without using the kernel-provided
> facilities such as DT; finally, serdev allows to move any such
> hardware-dependent tty code into the kernel - we just need to figure out
> how to best expose functionality there (and ideally grow some more
> helpers). Just note how patch 3/4 reuses the kernel's crc8
> implementation instead of re-implementing it from scratch. Similarly I'd
> love for my AT based LoRa drivers to share more serdev code, despite
> line ending and response styles differing greatly (think
> serdev_device_readline w/args?); binary protocols like ESP here are
> luckily not affected as much. It could also use some more/better
> documentation, some of the return values are wrong.
> * As another example, we seem to be lacking a generic SDR subsystem:
> People with SDR hardware seem to use either downstream kernel modules,
> possibly application-generated, or closed-source userspace libraries?
> Neither seems able to currently reuse the net subsystem for protocols.
> And yet I've been asked repeatedly to design drivers in a way they could
> be used with SDR, too, but without any way to actually test that today.
> Has anyone talked to the SDR chipset/equipment vendors to remedy that?
> The one I was in contact with simply chose not to reply again to date...
> 
> For ETH_P_ we seem to be far away from 0xffff, so I don't see a problem
> there? Not just was it the easiest thing to implement & test short-term,
> but as outlined in the cover letter I saw no way here to turn that into
> a non-net-subsystem because the data transmitted is not self-describing
> (mostly battery-less sensors/actuators with ca. 4 byte data payload).
> You must know that your device with id 0x12345678 conforms to profile X.
> 
> Is describing remote devices in DT an option? (CC'ing Rob and Jonathan)
> 
> /.../uart@foo {
> 	enocean {
> 		compatible = "enocean,esp3";
> 		#address-cells = <1>;
> 		#size-cells = <0>;
> 
> 		window-handle@41424344 {
> 			compatible = "manufacturer1,handle";
> 			reg = <0x41424344>;
> 			enocean,equipment-profile = <1 2 3>;

What are these profiles? For declaring you actually can support some
"window-handle"? Can this be changed during runtime?

Is this some kind of device class specification by EnOcean which need to
be set into their transceiver that a management layer handle it which is
running by firmware?

> 			#io-channel-cells = <1>;
> 		};
> 
> 		light-switch@41424348 {
> 			compatible = "manufacturer2,rocker";
> 			reg = <0x41424348>;
> 			enocean,equipment-profile = <4 5 6>;
> 			#io-channel-cells = <2>;
> 		};
> 	};
> };
> 
> Pro: This would allow to abstract sensors (iio?) and actuators (gpio?).
>      Cf. https://patchwork.ozlabs.org/patch/1028209/ for comparison.
> Con: How to deal with it on ACPI or on DT platforms without Overlays?
>      How would the kernel preserve remote device state across reboots?
> 
> So no, I don't think we can or should shoehorn non-802.15.4 PHYs into
> your ieee802154 PHY layer. If you see ways to share code between the
> various wireless PHYs, that would be great, but at present it seems like
> mostly boilerplate code with nothing in your phy struct applying to FSK
> or LoRa. Compare my cfglora series pointed to and Xue Liu's recent sysfs
> patch under discussion. If no more comments turn up on my cfglora series
> I'll copy it into a cfgfsk, so that I can integrate both into sx127x as
> a base for further discussions at Netdevconf. Thanks.
> 

Share code always sounds like a good idea.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Fix code style issue in mlx driver
From: Tonghao Zhang @ 2019-02-01  1:12 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Saeed Mahameed, Linux Netdev List
In-Reply-To: <CAJ3xEMhQq43QBcfWMneaOwTPqSQmh_HmEmi4tO1NGx8nu-vCXQ@mail.gmail.com>

On Fri, Feb 1, 2019 at 4:51 AM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Jan 31, 2019 at 10:49 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> >
> > On Thu, Jan 31, 2019 at 4:11 PM <xiangxia.m.yue@gmail.com> wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Add the tab before '}' and keep the code style consistent.
> > >
> > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > LGTM
>
> > Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
>
> oops, for files starting with en_ prefix we use net/mlx5e: prefix for the patch
> title (ethernet) and for the others net/mlx5: (core) -- please fix and
OK, thanks for your tips
> re-send, add my R.B
> and also make sure to copy Dave Miller

^ permalink raw reply

* Re: [PATCH net-next v2 01/12] net: bridge: multicast: Propagate br_mc_disabled_update() return
From: Florian Fainelli @ 2019-02-01  1:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev@vger.kernel.org, andrew@lunn.ch, vivien.didelot@gmail.com,
	davem@davemloft.net, Jiri Pirko, ilias.apalodimas@linaro.org,
	ivan.khoronzhuk@linaro.org, roopa@cumulusnetworks.com,
	nikolay@cumulusnetworks.com, Petr Machata
In-Reply-To: <20190131075013.GA27839@splinter>

On 1/30/19 11:50 PM, Ido Schimmel wrote:
> On Wed, Jan 30, 2019 at 05:00:57PM -0800, Florian Fainelli wrote:
>> On 1/29/19 11:36 PM, Ido Schimmel wrote:
>>> On Tue, Jan 29, 2019 at 04:55:37PM -0800, Florian Fainelli wrote:
>>>> -static void br_mc_disabled_update(struct net_device *dev, bool value)
>>>> +static int br_mc_disabled_update(struct net_device *dev, bool value)
>>>>  {
>>>>  	struct switchdev_attr attr = {
>>>>  		.orig_dev = dev,
>>>>  		.id = SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
>>>> -		.flags = SWITCHDEV_F_DEFER,
>>>> +		.flags = SWITCHDEV_F_DEFER | SWITCHDEV_F_SKIP_EOPNOTSUPP,
>>>
>>> Actually, since the operation is deferred I don't think the return value
>>> from the driver is ever checked. Can you test it?
>>
>> You are right, you get a WARN() from switchdev_attr_port_set_now(), but
>> this does not propagate back to the caller, so you can still create a
>> bridge device and enslave a device successfully despite getting warnings
>> on the console.
>>
>>>
>>> I think it would be good to convert the attributes to use the switchdev
>>> notifier like commit d17d9f5e5143 ("switchdev: Replace port obj add/del
>>> SDO with a notification") did for objects. Then you can have your
>>> listener veto the operation in the same context it is happening.
>>
>> Alright, working on it. Would you do that just for the attr_set, or for
>> attr_get as well (to be symmetrical)?
> 
> Yes, then we can get rid of switchdev_ops completely.
> 

OK, so here is what I have so far:

https://github.com/ffainelli/linux/pull/new/switchdev-attr

although I am seeing some invalid context operations with DSA that I am
debugging. Does this look like the right way to go from your perspective?
-- 
Florian

^ permalink raw reply

* [PATCH net-next v2 1/1] net/mlx5: Fix code style issue in mlx driver
From: xiangxia.m.yue @ 2019-01-29 21:23 UTC (permalink / raw)
  To: davem; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Add the tab before '}' and keep the code style consistent.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
---
v2: change the patch title
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 79f122b..9c5aac1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -618,7 +618,8 @@ static struct mlx5_flow_group *alloc_flow_group(struct mlx5_flow_steering *steer
 	if (ret) {
 		kmem_cache_free(steering->fgs_cache, fg);
 		return ERR_PTR(ret);
-}
+	}
+
 	ida_init(&fg->fte_allocator);
 	fg->mask.match_criteria_enable = match_criteria_enable;
 	memcpy(&fg->mask.match_criteria, match_criteria,
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Willem de Bruijn @ 2019-02-01  1:47 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Peter Oskolkov, Alexei Starovoitov, Network Development,
	Peter Oskolkov, David Ahern
In-Reply-To: <38629ba4-3201-46e2-5dff-e3a816d1af13@iogearbox.net>

On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/31/2019 12:51 AM, Peter Oskolkov wrote:
> > This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> > BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> > and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> > to packets (e.g. IP/GRE, GUE, IPIP).
> >
> > This is useful when thousands of different short-lived flows should be
> > encapped, each with different and dynamically determined destination.
> > Although lwtunnels can be used in some of these scenarios, the ability
> > to dynamically generate encap headers adds more flexibility, e.g.
> > when routing depends on the state of the host (reflected in global bpf
> > maps).
> >
> > Signed-off-by: Peter Oskolkov <posk@google.com>

> > +int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
> > +{
> > +     struct iphdr *iph;
> > +     bool ipv4;
> > +     int err;
> > +
> > +     if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
> > +             return -EINVAL;
> > +
> > +     /* validate protocol and length */
> > +     iph = (struct iphdr *)hdr;
> > +     if (iph->version == 4) {
> > +             ipv4 = true;
> > +             if (unlikely(len < iph->ihl * 4))
> > +                     return -EINVAL;
> > +     } else if (iph->version == 6) {
> > +             ipv4 = false;
> > +             if (unlikely(len < sizeof(struct ipv6hdr)))
> > +                     return -EINVAL;
> > +     } else {
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (ingress)
> > +             err = skb_cow_head(skb, len + skb->mac_len);
> > +     else
> > +             err = skb_cow_head(skb,
> > +                                len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
> > +     if (unlikely(err))
> > +             return err;
> > +
> > +     /* push the encap headers and fix pointers */
> > +     skb_reset_inner_headers(skb);
> > +     skb->encapsulation = 1;
> > +     skb_push(skb, len);
> > +     if (ingress)
> > +             skb_postpush_rcsum(skb, iph, len);
> > +     skb_reset_network_header(skb);
> > +     memcpy(skb_network_header(skb), hdr, len);
> > +     bpf_compute_data_pointers(skb);
>
> Does this work transparently with GSO as well or would we need to
> update shared info for this (like in nat64 case, for example)?

Good point. It does need to update the gso_type to include the tunnel
type, similar to iptunnel_handle_offloads.

Only, the nice feature of this interface is that it is encap protocol
independent, which implies that it does not know the correct type.

I don't think that we want to allow programs to write the gso_type themselves.

With GSO_PARTIAL, perhaps specifying the exact tunnel type can be
avoided as long as it is a fixed prefix to replicate?

The transport layer size does not change, so no need to recompute gso_segs?

Either way, this seems non-trivial enough to me to do in a separate
follow-on patch. For now just fail if skb_is_gso.

^ permalink raw reply

* Re: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames
From: Toshiaki Makita @ 2019-02-01  1:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Miller
  Cc: mst, jasowang, netdev, virtualization, dsahern, hawk,
	Toke Høiland-Jørgensen
In-Reply-To: <20190131211555.3b15c81f@carbon>

On 2019/02/01 5:15, Jesper Dangaard Brouer wrote:
> On Thu, 31 Jan 2019 09:45:23 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Thu, 31 Jan 2019 10:25:17 -0500
>>
>>> On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote:  
>>>> Previously virtnet_xdp_xmit() did not account for device tx counters,
>>>> which caused confusions.
>>>> To be consistent with SKBs, account them on freeing xdp_frames.
>>>>
>>>> Reported-by: David Ahern <dsahern@gmail.com>
>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>  
>>>
>>> Well we count them on receive so I guess it makes sense for consistency
>>>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> however, I really wonder whether adding more and more standard net stack
>>> things like this will end up costing most of XDP its speed.
>>>
>>> Should we instead make sure *not* to account XDP packets
>>> in any counters at all? XDP programs can use maps
>>> to do their own counting...  
>>
>> This has been definitely a discussion point, and something we should
>> develop a clear, strong, policy on.
>>
>> David, Jesper, care to chime in where we ended up in that last thread
>> discussion this?
> 
> IHMO packets RX and TX on a device need to be accounted, in standard
> counters, regardless of XDP.  For XDP RX the packet is counted as RX,
> regardless if XDP choose to XDP_DROP.  On XDP TX which is via
> XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to
> account the packet in a TX counter (this if often delayed to DMA TX
> completion handling).  We cannot break the expectation that RX and TX
> counter are visible to userspace stats tools. XDP should not make these
> packets invisible.
> 
> Performance wise, I don't see an issue. As updating these counters
> (packets and bytes) can be done as a bulk, either when driver NAPI RX
> func ends, or in TX DMA completion, like most drivers already do today.
> Further more, most drivers save this in per RX ring data-area, which
> are only summed when userspace read these.

Agreed.

> A separate question (and project) raised by David Ahern, was if we
> should have more detailed stats on the different XDP action return
> codes, as an easy means for sysadms to diagnose running XDP programs.
> That is something that require more discussions, as it can impact
> performance, and likely need to be opt-in.  My opinion is yes we should
> do this for the sake of better User eXperience, BUT *only* if we can find
> a technical solution that does not hurt performance.

Basically the situation for the detailed stats is the same as standard
stats, at least in virtio_net. Stats are updated as a bulk, and the
counters reside in RX/TX ring structures.
Probably this way of implementation would be ok performance-wise?
But as other drivers may have different situations, if it is generally
difficult to avoid performance penalty I'm OK with making them opt-in as
a standard way.

-- 
Toshiaki Makita


^ permalink raw reply

* Re: [PATCH net-next 1/1] openvswitch: Declare ovs key structures using macros
From: Pravin Shelar @ 2019-02-01  2:39 UTC (permalink / raw)
  To: Eli Britstein
  Cc: David Miller, netdev@vger.kernel.org, blp@ovn.org,
	dev@openvswitch.org, Roi Dayan, simon.horman@netronome.com
In-Reply-To: <0b6faaac-e0ea-b2a4-e8a3-42e48436ed1d@mellanox.com>

Can you send patch with this information in commit msg?


On Thu, Jan 31, 2019 at 3:32 AM Eli Britstein <elibr@mellanox.com> wrote:
>
> ping
>
> for the using patch, i put below the v1 of it. here is v2:
>
> https://patchwork.ozlabs.org/patch/1023406/
>
>
> On 1/27/2019 8:37 AM, Eli Britstein wrote:
> >
> > On 1/27/2019 1:04 AM, David Miller wrote:
> >> From: Eli Britstein <elibr@mellanox.com>
> >> Date: Thu, 24 Jan 2019 11:46:47 +0200
> >>
> >>> Declare ovs key structures using macros to enable retrieving fields
> >>> information, with no functional change.
> >>>
> >>> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> >>> Reviewed-by: Roi Dayan <roid@mellanox.com>
> >> I agree with Pravin, this need a much better commit message.
> >>
> >> Maybe even better to submit this alongside whatever is supposed
> >> to use these new macros.
> >
> > This patch is equivalent to a work done in the OVS tree.
> >
> > https://patchwork.ozlabs.org/patch/1023405/
> >
> > As a standalone it doesn't serve any purpose (as mentioned - no
> > functional change).
> >
> > It serves as a pre-step towards another patch in the OVS:
> >
> > https://patchwork.ozlabs.org/patch/1022794/
> >
> > So, the purpose of doing it in the kernel is just to keep this H file
> > identical. Once it is approved for the kernel, we will be able to
> > proceed with it in the OVS.
> >

^ permalink raw reply

* Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed
From: Jakub Kicinski @ 2019-02-01  3:11 UTC (permalink / raw)
  To: daniel, ast, David Miller
  Cc: Maciej Fijalkowski, netdev, brouer, john.fastabend, David Ahern
In-Reply-To: <20190201001954.4130-6-maciej.fijalkowski@intel.com>

On Fri,  1 Feb 2019 01:19:51 +0100, Maciej Fijalkowski wrote:
>  		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
> -		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
> +		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
> +			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
>  			return -EEXIST;
> +		}

This reminds me, since we allowed native/driver and offloaded XDP
programs to coexist in a25717d2b604 ("xdp: support simultaneous 
driver and hw XDP attachment") I got an internal feature request 
to also allow generic and native mode.  Would anyone object to that?

Apart from a touch up to test_offload.py I don't think anything 
would care.  netlink can already carry multiple IDs, iproute2
understands it, too..

(Obviously as a follow up after this set gets merged.)

^ permalink raw reply

* Re: [PATCH net-next] net: hns3: Fix potential NULL dereference on allocation error
From: YueHaibing @ 2019-02-01  3:15 UTC (permalink / raw)
  To: davem, yisen.zhuang, salil.mehta, lipeng321; +Cc: linux-kernel, netdev
In-Reply-To: <20190125031333.17196-1-yuehaibing@huawei.com>

ping ...

On 2019/1/25 11:13, YueHaibing wrote:
> hclge_mac_update_stats_complete doesn't check for NULL
> returns of kcalloc, it may result in an Oops.
> 
> Fixes: d174ea75c96a ("net: hns3: add statistics for PFC frames and MAC control frames")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> index 64b1589..7971606 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> @@ -343,6 +343,9 @@ static int hclge_mac_update_stats_complete(struct hclge_dev *hdev, u32 desc_num)
>  	int ret;
>  
>  	desc = kcalloc(desc_num, sizeof(struct hclge_desc), GFP_KERNEL);
> +	if (!desc)
> +		return -ENOMEM;
> +
>  	hclge_cmd_setup_basic_desc(&desc[0], HCLGE_OPC_STATS_MAC_ALL, true);
>  	ret = hclge_cmd_send(&hdev->hw, desc, desc_num);
>  	if (ret) {
> 


^ permalink raw reply

* Re: [PATCH net-next 06/10] net: introduce a net_device_ops macsec helper
From: Florian Fainelli @ 2019-02-01  3:50 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, sd, andrew, hkallweit1, netdev, linux-kernel,
	thomas.petazzoni, alexandre.belloni, quentin.schulz,
	allan.nielsen
In-Reply-To: <20190124092349.GE3662@kwain>



On 1/24/19 1:23 AM, Antoine Tenart wrote:
> Hi Florian,
> 
> On Wed, Jan 23, 2019 at 12:16:08PM -0800, Florian Fainelli wrote:
>> On 1/23/19 7:56 AM, Antoine Tenart wrote:
>>> This patch introduces a net_device_ops MACsec helper to allow net device
>>> drivers to implement a MACsec offloading solution.
>>>
>>> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
>>> ---
>>>  include/linux/netdevice.h | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index e675ef97a426..ee2f40dca515 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -53,6 +53,10 @@
>>>  #include <uapi/linux/pkt_cls.h>
>>>  #include <linux/hashtable.h>
>>>  
>>> +#ifdef CONFIG_MACSEC
>>> +#include <net/macsec.h>
>>> +#endif
>>
>> You can provide a forward declaration for struct netdev_macsec and not
>> have to include that header file.
> 
> OK.
> 
>>> +
>>>  struct netpoll_info;
>>>  struct device;
>>>  struct phy_device;
>>> @@ -1441,6 +1445,10 @@ struct net_device_ops {
>>>  						u32 flags);
>>>  	int			(*ndo_xsk_async_xmit)(struct net_device *dev,
>>>  						      u32 queue_id);
>>> +#ifdef CONFIG_MACSEC
>>> +	int			(*ndo_macsec)(struct net_device *dev,
>>> +					      struct netdev_macsec *macsec);
>>
>> You would really want to define an API which is more oriented towards
>> configuring/deconfiguring a MACsec association here, e.g.: similar to
>> what the IPsec offload ndos offer.
> 
> This means mostly moving from a single function using a command field to
> multiple specialized functions to add/remove each element of MACsec
> configuration.
> 
> I don't have strong opinion on the single helper vs a structure
> containing pointers to specialized ones, but out of curiosity what's the
> benefit of such a move? Future additions and maintainability?

Having multiple operations typically allows for better granularity when
you have HW that may not be capable of offloading an entire protocol
that way you can easily implement fallbacks within the core of that
protocol handling in Linux.

Maybe if you just rename this netdev_macsec_context that will make it
clearer what this does.

> 
>> It is not clear to me whether after your patch series we still need to
>> create a macsec virtual device, and that gets offloaded onto its real
>> device/PHY device, or if we don't need that all?
> 
> After this series, we will still need the virtual MACsec interface. When
> using hardware offloading this interface isn't doing much, but it's the
> interface used to configure all the MACsec connexions.

By not doing much, you mean its data path is basically unused? That
would be quite a deviation from any other type of offload that Linux has
AFAICT, for instance on VLAN devices you still have some amount of data
on the VLAN net_device, etc.

> 
> This is because, and that's specific to MACsec (vs IPsec), a software
> implementation is already supported and it's using a virtual interface
> to perform all the MACsec related operations (vs hooks in the Rx/Tx
> paths). I really wanted to avoid having two interfaces and ways of
> configuring MACsec depending on if the offloading is used.

The virtual network device makes sense when there is some special
treatment (encap/decap, encryption/decryption) that must happen before
sending a frame/PDU onto the wire. It's the same thing here AFAICT, but
since the HW supports doing it in the PHY directly, it's a tough one.

> 
> This should also allow in the future to disable at run-time the
> offloading on a given interface, and to still have MACsec working in
> software (or the opposite, with extra work). For this to work, the
> virtual interface still has to provide an Rx and a Tx functions so that
> programs can bind onto the same interface, regardless of if the
> offloading is enabled.

It would really be good to hear from Sabrina since she authored support
for MACsec to begin with.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 7/7] ethtool: add compat for devlink info
From: kbuild test robot @ 2019-02-01  4:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kbuild-all, davem, netdev, oss-drivers, jiri, andrew, f.fainelli,
	mkubecek, eugenem, jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-8-jakub.kicinski@netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

Hi Jakub,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jakub-Kicinski/devlink-add-device-driver-information-API/20190131-222221
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.2.0 make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   m68k-linux-gnu-ld: drivers/rtc/proc.o: in function `is_rtc_hctosys.isra.0':
   proc.c:(.text+0x178): undefined reference to `strcmp'
   m68k-linux-gnu-ld: net/core/ethtool.o: in function `ethtool_get_drvinfo':
>> ethtool.c:(.text+0xc08): undefined reference to `devlink_compat_running_versions'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 12123 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] mdio_bus: Fix PTR_ERR() usage after initialization to constant
From: Al Viro @ 2019-02-01  4:24 UTC (permalink / raw)
  To: YueHaibing
  Cc: Andrew Lunn, davem, f.fainelli, hkallweit1, linux-kernel, netdev
In-Reply-To: <eeb532d5-45a7-648f-f835-82448465762a@huawei.com>

On Tue, Jan 29, 2019 at 11:30:27AM +0800, YueHaibing wrote:
> >>  		gpiod = fwnode_get_named_gpiod(&mdiodev->dev.of_node->fwnode,
> >>  					       "reset-gpios", 0, GPIOD_OUT_LOW,
> >>  					       "PHY reset");
> >> -	if (PTR_ERR(gpiod) == -ENOENT ||
> >> -	    PTR_ERR(gpiod) == -ENOSYS)
> >> -		gpiod = NULL;
> >> -	else if (IS_ERR(gpiod))
> >> -		return PTR_ERR(gpiod);
> >> +	if (IS_ERR(gpiod)) {
> >> +		ret = PTR_ERR(gpiod);
> >> +		if (ret == -ENOENT || ret == -ENOSYS)
> >> +			gpiod = NULL;
> >> +		else
> >> +			return ret;
> >> +	}

Rule of the thumb: PTR_ERR(p) == -E... is almost always better off
as p == ERR_PTR(-E...)

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: David Chang @ 2019-02-01  4:27 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Peter Ceiley, Realtek linux nic maintainers, netdev,
	Martti Laaksonen
In-Reply-To: <1fd93860-47a9-268c-318f-03d5d70e721b@gmail.com>

On Jan 31, 2019 at 19:28:20 +0100, Heiner Kallweit wrote:
> Thanks for testing, Peter!
> So we have an ASPM-related issue indeed. I'm aware that there are certain
> incompatibilities between board chipsets and network chip versions
> (although it's not known which combinations are affected).
> And we don't know whether it's a hardware or BIOS issue.
> 
> Older driver versions dealt with this by simply disabling ASPM in general.
> As a result all systems with a supported Realtek chip didn't reach higher
> package power-saving states, resulting in significantly reduced battery
> lifetime on notebooks.
> The network driver has no stake in dealing with the ASPM policies, this
> is handled by lower PCI layers.
> 
> Unfortunately we can't detect ASPM incompatibilities at runtime. Maybe
> we could build some heuristics based on rx_missed percentage, but it's
> not clear that ASPM issues always show the same symptoms.
> 
> So for now people with affected systems have to set a proper
> pcie_aspm.policy parameter.
> Just what is not clear to me is why pcie_aspm=off doesn't help.
> 
> @David:
> I assume you'll check with the affected user to test the ASPM policy
> parameter.

Unfortunately, we did not have any performace improvement when 
using both kernel parameters.

@Peter, thanks for the information.

regards,
David
> 
> Heiner
> 
> 
> On 31.01.2019 13:09, Peter Ceiley wrote:
> > Hi Heiner,
> > 
> > A quick update on my testing with different pcie_aspm settings:
> > 
> > pcie_aspm=off | no change
> > pcie_aspm.policy=default | no change
> > pcie_aspm.policy=performance | issue resolved
> > pcie_aspm.policy=powersave | issue resolved
> > pcie_aspm.policy=powersupersave | issue resolved
> > 
> > It seems the new driver does not play nicely with the default ASPM policy.
> > 
> > As requested, I've included an output of ethtool below when experiencing
> > the issue - note that no errors are recorded.
> > 
> > # ethtool -S enp3s0
> > NIC statistics:
> >      tx_packets: 2749
> >      rx_packets: 4089
> >      tx_errors: 0
> >      rx_errors: 0
> >      rx_missed: 0
> >      align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      unicast: 4078
> >      broadcast: 9
> >      multicast: 2
> >      tx_aborted: 0
> >      tx_underrun: 0
> > 
> > David, I hope this helps for your user as well. I appreciate you sharing
> > the bug ticket - thanks.
> > 
> > Heiner, thanks very much for your help to date.
> > 
> > Regards,
> > 
> > Peter.
> > 
> > On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
> >>
> >> Hi Heiner,
> >>
> >> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> >>> Hi David, two more things:
> >>>
> >>> 1. Could you please test a recent linux-next kernel?
> >>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >>>    and compare them.
> >>
> >> I'm sorry that I do not have the issue machine handy. I would ask
> >> our user to do the test. Thanks!
> >>
> >> Regards,
> >> David
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 31.01.2019 07:21, Heiner Kallweit wrote:
> >>>> David, thanks for the link to the bug ticket.
> >>>> I think only a proper bisect can help to find the offending commit.
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 31.01.2019 03:32, David Chang wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We had a similr case here.
> >>>>> - Realtek r8169 receive performance regression in kernel 4.19
> >>>>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>>>>
> >>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >>>>> The major symptom is there are many rx_missed count.
> >>>>>
> >>>>>
> >>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>>>>
> >>>>> We will give it a try later.
> >>>>>
> >>>>>> And please check with "ethtool -S <if>" whether the chip statistics
> >>>>>> show a significant number of errors.
> >>>>>>
> >>>>>> If this doesn't help you may have to bisect to find the offending commit.
> >>>>>
> >>>>> We had tried fallback driver to a few previous commits as following,
> >>>>> but with no luck.
> >>>>>
> >>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>>>>
> >>>>> Thanks,
> >>>>> David Chang
> >>>>>
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>>>>> and this made no difference.
> >>>>>>>
> >>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>>>>> confirm that this immediately resolved the issue and access to the NFS
> >>>>>>> shares operated as expected.
> >>>>>>>
> >>>>>>> I presume this means it is an issue with the r8169 driver included in
> >>>>>>> 4.19 onwards?
> >>>>>>>
> >>>>>>> To answer your last questions:
> >>>>>>>
> >>>>>>> Base Board Information
> >>>>>>>     Manufacturer: Alienware
> >>>>>>>     Product Name: 0PGRP5
> >>>>>>>     Version: A02
> >>>>>>>
> >>>>>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>>>>> network chip?
> >>>>>>>>
> >>>>>>>> Rgds, Heiner
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>>>>> Hi Heiner,
> >>>>>>>>>
> >>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>>>>> a good idea.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Peter,
> >>>>>>>>>>
> >>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>>>>> What you could do:
> >>>>>>>>>>
> >>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>>>>
> >>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>>>>
> >>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>>>>
> >>>>>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>>>>> elsewhere in the network subsystem?
> >>>>>>>>>>
> >>>>>>>>>> Heiner
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>>>>> Hi Heiner,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for getting back to me.
> >>>>>>>>>>>
> >>>>>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>>>>
> >>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>>>>
> >>>>>>>>>>>     netstat -s |grep retransmitted
> >>>>>>>>>>>
> >>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>>>>> 4.19.18:
> >>>>>>>>>>>
> >>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>>>>> the following:
> >>>>>>>>>>>     real    0m19.867s
> >>>>>>>>>>>     user    0m0.012s
> >>>>>>>>>>>     sys    0m0.036s
> >>>>>>>>>>>
> >>>>>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>>>>     real    0m0.300s
> >>>>>>>>>>>     user    0m0.004s
> >>>>>>>>>>>     sys    0m0.007s
> >>>>>>>>>>>
> >>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>>>>
> >>>>>>>>>>> dmesg XID:
> >>>>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>>>>
> >>>>>>>>>>> # lspci -vv
> >>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>>>>> <512ns, L1 <64us
> >>>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>>>>> OBFF Disabled
> >>>>>>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>>>>         Not readable
> >>>>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>>>>>         Status:    InProgress-
> >>>>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>>>>         Max snoop latency: 71680ns
> >>>>>>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>>>>>     Kernel driver in use: r8169
> >>>>>>>>>>>     Kernel modules: r8169
> >>>>>>>>>>>
> >>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> Peter.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>>>>> situation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> lshw shows:
> >>>>>>>>>>>>>        description: Ethernet interface
> >>>>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>>>>        physical id: 0
> >>>>>>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>>>>>        logical name: enp3s0
> >>>>>>>>>>>>>        version: 0c
> >>>>>>>>>>>>>        serial:
> >>>>>>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>>>>>        width: 64 bits
> >>>>>>>>>>>>>        clock: 33MHz
> >>>>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind Regards,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Peter.
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>
> >>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Can you provide any measurements?
> >>>>>>>>>>>> - iperf results before and after
> >>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Heiner
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> > 
> 
> 

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: David Chang @ 2019-02-01  4:29 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Peter Ceiley, Realtek linux nic maintainers, netdev,
	Martti Laaksonen
In-Reply-To: <4d832c16-8830-b746-a818-6026c2e6725c@gmail.com>

On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> Hi David, two more things:
> 
> 1. Could you please test a recent linux-next kernel?

Not tested yet. Will do if possible.

> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>    and compare them.

For your informaiton.

[with pcie_aspm=off]
--- v4.18.15	2019-02-01 12:11:56.019051828 +0800
+++ v4.9.11	2019-02-01 12:12:26.827439645 +0800
@@ -3,18 +3,19 @@
 Offset          Values
 ------          ------
 0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010:         00 10 38 0e 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020:         00 f0 9b f6 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010:         00 f0 ba 0d 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020:         00 d0 35 f7 03 00 00 00 00 00 00 00 00 00 00 00
 0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040:         80 0f 10 57 0e cf 02 00 00 cf ba 34 00 00 00 00
-0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040:         80 0f 10 57 0e cf 02 00 00 d8 c7 50 00 00 00 00
+0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
 0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 76 d0
 0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
 0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0:         e1 20 51 51 00 30 94 f6 03 00 00 00 27 00 00 00
+0x00e0:         e1 20 51 51 00 e0 35 f7 03 00 00 00 27 00 00 00
 0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00

[pcie_aspm.policy=performance]
--- v4.18.15-p	2019-02-01 12:18:46.919221060 +0800
+++ v4.9.11-p	2019-02-01 12:19:09.207474824 +0800
@@ -3,18 +3,19 @@
 Offset          Values
 ------          ------
 0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010:         00 f0 bc 0d 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020:         00 60 2e f7 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010:         00 c0 22 09 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020:         00 f0 e5 f4 03 00 00 00 00 00 00 00 00 00 00 00
 0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040:         80 0f 10 57 0e cf 02 00 00 53 50 1a 00 00 00 00
-0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040:         80 0f 10 57 0e cf 02 00 00 d2 35 7b 00 00 00 00
+0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
 0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 a4 a0
 0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
 0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0:         e1 20 51 51 00 70 2e f7 03 00 00 00 27 00 00 00
+0x00e0:         e1 20 51 51 00 00 e6 f4 03 00 00 00 27 00 00 00
 0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00

Thanks,
David

> Heiner
> 
> 
> On 31.01.2019 07:21, Heiner Kallweit wrote:
> > David, thanks for the link to the bug ticket.
> > I think only a proper bisect can help to find the offending commit.
> > 
> > Heiner
> > 
> > 
> > On 31.01.2019 03:32, David Chang wrote:
> >> Hi,
> >>
> >> We had a similr case here.
> >> - Realtek r8169 receive performance regression in kernel 4.19
> >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>
> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >> The major symptom is there are many rx_missed count.
> >>
> >>
> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>> Hi Peter,
> >>>
> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>
> >> We will give it a try later.
> >>
> >>> And please check with "ethtool -S <if>" whether the chip statistics
> >>> show a significant number of errors.
> >>>
> >>> If this doesn't help you may have to bisect to find the offending commit.
> >>
> >> We had tried fallback driver to a few previous commits as following,
> >> but with no luck.
> >>
> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>
> >> Thanks,
> >> David Chang
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>> Hi Heiner,
> >>>>
> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>> and this made no difference.
> >>>>
> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>> confirm that this immediately resolved the issue and access to the NFS
> >>>> shares operated as expected.
> >>>>
> >>>> I presume this means it is an issue with the r8169 driver included in
> >>>> 4.19 onwards?
> >>>>
> >>>> To answer your last questions:
> >>>>
> >>>> Base Board Information
> >>>>     Manufacturer: Alienware
> >>>>     Product Name: 0PGRP5
> >>>>     Version: A02
> >>>>
> >>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Peter.
> >>>>
> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>
> >>>>> Hi Peter,
> >>>>>
> >>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>> network chip?
> >>>>>
> >>>>> Rgds, Heiner
> >>>>>
> >>>>>
> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>> a good idea.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Peter.
> >>>>>>
> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>> What you could do:
> >>>>>>>
> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>
> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>
> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>
> >>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>> elsewhere in the network subsystem?
> >>>>>>>
> >>>>>>> Heiner
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>> Hi Heiner,
> >>>>>>>>
> >>>>>>>> Thanks for getting back to me.
> >>>>>>>>
> >>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>
> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>
> >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>
> >>>>>>>>     netstat -s |grep retransmitted
> >>>>>>>>
> >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>> 4.19.18:
> >>>>>>>>
> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>> the following:
> >>>>>>>>     real    0m19.867s
> >>>>>>>>     user    0m0.012s
> >>>>>>>>     sys    0m0.036s
> >>>>>>>>
> >>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>     real    0m0.300s
> >>>>>>>>     user    0m0.004s
> >>>>>>>>     sys    0m0.007s
> >>>>>>>>
> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>
> >>>>>>>> dmesg XID:
> >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>
> >>>>>>>> # lspci -vv
> >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>> <512ns, L1 <64us
> >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>> OBFF Disabled
> >>>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>         Not readable
> >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>>         Status:    InProgress-
> >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>         Max snoop latency: 71680ns
> >>>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>>     Kernel driver in use: r8169
> >>>>>>>>     Kernel modules: r8169
> >>>>>>>>
> >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>
> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>
> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>> situation.
> >>>>>>>>>>
> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>
> >>>>>>>>>> lshw shows:
> >>>>>>>>>>        description: Ethernet interface
> >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>        physical id: 0
> >>>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>>        logical name: enp3s0
> >>>>>>>>>>        version: 0c
> >>>>>>>>>>        serial:
> >>>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>>        width: 64 bits
> >>>>>>>>>>        clock: 33MHz
> >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>
> >>>>>>>>>> Kind Regards,
> >>>>>>>>>>
> >>>>>>>>>> Peter.
> >>>>>>>>>>
> >>>>>>>>> Hi Peter,
> >>>>>>>>>
> >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>
> >>>>>>>>> - Can you provide any measurements?
> >>>>>>>>> - iperf results before and after
> >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>
> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>
> >>>>>>>>> Heiner
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox