Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 19/29] samples/bpf: Make samples more libbpf-centric
From: Arnaldo Carvalho de Melo @ 2016-12-20 17:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Joe Stringer, Alexei Starovoitov, Daniel Borkmann,
	Wang Nan, netdev, Arnaldo Carvalho de Melo
In-Reply-To: <20161220170358.4350-1-acme@kernel.org>

From: Joe Stringer <joe@ovn.org>

Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
  # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
  [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
  [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
  [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
  # ./offwaketime | head -5
  swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
  CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
  Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
  JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
  #

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 samples/bpf/bpf_load.c                            | 17 +++++++++---
 samples/bpf/bpf_load.h                            |  3 +++
 samples/bpf/fds_example.c                         |  9 ++++---
 samples/bpf/lathist_user.c                        |  2 +-
 samples/bpf/libbpf.c                              | 23 ++++++++--------
 samples/bpf/libbpf.h                              | 18 ++++++-------
 samples/bpf/lwt_len_hist_user.c                   |  6 +++--
 samples/bpf/offwaketime_user.c                    |  8 +++---
 samples/bpf/sampleip_user.c                       |  4 +--
 samples/bpf/sock_example.c                        | 12 +++++----
 samples/bpf/sockex1_user.c                        |  6 ++---
 samples/bpf/sockex2_user.c                        |  4 +--
 samples/bpf/sockex3_user.c                        |  4 +--
 samples/bpf/spintest_user.c                       |  8 +++---
 samples/bpf/tc_l2_redirect_user.c                 |  4 +--
 samples/bpf/test_cgrp2_array_pin.c                |  4 +--
 samples/bpf/test_cgrp2_attach.c                   | 11 +++++---
 samples/bpf/test_cgrp2_attach2.c                  |  7 +++--
 samples/bpf/test_cgrp2_sock.c                     |  6 +++--
 samples/bpf/test_current_task_under_cgroup_user.c |  8 +++---
 samples/bpf/test_lru_dist.c                       | 32 +++++++++++------------
 samples/bpf/test_probe_write_user_user.c          |  2 +-
 samples/bpf/trace_event_user.c                    | 14 +++++-----
 samples/bpf/trace_output_user.c                   |  2 +-
 samples/bpf/tracex2_user.c                        | 10 +++----
 samples/bpf/tracex3_user.c                        |  4 +--
 samples/bpf/tracex4_user.c                        |  4 +--
 samples/bpf/tracex6_user.c                        |  2 +-
 samples/bpf/xdp1_user.c                           |  2 +-
 samples/bpf/xdp_tx_iptunnel_user.c                |  6 ++---
 30 files changed, 133 insertions(+), 109 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index e30b6de94f2e..f5b186c46b7c 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -22,7 +22,6 @@
 #include <poll.h>
 #include <ctype.h>
 #include "libbpf.h"
-#include "bpf_helpers.h"
 #include "bpf_load.h"
 
 #define DEBUGFS "/sys/kernel/debug/tracing/"
@@ -30,17 +29,26 @@
 static char license[128];
 static int kern_version;
 static bool processed_sec[128];
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
 int map_fd[MAX_MAPS];
 int prog_fd[MAX_PROGS];
 int event_fd[MAX_PROGS];
 int prog_cnt;
 int prog_array_fd = -1;
 
+struct bpf_map_def {
+	unsigned int type;
+	unsigned int key_size;
+	unsigned int value_size;
+	unsigned int max_entries;
+	unsigned int map_flags;
+};
+
 static int populate_prog_array(const char *event, int prog_fd)
 {
 	int ind = atoi(event), err;
 
-	err = bpf_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
+	err = bpf_map_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
 	if (err < 0) {
 		printf("failed to store prog_fd in prog_array\n");
 		return -1;
@@ -87,9 +95,10 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		return -1;
 	}
 
-	fd = bpf_prog_load(prog_type, prog, size, license, kern_version);
+	fd = bpf_load_program(prog_type, prog, size, license, kern_version,
+			      bpf_log_buf, BPF_LOG_BUF_SIZE);
 	if (fd < 0) {
-		printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
+		printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
 		return -1;
 	}
 
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index fb46a421ab41..c827827299b3 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -1,12 +1,15 @@
 #ifndef __BPF_LOAD_H
 #define __BPF_LOAD_H
 
+#include "libbpf.h"
+
 #define MAX_MAPS 32
 #define MAX_PROGS 32
 
 extern int map_fd[MAX_MAPS];
 extern int prog_fd[MAX_PROGS];
 extern int event_fd[MAX_PROGS];
+extern char bpf_log_buf[BPF_LOG_BUF_SIZE];
 extern int prog_cnt;
 
 /* parses elf file compiled by llvm .c->.o
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 625e797be6ef..8a4fc4ef3993 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -58,8 +58,9 @@ static int bpf_prog_create(const char *object)
 		assert(!load_bpf_file((char *)object));
 		return prog_fd[0];
 	} else {
-		return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
-				     insns, sizeof(insns), "GPL", 0);
+		return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER,
+					insns, sizeof(insns), "GPL", 0,
+					bpf_log_buf, BPF_LOG_BUF_SIZE);
 	}
 }
 
@@ -83,12 +84,12 @@ static int bpf_do_map(const char *file, uint32_t flags, uint32_t key,
 	}
 
 	if ((flags & BPF_F_KEY_VAL) == BPF_F_KEY_VAL) {
-		ret = bpf_update_elem(fd, &key, &value, 0);
+		ret = bpf_map_update_elem(fd, &key, &value, 0);
 		printf("bpf: fd:%d u->(%u:%u) ret:(%d,%s)\n", fd, key, value,
 		       ret, strerror(errno));
 		assert(ret == 0);
 	} else if (flags & BPF_F_KEY) {
-		ret = bpf_lookup_elem(fd, &key, &value);
+		ret = bpf_map_lookup_elem(fd, &key, &value);
 		printf("bpf: fd:%d l->(%u):%u ret:(%d,%s)\n", fd, key, value,
 		       ret, strerror(errno));
 		assert(ret == 0);
diff --git a/samples/bpf/lathist_user.c b/samples/bpf/lathist_user.c
index 65da8c1576de..6477bad5b4e2 100644
--- a/samples/bpf/lathist_user.c
+++ b/samples/bpf/lathist_user.c
@@ -73,7 +73,7 @@ static void get_data(int fd)
 	for (c = 0; c < MAX_CPU; c++) {
 		for (i = 0; i < MAX_ENTRIES; i++) {
 			key = c * MAX_ENTRIES + i;
-			bpf_lookup_elem(fd, &key, &value);
+			bpf_map_lookup_elem(fd, &key, &value);
 
 			cpu_hist[c].data[i] = value;
 			if (value > cpu_hist[c].max)
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 9ce707bf02a7..6f076abdca35 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -32,7 +32,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
 	return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
 }
 
-int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
+int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags)
 {
 	union bpf_attr attr = {
 		.map_fd = fd,
@@ -44,7 +44,7 @@ int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
 	return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
 }
 
-int bpf_lookup_elem(int fd, void *key, void *value)
+int bpf_map_lookup_elem(int fd, void *key, void *value)
 {
 	union bpf_attr attr = {
 		.map_fd = fd,
@@ -55,7 +55,7 @@ int bpf_lookup_elem(int fd, void *key, void *value)
 	return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
 }
 
-int bpf_delete_elem(int fd, void *key)
+int bpf_map_delete_elem(int fd, void *key)
 {
 	union bpf_attr attr = {
 		.map_fd = fd,
@@ -65,7 +65,7 @@ int bpf_delete_elem(int fd, void *key)
 	return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
 }
 
-int bpf_get_next_key(int fd, void *key, void *next_key)
+int bpf_map_get_next_key(int fd, void *key, void *next_key)
 {
 	union bpf_attr attr = {
 		.map_fd = fd,
@@ -78,19 +78,18 @@ int bpf_get_next_key(int fd, void *key, void *next_key)
 
 #define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
 
-char bpf_log_buf[LOG_BUF_SIZE];
-
-int bpf_prog_load(enum bpf_prog_type prog_type,
-		  const struct bpf_insn *insns, int prog_len,
-		  const char *license, int kern_version)
+int bpf_load_program(enum bpf_prog_type prog_type,
+		     const struct bpf_insn *insns, int prog_len,
+		     const char *license, int kern_version,
+		     char *log_buf, size_t log_buf_sz)
 {
 	union bpf_attr attr = {
 		.prog_type = prog_type,
 		.insns = ptr_to_u64((void *) insns),
 		.insn_cnt = prog_len / sizeof(struct bpf_insn),
 		.license = ptr_to_u64((void *) license),
-		.log_buf = ptr_to_u64(bpf_log_buf),
-		.log_size = LOG_BUF_SIZE,
+		.log_buf = ptr_to_u64(log_buf),
+		.log_size = log_buf_sz,
 		.log_level = 1,
 	};
 
@@ -99,7 +98,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	 */
 	attr.kern_version = kern_version;
 
-	bpf_log_buf[0] = 0;
+	log_buf[0] = 0;
 
 	return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
 }
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 94a901d86fc2..20e3457857ca 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -6,14 +6,15 @@ struct bpf_insn;
 
 int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
 		   int max_entries, int map_flags);
-int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
-int bpf_lookup_elem(int fd, void *key, void *value);
-int bpf_delete_elem(int fd, void *key);
-int bpf_get_next_key(int fd, void *key, void *next_key);
+int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags);
+int bpf_map_lookup_elem(int fd, void *key, void *value);
+int bpf_map_delete_elem(int fd, void *key);
+int bpf_map_get_next_key(int fd, void *key, void *next_key);
 
-int bpf_prog_load(enum bpf_prog_type prog_type,
-		  const struct bpf_insn *insns, int insn_len,
-		  const char *license, int kern_version);
+int bpf_load_program(enum bpf_prog_type prog_type,
+		     const struct bpf_insn *insns, int insn_len,
+		     const char *license, int kern_version,
+		     char *log_buf, size_t log_buf_sz);
 
 int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
 int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
@@ -21,8 +22,7 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
 
-#define LOG_BUF_SIZE (256 * 1024)
-extern char bpf_log_buf[LOG_BUF_SIZE];
+#define BPF_LOG_BUF_SIZE (256 * 1024)
 
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
 
diff --git a/samples/bpf/lwt_len_hist_user.c b/samples/bpf/lwt_len_hist_user.c
index 05d783fc5daf..ec8f3bbcbef3 100644
--- a/samples/bpf/lwt_len_hist_user.c
+++ b/samples/bpf/lwt_len_hist_user.c
@@ -14,6 +14,8 @@
 #define MAX_INDEX 64
 #define MAX_STARS 38
 
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
 static void stars(char *str, long val, long max, int width)
 {
 	int i;
@@ -41,13 +43,13 @@ int main(int argc, char **argv)
 		return -1;
 	}
 
-	while (bpf_get_next_key(map_fd, &key, &next_key) == 0) {
+	while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) {
 		if (next_key >= MAX_INDEX) {
 			fprintf(stderr, "Key %lu out of bounds\n", next_key);
 			continue;
 		}
 
-		bpf_lookup_elem(map_fd, &next_key, values);
+		bpf_map_lookup_elem(map_fd, &next_key, values);
 
 		sum = 0;
 		for (i = 0; i < nr_cpus; i++)
diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c
index 6f002a9c24fa..9cce2a66bd66 100644
--- a/samples/bpf/offwaketime_user.c
+++ b/samples/bpf/offwaketime_user.c
@@ -49,14 +49,14 @@ static void print_stack(struct key_t *key, __u64 count)
 	int i;
 
 	printf("%s;", key->target);
-	if (bpf_lookup_elem(map_fd[3], &key->tret, ip) != 0) {
+	if (bpf_map_lookup_elem(map_fd[3], &key->tret, ip) != 0) {
 		printf("---;");
 	} else {
 		for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
 			print_ksym(ip[i]);
 	}
 	printf("-;");
-	if (bpf_lookup_elem(map_fd[3], &key->wret, ip) != 0) {
+	if (bpf_map_lookup_elem(map_fd[3], &key->wret, ip) != 0) {
 		printf("---;");
 	} else {
 		for (i = 0; i < PERF_MAX_STACK_DEPTH; i++)
@@ -77,8 +77,8 @@ static void print_stacks(int fd)
 	struct key_t key = {}, next_key;
 	__u64 value;
 
-	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
-		bpf_lookup_elem(fd, &next_key, &value);
+	while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+		bpf_map_lookup_elem(fd, &next_key, &value);
 		print_stack(&next_key, value);
 		key = next_key;
 	}
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 260a6bdd6413..5ac5adf75931 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -95,8 +95,8 @@ static void print_ip_map(int fd)
 
 	/* fetch IPs and counts */
 	key = 0, i = 0;
-	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
-		bpf_lookup_elem(fd, &next_key, &value);
+	while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+		bpf_map_lookup_elem(fd, &next_key, &value);
 		counts[i].ip = next_key;
 		counts[i++].count = value;
 		key = next_key;
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 28b60baa9fa8..d6b91e9a38ad 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -28,6 +28,8 @@
 #include <stddef.h>
 #include "libbpf.h"
 
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
 static int test_sock(void)
 {
 	int sock = -1, map_fd, prog_fd, i, key;
@@ -55,8 +57,8 @@ static int test_sock(void)
 		BPF_EXIT_INSN(),
 	};
 
-	prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
-				"GPL", 0);
+	prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
+				   "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
 	if (prog_fd < 0) {
 		printf("failed to load prog '%s'\n", strerror(errno));
 		goto cleanup;
@@ -72,13 +74,13 @@ static int test_sock(void)
 
 	for (i = 0; i < 10; i++) {
 		key = IPPROTO_TCP;
-		assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
 
 		key = IPPROTO_UDP;
-		assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &udp_cnt) == 0);
 
 		key = IPPROTO_ICMP;
-		assert(bpf_lookup_elem(map_fd, &key, &icmp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &icmp_cnt) == 0);
 
 		printf("TCP %lld UDP %lld ICMP %lld packets\n",
 		       tcp_cnt, udp_cnt, icmp_cnt);
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 678ce4693551..9454448bf198 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -32,13 +32,13 @@ int main(int ac, char **argv)
 		int key;
 
 		key = IPPROTO_TCP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0);
 
 		key = IPPROTO_UDP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &udp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd[0], &key, &udp_cnt) == 0);
 
 		key = IPPROTO_ICMP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &icmp_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd[0], &key, &icmp_cnt) == 0);
 
 		printf("TCP %lld UDP %lld ICMP %lld bytes\n",
 		       tcp_cnt, udp_cnt, icmp_cnt);
diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c
index 8a4085c2d117..6a40600d5a83 100644
--- a/samples/bpf/sockex2_user.c
+++ b/samples/bpf/sockex2_user.c
@@ -39,8 +39,8 @@ int main(int ac, char **argv)
 		int key = 0, next_key;
 		struct pair value;
 
-		while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
-			bpf_lookup_elem(map_fd[0], &next_key, &value);
+		while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+			bpf_map_lookup_elem(map_fd[0], &next_key, &value);
 			printf("ip %s bytes %lld packets %lld\n",
 			       inet_ntoa((struct in_addr){htonl(next_key)}),
 			       value.bytes, value.packets);
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 3fcfd8c4b2a3..9099c4255f23 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -54,8 +54,8 @@ int main(int argc, char **argv)
 
 		sleep(1);
 		printf("IP     src.port -> dst.port               bytes      packets\n");
-		while (bpf_get_next_key(map_fd[2], &key, &next_key) == 0) {
-			bpf_lookup_elem(map_fd[2], &next_key, &value);
+		while (bpf_map_get_next_key(map_fd[2], &key, &next_key) == 0) {
+			bpf_map_lookup_elem(map_fd[2], &next_key, &value);
 			printf("%s.%05d -> %s.%05d %12lld %12lld\n",
 			       inet_ntoa((struct in_addr){htonl(next_key.src)}),
 			       next_key.port16[0],
diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c
index 311ede532230..80676c25fa50 100644
--- a/samples/bpf/spintest_user.c
+++ b/samples/bpf/spintest_user.c
@@ -31,8 +31,8 @@ int main(int ac, char **argv)
 	for (i = 0; i < 5; i++) {
 		key = 0;
 		printf("kprobing funcs:");
-		while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
-			bpf_lookup_elem(map_fd[0], &next_key, &value);
+		while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+			bpf_map_lookup_elem(map_fd[0], &next_key, &value);
 			assert(next_key == value);
 			sym = ksym_search(value);
 			printf(" %s", sym->name);
@@ -41,8 +41,8 @@ int main(int ac, char **argv)
 		if (key)
 			printf("\n");
 		key = 0;
-		while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0)
-			bpf_delete_elem(map_fd[0], &next_key);
+		while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0)
+			bpf_map_delete_elem(map_fd[0], &next_key);
 		sleep(1);
 	}
 
diff --git a/samples/bpf/tc_l2_redirect_user.c b/samples/bpf/tc_l2_redirect_user.c
index 4013c5337b91..28995a776560 100644
--- a/samples/bpf/tc_l2_redirect_user.c
+++ b/samples/bpf/tc_l2_redirect_user.c
@@ -60,9 +60,9 @@ int main(int argc, char **argv)
 	}
 
 	/* bpf_tunnel_key.remote_ipv4 expects host byte orders */
-	ret = bpf_update_elem(array_fd, &array_key, &ifindex, 0);
+	ret = bpf_map_update_elem(array_fd, &array_key, &ifindex, 0);
 	if (ret) {
-		perror("bpf_update_elem");
+		perror("bpf_map_update_elem");
 		goto out;
 	}
 
diff --git a/samples/bpf/test_cgrp2_array_pin.c b/samples/bpf/test_cgrp2_array_pin.c
index 70e86f7be69d..8a1b8b5d8def 100644
--- a/samples/bpf/test_cgrp2_array_pin.c
+++ b/samples/bpf/test_cgrp2_array_pin.c
@@ -85,9 +85,9 @@ int main(int argc, char **argv)
 		}
 	}
 
-	ret = bpf_update_elem(array_fd, &array_key, &cg2_fd, 0);
+	ret = bpf_map_update_elem(array_fd, &array_key, &cg2_fd, 0);
 	if (ret) {
-		perror("bpf_update_elem");
+		perror("bpf_map_update_elem");
 		goto out;
 	}
 
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index a19484c45b79..8283ef86d392 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -36,6 +36,8 @@ enum {
 	MAP_KEY_BYTES,
 };
 
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
 static int prog_load(int map_fd, int verdict)
 {
 	struct bpf_insn prog[] = {
@@ -67,8 +69,9 @@ static int prog_load(int map_fd, int verdict)
 		BPF_EXIT_INSN(),
 	};
 
-	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB,
-			     prog, sizeof(prog), "GPL", 0);
+	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
+				prog, sizeof(prog), "GPL", 0,
+				bpf_log_buf, BPF_LOG_BUF_SIZE);
 }
 
 static int usage(const char *argv0)
@@ -108,10 +111,10 @@ static int attach_filter(int cg_fd, int type, int verdict)
 	}
 	while (1) {
 		key = MAP_KEY_PACKETS;
-		assert(bpf_lookup_elem(map_fd, &key, &pkt_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &pkt_cnt) == 0);
 
 		key = MAP_KEY_BYTES;
-		assert(bpf_lookup_elem(map_fd, &key, &byte_cnt) == 0);
+		assert(bpf_map_lookup_elem(map_fd, &key, &byte_cnt) == 0);
 
 		printf("cgroup received %lld packets, %lld bytes\n",
 		       pkt_cnt, byte_cnt);
diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c
index ddfac42ed4df..fc6092fdc3b0 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/samples/bpf/test_cgrp2_attach2.c
@@ -32,6 +32,8 @@
 #define BAR		"/foo/bar/"
 #define PING_CMD	"ping -c1 -w1 127.0.0.1"
 
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
 static int prog_load(int verdict)
 {
 	int ret;
@@ -40,8 +42,9 @@ static int prog_load(int verdict)
 		BPF_EXIT_INSN(),
 	};
 
-	ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB,
-			     prog, sizeof(prog), "GPL", 0);
+	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
+			       prog, sizeof(prog), "GPL", 0,
+			       bpf_log_buf, BPF_LOG_BUF_SIZE);
 
 	if (ret < 0) {
 		log_err("Loading program");
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index d467b3c1c55c..43b4bde5d05c 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -23,6 +23,8 @@
 
 #include "libbpf.h"
 
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
 static int prog_load(int idx)
 {
 	struct bpf_insn prog[] = {
@@ -34,8 +36,8 @@ static int prog_load(int idx)
 		BPF_EXIT_INSN(),
 	};
 
-	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
-			     "GPL", 0);
+	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
 }
 
 static int usage(const char *argv0)
diff --git a/samples/bpf/test_current_task_under_cgroup_user.c b/samples/bpf/test_current_task_under_cgroup_user.c
index 95aaaa846130..65b5fb51c1db 100644
--- a/samples/bpf/test_current_task_under_cgroup_user.c
+++ b/samples/bpf/test_current_task_under_cgroup_user.c
@@ -36,7 +36,7 @@ int main(int argc, char **argv)
 	if (!cg2)
 		goto err;
 
-	if (bpf_update_elem(map_fd[0], &idx, &cg2, BPF_ANY)) {
+	if (bpf_map_update_elem(map_fd[0], &idx, &cg2, BPF_ANY)) {
 		log_err("Adding target cgroup to map");
 		goto err;
 	}
@@ -50,7 +50,7 @@ int main(int argc, char **argv)
 	 */
 
 	sync();
-	bpf_lookup_elem(map_fd[1], &idx, &remote_pid);
+	bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
 
 	if (local_pid != remote_pid) {
 		fprintf(stderr,
@@ -64,10 +64,10 @@ int main(int argc, char **argv)
 		goto err;
 
 	remote_pid = 0;
-	bpf_update_elem(map_fd[1], &idx, &remote_pid, BPF_ANY);
+	bpf_map_update_elem(map_fd[1], &idx, &remote_pid, BPF_ANY);
 
 	sync();
-	bpf_lookup_elem(map_fd[1], &idx, &remote_pid);
+	bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
 
 	if (local_pid == remote_pid) {
 		fprintf(stderr, "BPF cgroup negative test did not work\n");
diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 316230a0ed23..d96dc88d3b04 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -134,7 +134,7 @@ static int pfect_lru_lookup_or_insert(struct pfect_lru *lru,
 	int seen = 0;
 
 	lru->total++;
-	if (!bpf_lookup_elem(lru->map_fd, &key, &node)) {
+	if (!bpf_map_lookup_elem(lru->map_fd, &key, &node)) {
 		if (node) {
 			list_move(&node->list, &lru->list);
 			return 1;
@@ -151,7 +151,7 @@ static int pfect_lru_lookup_or_insert(struct pfect_lru *lru,
 		node = list_last_entry(&lru->list,
 				       struct pfect_lru_node,
 				       list);
-		bpf_update_elem(lru->map_fd, &node->key, &null_node, BPF_EXIST);
+		bpf_map_update_elem(lru->map_fd, &node->key, &null_node, BPF_EXIST);
 	}
 
 	node->key = key;
@@ -159,10 +159,10 @@ static int pfect_lru_lookup_or_insert(struct pfect_lru *lru,
 
 	lru->nr_misses++;
 	if (seen) {
-		assert(!bpf_update_elem(lru->map_fd, &key, &node, BPF_EXIST));
+		assert(!bpf_map_update_elem(lru->map_fd, &key, &node, BPF_EXIST));
 	} else {
 		lru->nr_unique++;
-		assert(!bpf_update_elem(lru->map_fd, &key, &node, BPF_NOEXIST));
+		assert(!bpf_map_update_elem(lru->map_fd, &key, &node, BPF_NOEXIST));
 	}
 
 	return seen;
@@ -285,11 +285,11 @@ static void do_test_lru_dist(int task, void *data)
 
 		pfect_lru_lookup_or_insert(&pfect_lru, key);
 
-		if (!bpf_lookup_elem(lru_map_fd, &key, &value))
+		if (!bpf_map_lookup_elem(lru_map_fd, &key, &value))
 			continue;
 
-		if (bpf_update_elem(lru_map_fd, &key, &value, BPF_NOEXIST)) {
-			printf("bpf_update_elem(lru_map_fd, %llu): errno:%d\n",
+		if (bpf_map_update_elem(lru_map_fd, &key, &value, BPF_NOEXIST)) {
+			printf("bpf_map_update_elem(lru_map_fd, %llu): errno:%d\n",
 			       key, errno);
 			assert(0);
 		}
@@ -358,19 +358,19 @@ static void test_lru_loss0(int map_type, int map_flags)
 	for (key = 1; key <= 1000; key++) {
 		int start_key, end_key;
 
-		assert(bpf_update_elem(map_fd, &key, value, BPF_NOEXIST) == 0);
+		assert(bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST) == 0);
 
 		start_key = 101;
 		end_key = min(key, 900);
 
 		while (start_key <= end_key) {
-			bpf_lookup_elem(map_fd, &start_key, value);
+			bpf_map_lookup_elem(map_fd, &start_key, value);
 			start_key++;
 		}
 	}
 
 	for (key = 1; key <= 1000; key++) {
-		if (bpf_lookup_elem(map_fd, &key, value)) {
+		if (bpf_map_lookup_elem(map_fd, &key, value)) {
 			if (key <= 100)
 				old_unused_losses++;
 			else if (key <= 900)
@@ -408,10 +408,10 @@ static void test_lru_loss1(int map_type, int map_flags)
 	value[0] = 1234;
 
 	for (key = 1; key <= 1000; key++)
-		assert(!bpf_update_elem(map_fd, &key, value, BPF_NOEXIST));
+		assert(!bpf_map_update_elem(map_fd, &key, value, BPF_NOEXIST));
 
 	for (key = 1; key <= 1000; key++) {
-		if (bpf_lookup_elem(map_fd, &key, value))
+		if (bpf_map_lookup_elem(map_fd, &key, value))
 			nr_losses++;
 	}
 
@@ -436,7 +436,7 @@ static void do_test_parallel_lru_loss(int task, void *data)
 	next_ins_key = stable_base;
 	value[0] = 1234;
 	for (i = 0; i < nr_stable_elems; i++) {
-		assert(bpf_update_elem(map_fd, &next_ins_key, value,
+		assert(bpf_map_update_elem(map_fd, &next_ins_key, value,
 				       BPF_NOEXIST) == 0);
 		next_ins_key++;
 	}
@@ -448,9 +448,9 @@ static void do_test_parallel_lru_loss(int task, void *data)
 
 		if (rn % 10) {
 			key = rn % nr_stable_elems + stable_base;
-			bpf_lookup_elem(map_fd, &key, value);
+			bpf_map_lookup_elem(map_fd, &key, value);
 		} else {
-			bpf_update_elem(map_fd, &next_ins_key, value,
+			bpf_map_update_elem(map_fd, &next_ins_key, value,
 					BPF_NOEXIST);
 			next_ins_key++;
 		}
@@ -458,7 +458,7 @@ static void do_test_parallel_lru_loss(int task, void *data)
 
 	key = stable_base;
 	for (i = 0; i < nr_stable_elems; i++) {
-		if (bpf_lookup_elem(map_fd, &key, value))
+		if (bpf_map_lookup_elem(map_fd, &key, value))
 			nr_losses++;
 		key++;
 	}
diff --git a/samples/bpf/test_probe_write_user_user.c b/samples/bpf/test_probe_write_user_user.c
index a44bf347bedd..b5bf178a6ecc 100644
--- a/samples/bpf/test_probe_write_user_user.c
+++ b/samples/bpf/test_probe_write_user_user.c
@@ -50,7 +50,7 @@ int main(int ac, char **argv)
 	mapped_addr_in->sin_port = htons(5555);
 	mapped_addr_in->sin_addr.s_addr = inet_addr("255.255.255.255");
 
-	assert(!bpf_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY));
+	assert(!bpf_map_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY));
 
 	assert(listen(serverfd, 5) == 0);
 
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 9a130d31ecf2..704fe9fa77b2 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -61,14 +61,14 @@ static void print_stack(struct key_t *key, __u64 count)
 	int i;
 
 	printf("%3lld %s;", count, key->comm);
-	if (bpf_lookup_elem(map_fd[1], &key->kernstack, ip) != 0) {
+	if (bpf_map_lookup_elem(map_fd[1], &key->kernstack, ip) != 0) {
 		printf("---;");
 	} else {
 		for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
 			print_ksym(ip[i]);
 	}
 	printf("-;");
-	if (bpf_lookup_elem(map_fd[1], &key->userstack, ip) != 0) {
+	if (bpf_map_lookup_elem(map_fd[1], &key->userstack, ip) != 0) {
 		printf("---;");
 	} else {
 		for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
@@ -98,10 +98,10 @@ static void print_stacks(void)
 	int fd = map_fd[0], stack_map = map_fd[1];
 
 	sys_read_seen = sys_write_seen = false;
-	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
-		bpf_lookup_elem(fd, &next_key, &value);
+	while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
+		bpf_map_lookup_elem(fd, &next_key, &value);
 		print_stack(&next_key, value);
-		bpf_delete_elem(fd, &next_key);
+		bpf_map_delete_elem(fd, &next_key);
 		key = next_key;
 	}
 
@@ -111,8 +111,8 @@ static void print_stacks(void)
 	}
 
 	/* clear stack map */
-	while (bpf_get_next_key(stack_map, &stackid, &next_id) == 0) {
-		bpf_delete_elem(stack_map, &next_id);
+	while (bpf_map_get_next_key(stack_map, &stackid, &next_id) == 0) {
+		bpf_map_delete_elem(stack_map, &next_id);
 		stackid = next_id;
 	}
 }
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index 661a7d052f2c..3bedd945def1 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -162,7 +162,7 @@ static void test_bpf_perf_event(void)
 	pmu_fd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
 
 	assert(pmu_fd >= 0);
-	assert(bpf_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
+	assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
 	ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
 }
 
diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c
index 3e225e331f66..ded9804c5034 100644
--- a/samples/bpf/tracex2_user.c
+++ b/samples/bpf/tracex2_user.c
@@ -48,12 +48,12 @@ static void print_hist_for_pid(int fd, void *task)
 	long max_value = 0;
 	int i, ind;
 
-	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
+	while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
 		if (memcmp(&next_key, task, SIZE)) {
 			key = next_key;
 			continue;
 		}
-		bpf_lookup_elem(fd, &next_key, values);
+		bpf_map_lookup_elem(fd, &next_key, values);
 		value = 0;
 		for (i = 0; i < nr_cpus; i++)
 			value += values[i];
@@ -83,7 +83,7 @@ static void print_hist(int fd)
 	int task_cnt = 0;
 	int i;
 
-	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
+	while (bpf_map_get_next_key(fd, &key, &next_key) == 0) {
 		int found = 0;
 
 		for (i = 0; i < task_cnt; i++)
@@ -136,8 +136,8 @@ int main(int ac, char **argv)
 
 	for (i = 0; i < 5; i++) {
 		key = 0;
-		while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
-			bpf_lookup_elem(map_fd[0], &next_key, &value);
+		while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+			bpf_map_lookup_elem(map_fd[0], &next_key, &value);
 			printf("location 0x%lx count %ld\n", next_key, value);
 			key = next_key;
 		}
diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
index d0851cb4fa8d..8f7d199d5945 100644
--- a/samples/bpf/tracex3_user.c
+++ b/samples/bpf/tracex3_user.c
@@ -28,7 +28,7 @@ static void clear_stats(int fd)
 
 	memset(values, 0, sizeof(values));
 	for (key = 0; key < SLOTS; key++)
-		bpf_update_elem(fd, &key, values, BPF_ANY);
+		bpf_map_update_elem(fd, &key, values, BPF_ANY);
 }
 
 const char *color[] = {
@@ -89,7 +89,7 @@ static void print_hist(int fd)
 	int i;
 
 	for (key = 0; key < SLOTS; key++) {
-		bpf_lookup_elem(fd, &key, values);
+		bpf_map_lookup_elem(fd, &key, values);
 		value = 0;
 		for (i = 0; i < nr_cpus; i++)
 			value += values[i];
diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
index bc4a3bdea6ed..03449f773cb1 100644
--- a/samples/bpf/tracex4_user.c
+++ b/samples/bpf/tracex4_user.c
@@ -37,8 +37,8 @@ static void print_old_objects(int fd)
 	key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
 
 	key = -1;
-	while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
-		bpf_lookup_elem(map_fd[0], &next_key, &v);
+	while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
+		bpf_map_lookup_elem(map_fd[0], &next_key, &v);
 		key = next_key;
 		if (val - v.val < 1000000000ll)
 			/* object was allocated more then 1 sec ago */
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index 8ea4976cfcf1..179297cb4d35 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -36,7 +36,7 @@ static void test_bpf_perf_event(void)
 			goto exit;
 		}
 
-		bpf_update_elem(map_fd[0], &i, &pmu_fd[i], BPF_ANY);
+		bpf_map_update_elem(map_fd[0], &i, &pmu_fd[i], BPF_ANY);
 		ioctl(pmu_fd[i], PERF_EVENT_IOC_ENABLE, 0);
 	}
 
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 5f040a0d7712..d2be65d1fd86 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -43,7 +43,7 @@ static void poll_stats(int interval)
 		for (key = 0; key < nr_keys; key++) {
 			__u64 sum = 0;
 
-			assert(bpf_lookup_elem(map_fd[0], &key, values) == 0);
+			assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0);
 			for (i = 0; i < nr_cpus; i++)
 				sum += (values[i] - prev[key][i]);
 			if (sum)
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index 7a71f5c74684..70e192fc61aa 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -51,7 +51,7 @@ static void poll_stats(unsigned int kill_after_s)
 		for (proto = 0; proto < nr_protos; proto++) {
 			__u64 sum = 0;
 
-			assert(bpf_lookup_elem(map_fd[0], &proto, values) == 0);
+			assert(bpf_map_lookup_elem(map_fd[0], &proto, values) == 0);
 			for (i = 0; i < nr_cpus; i++)
 				sum += (values[i] - prev[proto][i]);
 
@@ -237,8 +237,8 @@ int main(int argc, char **argv)
 
 	while (min_port <= max_port) {
 		vip.dport = htons(min_port++);
-		if (bpf_update_elem(map_fd[1], &vip, &tnl, BPF_NOEXIST)) {
-			perror("bpf_update_elem(&vip2tnl)");
+		if (bpf_map_update_elem(map_fd[1], &vip, &tnl, BPF_NOEXIST)) {
+			perror("bpf_map_update_elem(&vip2tnl)");
 			return 1;
 		}
 	}
-- 
2.9.3

^ permalink raw reply related

* [PATCH 25/29] samples/bpf: Switch over to libbpf
From: Arnaldo Carvalho de Melo @ 2016-12-20 17:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Joe Stringer, Alexei Starovoitov, Daniel Borkmann,
	Wang Nan, netdev, Arnaldo Carvalho de Melo
In-Reply-To: <20161220170358.4350-1-acme@kernel.org>

From: Joe Stringer <joe@ovn.org>

Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  <SNIP>
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    HOSTCC  scripts/basic/fixdep
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    HOSTCC  scripts/basic/bin2c
    HOSTCC  arch/x86/tools/relocs_32.o
    HOSTCC  arch/x86/tools/relocs_64.o
    LD      samples/bpf/built-in.o
  <SNIP>
    HOSTCC  samples/bpf/fds_example.o
    HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex2_user.o
  <SNIP>
    HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
	  -Wno-compare-distinct-pointer-types \
	  -Wno-gnu-variable-sized-type-not-at-end \
	  -Wno-address-of-packed-member -Wno-tautological-compare \
	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD  samples/bpf/tc_l2_redirect
  <SNIP>
    HOSTLD  samples/bpf/lwt_len_hist
    HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime         offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
    [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
    [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
    [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
    [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
    [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
    [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
    [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
    [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
  qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
  swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 samples/bpf/Makefile             |  68 +++++++++++++-----------
 samples/bpf/README.rst           |   4 +-
 samples/bpf/bpf_load.c           |   3 +-
 samples/bpf/fds_example.c        |   3 +-
 samples/bpf/libbpf.c             | 111 ---------------------------------------
 samples/bpf/libbpf.h             |  19 +------
 samples/bpf/sock_example.c       |   3 +-
 samples/bpf/test_cgrp2_attach.c  |   3 +-
 samples/bpf/test_cgrp2_attach2.c |   3 +-
 samples/bpf/test_cgrp2_sock.c    |   3 +-
 10 files changed, 52 insertions(+), 168 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f2219c1489e5..81b0ef2f7994 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -35,40 +35,43 @@ hostprogs-y += tc_l2_redirect
 hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 
-test_lru_dist-objs := test_lru_dist.o libbpf.o
-sock_example-objs := sock_example.o libbpf.o
-fds_example-objs := bpf_load.o libbpf.o fds_example.o
-sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
-sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
-sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
-tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
-tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
-tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
-tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
-tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
-tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
-test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
-trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
-lathist-objs := bpf_load.o libbpf.o lathist_user.o
-offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
-spintest-objs := bpf_load.o libbpf.o spintest_user.o
-map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
-test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
-test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
-test_cgrp2_attach2-objs := libbpf.o test_cgrp2_attach2.o cgroup_helpers.o
-test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
-test_cgrp2_sock2-objs := bpf_load.o libbpf.o test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# Libbpf dependencies
+LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+
+test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
+sock_example-objs := sock_example.o $(LIBBPF)
+fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
+sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
+sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
+sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
+tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
+tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
+tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
+tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
+tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
+tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
+trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
+lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
+offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
+spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
+map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
+test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
+test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
+test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
+test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o cgroup_helpers.o
+test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
+test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
+xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o libbpf.o cgroup_helpers.o \
+xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \
 				       test_current_task_under_cgroup_user.o
-trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
-sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
-tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
-lwt_len_hist-objs := bpf_load.o libbpf.o lwt_len_hist_user.o
-xdp_tx_iptunnel-objs := bpf_load.o libbpf.o xdp_tx_iptunnel_user.o
+trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
+sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
+tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
+lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
+xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -104,6 +107,7 @@ always += lwt_len_hist_kern.o
 always += xdp_tx_iptunnel_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
+HOSTCFLAGS += -I$(srctree)/tools/lib/
 HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index a43eae3f0551..79f9a58f1872 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -1,8 +1,8 @@
 eBPF sample programs
 ====================
 
-This directory contains a mini eBPF library, test stubs, verifier
-test-suite and examples for using eBPF.
+This directory contains a test stubs, verifier test-suite and examples
+for using eBPF. The examples use libbpf from tools/lib/bpf.
 
 Build dependencies
 ==================
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index f5b186c46b7c..1bfb43394013 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -66,6 +66,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
 	bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
 	bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
+	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
 	int fd, efd, err, id;
@@ -95,7 +96,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		return -1;
 	}
 
-	fd = bpf_load_program(prog_type, prog, size, license, kern_version,
+	fd = bpf_load_program(prog_type, prog, insns_cnt, license, kern_version,
 			      bpf_log_buf, BPF_LOG_BUF_SIZE);
 	if (fd < 0) {
 		printf("bpf_load_program() err=%d\n%s", errno, bpf_log_buf);
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 8a4fc4ef3993..6245062844d1 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -53,13 +53,14 @@ static int bpf_prog_create(const char *object)
 		BPF_MOV64_IMM(BPF_REG_0, 1),
 		BPF_EXIT_INSN(),
 	};
+	size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn);
 
 	if (object) {
 		assert(!load_bpf_file((char *)object));
 		return prog_fd[0];
 	} else {
 		return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER,
-					insns, sizeof(insns), "GPL", 0,
+					insns, insns_cnt, "GPL", 0,
 					bpf_log_buf, BPF_LOG_BUF_SIZE);
 	}
 }
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 6f076abdca35..3391225ad7e9 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -4,8 +4,6 @@
 #include <linux/unistd.h>
 #include <unistd.h>
 #include <string.h>
-#include <linux/netlink.h>
-#include <linux/bpf.h>
 #include <errno.h>
 #include <net/ethernet.h>
 #include <net/if.h>
@@ -13,96 +11,6 @@
 #include <arpa/inet.h>
 #include "libbpf.h"
 
-static __u64 ptr_to_u64(void *ptr)
-{
-	return (__u64) (unsigned long) ptr;
-}
-
-int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
-		   int max_entries, int map_flags)
-{
-	union bpf_attr attr = {
-		.map_type = map_type,
-		.key_size = key_size,
-		.value_size = value_size,
-		.max_entries = max_entries,
-		.map_flags = map_flags,
-	};
-
-	return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
-}
-
-int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags)
-{
-	union bpf_attr attr = {
-		.map_fd = fd,
-		.key = ptr_to_u64(key),
-		.value = ptr_to_u64(value),
-		.flags = flags,
-	};
-
-	return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_lookup_elem(int fd, void *key, void *value)
-{
-	union bpf_attr attr = {
-		.map_fd = fd,
-		.key = ptr_to_u64(key),
-		.value = ptr_to_u64(value),
-	};
-
-	return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_delete_elem(int fd, void *key)
-{
-	union bpf_attr attr = {
-		.map_fd = fd,
-		.key = ptr_to_u64(key),
-	};
-
-	return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
-}
-
-int bpf_map_get_next_key(int fd, void *key, void *next_key)
-{
-	union bpf_attr attr = {
-		.map_fd = fd,
-		.key = ptr_to_u64(key),
-		.next_key = ptr_to_u64(next_key),
-	};
-
-	return syscall(__NR_bpf, BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
-}
-
-#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
-
-int bpf_load_program(enum bpf_prog_type prog_type,
-		     const struct bpf_insn *insns, int prog_len,
-		     const char *license, int kern_version,
-		     char *log_buf, size_t log_buf_sz)
-{
-	union bpf_attr attr = {
-		.prog_type = prog_type,
-		.insns = ptr_to_u64((void *) insns),
-		.insn_cnt = prog_len / sizeof(struct bpf_insn),
-		.license = ptr_to_u64((void *) license),
-		.log_buf = ptr_to_u64(log_buf),
-		.log_size = log_buf_sz,
-		.log_level = 1,
-	};
-
-	/* assign one field outside of struct init to make sure any
-	 * padding is zero initialized
-	 */
-	attr.kern_version = kern_version;
-
-	log_buf[0] = 0;
-
-	return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
-}
-
 int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
 {
 	union bpf_attr attr = {
@@ -124,25 +32,6 @@ int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
 	return syscall(__NR_bpf, BPF_PROG_DETACH, &attr, sizeof(attr));
 }
 
-int bpf_obj_pin(int fd, const char *pathname)
-{
-	union bpf_attr attr = {
-		.pathname	= ptr_to_u64((void *)pathname),
-		.bpf_fd		= fd,
-	};
-
-	return syscall(__NR_bpf, BPF_OBJ_PIN, &attr, sizeof(attr));
-}
-
-int bpf_obj_get(const char *pathname)
-{
-	union bpf_attr attr = {
-		.pathname	= ptr_to_u64((void *)pathname),
-	};
-
-	return syscall(__NR_bpf, BPF_OBJ_GET, &attr, sizeof(attr));
-}
-
 int open_raw_sock(const char *name)
 {
 	struct sockaddr_ll sll;
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 20e3457857ca..cf7d2386d1f9 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -2,28 +2,13 @@
 #ifndef __LIBBPF_H
 #define __LIBBPF_H
 
-struct bpf_insn;
-
-int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
-		   int max_entries, int map_flags);
-int bpf_map_update_elem(int fd, void *key, void *value, unsigned long long flags);
-int bpf_map_lookup_elem(int fd, void *key, void *value);
-int bpf_map_delete_elem(int fd, void *key);
-int bpf_map_get_next_key(int fd, void *key, void *next_key);
+#include <bpf/bpf.h>
 
-int bpf_load_program(enum bpf_prog_type prog_type,
-		     const struct bpf_insn *insns, int insn_len,
-		     const char *license, int kern_version,
-		     char *log_buf, size_t log_buf_sz);
+struct bpf_insn;
 
 int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
 int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 
-int bpf_obj_pin(int fd, const char *pathname);
-int bpf_obj_get(const char *pathname);
-
-#define BPF_LOG_BUF_SIZE (256 * 1024)
-
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
 
 #define BPF_ALU64_REG(OP, DST, SRC)				\
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index d6b91e9a38ad..5546f8aac37e 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -56,8 +56,9 @@ static int test_sock(void)
 		BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
 		BPF_EXIT_INSN(),
 	};
+	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
-	prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
+	prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt,
 				   "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
 	if (prog_fd < 0) {
 		printf("failed to load prog '%s'\n", strerror(errno));
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index 8283ef86d392..504058631ffc 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -68,9 +68,10 @@ static int prog_load(int map_fd, int verdict)
 		BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
 		BPF_EXIT_INSN(),
 	};
+	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
 	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
-				prog, sizeof(prog), "GPL", 0,
+				prog, insns_cnt, "GPL", 0,
 				bpf_log_buf, BPF_LOG_BUF_SIZE);
 }
 
diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c
index fc6092fdc3b0..6e69be37f87f 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/samples/bpf/test_cgrp2_attach2.c
@@ -41,9 +41,10 @@ static int prog_load(int verdict)
 		BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
 		BPF_EXIT_INSN(),
 	};
+	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
 	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
-			       prog, sizeof(prog), "GPL", 0,
+			       prog, insns_cnt, "GPL", 0,
 			       bpf_log_buf, BPF_LOG_BUF_SIZE);
 
 	if (ret < 0) {
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index 43b4bde5d05c..0791b949cbe4 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -35,8 +35,9 @@ static int prog_load(int idx)
 		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
 		BPF_EXIT_INSN(),
 	};
+	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
-	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
 				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
 }
 
-- 
2.9.3

^ permalink raw reply related

* Re: wl1251 & mac address & calibration data
From: Pali Rohár @ 2016-12-20 17:06 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Kalle Valo, Arend Van Spriel, Daniel Wagner, Luis R. Rodriguez,
	Tom Gundersen, Johannes Berg, Ming Lei, Mimi Zohar,
	Bjorn Andersson, Rafał Miłecki, Sebastian Reichel,
	Pavel Machek, Michal Kazior, Ivaylo Dimitrov, Aaro Koskinen,
	linux-wireless, Network Development, linux-kernel@vger.kernel.org
In-Reply-To: <20161220165658.GI4920@atomide.com>

[-- Attachment #1: Type: Text/Plain, Size: 2331 bytes --]

On Tuesday 20 December 2016 17:56:58 Tony Lindgren wrote:
> * Kalle Valo <kvalo@codeaurora.org> [161220 03:47]:
> > Arend Van Spriel <arend.vanspriel@broadcom.com> writes:
> > > On 18-12-2016 13:09, Pali Rohár wrote:
> > >> File wl1251-nvs.bin is provided by linux-firmware package and
> > >> contains default data which should be overriden by model
> > >> specific calibrated data.
> > > 
> > > Ah. Someone thought it was a good idea to provide the "one ring
> > > to rule them all". Nice.
> > 
> > Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git
> > should be renamed to wl1251-nvs.bin.example, or something like
> > that, as it should be only installed to a real system only if
> > there's no real calibration data available (only for developers to
> > use, not real users).
> 
> Makes sense to me. Note that with the recent changes to wlcore, we
> can now easily provide board specific calibration firmware simply by
> adding a new compatible value. So for n900, we could have something
> like compatible = "ti,wl1251-n900" and have it point to n900
> specific calibration file wl1251-nvs-n900.bin. Of course this won't
> help with the mac address, or any of the device specific data..
> 
> That is assuming the calibration values are the same for each similar
> device and don't have to be generated for each device. And naturally
> wl1251 needs simlar changes done to make use of devices specific
> calibration files.
> 
> Regards,
> 
> Tony

As wrote in another thread "wl1251 NVS calibration data format" 
calibration data for wl1251 (wl1251-nvs.bin) contains also MAC address, 
which kernel sends to wl1251 chip. Kernel just do not use it.

So... my idea now is:

1) extend request_firmware function family with ability to use userspace 
helper first and fallback to VFS

2) teach wl1251.ko to parse MAC address from wl1251-nvs.bin and use it 
(in case it is not empty or 00:00:20:07:03:09 which is in that example 
linux-firmware package)

3) write Nokia n900 specific userspace helper for providing data when 
kernel requests wl1251-nvs.bin. So userspace helper reads MAC address 
and calibration data from CAL, place MAC address into calibration data 
and send put it into kernel.

Are you OK with this idea?

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH v3] stmmac: enable rx queues
From: Joao Pinto @ 2016-12-20 17:09 UTC (permalink / raw)
  To: peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev,
	Joao Pinto

When the hardware is synthesized with multiple queues, all queues are
disabled for default. This patch adds the rx queues configuration.
This patch was successfully tested in a Synopsys QoS Reference design.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
---
changes v2 -> v3 (Seraphin Bonnaffe):
- GMAC_RX_QUEUE_CLEAR macro simplified
changes v1 -> v2 (Niklas Cassel and Seraphin Bonnaffe):
- Instead of using number of DMA channels, lets use number of queues
- Create 2 flavors of RX queue enable Macros: AV and DCB (AV by default)
- Make sure that the RX queue related bits are cleared before setting
- Check if rx_queue_enable is available before executing

 drivers/net/ethernet/stmicro/stmmac/common.h      |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/dwmac4.h      |  8 ++++++++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 12 ++++++++++++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c  |  5 +++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++++++++++++++++++++++
 5 files changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index b13a144..6c96291 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -323,6 +323,9 @@ struct dma_features {
 	/* TX and RX number of channels */
 	unsigned int number_rx_channel;
 	unsigned int number_tx_channel;
+	/* TX and RX number of queues */
+	unsigned int number_rx_queues;
+	unsigned int number_tx_queues;
 	/* Alternate (enhanced) DESC mode */
 	unsigned int enh_desc;
 };
@@ -454,6 +457,8 @@ struct stmmac_ops {
 	void (*core_init)(struct mac_device_info *hw, int mtu);
 	/* Enable and verify that the IPC module is supported */
 	int (*rx_ipc)(struct mac_device_info *hw);
+	/* Enable RX Queues */
+	void (*rx_queue_enable)(struct mac_device_info *hw, u32 queue);
 	/* Dump MAC registers */
 	void (*dump_regs)(struct mac_device_info *hw);
 	/* Handle extra events on specific interrupts hw dependent */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 3e8d4fe..b524598 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -22,6 +22,7 @@
 #define GMAC_HASH_TAB_32_63		0x00000014
 #define GMAC_RX_FLOW_CTRL		0x00000090
 #define GMAC_QX_TX_FLOW_CTRL(x)		(0x70 + x * 4)
+#define GMAC_RXQ_CTRL0			0x000000a0
 #define GMAC_INT_STATUS			0x000000b0
 #define GMAC_INT_EN			0x000000b4
 #define GMAC_PCS_BASE			0x000000e0
@@ -44,6 +45,11 @@
 
 #define GMAC_MAX_PERFECT_ADDRESSES	128
 
+/* MAC RX Queue Enable */
+#define GMAC_RX_QUEUE_CLEAR(queue)	~(GENMASK(1, 0) << ((queue) * 2))
+#define GMAC_RX_AV_QUEUE_ENABLE(queue)	BIT((queue) * 2)
+#define GMAC_RX_DCB_QUEUE_ENABLE(queue)	BIT(((queue) * 2) + 1)
+
 /* MAC Flow Control RX */
 #define GMAC_RX_FLOW_CTRL_RFE		BIT(0)
 
@@ -133,6 +139,8 @@ enum power_event {
 /* MAC HW features2 bitmap */
 #define GMAC_HW_FEAT_TXCHCNT		GENMASK(21, 18)
 #define GMAC_HW_FEAT_RXCHCNT		GENMASK(15, 12)
+#define GMAC_HW_FEAT_TXQCNT		GENMASK(9, 6)
+#define GMAC_HW_FEAT_RXQCNT		GENMASK(3, 0)
 
 /* MAC HW ADDR regs */
 #define GMAC_HI_DCS			GENMASK(18, 16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index eaed7cb..ecfbf57 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,6 +59,17 @@ static void dwmac4_core_init(struct mac_device_info *hw, int mtu)
 	writel(value, ioaddr + GMAC_INT_EN);
 }
 
+static void dwmac4_rx_queue_enable(struct mac_device_info *hw, u32 queue)
+{
+	void __iomem *ioaddr = hw->pcsr;
+	u32 value = readl(ioaddr + GMAC_RXQ_CTRL0);
+
+	value &= GMAC_RX_QUEUE_CLEAR(queue);
+	value |= GMAC_RX_AV_QUEUE_ENABLE(queue);
+
+	writel(value, ioaddr + GMAC_RXQ_CTRL0);
+}
+
 static void dwmac4_dump_regs(struct mac_device_info *hw)
 {
 	void __iomem *ioaddr = hw->pcsr;
@@ -392,6 +403,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x)
 static const struct stmmac_ops dwmac4_ops = {
 	.core_init = dwmac4_core_init,
 	.rx_ipc = dwmac4_rx_ipc_enable,
+	.rx_queue_enable = dwmac4_rx_queue_enable,
 	.dump_regs = dwmac4_dump_regs,
 	.host_irq_status = dwmac4_irq_status,
 	.flow_ctrl = dwmac4_flow_ctrl,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 8196ab5..377d1b4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -303,6 +303,11 @@ static void dwmac4_get_hw_feature(void __iomem *ioaddr,
 		((hw_cap & GMAC_HW_FEAT_RXCHCNT) >> 12) + 1;
 	dma_cap->number_tx_channel =
 		((hw_cap & GMAC_HW_FEAT_TXCHCNT) >> 18) + 1;
+	/* TX and RX number of queues */
+	dma_cap->number_rx_queues =
+		((hw_cap & GMAC_HW_FEAT_RXQCNT) >> 0) + 1;
+	dma_cap->number_tx_queues =
+		((hw_cap & GMAC_HW_FEAT_TXQCNT) >> 6) + 1;
 
 	/* IEEE 1588-2002 */
 	dma_cap->time_stamp = 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3e40578..bc9cff9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1271,6 +1271,24 @@ static void free_dma_desc_resources(struct stmmac_priv *priv)
 }
 
 /**
+ *  stmmac_mac_enable_rx_queues - Enable MAC rx queues
+ *  @priv: driver private structure
+ *  Description: It is used for enabling the rx queues in the MAC
+ */
+static void stmmac_mac_enable_rx_queues(struct stmmac_priv *priv)
+{
+	int rx_count = priv->dma_cap.number_rx_queues;
+	int queue = 0;
+
+	/* If GMAC does not have multiqueues, then this is not necessary*/
+	if (rx_count == 1)
+		return;
+
+	for (queue = 0; queue < rx_count; queue++)
+		priv->hw->mac->rx_queue_enable(priv->hw, queue);
+}
+
+/**
  *  stmmac_dma_operation_mode - HW DMA operation mode
  *  @priv: driver private structure
  *  Description: it is used for configuring the DMA operation mode register in
@@ -1691,6 +1709,10 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	/* Initialize the MAC Core */
 	priv->hw->mac->core_init(priv->hw, dev->mtu);
 
+	/* Initialize MAC RX Queues */
+	if (priv->hw->mac->rx_queue_enable)
+		stmmac_mac_enable_rx_queues(priv);
+
 	ret = priv->hw->mac->rx_ipc(priv->hw);
 	if (!ret) {
 		netdev_warn(priv->dev, "RX IPC Checksum Offload disabled\n");
-- 
2.9.3

^ permalink raw reply related

* Re: wl1251 & mac address & calibration data
From: Kalle Valo @ 2016-12-20 17:11 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Arend Van Spriel, Pali Rohár, Daniel Wagner,
	Luis R. Rodriguez, Tom Gundersen, Johannes Berg, Ming Lei,
	Mimi Zohar, Bjorn Andersson, Rafał Miłecki,
	Sebastian Reichel, Pavel Machek, Michal Kazior, Ivaylo Dimitrov,
	Aaro Koskinen, linux-wireless, Network Development
In-Reply-To: <20161220165658.GI4920@atomide.com>

Tony Lindgren <tony@atomide.com> writes:

> * Kalle Valo <kvalo@codeaurora.org> [161220 03:47]:
>> Arend Van Spriel <arend.vanspriel@broadcom.com> writes:
>> 
>> > On 18-12-2016 13:09, Pali Rohár wrote:
>> >
>> >> File wl1251-nvs.bin is provided by linux-firmware package and contains 
>> >> default data which should be overriden by model specific calibrated 
>> >> data.
>> >
>> > Ah. Someone thought it was a good idea to provide the "one ring to rule
>> > them all". Nice.
>> 
>> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be
>> renamed to wl1251-nvs.bin.example, or something like that, as it should
>> be only installed to a real system only if there's no real calibration
>> data available (only for developers to use, not real users).
>
> Makes sense to me. Note that with the recent changes to wlcore, we can
> now easily provide board specific calibration firmware simply by adding a
> new compatible value. So for n900, we could have something like
> compatible = "ti,wl1251-n900" and have it point to n900 specific calibration
> file wl1251-nvs-n900.bin. Of course this won't help with the mac address,
> or any of the device specific data..
>
> That is assuming the calibration values are the same for each similar
> device and don't have to be generated for each device. And naturally wl1251
> needs simlar changes done to make use of devices specific calibration files.

No, these are unique per each sold device. Every N900 was calibrated at
the factory and they all have different calibration data which is stored
to the flash. So when N900 boots (and in _every_ boot) it has to load
the calibration data from the flash and provide it to the wl1251 driver
somehow.

-- 
Kalle Valo

^ permalink raw reply

* Re: wl1251 & mac address & calibration data
From: Tony Lindgren @ 2016-12-20 17:21 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Arend Van Spriel, Pali Rohár, Daniel Wagner,
	Luis R. Rodriguez, Tom Gundersen, Johannes Berg, Ming Lei,
	Mimi Zohar, Bjorn Andersson, Rafał Miłecki,
	Sebastian Reichel, Pavel Machek, Michal Kazior, Ivaylo Dimitrov,
	Aaro Koskinen, linux-wireless, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <87bmw6ttim.fsf-HodKDYzPHsUD5k0oWYwrnHL1okKdlPRT@public.gmane.org>

* Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> [161220 09:12]:
> Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org> writes:
> 
> > * Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> [161220 03:47]:
> >> Arend Van Spriel <arend.vanspriel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> writes:
> >> 
> >> > On 18-12-2016 13:09, Pali Rohár wrote:
> >> >
> >> >> File wl1251-nvs.bin is provided by linux-firmware package and contains 
> >> >> default data which should be overriden by model specific calibrated 
> >> >> data.
> >> >
> >> > Ah. Someone thought it was a good idea to provide the "one ring to rule
> >> > them all". Nice.
> >> 
> >> Yes, that was a bad idea. wl1251-nvs.bin in linux-firmware.git should be
> >> renamed to wl1251-nvs.bin.example, or something like that, as it should
> >> be only installed to a real system only if there's no real calibration
> >> data available (only for developers to use, not real users).
> >
> > Makes sense to me. Note that with the recent changes to wlcore, we can
> > now easily provide board specific calibration firmware simply by adding a
> > new compatible value. So for n900, we could have something like
> > compatible = "ti,wl1251-n900" and have it point to n900 specific calibration
> > file wl1251-nvs-n900.bin. Of course this won't help with the mac address,
> > or any of the device specific data..
> >
> > That is assuming the calibration values are the same for each similar
> > device and don't have to be generated for each device. And naturally wl1251
> > needs simlar changes done to make use of devices specific calibration files.
> 
> No, these are unique per each sold device. Every N900 was calibrated at
> the factory and they all have different calibration data which is stored
> to the flash. So when N900 boots (and in _every_ boot) it has to load
> the calibration data from the flash and provide it to the wl1251 driver
> somehow.

Urgh, OK. So much for that idea then.

Thanks,

Tony

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-20 17:23 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
	Kees Cook, Jann Horn, Tejun Heo, David Ahern, David S. Miller,
	Thomas Graf, Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel@vger.kernel.org, Network Development
In-Reply-To: <bdc18fe4-f90d-49f0-d0a6-a28f87335df2@zonque.org>

On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack <daniel@zonque.org> wrote:
> Hi,
>
> On 12/20/2016 04:50 AM, Andy Lutomirski wrote:
>> You mean BPF_CGROUP_RUN_PROG_INET_SOCK(sk)?  There is nothing bpf
>> specfic about the hook except that the name of this macro has "BPF" in
>> it.  There is nothing whatsoever that's bpf-specific about the context
>> -- sk is not bpf-specific at all.
>>
>> The only thing bpf-specific about it is that it currently only invokes
>> bpf programs.  That could easily change.
>
> I'm not sure if I follow. The code as it currently stands only supports
> attaching bpf programs to cgroups which have been created using
> BPF_PROG_LOAD. If cgroups would support other program types in the
> future, then they would need to be stored in different data types
> anyway, and the bpf syscall multiplexer would be the wrong entry point
> to access them anyway.

To clarify, since this thread has gotten excessively long and twisted,
I think it's important that, for hooks attached to a cgroup, you be
able to tell in a generic way whether something is plugged into the
hook.  The natural way to see a cgroup's configuration is to read from
cgroupfs, so I think that reading from cgroupfs should show you that a
BPF program is attached and also give enough information that, once
bpf programs become dumpable, you can dump the program (using the
bpf() syscall or whatever).

Obviously the interface to *attach* a BPF program to a hook will need
to be at least a little bit BPF-specific.  But there's nothing
particularly BPF-specific about detaching, and if a control file were
to exist, writing "detach" or "none" to it seems natural.

>
> Whether we add bpf-specific code to the cgroup file parsers or
> cgroup-specific code to the bpf layer does not make much of a semantic
> difference, does it? As a matter of fact, my very first implementation
> of this patch set implemented a cgroup controller that would allow
> writing strings like "ing
>
> b) make it possible to extend the functionality in the future by adding
> flags to the command struct etc.
>
> And I hoped we achieved that after discussing it for so long.
>
>> How about slowing down a wee bit and trying to come up with cgroup
>> hook semantics that work for all of these use cases?
>
> I'm all for discussing things, but I don't this was done in a rush.
>
> I do agree though that adding functionality to cgroups that is not
> limited to resource control is a delicate thing to do, which is why I
> cc'ed cgroups@ in my patches. I should have also added linux-api@ I
> guess, sorry I missed that.
>ress 5" to its control file, where 5 is the fd
> number that came out of BPF_PROG_LOAD. The main reason we decided to
> ditch that was that echoing fd numbers into a text file seemed way worse
> than going through a proper syscall layer with it, and ioctls are
> unavailable on pseudo-fs.

There isn't a big semantic difference between
'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
because the permission check is much weaker.

The reason I suggest ioctl() and not write() is that write() MUST NOT
make its behavior depend on the caller's credentials, file table, etc.
Imagine what would happen if you did 'sudo -u eviltext
>/cgroup/NAME/control.file'.  (This particular mistake has been
repeated many times in the kernel, in drivers, networking, namespaces,
core code, etc, and it's resulted in a big pile of privilege
escalation bugs.)  So write("bpf:<actual BPF instructions>") is safe
(but unusably awkward, I think), whereas write("bpf:fd 5") is unsafe.

>
> The idea was rather to allow attaching bpf programs to other things than
> just cgroups as well, which is why we called the member of 'union
> bpf_attr' 'target_fd', and a cgroup is just one type a target here.

I would make that a separate operation.  If someone adds the ability
to attach an ebpf program to, say, seccomp (I'm quite sure this will
happen eventually), it should be attached using seccomp(), not bpf(),
for example).  The people writing seccomp filters will thank you for
making the syscall in question reflect what object (the cgroup, for
example) is being modified.

>
>>> i'm assuming 'baadf00d' is bpf program fd expressed a text string?
>>> and kernel needs to parse above? will you allow capital and lower
>>> case for 'bpf:' ? and mixed case too? spaces and tabs allowed or not?
>>> can program fd expressed as decimal or hex or both?
>>> how do you return the error? as a text string for user space
>>> to parse?
>>
>> No.  The kernel does not parse it because you cannot write this to the
>> file.  You set a bpf filter with ioctl and pass an fd.
>
> An ioctl on what file, exactly?

There are at least two plausible models.

My preference would be to do an ioctl on a new
/cgroup/NAME/network_hooks.inet_ingress file.  Reading that file tells
you whether something is attached and hopefully also gives enough
information (a hash of the BPF program, perhaps) to dump the actual
program using future bpf() interfaces.  write() and ioctl() can be
used to configure it as appropriate.

Another option that I like less would be to have a
/cgroup/NAME/cgroup.bpf that lists all the active hooks along with
their contents.  You would do an ioctl() on that to program a hook and
you could read it to see what's there.

FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
It doesn't make a semantic difference, except that I dislike
BPF_PROG_DETACH because that particular command isn't BPF-specific at
all.

>
>> If you *read*
>> the file, you get the same bpf program hash that fdinfo on the bpf
>> object would show -- this is for debugging and (eventually) CRIU.
>
> We need a debugging facility at some point, I agree to that. As the code
> currently stands, that would rather need to go into the bpf(2) syscall
> though, as setting a program through bpf(2) and reading it through
> cgroupfs is really nasty.

But knowing that you have to call bpf() to tell whether bpf hooks are
installed in a cgroup is nasty.  Everything else uses ls and cat --
why should BPF be special here?

>> So if I set up a cgroup that's monitored and call it /cgroup/a and
>> enable delegation and if the program running there wants to do its own
>> monitoring in /cgroup/a/b (via delegation), then you really want the
>> outer monitor to silently drop events coming from /cgroup/a/b?
>
> That's a fair point, and we've discussed it as well. The issue is, as
> Alexei already pointed out, that we do not want to traverse the tree up
> to the root for nested cgroups due to the runtime costs in the
> networking fast-path. After all, we're running the bpf program for each
> packet in flight. Hence, we opted for the approach to only look at the
> leaf node for now, with the ability to open it up further in the future
> using flags during attach etc.

Careful here!  You don't look only at the leaf node for now.  You do a
fancy traversal and choose the nearest node that has a hook set up.
This gives you almost all the complexity of evaluating all of the
installed hooks with none of the benefit.  It also is, IMO, much more
dangerous than only looking at the leaf node.  Consider:

mkdir /cgroup/foo
BPF_PROG_ATTACH(some program to foo)
mkdir /cgroup/foo/bar
chown -R some_user /cgroup/foo/bar

If the kernel only looked at the leaf, then the program that did the
above would not expect that the program would constrain
/cgroup/foo/bar's activity.  But, as it stands, the program *would*
expect /cgroup/foo/bar to be constrained, except that, whenever the
capable() check changes to ns_capable() (which will happen eventually
one way or another), then the bad guy can create /cgroup/foo/bar/baz,
install a new no-op hook there, and break the security assumption.

IOW, I think that totally non-recursive hooks are okay from a security
perspective, albeit rather strange, but the current design is not okay
from a security perspective.

>
>> The current approach to bpf hooks will bite you down the road.  David
>> Ahern is already proposing using it for something that is not tracing
>> at all, and someone will want that in a container, and there will be a
>> problem.
>
> Hmm, I thought we've sorted out the concerns about that by making sure
> that we
>
> a) lock-down the API sufficiently so it doesn't cause any security
> issues in its current form, and

This argument is why iptables + userns has become a security mess, for
example.  Designing an API assuming that the bad guys will never be
permitted to use it causes quite a bit of pain when, a few years
later, bad guys are permitted to use it.

>> I think my proposal is quite close to workable.
>
> So let's talk about how to proceed. I've seen different bits of your
> proposal in different mails, and I think a summary of it would help the
> discussion.

So here's a fleshed-out possible version that's a bit of a compromise
after sleeping on this.  There's plenty of room to tweak this.

Each cgroup gets a new file cgroup.hooks.  Reading it shows a list of
active hooks.  (A hook can be a string like "network.inet_ingress".)

You can write a command like "-network.inet_ingress off" to it to
disable network.inet_ingress.  You can write a command like
"+network.inet_ingress" to it to enable the network.inet_ingress hook.

When a hook (e.g. network.inet_ingress) is enabled, a new file appears
in the cgroup called "hooks.network.inet_ingress").  You can read it
to get an indication of what is currently installed in that slot.  You
can write "none" to it to cause nothing to be installed in that slot.
(This replaces BPF_PROG_DETACH.).  You can open it for write and use
bpf() or perhaps ioctl() to attach a bpf program.  Maybe you can also
use bpf() to dump the bpf program, but, regardless, if a bpf program
is there, read() will return some string that contains "bpf" and maybe
some other useful information.

If a SELinux policy wants to lock down the netowrk.inet_ingress hook,
it uses existing mechanisms to label the hooks.network.inet_ingress
file when it appears and to restrict opening it for write.

I think this is all quite straightforward to implement and will result
in clean code.  I could probably make some decent progress toward it
over the next couple days.

--Andy

^ permalink raw reply

* Re: [PATCH] ethernet: sfc: Add Kconfig entry for vendor Solarflare
From: Edward Cree @ 2016-12-20 17:29 UTC (permalink / raw)
  To: Tobias Klauser, netdev; +Cc: linux-net-drivers, bkenward
In-Reply-To: <20161220133826.1478-1-tklauser@distanz.ch>

On 20/12/16 13:38, Tobias Klauser wrote:
> Since commit
>
>   5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
>
> there are two drivers for Solarflare devices, but both still show up
> directly beneath "Ethernet driver support" in the Kconfig. Follow the
> pattern of other vendors and group them beneath an own vendor Kconfig
> entry for Solarflare.
>
> Cc: Edward Cree <ecree@solarflare.com>
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Edward Cree <ecree@solarflare.com>

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Jones Desougi @ 2016-12-20 17:58 UTC (permalink / raw)
  To: Geoff Lansberry, linux-wireless-u79uwXL29TY76Z2rM5mHXA
  Cc: lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mgreer-luAo+O/VEmrlveNOaEYElw, justin-R+k406RtEhcAvxtiuMwx3w
In-Reply-To: <1482250592-4268-1-git-send-email-glansberry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 2016-12-20 17:16, Geoff Lansberry wrote:
> From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
> 
> The TRF7970A has configuration options to support hardware designs
> which use a 27.12MHz clock. This commit adds a device tree option
> 'clock-frequency' to support configuring the this chip for default
> 13.56MHz clock or the optional 27.12MHz clock.
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  4 ++
>  drivers/nfc/trf7970a.c                             | 50 +++++++++++++++++-----
>  2 files changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index 32b35a0..e262ac1 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -21,6 +21,8 @@ Optional SoC Specific Properties:
>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>    where an extra byte is returned by Read Multiple Block commands issued
>    to Type 5 tags.
> +- clock-frequency: Set to specify that the input frequency to the trf7970a is 13560000Hz or 27120000Hz
> +
You're adding an empty line here that is removed in the next patch.

>  
>  Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  
> @@ -43,6 +45,8 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  		irq-status-read-quirk;
>  		en2-rf-quirk;
>  		t5t-rmb-extra-byte-quirk;
> +		vdd_io_1v8;
This does not belong here, and so no need to remove in the next patch.

> +		clock-frequency = <27120000>;
>  		status = "okay";
>  	};
>  };
> diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
> index 26c9dbb..4e051e9 100644
> --- a/drivers/nfc/trf7970a.c
> +++ b/drivers/nfc/trf7970a.c
> @@ -124,6 +124,9 @@
>  		 NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK)
>  
>  #define TRF7970A_AUTOSUSPEND_DELAY		30000 /* 30 seconds */
> +#define TRF7970A_13MHZ_CLOCK_FREQUENCY		13560000
> +#define TRF7970A_27MHZ_CLOCK_FREQUENCY		27120000
> +
>  
>  #define TRF7970A_RX_SKB_ALLOC_SIZE		256
>  
> @@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf)
>  
>  	trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
>  
> -	ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
> +	ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
> +			trf->modulator_sys_clk_ctrl);
>  	if (ret)
>  		goto err_out;
>  
> -	trf->modulator_sys_clk_ctrl = 0;
> -
>  	ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
>  			TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
>  			TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
> @@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a *trf, int tech)
>  	switch (tech) {
>  	case NFC_DIGITAL_RF_TECH_106A:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_OOK;
>  		trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_106B:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_ASK10;
>  		trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_212F:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_ASK10;
>  		trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_424F:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_ASK10;
>  		trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_ISO15693:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_OOK;
>  		trf->guard_time = TRF7970A_GUARD_TIME_15693;
>  		break;
>  	default:
> @@ -1571,17 +1583,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a *trf, int tech)
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
>  			TRF7970A_ISO_CTRL_NFC_CE |
>  			TRF7970A_ISO_CTRL_NFC_CE_14443A;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_OOK;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_212F:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
>  			TRF7970A_ISO_CTRL_NFC_NFCF_212;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_ASK10;
>  		break;
>  	case NFC_DIGITAL_RF_TECH_424F:
>  		trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
>  			TRF7970A_ISO_CTRL_NFC_NFCF_424;
> -		trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
> +		trf->modulator_sys_clk_ctrl =
> +			(trf->modulator_sys_clk_ctrl & 0xF8) |
> +			TRF7970A_MODULATOR_DEPTH_ASK10;
>  		break;
>  	default:
>  		dev_dbg(trf->dev, "Unsupported rf technology: %d\n", tech);
> @@ -1987,6 +2005,7 @@ static int trf7970a_probe(struct spi_device *spi)
>  	struct device_node *np = spi->dev.of_node;
>  	struct trf7970a *trf;
>  	int uvolts, autosuspend_delay, ret;
> +	u32 clk_freq = 13560000;
Use TRF7970A_13MHZ_CLOCK_FREQUENCY here?

>  
>  	if (!np) {
>  		dev_err(&spi->dev, "No Device Tree entry\n");
> @@ -2043,6 +2062,15 @@ static int trf7970a_probe(struct spi_device *spi)
>  		return ret;
>  	}
>  
> +	of_property_read_u32(np, "clock-frequency", &clk_freq);
> +	if ((clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY) ||
> +		(clk_freq != TRF7970A_27MHZ_CLOCK_FREQUENCY)) {
Two comparisons with 27MHz, missing 13MHz.

> +		dev_err(trf->dev,
> +			"clock-frequency (%u Hz) unsupported\n",
> +			clk_freq);
> +		return -EINVAL;
> +	}
> +
>  	if (of_property_read_bool(np, "en2-rf-quirk"))
>  		trf->quirks |= TRF7970A_QUIRK_EN2_MUST_STAY_LOW;
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Mark Greer @ 2016-12-20 18:11 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo, robh+dt,
	mark.rutland, netdev, devicetree, linux-kernel, justin
In-Reply-To: <1482250592-4268-1-git-send-email-glansberry@gmail.com>

Hi Geoff.

Please put the version in your subjects when submitting anything but the
initial version of a patch (e.g., [PATCH v2 1/3]).

Which series do you want reviewed?

Mark
--

^ permalink raw reply

* Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-20 18:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161219.203623.119653805184192345.davem@davemloft.net>

On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > From: Dave Jones <davej@codemonkey.org.uk>
 > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > 
 > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > > 
 > >  > Unfortunately, this made no difference.  I spent some time today trying
 > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > >  > 
 > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
 > >  > I can trigger it with Trinity.
 > > 
 > > scratch that last part, I finally just repro'd it with a single process.
 > 
 > Thanks for the info, I'll try to think about this some more.

I threw in some debug printks right before that BUG_ON.
it's always this:

skb->len=31 skb->data_len=0 offset:30 total_len:9

Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ?

	Dave

^ permalink raw reply

* Re: ipv6: handle -EFAULT from skb_copy_bits
From: David Miller @ 2016-12-20 18:28 UTC (permalink / raw)
  To: davej; +Cc: netdev
In-Reply-To: <20161220181728.dd2cynjwrceruwcu@codemonkey.org.uk>

From: Dave Jones <davej@codemonkey.org.uk>
Date: Tue, 20 Dec 2016 13:17:28 -0500

> On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
>  > From: Dave Jones <davej@codemonkey.org.uk>
>  > Date: Mon, 19 Dec 2016 19:40:13 -0500
>  > 
>  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
>  > > 
>  > >  > Unfortunately, this made no difference.  I spent some time today trying
>  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > >  > 
>  > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
>  > >  > I can trigger it with Trinity.
>  > > 
>  > > scratch that last part, I finally just repro'd it with a single process.
>  > 
>  > Thanks for the info, I'll try to think about this some more.
> 
> I threw in some debug printks right before that BUG_ON.
> it's always this:
> 
> skb->len=31 skb->data_len=0 offset:30 total_len:9
> 
> Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ?

skb->data_len is just the length of any non-linear data in the SKB.

This has to do with the SKB buffer layout and geometry, not whether
the packet is "fragmented" in the protocol sense.

So no, this isn't a criteria for packets being filtered out by this
point.

Can you try to capture what sk->sk_socket->type and
inet_sk(sk)->hdrincl are set to at the time of the crash?

Thanks.

^ permalink raw reply

* Re: kernel/bpf/verifier.c: 4 * possible unintended fallthrough ?
From: Josef Bacik @ 2016-12-20 18:28 UTC (permalink / raw)
  To: David Binderman
  Cc: Alexei Starovoitov, ast@kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <VI1PR08MB10228EE1E2FA0EBFEA78DE919C900@VI1PR08MB1022.eurprd08.prod.outlook .com>

On Tue, Dec 20, 2016 at 11:34 AM, David Binderman <dcb314@hotmail.com> 
wrote:
> 
> Hello there,
> 
>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> I've tried 4.9 and 5.2 and don't see this warning.
> 
> As expected - I used a development version of gcc.
> Latest released version is 6.2
> 
>> Is this 6.x gcc?
> 
> 7.0 would be more accurate.
> 
>> I suspect it will have such warnings all over the kernel.
> 
> Indeed it has hundreds, but the subject under discussion is file
> kernel/bpf/verifier.c.
> 
> I am still not sure if I have found a fallthrough bug or not.

You haven't, this is intended so is a useless warning.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Geoff Lansberry @ 2016-12-20 18:29 UTC (permalink / raw)
  To: Mark Greer
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Justin Bronder
In-Reply-To: <20161220181157.GB30273-luAo+O/VEmrlveNOaEYElw@public.gmane.org>

On Tue, Dec 20, 2016 at 1:11 PM, Mark Greer <mgreer-luAo+O/VEmrlveNOaEYElw@public.gmane.org> wrote:
> Hi Geoff.
>
> Please put the version in your subjects when submitting anything but the
> initial version of a patch (e.g., [PATCH v2 1/3]).
>
> Which series do you want reviewed?
>
> Mark
> --
Sorry about the double posting, I had forgotten to erase the patches I
generated while rebasing and checking, and I'll have to figure out how
to add that v2 line to the automatically generated subject line if I
end up submitting another round.

Please review the three most recent patches, which have the send time
of 17:16.

Best Regards,
Geoff
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address
From: David Miller @ 2016-12-20 18:30 UTC (permalink / raw)
  To: fgao; +Cc: maheshb, edumazet, netdev, gfree.wind
In-Reply-To: <1482110645-1853-1-git-send-email-fgao@ikuai8.com>

From: fgao@ikuai8.com
Date: Mon, 19 Dec 2016 09:24:05 +0800

>  It is sent again because the first email is sent during net-next closing.

It is still closed, and will not open again for at least one week.

^ permalink raw reply

* Re: mlx4: Bug in XDP_TX + 16 rx-queues
From: Martin KaFai Lau @ 2016-12-20 18:31 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Tariq Toukan, Saeed Mahameed, netdev@vger.kernel.org,
	Alexei Starovoitov
In-Reply-To: <a586ecb7-e290-8c70-714b-2eea0b94c7fb@gmail.com>

On Tue, Dec 20, 2016 at 02:02:05PM +0200, Tariq Toukan wrote:
> Thanks Martin, nice catch!
>
>
> On 20/12/2016 1:37 AM, Martin KaFai Lau wrote:
> >Hi Tariq,
> >
> >On Sat, Dec 17, 2016 at 02:18:03AM -0800, Martin KaFai Lau wrote:
> >>Hi All,
> >>
> >>I have been debugging with XDP_TX and 16 rx-queues.
> >>
> >>1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
> >>it seems that the packet cannot be XDP_TX out if the pkt
> >>is received from some particular CPUs (/rx-queues).
> >>
> >>2) If 8 rx-queues is used, it does not have problem.
> >>
> >>3) The 16 rx-queues problem also went away after reverting these
> >>two patches:
> >>15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
> >>67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme
> >>
> >After taking a closer look at 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
> >and armed with the fact that '>8 rx-queues does not work', I have
> >made the attached change that fixed the issue.
> >
> >Making change in mlx4_en_fill_qp_context() could be an easier fix
> >but I think this change will be easier for discussion purpose.
> >
> >I don't want to lie that I know anything about how this variable
> >works in CX3.  If this change makes sense, I can cook up a diff.
> >Otherwise, can you shed some light on what could be happening
> >and hopefully can lead to a diff?
> >
> >Thanks
> >--Martin
> >
> >
> >diff --git i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >index bcd955339058..b3bfb987e493 100644
> >--- i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >+++ w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> >@@ -1638,10 +1638,10 @@ int mlx4_en_start_port(struct net_device *dev)
> >
> >  	/* Configure tx cq's and rings */
> >  	for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
> >-		u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1;
> The bug lies in this line.
> Number of rings per UP in case of TX_XDP should be priv->tx_ring_num[TX_XDP
> ], not 1.
> Please try the following fix.
> I can prepare and send it once the window opens again.
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index bcd955339058..edbe200ac2fa 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev)
>
>         /* Configure tx cq's and rings */
>         for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
> -               u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up :
> 1;
> +               u8 num_tx_rings_p_up = t == TX ?
> +                       priv->num_tx_rings_p_up : priv->tx_ring_num[t];
>
>                 for (i = 0; i < priv->tx_ring_num[t]; i++) {
>                         /* Configure cq */
>
Thanks for confirming the bug is related to the user_prio argument.

I have some questions:

1. Just to confirm the intention of the change.  Your change is essentially
   always passing 0 to the user_prio parameter for the TX_XDP type by
   doing (i / priv->tx_ring_num[t])?  If yes, would it be clearer to
   always pass 0 instead?

   And yes, it also works in our test.  Please post an offical patch
   if it is the fix.

2. Can you explain a little on how does the user_prio affect
   the tx behavior?  e.g. What is the difference between
   different user_prio like 0, 1, 2...etc?

3. Mostly a follow up on (2).
   In mlx4_en_get_profile(), num_tx_rings_p_up (of the struct mlx4_en_profile)
   depends on mlx4_low_memory_profile() and number of cpu.  Does these
   similar bounds apply to the 'u8 num_tx_rings_p_up' here for
   TX_XDP type?

Thanks,
Martin

> >-
> >  		for (i = 0; i < priv->tx_ring_num[t]; i++) {
> >  			/* Configure cq */
> >+			int user_prio;
> >+
> >  			cq = priv->tx_cq[t][i];
> >  			err = mlx4_en_activate_cq(priv, cq, i);
> >  			if (err) {
> >@@ -1660,9 +1660,14 @@ int mlx4_en_start_port(struct net_device *dev)
> >
> >  			/* Configure ring */
> >  			tx_ring = priv->tx_ring[t][i];
> >+			if (t != TX_XDP)
> >+				user_prio = i / priv->num_tx_rings_p_up;
> >+			else
> >+				user_prio = i & 0x07;
> >+
> >  			err = mlx4_en_activate_tx_ring(priv, tx_ring,
> >  						       cq->mcq.cqn,
> >-						       i / num_tx_rings_p_up);
> >+						       user_prio);
> >  			if (err) {
> >  				en_err(priv, "Failed allocating Tx ring\n");
> >  				mlx4_en_deactivate_cq(priv, cq);
> Regards,
> Tariq Toukan.

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Mark Greer @ 2016-12-20 18:35 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, robh+dt, mark.rutland, netdev, devicetree,
	linux-kernel, Justin Bronder
In-Reply-To: <CAO7Z3WLYMZ3rRGKpksYnRksbC=A1WNog913UJFiPsAT7ymOnaw@mail.gmail.com>

On Tue, Dec 20, 2016 at 01:29:13PM -0500, Geoff Lansberry wrote:
> On Tue, Dec 20, 2016 at 1:11 PM, Mark Greer <mgreer@animalcreek.com> wrote:
> > Hi Geoff.
> >
> > Please put the version in your subjects when submitting anything but the
> > initial version of a patch (e.g., [PATCH v2 1/3]).
> >
> > Which series do you want reviewed?
> >
> > Mark
> > --
> Sorry about the double posting, I had forgotten to erase the patches I
> generated while rebasing and checking, and I'll have to figure out how
> to add that v2 line to the automatically generated subject line if I
> end up submitting another round.

Hint: -v <n> option of 'git format-patch'

> Please review the three most recent patches, which have the send time
> of 17:16.

Okay, thank.

Mark

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Daniel Mack @ 2016-12-20 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
	Kees Cook, Jann Horn, Tejun Heo, David Ahern, David S. Miller,
	Thomas Graf, Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Network Development
In-Reply-To: <CALCETrXyp2ddf4HRsEoN=qEwTBaezOUX2XWj6nxPcbc4t13svw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi,

On 12/20/2016 06:23 PM, Andy Lutomirski wrote:
> On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:

> To clarify, since this thread has gotten excessively long and twisted,
> I think it's important that, for hooks attached to a cgroup, you be
> able to tell in a generic way whether something is plugged into the
> hook.  The natural way to see a cgroup's configuration is to read from
> cgroupfs, so I think that reading from cgroupfs should show you that a
> BPF program is attached and also give enough information that, once
> bpf programs become dumpable, you can dump the program (using the
> bpf() syscall or whatever).

[...]

> There isn't a big semantic difference between
> 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
> CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
> O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
> difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
> because the permission check is much weaker.

Okay, if you have such a control file, you can of course do something
like that. When we discussed things back then with Tejun however, we
concluded that a controller that is not completely controllable through
control knobs that can be written and read via cat is meaningless.
That's why this has become a 'hidden' cgroup feature.

With your proposed API, you'd first go to the bpf(2) syscall in order to
get a prog fd, and then come back to some sort of cgroup API to put the
fd in there. That's quite a mix and match, which is why we considered
the API cleaner in its current form, as everything that is related to
bpf is encapsulated behind a single syscall.

> My preference would be to do an ioctl on a new
> /cgroup/NAME/network_hooks.inet_ingress file.  Reading that file tells
> you whether something is attached and hopefully also gives enough
> information (a hash of the BPF program, perhaps) to dump the actual
> program using future bpf() interfaces.  write() and ioctl() can be
> used to configure it as appropriate.

So am I reading this right? You're proposing to add ioctl() hooks to
kernfs/cgroupfs? That would open more possibilities of course, but I'm
not sure where that rabbit hole leads us eventually.

> Another option that I like less would be to have a
> /cgroup/NAME/cgroup.bpf that lists all the active hooks along with
> their contents.  You would do an ioctl() on that to program a hook and
> you could read it to see what's there.

Yes, read() could, in theory, give you similar information than ioctl(),
but in human-readable form.

> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
> It doesn't make a semantic difference, except that I dislike
> BPF_PROG_DETACH because that particular command isn't BPF-specific at
> all.

Well, I think it is; it pops the bpf program from a target and drops the
reference on it. It's not much code, but it's certainly bpf-specific.

>>> So if I set up a cgroup that's monitored and call it /cgroup/a and
>>> enable delegation and if the program running there wants to do its own
>>> monitoring in /cgroup/a/b (via delegation), then you really want the
>>> outer monitor to silently drop events coming from /cgroup/a/b?
>>
>> That's a fair point, and we've discussed it as well. The issue is, as
>> Alexei already pointed out, that we do not want to traverse the tree up
>> to the root for nested cgroups due to the runtime costs in the
>> networking fast-path. After all, we're running the bpf program for each
>> packet in flight. Hence, we opted for the approach to only look at the
>> leaf node for now, with the ability to open it up further in the future
>> using flags during attach etc.
> 
> Careful here!  You don't look only at the leaf node for now.  You do a
> fancy traversal and choose the nearest node that has a hook set up.

But we do the 'complex' operation at attach time or when a cgroup is
created, both of which are slow-path operations. In the fast-path, we
only look at the leaf, which may or may not have an effective program
installed. And that's of course much cheaper then doing the traversing
for each packet.

> mkdir /cgroup/foo
> BPF_PROG_ATTACH(some program to foo)
> mkdir /cgroup/foo/bar
> chown -R some_user /cgroup/foo/bar
> 
> If the kernel only looked at the leaf, then the program that did the
> above would not expect that the program would constrain
> /cgroup/foo/bar's activity.  But, as it stands, the program *would*
> expect /cgroup/foo/bar to be constrained, except that, whenever the
> capable() check changes to ns_capable() (which will happen eventually
> one way or another), then the bad guy can create /cgroup/foo/bar/baz,
> install a new no-op hook there, and break the security assumption.
> 
> IOW, I think that totally non-recursive hooks are okay from a security
> perspective, albeit rather strange, but the current design is not okay
> from a security perspective.

We locked down the ability to override any of these programs with
CAP_NET_ADMIN, which is also what it takes to flush iptables, right?
What's the difference?

> So here's a fleshed-out possible version that's a bit of a compromise
> after sleeping on this.  There's plenty of room to tweak this.
> 
> Each cgroup gets a new file cgroup.hooks.  Reading it shows a list of
> active hooks.  (A hook can be a string like "network.inet_ingress".)
> 
> You can write a command like "-network.inet_ingress off" to it to
> disable network.inet_ingress.  You can write a command like
> "+network.inet_ingress" to it to enable the network.inet_ingress hook.
> 
> When a hook (e.g. network.inet_ingress) is enabled, a new file appears
> in the cgroup called "hooks.network.inet_ingress").  You can read it
> to get an indication of what is currently installed in that slot.  You
> can write "none" to it to cause nothing to be installed in that slot.
> (This replaces BPF_PROG_DETACH.).  You can open it for write and use
> bpf() or perhaps ioctl() to attach a bpf program.  Maybe you can also
> use bpf() to dump the bpf program, but, regardless, if a bpf program
> is there, read() will return some string that contains "bpf" and maybe
> some other useful information.

I can see where you're going, but I don't know yet if if I like this
approach better, given that you would still need a binary interface at
least at attach time, and that such an interface would use a resource
returned from bpf(2). The ability to read from control files in order to
see what's going on is nice though.

I'd like to have Tejun's and Alexei's opinion on this - as I said, I had
something like that (albeit much simpler) in one of my very early
drafts, but we consented to do the hookup the other way around, for
stated reasons.


Thanks,
Daniel

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-20 18:49 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
	Kees Cook, Jann Horn, Tejun Heo, David Ahern, David S. Miller,
	Thomas Graf, Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel@vger.kernel.org, Network Development
In-Reply-To: <9e378fb1-23ff-a239-d915-3d9aa26a999e@zonque.org>

On Tue, Dec 20, 2016 at 10:36 AM, Daniel Mack <daniel@zonque.org> wrote:
> Hi,
>
> On 12/20/2016 06:23 PM, Andy Lutomirski wrote:
>> On Tue, Dec 20, 2016 at 2:21 AM, Daniel Mack <daniel@zonque.org> wrote:
>
>> To clarify, since this thread has gotten excessively long and twisted,
>> I think it's important that, for hooks attached to a cgroup, you be
>> able to tell in a generic way whether something is plugged into the
>> hook.  The natural way to see a cgroup's configuration is to read from
>> cgroupfs, so I think that reading from cgroupfs should show you that a
>> BPF program is attached and also give enough information that, once
>> bpf programs become dumpable, you can dump the program (using the
>> bpf() syscall or whatever).
>
> [...]
>
>> There isn't a big semantic difference between
>> 'open("/cgroup/NAME/some.control.file", O_WRONLY); ioctl(...,
>> CGROUP_ATTACH_BPF, ...)' and 'open("/cgroup/NAME/some.control.file",
>> O_WRONLY); bpf(BPF_PROG_ATTACH, ...);'.  There is, however, a semantic
>> difference when you do open("/cgroup/NAME", O_RDONLY | O_DIRECTORY)
>> because the permission check is much weaker.
>
> Okay, if you have such a control file, you can of course do something
> like that. When we discussed things back then with Tejun however, we
> concluded that a controller that is not completely controllable through
> control knobs that can be written and read via cat is meaningless.
> That's why this has become a 'hidden' cgroup feature.
>
> With your proposed API, you'd first go to the bpf(2) syscall in order to
> get a prog fd, and then come back to some sort of cgroup API to put the
> fd in there. That's quite a mix and match, which is why we considered
> the API cleaner in its current form, as everything that is related to
> bpf is encapsulated behind a single syscall.

You already have to do bpf() to get a prog fd, then open() to get a
cgroup fd, then bpf() or ioctl() to attach, so this isn't much
different, and its exactly the same number of syscalls.

>
>> My preference would be to do an ioctl on a new
>> /cgroup/NAME/network_hooks.inet_ingress file.  Reading that file tells
>> you whether something is attached and hopefully also gives enough
>> information (a hash of the BPF program, perhaps) to dump the actual
>> program using future bpf() interfaces.  write() and ioctl() can be
>> used to configure it as appropriate.
>
> So am I reading this right? You're proposing to add ioctl() hooks to
> kernfs/cgroupfs? That would open more possibilities of course, but I'm
> not sure where that rabbit hole leads us eventually.

Indeed.  I already have a test patch to add ioctl() to kernfs.  Adding
it to cgroupfs shouldn't be much more complicated.

>
>> Another option that I like less would be to have a
>> /cgroup/NAME/cgroup.bpf that lists all the active hooks along with
>> their contents.  You would do an ioctl() on that to program a hook and
>> you could read it to see what's there.
>
> Yes, read() could, in theory, give you similar information than ioctl(),
> but in human-readable form.
>
>> FWIW, everywhere I say ioctl(), the bpf() syscall would be okay, too.
>> It doesn't make a semantic difference, except that I dislike
>> BPF_PROG_DETACH because that particular command isn't BPF-specific at
>> all.
>
> Well, I think it is; it pops the bpf program from a target and drops the
> reference on it. It's not much code, but it's certainly bpf-specific.

I mean the interface isn't bpf-specific.  If there was something that
wasn't bpf attached to the target, you'd still want an API to detach
it.

>
>>>> So if I set up a cgroup that's monitored and call it /cgroup/a and
>>>> enable delegation and if the program running there wants to do its own
>>>> monitoring in /cgroup/a/b (via delegation), then you really want the
>>>> outer monitor to silently drop events coming from /cgroup/a/b?
>>>
>>> That's a fair point, and we've discussed it as well. The issue is, as
>>> Alexei already pointed out, that we do not want to traverse the tree up
>>> to the root for nested cgroups due to the runtime costs in the
>>> networking fast-path. After all, we're running the bpf program for each
>>> packet in flight. Hence, we opted for the approach to only look at the
>>> leaf node for now, with the ability to open it up further in the future
>>> using flags during attach etc.
>>
>> Careful here!  You don't look only at the leaf node for now.  You do a
>> fancy traversal and choose the nearest node that has a hook set up.
>
> But we do the 'complex' operation at attach time or when a cgroup is
> created, both of which are slow-path operations. In the fast-path, we
> only look at the leaf, which may or may not have an effective program
> installed. And that's of course much cheaper then doing the traversing
> for each packet.

You would never traverse the full hierarchy for each packet.  You'd
have a linked list of programs that are attached, kind of like how
there's an "effective" array right now.  I sent out pseudocode earlier
in the thread.

>
>> mkdir /cgroup/foo
>> BPF_PROG_ATTACH(some program to foo)
>> mkdir /cgroup/foo/bar
>> chown -R some_user /cgroup/foo/bar
>>
>> If the kernel only looked at the leaf, then the program that did the
>> above would not expect that the program would constrain
>> /cgroup/foo/bar's activity.  But, as it stands, the program *would*
>> expect /cgroup/foo/bar to be constrained, except that, whenever the
>> capable() check changes to ns_capable() (which will happen eventually
>> one way or another), then the bad guy can create /cgroup/foo/bar/baz,
>> install a new no-op hook there, and break the security assumption.
>>
>> IOW, I think that totally non-recursive hooks are okay from a security
>> perspective, albeit rather strange, but the current design is not okay
>> from a security perspective.
>
> We locked down the ability to override any of these programs with
> CAP_NET_ADMIN, which is also what it takes to flush iptables, right?
> What's the difference?

For iptables, it's ns_capable() now, and there have been a number of
holes in it.  For cgroup, it's going to turn in to ns_capable() sooner
or later, and it would be nice to be ready for it.

--Andy

^ permalink raw reply

* Re: [PATCH perf/core REBASE 3/5] tools lib bpf: Add bpf_prog_{attach,detach}
From: Joe Stringer @ 2016-12-20 18:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: LKML, netdev, Wang Nan, ast, Daniel Borkmann
In-Reply-To: <20161220143217.GC32756@kernel.org>

On 20 December 2016 at 06:32, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Em Tue, Dec 20, 2016 at 11:18:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Dec 14, 2016 at 02:43:40PM -0800, Joe Stringer escreveu:
>> > Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
>> > eBPF programs to cgroups") added these functions to samples/libbpf, but
>> > during this merge all of the samples libbpf functionality is shifting to
>> > tools/lib/bpf. Shift these functions there.
>> >
>> > Signed-off-by: Joe Stringer <joe@ovn.org>
>> > ---
>> > Arnaldo, this is a new patch you didn't previously review which I've
>> > prepared due to the conflict with net-next. I figured it's better to try
>> > to get samples/bpf properly switched over this window rather than defer the
>> > problem and end up having to deal with another merge problem next time
>> > around. I hope that is fine for you. If not, this patch onwards will need
>> > to be dropped
>> >
>> > It's a simple copy/paste/delete with a minor change for sys_bpf() vs
>> > syscall().
>> > ---
>> >  samples/bpf/libbpf.c | 21 ---------------------
>> >  samples/bpf/libbpf.h |  3 ---
>> >  tools/lib/bpf/bpf.c  | 21 +++++++++++++++++++++
>> >  tools/lib/bpf/bpf.h  |  3 +++
>> >  4 files changed, 24 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
>> > index 3391225ad7e9..d9af876b4a2c 100644
>> > --- a/samples/bpf/libbpf.c
>> > +++ b/samples/bpf/libbpf.c
>> > @@ -11,27 +11,6 @@
>> >  #include <arpa/inet.h>
>> >  #include "libbpf.h"
>> >
>> > -int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
>> > -{
>> > -   union bpf_attr attr = {
>> > -           .target_fd = target_fd,
>> > -           .attach_bpf_fd = prog_fd,
>> > -           .attach_type = type,
>> > -   };
>> > -
>> > -   return syscall(__NR_bpf, BPF_PROG_ATTACH, &attr, sizeof(attr));
>>
>> This one makes it fail for CentOS 5 and 6, others may fail as well,
>> still building, investigating...
>
> Ok, fixed it by making it follow the model of the other sys_bpf wrappers
> setting up that bpf_attr union wrt initializing unamed struct members:
>
>  int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
>  {
> -       union bpf_attr attr = {
> -               .target_fd = target_fd,
> -               .attach_bpf_fd = prog_fd,
> -               .attach_type = type,
> -       };
> +       union bpf_attr attr;
> +
> +       bzero(&attr, sizeof(attr));
> +       attr.target_fd     = target_fd;
> +       attr.attach_bpf_fd = prog_fd;
> +       attr.attach_type   = type;
>
>         return sys_bpf(BPF_PROG_ATTACH, &attr, sizeof(attr));
>  }

Ah, I just shifted these across originally so the delta would be
minimal but now I know why this code is like this. Thanks.

^ permalink raw reply

* Re: [PATCH net 0/3] Fix integration of eee-broken-modes
From: David Miller @ 2016-12-20 18:51 UTC (permalink / raw)
  To: jbrunet
  Cc: netdev, devicetree, f.fainelli, carlo, khilman,
	martin.blumenstingl, neolynx, andrew, narmstrong, linux-amlogic,
	linux-arm-kernel, linux-kernel, julia.lawall, yegorslists,
	afaerber
In-Reply-To: <1482159938-13239-1-git-send-email-jbrunet@baylibre.com>

From: Jerome Brunet <jbrunet@baylibre.com>
Date: Mon, 19 Dec 2016 16:05:35 +0100

> The purpose of this series is to fix the integration of the ethernet phy
> property "eee-broken-modes" [0]
> 
> The v3 of this series has been merged, missing a fix (error reported by
> kbuild robot) available in the v4 [1]
> 
> More importantly, Florian opposed adding a DT property mapping a device
> register this directly [2]. The concern was that the property could be
> abused to implement platform configuration policy. After discussing it,
> I think we agreed that such information about the HW (defect) should appear
> in the platform DT. However, the preferred way is to add a boolean property
> for each EEE broken mode.
> 
> [0]: http://lkml.kernel.org/r/1480326409-25419-1-git-send-email-jbrunet@baylibre.com
> [1]: http://lkml.kernel.org/r/1480348229-25672-1-git-send-email-jbrunet@baylibre.com
> [2]: http://lkml.kernel.org/r/e14a3b0c-dc34-be14-48b3-518a0ad0c080@gmail.com

Series applied, thank you.

^ permalink raw reply

* Re: Soft lockup in tc_classify
From: Shahar Klein @ 2016-12-20  6:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: shahark, Or Gerlitz, Daniel Borkmann, Linux Netdev List,
	Roi Dayan, David Miller, Jiri Pirko, John Fastabend,
	Hadar Hen Zion
In-Reply-To: <CAM_iQpXUQYvvXonEXe0czd4osL5YxZ+G5B-PUddautcHnGOtQw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3353 bytes --]



On 12/19/2016 7:58 PM, Cong Wang wrote:
> Hello,
>
> On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein <shahark@mellanox.com> wrote:
>>
>>
>> On 12/13/2016 12:51 AM, Cong Wang wrote:
>>>
>>> On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>
>>>> On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann <daniel@iogearbox.net>
>>>> wrote:
>>>>
>>>>> Note that there's still the RCU fix missing for the deletion race that
>>>>> Cong will still send out, but you say that the only thing you do is to
>>>>> add a single rule, but no other operation in involved during that test?
>>>>
>>>>
>>>> What's missing to have the deletion race fixed? making a patch or
>>>> testing to a patch which was sent?
>>>
>>>
>>> If you think it would help for this problem, here is my patch rebased
>>> on the latest net-next.
>>>
>>> Again, I don't see how it could help this case yet, especially I don't
>>> see how we could have a loop in this singly linked list.
>>>
>>
>> I've applied cong's patch and hit a different lockup(full log attached):
>
>
> Are you sure this is really different? For me, it is still inside the loop
> in tc_classify(), with only a slightly different offset.
>
>
>>
>> Daniel suggested I'll add a print:
>>                 case RTM_DELTFILTER:
>> -                   err = tp->ops->delete(tp, fh);
>> +                 printk(KERN_ERR "DEBUGG:SK %s:%d\n", __func__, __LINE__);
>> +                 err = tp->ops->delete(tp, fh, &last);
>>                         if (err == 0) {
>>
>> and I couldn't see this print in the output.....
>
> Hmm, that is odd, if this never prints, then my patch should not make any
> difference.
>
> There are still two other cases where we could change tp->next, so do you
> mind to add two more printk's for debugging?
>
> Attached is the delta patch.
>
> Thanks!
>

I've added a slightly different debug print:
@@ -368,11 +375,12 @@ static int tc_ctl_tfilter(struct sk_buff *skb, 
struct nlmsghdr *n)
                 if (tp_created) {
                         RCU_INIT_POINTER(tp->next, 
rtnl_dereference(*back));
                         rcu_assign_pointer(*back, tp);
+                 printk(KERN_ERR "DEBUGG:SK add/change filter by: %pf 
tp=%p tp->next=%p\n", tp->ops->get, tp, tp->next);
                 }
                 tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false);

full output attached:

[  283.290271] Mirror/redirect action on
[  283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9432d704df60 tp->next=          (null)
[  283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d240 tp->next=          (null)
[  283.359997] GACT probability on
[  283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d240
[  283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[swapper/0:0]


Thanks
Shahar




[-- Attachment #2: tp_p_debug.log --]
[-- Type: text/plain, Size: 18431 bytes --]

[  283.290271] Mirror/redirect action on
[  283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9432d704df60 tp->next=          (null)
[  283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d240 tp->next=          (null)
[  283.359997] GACT probability on
[  283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d240
[  283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
[  308.547322] Modules linked in: act_gact act_mirred openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 vfio_pci vfio_virqfd vfio_iommu_type1 vfio cls_flower mlx5_ib mlx5_core devlink sch_ingress nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun ebtable_filter ebtables ip6table_filter ip6_tables netconsole rpcrdma bridge ib_isert stp iscsi_target_mod llc ib_iser libiscsi scsi_transport_iscsi ib_srpt ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel kvm igb irqbypass joydev ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel ptp ipmi_si iTCO_vendor_support pcspkr ghash_clmulni_intel wmi pps_core i2c_algo_bit ipmi_msghandler mei_me i2c_i801 ioatdma tpm_tis mei shpchp i2c_smbus dca tpm_tis_core lpc_ich tpm nfsd target_core_mod auth_rpcgss nfs_acl lockd grace sunrpc isci libsas serio_raw scsi_transport_sas [last unloaded: devlink]
[  308.668291] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0+ #31
[  308.675337] Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0a 08/08/2013
[  308.683060] task: ffffffff94e0e500 task.stack: ffffffff94e00000
[  308.690012] RIP: 0010:fl_classify+0xb/0x2b0 [cls_flower]
[  308.696275] RSP: 0018:ffff9432efa03c20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  308.705396] RAX: 0000000000000008 RBX: ffff9432b59c4100 RCX: 0000000000000000
[  308.713704] RDX: ffff9432efa03c98 RSI: ffff9436e718d3c0 RDI: ffff9432b59c4100
[  308.722099] RBP: ffff9432efa03c28 R08: 000000000000270f R09: 0000000000000000
[  308.730409] R10: 0000000000000000 R11: 0000000000000004 R12: ffff9432efa03c98
[  308.738713] R13: 0000000000000008 R14: ffff9436e718d3c0 R15: 0000000000000001
[  308.747013] FS:  0000000000000000(0000) GS:ffff9432efa00000(0000) knlGS:0000000000000000
[  308.756625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  308.763378] CR2: 00007f5415f67914 CR3: 00000005fde07000 CR4: 00000000000426f0
[  308.771684] Call Trace:
[  308.774739]  <IRQ>
[  308.777311]  tc_classify+0x78/0x120
[  308.781549]  __netif_receive_skb_core+0x623/0xa00
[  308.787141]  ? udp4_gro_receive+0x10b/0x2d0
[  308.792143]  __netif_receive_skb+0x18/0x60
[  308.797048]  netif_receive_skb_internal+0x40/0xb0
[  308.802637]  napi_gro_receive+0xcd/0x120
[  308.807462]  mlx5e_handle_rx_cqe_rep+0x61b/0x890 [mlx5_core]
[  308.814123]  mlx5e_poll_rx_cq+0x83/0x840 [mlx5_core]
[  308.820015]  mlx5e_napi_poll+0x89/0x480 [mlx5_core]
[  308.825818]  net_rx_action+0x260/0x3c0
[  308.830334]  __do_softirq+0xc9/0x28c
[  308.834658]  irq_exit+0xd7/0xe0
[  308.838492]  do_IRQ+0x51/0xd0
[  308.842132]  common_interrupt+0x93/0x93
[  308.846747] RIP: 0010:cpuidle_enter_state+0xe1/0x260
[  308.852624] RSP: 0018:ffffffff94e03dc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffa2
[  308.861766] RAX: ffff9432efa19600 RBX: ffff9432efa23600 RCX: 000000000000001f
[  308.870077] RDX: 0000000000000000 RSI: ffff9432efa16cd8 RDI: 0000000000000000
[  308.878379] RBP: ffffffff94e03e00 R08: 0000000000000001 R09: cccccccccccccccd
[  308.886690] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000001
[  308.895000] R13: 0000000000000000 R14: ffffffff94ec79a0 R15: 00000041fab01c8d
[  308.903306]  </IRQ>
[  308.905978]  ? cpuidle_enter_state+0xc0/0x260
[  308.911173]  cpuidle_enter+0x17/0x20
[  308.915498]  call_cpuidle+0x23/0x40
[  308.919721]  do_idle+0x172/0x200
[  308.923656]  cpu_startup_entry+0x71/0x80
[  308.928370]  rest_init+0x77/0x80
[  308.932304]  start_kernel+0x4a6/0x4c7
[  308.936723]  ? set_init_arg+0x55/0x55
[  308.941141]  ? early_idt_handler_array+0x120/0x120
[  308.946823]  x86_64_start_reservations+0x24/0x26
[  308.952314]  x86_64_start_kernel+0x14c/0x16f
[  308.957418]  start_cpu+0x5/0x14
[  308.961242] Code: a8 4c 89 fe 48 8b 4d c8 48 8d 14 07 4c 89 e7 e8 2c fe ff ff e9 14 ff ff ff 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 <41> 56 41 55 41 54 53 48 81 ec 28 01 00 00 65 48 8b 04 25 28 00 
[  308.989075] Kernel panic - not syncing: softlockup: hung tasks
[  308.995924] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L  4.9.0+ #31
[  309.010442] Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0a 08/08/2013
[  309.018160] Call Trace:
[  309.021211]  <IRQ>
[  309.023776]  dump_stack+0x63/0x8c
[  309.027807]  panic+0xeb/0x239
[  309.031449]  watchdog_timer_fn+0x1e5/0x1f0
[  309.036354]  ? watchdog+0x40/0x40
[  309.040386]  __hrtimer_run_queues+0xee/0x270
[  309.045486]  hrtimer_interrupt+0xa8/0x190
[  309.050293]  local_apic_timer_interrupt+0x35/0x60
[  309.055880]  smp_apic_timer_interrupt+0x38/0x50
[  309.061272]  apic_timer_interrupt+0x93/0xa0
[  309.066272] RIP: 0010:fl_classify+0xb/0x2b0 [cls_flower]
[  309.072538] RSP: 0018:ffff9432efa03c20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  309.081686] RAX: 0000000000000008 RBX: ffff9432b59c4100 RCX: 0000000000000000
[  309.089994] RDX: ffff9432efa03c98 RSI: ffff9436e718d3c0 RDI: ffff9432b59c4100
[  309.098297] RBP: ffff9432efa03c28 R08: 000000000000270f R09: 0000000000000000
[  309.106603] R10: 0000000000000000 R11: 0000000000000004 R12: ffff9432efa03c98
[  309.114914] R13: 0000000000000008 R14: ffff9436e718d3c0 R15: 0000000000000001
[  309.123229]  tc_classify+0x78/0x120
[  309.127452]  __netif_receive_skb_core+0x623/0xa00
[  309.133031]  ? udp4_gro_receive+0x10b/0x2d0
[  309.138033]  __netif_receive_skb+0x18/0x60
[  309.142949]  netif_receive_skb_internal+0x40/0xb0
[  309.148534]  napi_gro_receive+0xcd/0x120
[  309.153259]  mlx5e_handle_rx_cqe_rep+0x61b/0x890 [mlx5_core]
[  309.159918]  mlx5e_poll_rx_cq+0x83/0x840 [mlx5_core]
[  309.165823]  mlx5e_napi_poll+0x89/0x480 [mlx5_core]
[  309.171608]  net_rx_action+0x260/0x3c0
[  309.176238]  __do_softirq+0xc9/0x28c
[  309.180563]  irq_exit+0xd7/0xe0
[  309.184395]  do_IRQ+0x51/0xd0
[  309.188035]  common_interrupt+0x93/0x93
[  309.192651] RIP: 0010:cpuidle_enter_state+0xe1/0x260
[  309.198527] RSP: 0018:ffffffff94e03dc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffa2
[  309.207651] RAX: ffff9432efa19600 RBX: ffff9432efa23600 RCX: 000000000000001f
[  309.215959] RDX: 0000000000000000 RSI: ffff9432efa16cd8 RDI: 0000000000000000
[  309.224268] RBP: ffffffff94e03e00 R08: 0000000000000001 R09: cccccccccccccccd
[  309.232573] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000001
[  309.240881] R13: 0000000000000000 R14: ffffffff94ec79a0 R15: 00000041fab01c8d
[  309.249187]  </IRQ>
[  309.251858]  ? cpuidle_enter_state+0xc0/0x260
[  309.257057]  cpuidle_enter+0x17/0x20
[  309.261382]  call_cpuidle+0x23/0x40
[  309.265635]  do_idle+0x172/0x200
[  309.269604]  cpu_startup_entry+0x71/0x80
[  309.274314]  rest_init+0x77/0x80
[  309.278247]  start_kernel+0x4a6/0x4c7
[  309.282668]  ? set_init_arg+0x55/0x55
[  309.287089]  ? early_idt_handler_array+0x120/0x120
[  309.292771]  x86_64_start_reservations+0x24/0x26
[  309.298262]  x86_64_start_kernel+0x14c/0x16f
[  309.303361]  start_cpu+0x5/0x14
[  309.307245] Kernel Offset: 0x13000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  310.573997] ---[ end Kernel panic - not syncing: softlockup: hung tasks
[  310.581734] ------------[ cut here ]------------
[  310.587236] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff94065c14 (native_write_msr+0x4/0x30)
[  310.602404] Call Trace:
[  310.605472]  <IRQ>
[  310.608066]  ? native_apic_msr_write+0x30/0x40
[  310.613371]  x2apic_send_IPI_self+0x1d/0x20
[  310.618390]  arch_irq_work_raise+0x28/0x40
[  310.623309]  irq_work_queue+0x6e/0x80
[  310.627724]  wake_up_klogd+0x34/0x40
[  310.632045]  console_unlock+0x4dc/0x540
[  310.636659]  vprintk_emit+0x2eb/0x4b0
[  310.641091]  ? native_smp_send_reschedule+0x3f/0x50
[  310.646871]  vprintk_default+0x29/0x40
[  310.651393]  printk+0x5d/0x74
[  310.655034]  ? native_smp_send_reschedule+0x3f/0x50
[  310.660807]  __warn+0x3b/0xf0
[  310.664450]  warn_slowpath_null+0x1d/0x20
[  310.669262]  native_smp_send_reschedule+0x3f/0x50
[  310.674849]  try_to_wake_up+0x312/0x390
[  310.679456]  default_wake_function+0x12/0x20
[  310.684560]  __wake_up_common+0x55/0x90
[  310.689170]  __wake_up_locked+0x13/0x20
[  310.693788]  ep_poll_callback+0xbb/0x240
[  310.698493]  __wake_up_common+0x55/0x90
[  310.703101]  __wake_up+0x39/0x50
[  310.707028]  wake_up_klogd_work_func+0x40/0x60
[  310.712316]  irq_work_run_list+0x4d/0x70
[  310.717022]  irq_work_run+0x2c/0x40
[  310.721243]  smp_irq_work_interrupt+0x2e/0x40
[  310.726443]  irq_work_interrupt+0x93/0xa0
[  310.731253] RIP: 0010:panic+0x1f5/0x239
[  310.735876] RSP: 0018:ffff9432efa039e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff09
[  310.744995] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[  310.753294] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff9432efa0e060
[  310.761594] RBP: ffff9432efa03a58 R08: 0000000000000674 R09: ffff942e800bb3e0
[  310.769900] R10: 00000000000000ef R11: 0000000000000198 R12: ffffffff94c4a4a9
[  310.778199] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9432efa03b78
[  310.786505]  ? panic+0x1f1/0x239
[  310.790444]  watchdog_timer_fn+0x1e5/0x1f0
[  310.795353]  ? watchdog+0x40/0x40
[  310.799401]  __hrtimer_run_queues+0xee/0x270
[  310.804501]  hrtimer_interrupt+0xa8/0x190
[  310.809318]  local_apic_timer_interrupt+0x35/0x60
[  310.814895]  smp_apic_timer_interrupt+0x38/0x50
[  310.820282]  apic_timer_interrupt+0x93/0xa0
[  310.825287] RIP: 0010:fl_classify+0xb/0x2b0 [cls_flower]
[  310.831554] RSP: 0018:ffff9432efa03c20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  310.840693] RAX: 0000000000000008 RBX: ffff9432b59c4100 RCX: 0000000000000000
[  310.849007] RDX: ffff9432efa03c98 RSI: ffff9436e718d3c0 RDI: ffff9432b59c4100
[  310.857402] RBP: ffff9432efa03c28 R08: 000000000000270f R09: 0000000000000000
[  310.865712] R10: 0000000000000000 R11: 0000000000000004 R12: ffff9432efa03c98
[  310.874020] R13: 0000000000000008 R14: ffff9436e718d3c0 R15: 0000000000000001
[  310.882337]  tc_classify+0x78/0x120
[  310.886568]  __netif_receive_skb_core+0x623/0xa00
[  310.892157]  ? udp4_gro_receive+0x10b/0x2d0
[  310.897151]  __netif_receive_skb+0x18/0x60
[  310.902057]  netif_receive_skb_internal+0x40/0xb0
[  310.907643]  napi_gro_receive+0xcd/0x120
[  310.912370]  mlx5e_handle_rx_cqe_rep+0x61b/0x890 [mlx5_core]
[  310.919031]  mlx5e_poll_rx_cq+0x83/0x840 [mlx5_core]
[  310.924924]  mlx5e_napi_poll+0x89/0x480 [mlx5_core]
[  310.930808]  net_rx_action+0x260/0x3c0
[  310.935319]  __do_softirq+0xc9/0x28c
[  310.939658]  irq_exit+0xd7/0xe0
[  310.943485]  do_IRQ+0x51/0xd0
[  310.947124]  common_interrupt+0x93/0x93
[  310.951748] RIP: 0010:cpuidle_enter_state+0xe1/0x260
[  310.957616] RSP: 0018:ffffffff94e03dc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffa2
[  310.966743] RAX: ffff9432efa19600 RBX: ffff9432efa23600 RCX: 000000000000001f
[  310.975044] RDX: 0000000000000000 RSI: ffff9432efa16cd8 RDI: 0000000000000000
[  310.983349] RBP: ffffffff94e03e00 R08: 0000000000000001 R09: cccccccccccccccd
[  310.991654] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000001
[  310.999952] R13: 0000000000000000 R14: ffffffff94ec79a0 R15: 00000041fab01c8d
[  311.008254]  </IRQ>
[  311.010926]  ? cpuidle_enter_state+0xc0/0x260
[  311.016122]  cpuidle_enter+0x17/0x20
[  311.020430]  call_cpuidle+0x23/0x40
[  311.024658]  do_idle+0x172/0x200
[  311.028583]  cpu_startup_entry+0x71/0x80
[  311.033295]  rest_init+0x77/0x80
[  311.037233]  start_kernel+0x4a6/0x4c7
[  311.041646]  ? set_init_arg+0x55/0x55
[  311.046068]  ? early_idt_handler_array+0x120/0x120
[  311.051752]  x86_64_start_reservations+0x24/0x26
[  311.057238]  x86_64_start_kernel+0x14c/0x16f
[  311.062339]  start_cpu+0x5/0x14
[  311.066180] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3f/0x50
[  311.076956] Modules linked in: act_gact act_mirred openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 vfio_pci vfio_virqfd vfio_iommu_type1 vfio cls_flower mlx5_ib mlx5_core devlink sch_ingress nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun ebtable_filter ebtables ip6table_filter ip6_tables netconsole rpcrdma bridge ib_isert stp iscsi_target_mod llc ib_iser libiscsi scsi_transport_iscsi ib_srpt ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp kvm_intel kvm igb irqbypass joydev ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel ptp ipmi_si iTCO_vendor_support pcspkr ghash_clmulni_intel wmi pps_core i2c_algo_bit ipmi_msghandler mei_me i2c_i801 ioatdma tpm_tis mei shpchp i2c_smbus dca tpm_tis_core lpc_ich tpm nfsd target_core_mod auth_rpcgss nfs_acl lockd grace sunrpc isci libsas serio_raw scsi_transport_sas [last unloaded: devlink]
[  311.198587] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L  4.9.0+ #31
[  311.207253] Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0a 08/08/2013
[  311.214983] Call Trace:
[  311.218051]  <IRQ>
[  311.220626]  dump_stack+0x63/0x8c
[  311.224657]  __warn+0xd1/0xf0
[  311.228298]  warn_slowpath_null+0x1d/0x20
[  311.233116]  native_smp_send_reschedule+0x3f/0x50
[  311.238702]  try_to_wake_up+0x312/0x390
[  311.243318]  default_wake_function+0x12/0x20
[  311.248418]  __wake_up_common+0x55/0x90
[  311.253034]  __wake_up_locked+0x13/0x20
[  311.257641]  ep_poll_callback+0xbb/0x240
[  311.262346]  __wake_up_common+0x55/0x90
[  311.272771]  __wake_up+0x39/0x50
[  311.276697]  wake_up_klogd_work_func+0x40/0x60
[  311.281986]  irq_work_run_list+0x4d/0x70
[  311.286681]  irq_work_run+0x2c/0x40
[  311.290899]  smp_irq_work_interrupt+0x2e/0x40
[  311.296090]  irq_work_interrupt+0x93/0xa0
[  311.300900] RIP: 0010:panic+0x1f5/0x239
[  311.305508] RSP: 0018:ffff9432efa039e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff09
[  311.314630] RAX: 000000000000003b RBX: 0000000000000000 RCX: 0000000000000006
[  311.322936] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff9432efa0e060
[  311.331245] RBP: ffff9432efa03a58 R08: 0000000000000674 R09: ffff942e800bb3e0
[  311.339543] R10: 00000000000000ef R11: 0000000000000198 R12: ffffffff94c4a4a9
[  311.347855] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9432efa03b78
[  311.356167]  ? panic+0x1f1/0x239
[  311.360106]  watchdog_timer_fn+0x1e5/0x1f0
[  311.365004]  ? watchdog+0x40/0x40
[  311.369035]  __hrtimer_run_queues+0xee/0x270
[  311.374132]  hrtimer_interrupt+0xa8/0x190
[  311.378935]  local_apic_timer_interrupt+0x35/0x60
[  311.384511]  smp_apic_timer_interrupt+0x38/0x50
[  311.389897]  apic_timer_interrupt+0x93/0xa0
[  311.394892] RIP: 0010:fl_classify+0xb/0x2b0 [cls_flower]
[  311.401151] RSP: 0018:ffff9432efa03c20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  311.410270] RAX: 0000000000000008 RBX: ffff9432b59c4100 RCX: 0000000000000000
[  311.418580] RDX: ffff9432efa03c98 RSI: ffff9436e718d3c0 RDI: ffff9432b59c4100
[  311.426967] RBP: ffff9432efa03c28 R08: 000000000000270f R09: 0000000000000000
[  311.435278] R10: 0000000000000000 R11: 0000000000000004 R12: ffff9432efa03c98
[  311.443584] R13: 0000000000000008 R14: ffff9436e718d3c0 R15: 0000000000000001
[  311.451889]  tc_classify+0x78/0x120
[  311.456105]  __netif_receive_skb_core+0x623/0xa00
[  311.461683]  ? udp4_gro_receive+0x10b/0x2d0
[  311.466687]  __netif_receive_skb+0x18/0x60
[  311.471593]  netif_receive_skb_internal+0x40/0xb0
[  311.477186]  napi_gro_receive+0xcd/0x120
[  311.481900]  mlx5e_handle_rx_cqe_rep+0x61b/0x890 [mlx5_core]
[  311.488555]  mlx5e_poll_rx_cq+0x83/0x840 [mlx5_core]
[  311.494451]  mlx5e_napi_poll+0x89/0x480 [mlx5_core]
[  311.500233]  net_rx_action+0x260/0x3c0
[  311.504751]  __do_softirq+0xc9/0x28c
[  311.509075]  irq_exit+0xd7/0xe0
[  311.512901]  do_IRQ+0x51/0xd0
[  311.516529]  common_interrupt+0x93/0x93
[  311.521143] RIP: 0010:cpuidle_enter_state+0xe1/0x260
[  311.527011] RSP: 0018:ffffffff94e03dc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffa2
[  311.536123] RAX: ffff9432efa19600 RBX: ffff9432efa23600 RCX: 000000000000001f
[  311.544430] RDX: 0000000000000000 RSI: ffff9432efa16cd8 RDI: 0000000000000000
[  311.552760] RBP: ffffffff94e03e00 R08: 0000000000000001 R09: cccccccccccccccd
[  311.561087] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000001
[  311.569396] R13: 0000000000000000 R14: ffffffff94ec79a0 R15: 00000041fab01c8d
[  311.577714]  </IRQ>
[  311.580393]  ? cpuidle_enter_state+0xc0/0x260
[  311.585591]  cpuidle_enter+0x17/0x20
[  311.589913]  call_cpuidle+0x23/0x40
[  311.594136]  do_idle+0x172/0x200
[  311.598069]  cpu_startup_entry+0x71/0x80
[  311.602782]  rest_init+0x77/0x80
[  311.606713]  start_kernel+0x4a6/0x4c7
[  311.611134]  ? set_init_arg+0x55/0x55
[  311.615547]  ? early_idt_handler_array+0x120/0x120
[  311.621231]  x86_64_start_reservations+0x24/0x26
[  311.626717]  x86_64_start_kernel+0x14c/0x16f
[  311.631810]  start_cpu+0x5/0x14
[  311.635648] ---[ end trace c2fd08dd3d93dab3 ]---



^ permalink raw reply

* Re: [PATCH 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Mark Greer @ 2016-12-20 18:59 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo, robh+dt,
	mark.rutland, netdev, devicetree, linux-kernel, justin,
	Jaret Cantu
In-Reply-To: <1482250592-4268-3-git-send-email-glansberry@gmail.com>

On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote:
> From: Jaret Cantu <jaret.cantu@timesys.com>
> 
> Repeated polling attempts cause a NULL dereference error to occur.
> This is because the state of the trf7970a is currently reading but
> another request has been made to send a command before it has finished.

How is this happening?  Was trf7970a_abort_cmd() called and it didn't
work right?  Was it not called at all and there is a bug in the digital
layer?  More details please.

> The solution is to properly kill the waiting reading (workqueue)
> before failing on the send.

If the bug is in the calling code, then that is what should get fixed.
This seems to be a hack to work-around a digital layer bug.

Mark
--

^ permalink raw reply

* Re: [PATCH net v4 0/4] fsl/fman: fixes for ARM
From: David Miller @ 2016-12-20 19:00 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev, linuxppc-dev, linux-kernel, scott.wood
In-Reply-To: <1482180166-10677-1-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Mon, 19 Dec 2016 22:42:42 +0200

> The patch set fixes advertised speeds for QSGMII interfaces, disables
> A007273 erratum workaround on non-PowerPC platforms where it does not
> apply, enables compilation on ARM64 and addresses a probing issue on
> non PPC platforms.
> 
> Changes from v3: removed redundant comment, added ack by Scott
> Changes from v2: merged fsl/fman changes to avoid a point of failure
> Changes from v1: unifying probing on all supported platforms

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net 1/2] net: netcp: ethss: fix errors in ethtool ops
From: David Miller @ 2016-12-20 19:08 UTC (permalink / raw)
  To: m-karicheri2; +Cc: netdev, linux-kernel
In-Reply-To: <1482188157-24490-1-git-send-email-m-karicheri2@ti.com>

From: Murali Karicheri <m-karicheri2@ti.com>
Date: Mon, 19 Dec 2016 17:55:56 -0500

> From: WingMan Kwok <w-kwok2@ti.com>
> 
> In ethtool ops, it needs to retrieve the corresponding
> ethss module (gbe or xgbe) from the net_device structure.
> Prior to this patch, the retrieving procedure only
> checks for the gbe module.  This patch fixes the issue
> by checking the xgbe module if the net_device structure
> does not correspond to the gbe module.
> 
> Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
> Signed-off-by: Sekhar Nori <nsekhar@ti.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox