Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net] phy: fix error case of phy_led_triggers_(un)register
From: David Miller @ 2016-11-26  0:59 UTC (permalink / raw)
  To: Woojung.Huh; +Cc: zach.brown, netdev, f.fainelli, andrew
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D40969DE2@CHN-SV-EXMX02.mchp-main.com>

From: <Woojung.Huh@microchip.com>
Date: Wed, 23 Nov 2016 23:10:33 +0000

> From: Woojung Huh <woojung.huh@microchip.com>
> 
> When phy_init_hw() fails at phy_attach_direct();
> - phy_detach() calls phy_led_triggers_unregister() without
>   previous call of phy_led_triggers_register().
> - still call phy_led_triggers_register() and cause memory leak.
> 
> Fixes: 2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change")
> Signed-off-by: Woojung Huh <woojung.huh@microchip.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net-next 1/2] sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver
From: kbuild test robot @ 2016-11-26  0:58 UTC (permalink / raw)
  To: Edward Cree; +Cc: kbuild-all, linux-net-drivers, davem, bkenward, netdev
In-Reply-To: <fe387b90-1fd5-2d12-ca46-ff04ed8a28b7@solarflare.com>

[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]

Hi Edward,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Edward-Cree/sfc-split-out-Falcon-driver/20161126-033439
config: i386-randconfig-h1-11260702 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Edward-Cree/sfc-split-out-Falcon-driver/20161126-033439 HEAD 738d215da7cf33cb4f2916dfba4fdb1558829e5a builds fine.
      It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/sfc/falcon/built-in.o: In function `tenxpress_set_id_led':
>> (.text+0x28db4): multiple definition of `tenxpress_set_id_led'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x3bac7): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_stop_nic_stats':
>> (.text+0x1458c): multiple definition of `falcon_stop_nic_stats'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x15a09): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_qt202x_set_led':
>> (.text+0x26915): multiple definition of `falcon_qt202x_set_led'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x39628): first defined here
>> drivers/net/ethernet/sfc/falcon/built-in.o:(.rodata+0x2ba0): multiple definition of `falcon_sfx7101_phy_ops'
   drivers/net/ethernet/sfc/built-in.o:(.rodata+0x48a0): first defined here
>> drivers/net/ethernet/sfc/falcon/built-in.o:(.rodata+0x2ce0): multiple definition of `falcon_txc_phy_ops'
   drivers/net/ethernet/sfc/built-in.o:(.rodata+0x49e0): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_txc_set_gpio_dir':
>> (.text+0x2a104): multiple definition of `falcon_txc_set_gpio_dir'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x3ce17): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_txc_set_gpio_val':
>> (.text+0x2a0b7): multiple definition of `falcon_txc_set_gpio_val'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x3cdca): first defined here
>> drivers/net/ethernet/sfc/falcon/built-in.o:(.rodata+0x2960): multiple definition of `falcon_qt202x_phy_ops'
   drivers/net/ethernet/sfc/built-in.o:(.rodata+0x4660): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_probe_board':
>> (.text+0x2bb13): multiple definition of `falcon_probe_board'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x3e826): first defined here
>> drivers/net/ethernet/sfc/falcon/built-in.o:(.rodata+0x1a00): multiple definition of `falcon_a1_nic_type'
   drivers/net/ethernet/sfc/built-in.o:(.rodata+0x1b60): first defined here
>> ld: Warning: size of symbol `falcon_a1_nic_type' changed from 472 in drivers/net/ethernet/sfc/built-in.o to 356 in drivers/net/ethernet/sfc/falcon/built-in.o
>> drivers/net/ethernet/sfc/falcon/built-in.o:(.rodata+0x1880): multiple definition of `falcon_b0_nic_type'
   drivers/net/ethernet/sfc/built-in.o:(.rodata+0x1980): first defined here
>> ld: Warning: size of symbol `falcon_b0_nic_type' changed from 472 in drivers/net/ethernet/sfc/built-in.o to 356 in drivers/net/ethernet/sfc/falcon/built-in.o
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_reset_xaui':
>> (.text+0x1835c): multiple definition of `falcon_reset_xaui'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x197d9): first defined here
   drivers/net/ethernet/sfc/falcon/built-in.o: In function `falcon_start_nic_stats':
>> (.text+0x14327): multiple definition of `falcon_start_nic_stats'
   drivers/net/ethernet/sfc/built-in.o:(.text+0x157a4): first defined here

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30036 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] Add support for the MV88e6097
From: David Miller @ 2016-11-26  0:54 UTC (permalink / raw)
  To: eichest; +Cc: andrew, vivien.didelot, netdev, stefan.eichenberger
In-Reply-To: <20161123205952.6231-1-stefan.eichenberger@netmodule.com>

From: Stefan Eichenberger <eichest@gmail.com>
Date: Wed, 23 Nov 2016 21:59:50 +0100

> This patchset will add support for the MV88E6097 DSA switch and enable
> EDSA on MV88E6097 family devices.
> 
> Changes since v1:
> - Add missing g1_irqs = 8
> - Add missing comment after mv88e6097_ops
> - Change patch order

This doesn't apply cleanly to net-next, please respin.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: properly flush delay-freed skbs
From: David Miller @ 2016-11-26  0:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, brouer, alexander.h.duyck
In-Reply-To: <1479919496.8455.509.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Nov 2016 08:44:56 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Typical NAPI drivers use napi_consume_skb(skb) at TX completion time.
> This put skb in a percpu special queue, napi_alloc_cache, to get bulk
> frees.
> 
> It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE
> limit quite often, with skbs that were queued hundreds of usec earlier.
> I measured this can take ~6000 nsec to perform one flush.
> 
> __kfree_skb_flush() can be called from two points right now :
> 
> 1) From net_tx_action(), but only for skbs that were queued to
> sd->completion_queue.
> 
>  -> Irrelevant for NAPI drivers in normal operation.
> 
> 2) From net_rx_action(), but only under high stress or if RPS/RFS has a
> pending action.
> 
> This patch changes net_rx_action() to perform the flush in all cases and
> after more urgent operations happened (like kicking remote CPUS for
> RPS/RFS).
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>

Applied.

^ permalink raw reply

* [PATCH net-next 6/6] bpf: fix multiple issues in selftest suite and samples
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann, William Tu
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

1) The test_lru_map and test_lru_dist fails building on my machine since
   the sys/resource.h header is not included.

2) test_verifier fails in one test case where we try to call an invalid
   function, since the verifier log output changed wrt printing function
   names.

3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
   retrieving the number of possible CPUs. This is broken at least in our
   scenario and really just doesn't work.

   glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
   First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
   if that fails, depending on the config, it either tries to count CPUs
   in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
   If /proc/cpuinfo has some issue, it returns just 1 worst case. This
   oddity is nothing new [1], but semantics/behaviour seems to be settled.
   _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
   that fails it looks into /proc/stat for cpuX entries, and if also that
   fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
   unlikely all breaks down).

   While that might match num_possible_cpus() from the kernel in some
   cases, it's really not guaranteed with CPU hotplugging, and can result
   in a buffer overflow since the array in user space could have too few
   number of slots, and on perpcu map lookup, the kernel will write beyond
   that memory of the value buffer.

   William Tu reported such mismatches:

     [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
     happens when CPU hotadd is enabled. For example, in Fusion when
     setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
     -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
     will be 2 [2]. [...]

   Documentation/cputopology.txt says /sys/devices/system/cpu/possible
   outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
   so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
   own implementation. Later, we could add support to bpf(2) for passing
   a mask via CPU_SET(3), for example, to just select a subset of CPUs.

   BPF samples code needs this fix as well (at least so that people stop
   copying this). Thus, define bpf_num_possible_cpus() once in selftests
   and import it from there for the sample code to avoid duplicating it.
   The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.

After all three issues are fixed, the test suite runs fine again:

  # make run_tests | grep self
  selftests: test_verifier [PASS]
  selftests: test_maps [PASS]
  selftests: test_lru_map [PASS]
  selftests: test_kmod.sh [PASS]

  [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
  [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html

Fixes: 3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191d2 ("Add sample for adding simple drop program to link")
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile                        |  1 +
 samples/bpf/test_lru_dist.c                 |  5 +++-
 samples/bpf/tracex2_user.c                  |  4 ++-
 samples/bpf/tracex3_user.c                  |  6 +++--
 samples/bpf/xdp1_user.c                     |  4 ++-
 tools/testing/selftests/bpf/bpf_util.h      | 38 +++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_lru_map.c  |  8 ++++--
 tools/testing/selftests/bpf/test_maps.c     |  7 +++---
 tools/testing/selftests/bpf/test_verifier.c |  2 +-
 9 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_util.h

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index fb17206..22b6407e 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -91,6 +91,7 @@ always += trace_event_kern.o
 always += sampleip_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
+HOSTCFLAGS += -I$(objtree)/tools/testing/selftests/bpf/
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
 HOSTLOADLIBES_fds_example += -lelf
diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 2859977..316230a 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -16,10 +16,13 @@
 #include <sched.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
+#include <sys/resource.h>
 #include <fcntl.h>
 #include <stdlib.h>
 #include <time.h>
+
 #include "libbpf.h"
+#include "bpf_util.h"
 
 #define min(a, b) ((a) < (b) ? (a) : (b))
 #define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
@@ -510,7 +513,7 @@ int main(int argc, char **argv)
 
 	srand(time(NULL));
 
-	nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	nr_cpus = bpf_num_possible_cpus();
 	assert(nr_cpus != -1);
 	printf("nr_cpus:%d\n\n", nr_cpus);
 
diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c
index ab5b19e..3e225e3 100644
--- a/samples/bpf/tracex2_user.c
+++ b/samples/bpf/tracex2_user.c
@@ -4,8 +4,10 @@
 #include <signal.h>
 #include <linux/bpf.h>
 #include <string.h>
+
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "bpf_util.h"
 
 #define MAX_INDEX	64
 #define MAX_STARS	38
@@ -36,8 +38,8 @@ struct hist_key {
 
 static void print_hist_for_pid(int fd, void *task)
 {
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	struct hist_key key = {}, next_key;
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
 	long values[nr_cpus];
 	char starstr[MAX_STARS];
 	long value;
diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
index 48716f7..d0851cb 100644
--- a/samples/bpf/tracex3_user.c
+++ b/samples/bpf/tracex3_user.c
@@ -11,8 +11,10 @@
 #include <stdbool.h>
 #include <string.h>
 #include <linux/bpf.h>
+
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "bpf_util.h"
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
 
@@ -20,7 +22,7 @@
 
 static void clear_stats(int fd)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	__u64 values[nr_cpus];
 	__u32 key;
 
@@ -77,7 +79,7 @@ static void print_banner(void)
 
 static void print_hist(int fd)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	__u64 total_events = 0;
 	long values[nr_cpus];
 	__u64 max_cnt = 0;
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index a5e109e..2b2150d 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -15,7 +15,9 @@
 #include <string.h>
 #include <sys/socket.h>
 #include <unistd.h>
+
 #include "bpf_load.h"
+#include "bpf_util.h"
 #include "libbpf.h"
 
 static int set_link_xdp_fd(int ifindex, int fd)
@@ -120,7 +122,7 @@ static void int_exit(int sig)
  */
 static void poll_stats(int interval)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	const unsigned int nr_keys = 256;
 	__u64 values[nr_cpus], prev[nr_keys][nr_cpus];
 	__u32 key;
diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
new file mode 100644
index 0000000..84a5d18
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -0,0 +1,38 @@
+#ifndef __BPF_UTIL__
+#define __BPF_UTIL__
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+static inline unsigned int bpf_num_possible_cpus(void)
+{
+	static const char *fcpu = "/sys/devices/system/cpu/possible";
+	unsigned int start, end, possible_cpus = 0;
+	char buff[128];
+	FILE *fp;
+
+	fp = fopen(fcpu, "r");
+	if (!fp) {
+		printf("Failed to open %s: '%s'!\n", fcpu, strerror(errno));
+		exit(1);
+	}
+
+	while (fgets(buff, sizeof(buff), fp)) {
+		if (sscanf(buff, "%u-%u", &start, &end) == 2) {
+			possible_cpus = start == 0 ? end + 1 : 0;
+			break;
+		}
+	}
+
+	fclose(fp);
+	if (!possible_cpus) {
+		printf("Failed to retrieve # possible CPUs!\n");
+		exit(1);
+	}
+
+	return possible_cpus;
+}
+
+#endif /* __BPF_UTIL__ */
diff --git a/tools/testing/selftests/bpf/test_lru_map.c b/tools/testing/selftests/bpf/test_lru_map.c
index 627757e..b13fed5 100644
--- a/tools/testing/selftests/bpf/test_lru_map.c
+++ b/tools/testing/selftests/bpf/test_lru_map.c
@@ -12,10 +12,14 @@
 #include <string.h>
 #include <assert.h>
 #include <sched.h>
-#include <sys/wait.h>
 #include <stdlib.h>
 #include <time.h>
+
+#include <sys/wait.h>
+#include <sys/resource.h>
+
 #include "bpf_sys.h"
+#include "bpf_util.h"
 
 #define LOCAL_FREE_TARGET	(128)
 #define PERCPU_FREE_TARGET	(16)
@@ -559,7 +563,7 @@ int main(int argc, char **argv)
 
 	assert(!setrlimit(RLIMIT_MEMLOCK, &r));
 
-	nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	nr_cpus = bpf_num_possible_cpus();
 	assert(nr_cpus != -1);
 	printf("nr_cpus:%d\n\n", nr_cpus);
 
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index ee384f0..eedfef8 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -22,6 +22,7 @@
 #include <linux/bpf.h>
 
 #include "bpf_sys.h"
+#include "bpf_util.h"
 
 static int map_flags;
 
@@ -110,7 +111,7 @@ static void test_hashmap(int task, void *data)
 
 static void test_hashmap_percpu(int task, void *data)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	long long value[nr_cpus];
 	long long key, next_key;
 	int expected_key_mask = 0;
@@ -258,7 +259,7 @@ static void test_arraymap(int task, void *data)
 
 static void test_arraymap_percpu(int task, void *data)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	int key, next_key, fd, i;
 	long values[nr_cpus];
 
@@ -313,7 +314,7 @@ static void test_arraymap_percpu(int task, void *data)
 
 static void test_arraymap_percpu_many_keys(void)
 {
-	unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	unsigned int nr_cpus = bpf_num_possible_cpus();
 	unsigned int nr_keys = 20000;
 	long values[nr_cpus];
 	int key, fd, i;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 0ef8eaf..3c4a1fb 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -285,7 +285,7 @@ struct test_val {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 1234567),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "invalid func 1234567",
+		.errstr = "invalid func unknown#1234567",
 		.result = REJECT,
 	},
 	{
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 4/6] bpf: add owner_prog_type and accounted mem to array map's fdinfo
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.

The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.

Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/syscall.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1090d16..4caa18e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -138,18 +138,31 @@ static int bpf_map_release(struct inode *inode, struct file *filp)
 static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 {
 	const struct bpf_map *map = filp->private_data;
+	const struct bpf_array *array;
+	u32 owner_prog_type = 0;
+
+	if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+		array = container_of(map, struct bpf_array, map);
+		owner_prog_type = array->owner_prog_type;
+	}
 
 	seq_printf(m,
 		   "map_type:\t%u\n"
 		   "key_size:\t%u\n"
 		   "value_size:\t%u\n"
 		   "max_entries:\t%u\n"
-		   "map_flags:\t%#x\n",
+		   "map_flags:\t%#x\n"
+		   "memlock:\t%llu\n",
 		   map->map_type,
 		   map->key_size,
 		   map->value_size,
 		   map->max_entries,
-		   map->map_flags);
+		   map->map_flags,
+		   map->pages * 1ULL << PAGE_SHIFT);
+
+	if (owner_prog_type)
+		seq_printf(m, "owner_prog_type:\t%u\n",
+			   owner_prog_type);
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 2/6] bpf: drop useless bpf_fd member from cls/act
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/sched/act_bpf.c | 7 -------
 net/sched/cls_bpf.c | 9 +--------
 2 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1aa4ecf..84c1d2d 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -28,7 +28,6 @@ struct tcf_bpf_cfg {
 	struct bpf_prog *filter;
 	struct sock_filter *bpf_ops;
 	const char *bpf_name;
-	u32 bpf_fd;
 	u16 bpf_num_ops;
 	bool is_ebpf;
 };
@@ -118,9 +117,6 @@ static int tcf_bpf_dump_bpf_info(const struct tcf_bpf *prog,
 static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
 				  struct sk_buff *skb)
 {
-	if (nla_put_u32(skb, TCA_ACT_BPF_FD, prog->bpf_fd))
-		return -EMSGSIZE;
-
 	if (prog->bpf_name &&
 	    nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
@@ -233,7 +229,6 @@ static int tcf_bpf_init_from_efd(struct nlattr **tb, struct tcf_bpf_cfg *cfg)
 		}
 	}
 
-	cfg->bpf_fd = bpf_fd;
 	cfg->bpf_name = name;
 	cfg->filter = fp;
 	cfg->is_ebpf = true;
@@ -332,8 +327,6 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
 	if (cfg.bpf_num_ops)
 		prog->bpf_num_ops = cfg.bpf_num_ops;
-	if (cfg.bpf_fd)
-		prog->bpf_fd = cfg.bpf_fd;
 
 	prog->tcf_action = parm->action;
 	rcu_assign_pointer(prog->filter, cfg.filter);
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 52dc85a..28cb5fa 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -45,10 +45,7 @@ struct cls_bpf_prog {
 	u32 gen_flags;
 	struct tcf_exts exts;
 	u32 handle;
-	union {
-		u32 bpf_fd;
-		u16 bpf_num_ops;
-	};
+	u16 bpf_num_ops;
 	struct sock_filter *bpf_ops;
 	const char *bpf_name;
 	struct tcf_proto *tp;
@@ -377,7 +374,6 @@ static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog,
 	}
 
 	prog->bpf_ops = NULL;
-	prog->bpf_fd = bpf_fd;
 	prog->bpf_name = name;
 	prog->filter = fp;
 
@@ -561,9 +557,6 @@ static int cls_bpf_dump_bpf_info(const struct cls_bpf_prog *prog,
 static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 				  struct sk_buff *skb)
 {
-	if (nla_put_u32(skb, TCA_BPF_FD, prog->bpf_fd))
-		return -EMSGSIZE;
-
 	if (prog->bpf_name &&
 	    nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 1/6] bpf: drop unnecessary context cast from BPF_PROG_RUN
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
 include/linux/filter.h                              | 6 +++---
 kernel/events/core.c                                | 2 +-
 kernel/seccomp.c                                    | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index eb37157..876ab3a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1518,7 +1518,7 @@ static int nfp_net_run_xdp(struct bpf_prog *prog, void *data, unsigned int len)
 	xdp.data = data;
 	xdp.data_end = data + len;
 
-	return BPF_PROG_RUN(prog, (void *)&xdp);
+	return BPF_PROG_RUN(prog, &xdp);
 }
 
 /**
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..7f246a2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -408,8 +408,8 @@ struct bpf_prog {
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
-	unsigned int		(*bpf_func)(const struct sk_buff *skb,
-					    const struct bpf_insn *filter);
+	unsigned int		(*bpf_func)(const void *ctx,
+					    const struct bpf_insn *insn);
 	/* Instructions for interpreter */
 	union {
 		struct sock_filter	insns[0];
@@ -504,7 +504,7 @@ static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	u32 ret;
 
 	rcu_read_lock();
-	ret = BPF_PROG_RUN(prog, (void *)xdp);
+	ret = BPF_PROG_RUN(prog, xdp);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0e29213..19237c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7715,7 +7715,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
 		goto out;
 	rcu_read_lock();
-	ret = BPF_PROG_RUN(event->prog, (void *)&ctx);
+	ret = BPF_PROG_RUN(event->prog, &ctx);
 	rcu_read_unlock();
 out:
 	__this_cpu_dec(bpf_prog_active);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0db7c8a..bff9c77 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -195,7 +195,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd)
 	 * value always takes priority (ignoring the DATA).
 	 */
 	for (; f; f = f->prev) {
-		u32 cur_ret = BPF_PROG_RUN(f->prog, (void *)sd);
+		u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
 
 		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 5/6] bpf: allow for mount options to specify permissions
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/inode.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 2565809..0b030c9 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -18,6 +18,7 @@
 #include <linux/namei.h>
 #include <linux/fs.h>
 #include <linux/kdev_t.h>
+#include <linux/parser.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
 
@@ -364,15 +365,66 @@ static void bpf_evict_inode(struct inode *inode)
 static const struct super_operations bpf_super_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= generic_delete_inode,
+	.show_options	= generic_show_options,
 	.evict_inode	= bpf_evict_inode,
 };
 
+enum {
+	OPT_MODE,
+	OPT_ERR,
+};
+
+static const match_table_t bpf_mount_tokens = {
+	{ OPT_MODE, "mode=%o" },
+	{ OPT_ERR, NULL },
+};
+
+struct bpf_mount_opts {
+	umode_t mode;
+};
+
+static int bpf_parse_options(char *data, struct bpf_mount_opts *opts)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int option, token;
+	char *ptr;
+
+	opts->mode = S_IRWXUGO;
+
+	while ((ptr = strsep(&data, ",")) != NULL) {
+		if (!*ptr)
+			continue;
+
+		token = match_token(ptr, bpf_mount_tokens, args);
+		switch (token) {
+		case OPT_MODE:
+			if (match_octal(&args[0], &option))
+				return -EINVAL;
+			opts->mode = option & S_IALLUGO;
+			break;
+		/* We might like to report bad mount options here, but
+		 * traditionally we've ignored all mount options, so we'd
+		 * better continue to ignore non-existing options for bpf.
+		 */
+		}
+	}
+
+	return 0;
+}
+
 static int bpf_fill_super(struct super_block *sb, void *data, int silent)
 {
 	static struct tree_descr bpf_rfiles[] = { { "" } };
+	struct bpf_mount_opts opts;
 	struct inode *inode;
 	int ret;
 
+	save_mount_options(sb, data);
+
+	ret = bpf_parse_options(data, &opts);
+	if (ret)
+		return ret;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -382,7 +434,7 @@ static int bpf_fill_super(struct super_block *sb, void *data, int silent)
 	inode = sb->s_root->d_inode;
 	inode->i_op = &bpf_dir_iops;
 	inode->i_mode &= ~S_IALLUGO;
-	inode->i_mode |= S_ISVTX | S_IRWXUGO;
+	inode->i_mode |= S_ISVTX | opts.mode;
 
 	return 0;
 }
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 3/6] bpf: reuse dev_is_mac_header_xmit for redirect
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480119395.git.daniel@iogearbox.net>

Commit dcf800344a91 ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/if_arp.h | 16 ++++++++++++++++
 net/core/filter.c      | 14 ++++----------
 net/sched/act_mirred.c | 15 +--------------
 3 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h
index f563907..3355efc 100644
--- a/include/linux/if_arp.h
+++ b/include/linux/if_arp.h
@@ -44,4 +44,20 @@ static inline int arp_hdr_len(struct net_device *dev)
 		return sizeof(struct arphdr) + (dev->addr_len + sizeof(u32)) * 2;
 	}
 }
+
+static inline bool dev_is_mac_header_xmit(const struct net_device *dev)
+{
+	switch (dev->type) {
+	case ARPHRD_TUNNEL:
+	case ARPHRD_TUNNEL6:
+	case ARPHRD_SIT:
+	case ARPHRD_IPGRE:
+	case ARPHRD_VOID:
+	case ARPHRD_NONE:
+		return false;
+	default:
+		return true;
+	}
+}
+
 #endif	/* _LINUX_IF_ARP_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index ea315af..698a262 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -30,6 +30,7 @@
 #include <linux/inet.h>
 #include <linux/netdevice.h>
 #include <linux/if_packet.h>
+#include <linux/if_arp.h>
 #include <linux/gfp.h>
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -1696,17 +1697,10 @@ static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
 static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
 			  u32 flags)
 {
-	switch (dev->type) {
-	case ARPHRD_TUNNEL:
-	case ARPHRD_TUNNEL6:
-	case ARPHRD_SIT:
-	case ARPHRD_IPGRE:
-	case ARPHRD_VOID:
-	case ARPHRD_NONE:
-		return __bpf_redirect_no_mac(skb, dev, flags);
-	default:
+	if (dev_is_mac_header_xmit(dev))
 		return __bpf_redirect_common(skb, dev, flags);
-	}
+	else
+		return __bpf_redirect_no_mac(skb, dev, flags);
 }
 
 BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index b2d417b..1af7baa 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -21,6 +21,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/gfp.h>
+#include <linux/if_arp.h>
 #include <net/net_namespace.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
@@ -73,20 +74,6 @@ static void tcf_mirred_release(struct tc_action *a, int bind)
 static unsigned int mirred_net_id;
 static struct tc_action_ops act_mirred_ops;
 
-static bool dev_is_mac_header_xmit(const struct net_device *dev)
-{
-	switch (dev->type) {
-	case ARPHRD_TUNNEL:
-	case ARPHRD_TUNNEL6:
-	case ARPHRD_SIT:
-	case ARPHRD_IPGRE:
-	case ARPHRD_VOID:
-	case ARPHRD_NONE:
-		return false;
-	}
-	return true;
-}
-
 static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 			   struct nlattr *est, struct tc_action **a, int ovr,
 			   int bind)
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 0/6] BPF cleanups and misc updates
From: Daniel Borkmann @ 2016-11-26  0:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann

This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.

Thanks!

Daniel Borkmann (6):
  bpf: drop unnecessary context cast from BPF_PROG_RUN
  bpf: drop useless bpf_fd member from cls/act
  bpf: reuse dev_is_mac_header_xmit for redirect
  bpf: add owner_prog_type and accounted mem to array map's fdinfo
  bpf: allow for mount options to specify permissions
  bpf: fix multiple issues in selftest suite and samples

 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  2 +-
 include/linux/filter.h                             |  6 +--
 include/linux/if_arp.h                             | 16 +++++++
 kernel/bpf/inode.c                                 | 54 +++++++++++++++++++++-
 kernel/bpf/syscall.c                               | 17 ++++++-
 kernel/events/core.c                               |  2 +-
 kernel/seccomp.c                                   |  2 +-
 net/core/filter.c                                  | 14 ++----
 net/sched/act_bpf.c                                |  7 ---
 net/sched/act_mirred.c                             | 15 +-----
 net/sched/cls_bpf.c                                |  9 +---
 samples/bpf/Makefile                               |  1 +
 samples/bpf/test_lru_dist.c                        |  5 +-
 samples/bpf/tracex2_user.c                         |  4 +-
 samples/bpf/tracex3_user.c                         |  6 ++-
 samples/bpf/xdp1_user.c                            |  4 +-
 tools/testing/selftests/bpf/bpf_util.h             | 38 +++++++++++++++
 tools/testing/selftests/bpf/test_lru_map.c         |  8 +++-
 tools/testing/selftests/bpf/test_maps.c            |  7 +--
 tools/testing/selftests/bpf/test_verifier.c        |  2 +-
 20 files changed, 160 insertions(+), 59 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_util.h

-- 
1.9.3

^ permalink raw reply

* Re: [PATCH v9 0/6] Add eBPF hooks for cgroups
From: David Miller @ 2016-11-25 23:15 UTC (permalink / raw)
  To: daniel-cYrQPVfZoowdnm+yROfE0A
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, kafai-b10kYP2dOMg, fw-HFFVJYpyMKqzQB+pC5nmwQ,
	pablo-Cap9r6Oaw4JrovVCs/uTlw, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479916350-28462-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
Date: Wed, 23 Nov 2016 16:52:24 +0100

> This is v9 of the patch set to allow eBPF programs for network
> filtering and accounting to be attached to cgroups, so that they apply
> to all sockets of all tasks placed in that cgroup. The logic also
> allows to be extendeded for other cgroup based eBPF logic.
> 
> Again, only minor details are updated in this version.

Series applied, thanks for working so hard to see this through
to the very end.

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: kbuild test robot @ 2016-11-25 23:04 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: kbuild-all, David S. Miller, linux-kernel, netdev, Jisheng Zhang,
	Arnd Bergmann, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel, Nadav Haklai,
	Marcin Wojtas, Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <7e6004f918d3fcde9ae71e7893d26b19086236a3.1480087510.git-series.gregory.clement@free-electrons.com>

[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]

Hi Gregory,

[auto build test ERROR on ]

url:    https://github.com/0day-ci/linux/commits/Gregory-CLEMENT/Support-Armada-37xx-SoC-ARMv8-64-bits-in-mvneta-driver/20161126-050621
base:    
config: parisc-allmodconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

Note: the linux-review/Gregory-CLEMENT/Support-Armada-37xx-SoC-ARMv8-64-bits-in-mvneta-driver/20161126-050621 HEAD 5f44108a5c983ae4477f811485fdc4ee12294e72 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):


vim +2745 drivers/net/ethernet/marvell/mvneta.c

  2739					   DMA_FROM_DEVICE);
  2740		if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
  2741			mvneta_frag_free(pp->frag_size, data);
  2742			return -ENOMEM;
  2743		}
  2744	
> 2745		phys_addr += pp->rx_offset_correction;
  2746		rx_desc->buf_phys_addr = phys_addr;
  2747		rx_desc->buf_cookie = (uintptr_t)data;
  2748	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47218 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 4/5] net/socket: add helpers for recvmmsg
From: Eric Dumazet @ 2016-11-25 22:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Hannes Frederic Sowa, Sabrina Dubroca
In-Reply-To: <56ac598a3f8b677d58cc9fec5470df230c6d1f70.1480086321.git.pabeni@redhat.com>

On Fri, 2016-11-25 at 16:39 +0100, Paolo Abeni wrote:
> _skb_try_recv_datagram_batch dequeues multiple skb's from the
> socket's receive queue, and runs the bulk_destructor callback under
> the receive queue lock.

...

> +	last = (struct sk_buff *)queue;
> +	first = (struct sk_buff *)queue->next;
> +	skb_queue_walk(queue, skb) {
> +		last = skb;
> +		totalsize += skb->truesize;
> +		if (++datagrams == batch)
> +			break;
> +	}

This is absolutely not good.

Walking through a list, bringing 2 cache lines per skb, is not the
proper way to deal with bulking.

And I do not see where 'batch' value coming from user space is capped ?

Is it really vlen argument coming from recvmmsg() system call ???

This code runs with BH masked, so you do not want to give user a way to
make you loop there 1000 times 

Bulking is nice, only if you do not compromise with system stability and
latency requirements from other users/applications.

^ permalink raw reply

* Cash Grant
From: Mrs Julie Leach @ 2016-11-25 20:02 UTC (permalink / raw)
  To: Recipients

You are a recipient to Mrs Julie Leach Donation of $3 million USD. Contact (julieleach93@gmail.com ) for claims.

^ permalink raw reply

* Re: [PATCH 0/3] virtio/vringh: kill off ACCESS_ONCE()
From: Christian Borntraeger @ 2016-11-25 21:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Mark Rutland, Davidlohr Bueso, KVM list, dbueso, Peter Zijlstra,
	netdev, Boqun Feng, LKML, virtualization, Paul McKenney,
	Linus Torvalds, Dmitry Vyukov
In-Reply-To: <20161125230735-mutt-send-email-mst@kernel.org>

On 11/25/2016 10:08 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 05:49:45PM +0100, Christian Borntraeger wrote:
>> On 11/25/2016 05:17 PM, Peter Zijlstra wrote:
>>> On Fri, Nov 25, 2016 at 04:10:04PM +0000, Mark Rutland wrote:
>>>> On Fri, Nov 25, 2016 at 04:21:39PM +0100, Dmitry Vyukov wrote:
>>>
>>>>> What are use cases for such primitive that won't be OK with "read once
>>>>> _and_ atomically"?
>>>>
>>>> I have none to hand.
>>>
>>> Whatever triggers the __builtin_memcpy() paths, and even the size==8
>>> paths on 32bit.
>>>
>>> You could put a WARN in there to easily find them.
>>
>> There were several cases that I found during writing the *ONCE stuff.
>> For example there are some 32bit ppc variants with 64bit PTEs. Some for
>> others (I think sparc). And the mm/ code is perfectly fine with these
>> PTE accesses being done NOT atomic.
> 
> In that case do we even need _ONCE at all?

Yes. For example look at gup_pmd_range. Here several checks are made on the pmd.
It is important the the check for pmd_none is made on the same value than
the check for pmd_trans_huge, but it is not important that the value is still up
to date. 
And there are really cases where we cannot read the  thing atomically, e.g. on 
m68k and sparc(32bit) pmd_t is defined as array of longs.

Another problem is that a compiler can implement the following code as 2 memory
reads (e.g. if you have compare instructions that work on memory) instead of a 
memory read and 2 compares

int check(unsigned long *value_p) {
	unsigned long value = *value_p;
	if (condition_a(value))
		return 1;
	if (condition_b(value))
		return 2;
	return 3;
}

With READ_ONCE you forbid that. In past times you would have used barrier() after 
the assignment to achieve the same goal.


> Are there assumptions these are two 32 bit reads?

It depends on the code. Some places (e.g. in gup) assumes that the access via
READ_ONCE is atomic (which it is for sane compilers as long as the pointer
is <= word size). In some others places just one bit is tested.
> 
> 
>>
>>>
>>> The advantage of introducing the SINGLE_{LOAD,STORE}() helpers is that
>>> they compiletime validate this the size is 'right' and can runtime check
>>> alignment constraints.
>>>
>>> IE, they are strictly stronger than {READ,WRITE}_ONCE().
>>>
> 

^ permalink raw reply

* Re: [PATCH] cxgb4: fix memory leak on txq_info
From: Colin Ian King @ 2016-11-25 21:28 UTC (permalink / raw)
  To: David Miller; +Cc: hariprasad, netdev, linux-kernel
In-Reply-To: <20161125.161058.2283539123524395654.davem@davemloft.net>

On 25/11/16 21:10, David Miller wrote:
> From: Colin King <colin.king@canonical.com>
> Date: Wed, 23 Nov 2016 11:02:44 +0000
> 
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> Currently if txq_info->uldtxq cannot be allocated then
>> txq_info->txq is being kfree'd (which is redundant because it
>> is NULL) instead of txq_info. Fix this by instead kfree'ing
>> txq_info.
>>
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> 
> Applied, but Colin you _really_ need to start properly marking your
> networking patch submissions by indicating in the subject which
> tree your change is for.  In this case I figured out it was
> net-next, but you must say this explicitly in the Subject line
> via "Subject: [PATCH net-next] ..."
> 
> Thanks.
> 
Understood, will do next time, apologies for that.

Colin

^ permalink raw reply

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: John Fastabend @ 2016-11-25 21:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <20161122165400-mutt-send-email-mst@kernel.org>

On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>
>>>> This adds XDP support to virtio_net. Some requirements must be
>>>> met for XDP to be enabled depending on the mode. First it will
>>>> only be supported with LRO disabled so that data is not pushed
>>>> across multiple buffers. The MTU must be less than a page size
>>>> to avoid having to handle XDP across multiple pages.
>>>>
>>>> If mergeable receive is enabled this first series only supports
>>>> the case where header and data are in the same buf which we can
>>>> check when a packet is received by looking at num_buf. If the
>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>> does not happen and both header and data is put in a single buffer
>>>> as expected so we check this when XDP programs are loaded. Note I
>>>> have only tested this with Linux vhost backend.
>>>>
>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>> met then XDP is allowed.
>>>>
>>>> A follow on patch can be generated to solve the mergeable receive
>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>> be handled has easily.
>>>
>>>
>>> I would very much prefer support for other layouts without drops
>>> before merging this.
>>> header by itself can certainly be handled by skipping it.
>>> People wanted to use that e.g. for zero copy.
>>
>> OK fair enough I'll do this now rather than push it out.
>>

Hi Michael,

The header skip logic however complicates the xmit handling a fair
amount. Specifically when we release the buffers after xmit then
both the hdr and data portions need to be released which requires
some tracking.

Is the header split logic actually in use somewhere today? It looks
like its not being used in Linux case. And zero copy RX is currently as
best I can tell not supported anywhere so I would prefer not to
complicate the XDP path at the moment with a possible future feature.

>>>
>>> Anything else can be handled by copying the packet.

Any idea how to test this? At the moment I have some code to linearize
the data in all cases with more than a single buffer. But wasn't clear
to me which features I could negotiate with vhost/qemu to get more than
a single buffer in the receive path.

Thanks,
John

^ permalink raw reply

* Re: [PATCH] drivers: net: davinci_mdio: use builtin_platform_driver
From: David Miller @ 2016-11-25 21:23 UTC (permalink / raw)
  To: geliangtang
  Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <055763562f90fd7e2d311308e1d731ba93c3eea9.1479912302.git.geliangtang@gmail.com>

From: Geliang Tang <geliangtang@gmail.com>
Date: Wed, 23 Nov 2016 22:45:43 +0800

> @@ -536,11 +536,7 @@ static struct platform_driver davinci_mdio_driver = {
>  	.remove = davinci_mdio_remove,
>  };
>  
> -static int __init davinci_mdio_init(void)
> -{
> -	return platform_driver_register(&davinci_mdio_driver);
> -}
> -device_initcall(davinci_mdio_init);
> +builtin_platform_driver(davinci_mdio_driver);
>  

As noted by others this is not a correct transformation, the existing
code works properly when modular.  But it will not with this change.

device_initcall() is rerouted to module_init() inside of a module
build, whereas the thing builtin_platform_driver() expands to does
not.

^ permalink raw reply

* Re: pull-request: can 2016-11-23
From: David Miller @ 2016-11-25 21:18 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20161123143430.24985-1-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 23 Nov 2016 15:34:29 +0100

> this is a pull request for net/master.
> 
> The patch by Oliver Hartkopp for the broadcast manager (bcm) fixes the CAN-FD
> support, which may cause an out-of-bounds access otherwise.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: add protocol level recvmmsg support
From: Eric Dumazet @ 2016-11-25 21:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Hannes Frederic Sowa, Sabrina Dubroca
In-Reply-To: <cover.1480086321.git.pabeni@redhat.com>

On Fri, 2016-11-25 at 16:39 +0100, Paolo Abeni wrote:
> The goal of recvmmsg() is to amortize the syscall overhead on a possible
> long messages batch, but for most networking protocols, e.g. udp the
> syscall overhead is negligible compared to the protocol specific operations
> like dequeuing.

Problem of recvmmsg() is that it blows up L1/L2 cache of the cpu.
It gives false 'good results' until other threads sharing the same cache
hierarchy are competing with you. Then performance is actually lower
than regular recvmsg().

And I presume your tests did not really use the data once copied to user
space, like doing the typical operations a UDP server does on incoming
packets ?

I would rather try to optimize normal recvmsg(), instead of adding so
much code in the kernel for this horrible recvmmsg() super system call.

Looking at how buggy sendmmsg() was until commit 3023898b7d4aac6
("sock: fix sendmmsg for partial sendmsg"), I fear that these 'super'
system calls are way too complex.

How could we improve UDP ?

For example, we could easily have 2 queues to reduce false sharing and
lock contention.

1) One queue accessed by softirq to append packets.

2) One queue accessed by recvmsg(). Make sure these two queues do not
share a cache line.

When 2nd queue is empty, transfer whole first queue in one operation.

Look in net/core/dev.c , process_backlog() for an example of this
strategy.

Alternative would be to use a ring buffer, although the forward_alloc
stuff might be complex.

^ permalink raw reply

* Re: [PATCH] dwc_eth_qos: drop duplicate headers
From: David Miller @ 2016-11-25 21:14 UTC (permalink / raw)
  To: geliangtang; +Cc: lars.persson, netdev, linux-kernel
In-Reply-To: <906a07a7db3f3d6454236dd10d4082c1d9c78fa6.1479905884.git.geliangtang@gmail.com>

From: Geliang Tang <geliangtang@gmail.com>
Date: Wed, 23 Nov 2016 22:24:35 +0800

> Drop duplicate headers types.h and delay.h from dwc_eth_qos.c.
> 
> Signed-off-by: Geliang Tang <geliangtang@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] cxgb4: fix memory leak on txq_info
From: David Miller @ 2016-11-25 21:10 UTC (permalink / raw)
  To: colin.king; +Cc: hariprasad, netdev, linux-kernel
In-Reply-To: <20161123110244.16111-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Wed, 23 Nov 2016 11:02:44 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> Currently if txq_info->uldtxq cannot be allocated then
> txq_info->txq is being kfree'd (which is redundant because it
> is NULL) instead of txq_info. Fix this by instead kfree'ing
> txq_info.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied, but Colin you _really_ need to start properly marking your
networking patch submissions by indicating in the subject which
tree your change is for.  In this case I figured out it was
net-next, but you must say this explicitly in the Subject line
via "Subject: [PATCH net-next] ..."

Thanks.

^ permalink raw reply

* Re: [PATCH 0/3] virtio/vringh: kill off ACCESS_ONCE()
From: Michael S. Tsirkin @ 2016-11-25 21:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Mark Rutland, Davidlohr Bueso, KVM list, dbueso, Peter Zijlstra,
	netdev, Boqun Feng, LKML, virtualization, Paul McKenney,
	Linus Torvalds, Dmitry Vyukov
In-Reply-To: <d7f3740b-e343-68fc-4996-f712dd8c07f3@de.ibm.com>

On Fri, Nov 25, 2016 at 05:49:45PM +0100, Christian Borntraeger wrote:
> On 11/25/2016 05:17 PM, Peter Zijlstra wrote:
> > On Fri, Nov 25, 2016 at 04:10:04PM +0000, Mark Rutland wrote:
> >> On Fri, Nov 25, 2016 at 04:21:39PM +0100, Dmitry Vyukov wrote:
> > 
> >>> What are use cases for such primitive that won't be OK with "read once
> >>> _and_ atomically"?
> >>
> >> I have none to hand.
> > 
> > Whatever triggers the __builtin_memcpy() paths, and even the size==8
> > paths on 32bit.
> > 
> > You could put a WARN in there to easily find them.
> 
> There were several cases that I found during writing the *ONCE stuff.
> For example there are some 32bit ppc variants with 64bit PTEs. Some for
> others (I think sparc). And the mm/ code is perfectly fine with these
> PTE accesses being done NOT atomic.

In that case do we even need _ONCE at all?
Are there assumptions these are two 32 bit reads?


> 
> > 
> > The advantage of introducing the SINGLE_{LOAD,STORE}() helpers is that
> > they compiletime validate this the size is 'right' and can runtime check
> > alignment constraints.
> > 
> > IE, they are strictly stronger than {READ,WRITE}_ONCE().
> > 

^ permalink raw reply

* Re: [PATCH RFC v1] ethtool: implement helper to get flow_type value
From: David Miller @ 2016-11-25 21:06 UTC (permalink / raw)
  To: jacob.e.keller; +Cc: netdev, intel-wired-lan
In-Reply-To: <20161122234453.31611-1-jacob.e.keller@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>
Date: Tue, 22 Nov 2016 15:44:53 -0800

> @@ -880,6 +880,14 @@ struct ethtool_rx_flow_spec {
>  	__u32		location;
>  };
>  
> +/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
> +#define	FLOW_EXT	0x80000000
> +#define	FLOW_MAC_EXT	0x40000000
> +static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type)
> +{
> +	return flow_type & (FLOW_EXT | FLOW_MAC_EXT);
> +}
> +
>  /* How rings are layed out when accessing virtual functions or
>   * offloaded queues is device specific. To allow users to do flow
>   * steering and specify these queues the ring cookie is partitioned
> @@ -1579,9 +1587,6 @@ static inline int ethtool_validate_duplex(__u8 duplex)
>  #define	IPV4_FLOW	0x10	/* hash only */
>  #define	IPV6_FLOW	0x11	/* hash only */
>  #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
> -/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
> -#define	FLOW_EXT	0x80000000
> -#define	FLOW_MAC_EXT	0x40000000
>  
>  /* L3-L4 network traffic flow hash options */
>  #define	RXH_L2DA	(1 << 1)

Please put the helper after the FLOW_* definitions rather than moving
them earlier in the file.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox