Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] Fix ss Netid column and Local/Peer_Address
From: Yoann P. @ 2018-10-29 20:07 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20181029194938.7df26333@redhat.com>




> On Mon, 29 Oct 2018 19:20:36 +0100
> 
> Stefano Brivio <sbrivio@redhat.com> wrote:
> > The actual issue seems to be that in some cases the left delimiter for
> > the State column is not printed
> 
> Much worse, we always print the left delimiter of the last buffered
> column, which is usually empty. My bad.
> 
> The issue is not so visible in general as we almost always have spaces
> to distribute around, but not if you start going below 70/75 columns.
> Can you try this?
> 
> diff --git a/misc/ss.c b/misc/ss.c
> index f99b6874c228..90986b1dc15f 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -1260,7 +1260,7 @@ static void render(void)
>         while (token) {
>                 /* Print left delimiter only if we already started a line */
> if (line_started++)
> -                       printed = printf("%s", current_field->ldelim);
> +                       printed = printf("%s", f->ldelim);
>                 else
>                         printed = 0;


I can't reproduce the issue with this modification. :).

^ permalink raw reply

* Re: [PATCH] bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data
From: John Fastabend @ 2018-10-29 20:30 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev
In-Reply-To: <1540841488-22023-1-git-send-email-john.fastabend@gmail.com>

On 10/29/2018 12:31 PM, John Fastabend wrote:
> We return 0 in the case of a nonblocking socket that has no data
> available. However, this is incorrect and may confuse applications.
> After this patch we do the correct thing and return the error
> EAGAIN.
> 
> Quoting return codes from recvmsg manpage,
> 
> EAGAIN or EWOULDBLOCK
>  The socket is marked nonblocking and the receive operation would
>  block, or a receive timeout had been set and the timeout expired
>  before data was received.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---

Add fixes tag.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")

^ permalink raw reply

* [PATCH net v2] rtnetlink: Disallow FDB configuration for non-Ethernet device
From: Ido Schimmel @ 2018-10-29 20:36 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Ido Schimmel, Vlad Yasevich, David Ahern

When an FDB entry is configured, the address is validated to have the
length of an Ethernet address, but the device for which the address is
configured can be of any type.

The above can result in the use of uninitialized memory when the address
is later compared against existing addresses since 'dev->addr_len' is
used and it may be greater than ETH_ALEN, as with ip6tnl devices.

Fix this by making sure that FDB entries are only configured for
Ethernet devices.

BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x14b/0x190 lib/dump_stack.c:113
  kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
  __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
  memcmp+0x11d/0x180 lib/string.c:863
  dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
  ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
  rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
  rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
  netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
  rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
  netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
  netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440ee9
Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
  kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
  kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2718 [inline]
  __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:996 [inline]
  netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
  netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
  __sys_sendmsg net/socket.c:2152 [inline]
  __do_sys_sendmsg net/socket.c:2161 [inline]
  __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

v2:
* Make error message more specific (David)

Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
---
 net/core/rtnetlink.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f679c7a7d761..e01274bd5e3e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3600,6 +3600,11 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EINVAL;
 	}
 
+	if (dev->type != ARPHRD_ETHER) {
+		NL_SET_ERR_MSG(extack, "FDB add only supported for Ethernet devices");
+		return -EINVAL;
+	}
+
 	addr = nla_data(tb[NDA_LLADDR]);
 
 	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
@@ -3704,6 +3709,11 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EINVAL;
 	}
 
+	if (dev->type != ARPHRD_ETHER) {
+		NL_SET_ERR_MSG(extack, "FDB delete only supported for Ethernet devices");
+		return -EINVAL;
+	}
+
 	addr = nla_data(tb[NDA_LLADDR]);
 
 	err = fdb_vid_parse(tb[NDA_VLAN], &vid, extack);
-- 
2.17.2

^ permalink raw reply related

* Re: [PATCH net v2] rtnetlink: Disallow FDB configuration for non-Ethernet device
From: David Ahern @ 2018-10-29 20:39 UTC (permalink / raw)
  To: Ido Schimmel, netdev@vger.kernel.org; +Cc: davem@davemloft.net, Vlad Yasevich
In-Reply-To: <20181029203622.20608-1-idosch@mellanox.com>

On 10/29/18 2:36 PM, Ido Schimmel wrote:
> When an FDB entry is configured, the address is validated to have the
> length of an Ethernet address, but the device for which the address is
> configured can be of any type.
> 
> The above can result in the use of uninitialized memory when the address
> is later compared against existing addresses since 'dev->addr_len' is
> used and it may be greater than ETH_ALEN, as with ip6tnl devices.
> 
> Fix this by making sure that FDB entries are only configured for
> Ethernet devices.

...

> 
> Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
> Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>
> ---
>  net/core/rtnetlink.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* [PATCH bpf-next] bpf_load: add map name to load_maps error message
From: Shannon Nelson @ 2018-10-29 21:14 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, shannon.lee.nelson

To help when debugging bpf/xdp load issues, have the load_map()
error message include the number and name of the map that
failed.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 samples/bpf/bpf_load.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 89161c9..5de0357 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -282,8 +282,8 @@ static int load_maps(struct bpf_map_data *maps, int nr_maps,
 							numa_node);
 		}
 		if (map_fd[i] < 0) {
-			printf("failed to create a map: %d %s\n",
-			       errno, strerror(errno));
+			printf("failed to create map %d (%s): %d %s\n",
+			       i, maps[i].name, errno, strerror(errno));
 			return 1;
 		}
 		maps[i].fd = map_fd[i];
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next] xdp: sample code for redirecting vlan packets to specific cpus
From: Shannon Nelson @ 2018-10-29 21:19 UTC (permalink / raw)
  To: ast, daniel
  Cc: netdev, eric.dumazet, john.fastabend, silviu.smarandache,
	shannon.lee.nelson

This is an example of using XDP to redirect the processing of
particular vlan packets to specific CPUs.  This is in response
to comments received on a kernel patch put forth previously
to do something similar using RPS.
     https://www.spinics.net/lists/netdev/msg528210.html
     [PATCH net-next] net: enable RPS on vlan devices

This XDP application watches for inbound vlan-tagged packets
and redirects those packets to be processed on a specific CPU
as configured in a BPF map.  The BPF map can be modified by
this user program, which can also load and unload the kernel
XDP code.

One example use is for supporting VMs where we can't control the
OS being used: we'd like to separate the VM CPU processing from
the host's CPUs as a way to help mitigate L1TF related issues.
When running the VM's traffic on a vlan we can stick the host's
Rx processing on one set of CPUs separate from the VM's CPUs.

This example currently uses a vlan key and cpu value in the
BPF map, so only can do one CPU per vlan.  This could easily
be modified to use a bitpattern of CPUs rather than a CPU id
to allow multiple CPUs per vlan.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 samples/bpf/Makefile                 |   3 +
 samples/bpf/xdp_vlan_redirect_kern.c | 118 +++++++++++
 samples/bpf/xdp_vlan_redirect_user.c | 393 +++++++++++++++++++++++++++++++++++
 3 files changed, 514 insertions(+)
 create mode 100644 samples/bpf/xdp_vlan_redirect_kern.c
 create mode 100644 samples/bpf/xdp_vlan_redirect_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1303af1..a9746ce 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -44,6 +44,7 @@ hostprogs-y += load_sock_ops
 hostprogs-y += xdp_redirect
 hostprogs-y += xdp_redirect_map
 hostprogs-y += xdp_redirect_cpu
+hostprogs-y += xdp_vlan_redirect
 hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
@@ -99,6 +100,7 @@ per_socket_stats_example-objs := cookie_uid_helper_example.o
 xdp_redirect-objs := bpf_load.o xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o
+xdp_vlan_redirect-objs := bpf_load.o xdp_vlan_redirect_user.o
 xdp_monitor-objs := bpf_load.o xdp_monitor_user.o
 xdp_rxq_info-objs := xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o syscall_tp_user.o
@@ -154,6 +156,7 @@ always += tcp_basertt_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
 always += xdp_redirect_cpu_kern.o
+always += xdp_vlan_redirect_kern.o
 always += xdp_monitor_kern.o
 always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
diff --git a/samples/bpf/xdp_vlan_redirect_kern.c b/samples/bpf/xdp_vlan_redirect_kern.c
new file mode 100644
index 0000000..3d561a0
--- /dev/null
+++ b/samples/bpf/xdp_vlan_redirect_kern.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  XDP redirect vlans to CPUs - kernel code
+ *
+ *  Copyright(c) 2018 Oracle Corp.
+ */
+
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+#define MAX_CPUS 64           /* WARNING - sync with _user.c */
+#define UNDEF_CPU 0xff000000  /* WARNING - sync with _user.c */
+
+/* The vlan index finds cpu(s) for processing a packet */
+struct bpf_map_def SEC("maps") vlan_redirect_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(u32),		/* vlan tag */
+	.value_size = sizeof(u64),		/* cpu bitpattern */
+	.max_entries = 4096,
+};
+
+/* List of cpus that can participate in the vlan redirect */
+struct bpf_map_def SEC("maps") vlan_redirect_cpus_map = {
+	.type		= BPF_MAP_TYPE_CPUMAP,
+	.key_size	= sizeof(u32),		/* cpu id */
+	.value_size	= sizeof(u32),		/* queue size */
+	.max_entries	= MAX_CPUS,
+};
+
+/* Counters for debug */
+#define VRC_CALLS  0    /* number of calls to this program */
+#define VRC_VLANS  1    /* number of vlan packets seen */
+#define VRC_HITS   2    /* number of redirects attempted */
+#define CPU_COUNT  3    /* number of CPUs found */
+struct bpf_map_def SEC("maps") vlan_redirect_counters_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(u32),		/* vlan tag */
+	.value_size = sizeof(u64),		/* cpu bitpattern */
+	.max_entries = 4,
+};
+
+SEC("xdp_vlan_redirect")
+int xdp_vlan_redirect(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	uint64_t h_proto, nh_off;
+	struct vlan_hdr *vhdr;
+	uint64_t *cpu_value;
+	uint32_t num_cpus;
+	uint32_t minlen;
+	uint64_t *vp;
+	uint64_t v, k;
+	uint64_t vlan;
+
+	/* count packets processed */
+	k = VRC_CALLS;
+	vp = bpf_map_lookup_elem(&vlan_redirect_counters_map, &k);
+	if (vp) {
+		v = (*vp) + 1;
+		bpf_map_update_elem(&vlan_redirect_counters_map, &k, &v, 0);
+	}
+
+	/* is there enough packet? */
+	minlen = sizeof(struct ethhdr) + sizeof(struct vlan_hdr);
+	if (data + minlen > data_end)
+		return XDP_PASS;
+
+	/* is there a vlan tag? */
+	h_proto = be16_to_cpu(eth->h_proto);
+	if (h_proto != ETH_P_8021Q && h_proto != ETH_P_8021AD)
+		return XDP_PASS;
+
+	vhdr = data + sizeof(struct ethhdr);
+	vlan = be16_to_cpu(vhdr->h_vlan_TCI) & VLAN_VID_MASK;
+	if (!vlan)
+		return XDP_PASS;
+
+	/* count vlan packets seen */
+	k = VRC_VLANS;
+	vp = bpf_map_lookup_elem(&vlan_redirect_counters_map, &k);
+	if (vp) {
+		v = (*vp) + 1;
+		bpf_map_update_elem(&vlan_redirect_counters_map, &k, &v, 0);
+	}
+
+	/* what cpu(s) for this vlanid? */
+	cpu_value = bpf_map_lookup_elem(&vlan_redirect_map, &vlan);
+	if (!cpu_value || *cpu_value == UNDEF_CPU)
+		return XDP_PASS;
+
+	/* TODO: cpu_value could be a bit-pattern of possible cpus
+	 *       from which to choose, and here we would do a hash
+	 *       to select the target cpu.
+	 */
+
+	/* TODO: scale the cpu found to be sure it is less than the
+	 *       actual number of CPUs running.
+	 */
+
+	/* set up the redirect */
+	bpf_redirect_map(&vlan_redirect_cpus_map, *cpu_value, 0);
+
+	/* count redirects attempted */
+	k = VRC_HITS;
+	vp = bpf_map_lookup_elem(&vlan_redirect_counters_map, &k);
+	if (vp) {
+		v = (*vp) + 1;
+		bpf_map_update_elem(&vlan_redirect_counters_map, &k, &v, 0);
+	}
+
+	return XDP_REDIRECT;
+}
diff --git a/samples/bpf/xdp_vlan_redirect_user.c b/samples/bpf/xdp_vlan_redirect_user.c
new file mode 100644
index 0000000..c4079f4
--- /dev/null
+++ b/samples/bpf/xdp_vlan_redirect_user.c
@@ -0,0 +1,393 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  XDP redirect vlans to CPUs - user code
+ *
+ *  Copyright(c) 2018 Oracle Corp.
+ *
+ *
+ * This _user.c code, along with the accompanying _kern.c, is
+ * intended as an example of using XDP to redirect processing of
+ * particular vlan packets to specific CPUs.  This is in response
+ * to comments received on a kernel patch put forth previously
+ * to do something similar using RPS.
+ *     https://www.spinics.net/lists/netdev/msg528210.html
+ *     [PATCH net-next] net: enable RPS on vlan devices
+ *
+ * This XDP application watches for inbound vlan-tagged packets
+ * and redirects those packets to be processed on a specific CPU
+ * as configured in a BPF map.  The BPF map can be modified by
+ * this user program, which can also load and unload the kernel
+ * XDP code.
+ *
+ * In supporting VMs where we can't control the OS being used,
+ * we'd like to separate the VM CPU processing from the host's
+ * CPUs as a way to help mitigate the impact of the L1TF issue.
+ * When running the VM's traffic on a vlan we can stick the Rx
+ * processing on one set of CPUs separate from the VM's CPUs.
+ * Yes, choosing to use this may cause a bit of throughput pain
+ * when the packets are actually passed into the VM and have to
+ * move from one cache to another.
+ *
+ * This example currently uses a vlan key and cpu value in the
+ * BPF map, so only can do one CPU per vlan.  This could easily
+ * be modified to use a bitpattern of CPUs rather than a CPU id
+ * to allow multiple CPUs per vlan.
+ *
+ * Before using, please be sure to mount the bpf pseudo-fs
+ *	mount -t bpf bpf /sys/fs/bpf
+ *
+ * Also, be sure that the device is not stripping vlan tags
+ * so that the XDP program has a chance to inspect them
+ *	ethtool -K eth0 rxvlan off
+ *
+ * To load the feature, use a command line something like this:
+ *	xdp_vlan_redirect --dev eth0 --install
+ *
+ * Once installed, you can see the pinned files in userspace:
+ *	# ls /sys/fs/bpf
+ *	xdp_vlan_redirect  xdp_vlan_redirect_map
+ *
+ * These commands add vlan:cpu mappings:
+ *	xdp_vlan_redirect --dev eth0 --vlan 1 --cpu 5
+ *	xdp_vlan_redirect -d eth0 -v 3 -c 4
+ *
+ * You can use the bpftool to print the contents of the vlan map:
+ *	# bpftool map dump pinned /sys/fs/bpf/xdp_vlan_redirect_vlan_map
+ *	key: 00 00 00 00  value: 00 00 00 ff 00 00 00 00
+ *	key: 01 00 00 00  value: 05 00 00 00 00 00 00 00
+ *	key: 02 00 00 00  value: 00 00 00 ff 00 00 00 00
+ *	key: 03 00 00 00  value: 04 00 00 00 00 00 00 00
+ *	key: 04 00 00 00  value: 00 00 00 ff 00 00 00 00
+ *	    :
+ *
+ * Use negative numbers to remove vlans from the map:
+ *	xdp_vlan_redirect -d eth0 -v -3
+ *
+ * It is possible to do map editing with bpftool, but note that all
+ * the bytes of both the key and the value must be specified:
+ *	# bpftool map update pinned /sys/fs/bpf/xdp_vlan_redirect_vlan_map \
+ *		  key 3 0 0 0 value 0 7 0 0 0 0 0 0
+ *
+ * Removing the feature is similar to install:
+ *	xdp_vlan_redirect --dev eth0 --remove
+ *
+ */
+
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <locale.h>
+#include <sys/resource.h>
+#include <getopt.h>
+#include <net/if.h>
+
+#include <linux/if_link.h>
+
+#include <bpf/bpf.h>
+#include "bpf_load.h"
+#include "bpf_util.h"
+
+#define MAX_CPUS  64          /* WARNING - sync with _kern.c */
+#define UNDEF_CPU 0xff000000  /* WARNING - sync with _kern.c */
+
+/* counters in the counter_map */
+#define VRC_CALLS  0    /* number of calls to this program */
+#define VRC_VLANS  1    /* number of vlan packets seen */
+#define VRC_HITS   2    /* number of redirects attempted */
+#define CPU_COUNT  3    /* number of CPUs found */
+
+static const struct option long_options[] = {
+	{"help",	no_argument,		NULL, 'h' },
+	{"dev",		required_argument,	NULL, 'd' },
+	{"cpu",		required_argument,	NULL, 'c' },
+	{"vlan",	required_argument,	NULL, 'v' },
+	{"install",	no_argument,		NULL, 'i' },
+	{"remove",	no_argument,		NULL, 'r' },
+	{0, 0, NULL,  0 }
+};
+
+static void usage(char *argv[])
+{
+	int i;
+
+	printf("%s - CPU targeting for vlan processing\n", argv[0]);
+	printf("\n");
+	printf(" Usage: %s (options-see-below)\n", argv[0]);
+	printf(" Listing options:\n");
+	for (i = 0; long_options[i].name != 0; i++) {
+		printf(" --%-12s", long_options[i].name);
+		if (long_options[i].flag != NULL)
+			printf(" flag (internal value:%d)",
+				*long_options[i].flag);
+		else
+			printf(" short-option: -%c",
+				long_options[i].val);
+		printf("\n");
+	}
+	printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+	const char *pinpath = "/sys/fs/bpf/";
+	char filename[64] = { 0 };
+	char pin_prog_name[sizeof(filename) + sizeof(pinpath)] = { 0 };
+	char pin_vlanmap_name[sizeof(filename) + sizeof(pinpath)] = { 0 };
+	char pin_countermap_name[sizeof(filename) + sizeof(pinpath)] = { 0 };
+	const char *kern_suffix = "_kern.o";
+	char ifname[IF_NAMESIZE] = { 0 };
+	bool install = false;
+	bool remove = false;
+	int cpu = MAX_CPUS;
+	int longindex = 0;
+	int ifindex = -1;
+	int vfd, cfd;
+	int vlan = 0;
+	__u64 v64;
+	__u64 c64;
+	int ret;
+	int opt;
+
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
+	/* Parse command line args */
+	while ((opt = getopt_long(argc, argv, "d:c:v:ri",
+				  long_options, &longindex)) != -1) {
+		switch (opt) {
+		case 'd':
+			strncpy(ifname, optarg, sizeof(ifname)-1);
+			ifindex = if_nametoindex(ifname);
+			if (ifindex == 0) {
+				fprintf(stderr, "ERR: device name '%s' : %s\n",
+					ifname, strerror(errno));
+				goto error;
+			}
+			break;
+		case 'c':
+			cpu = strtoul(optarg, NULL, 0);
+			if (cpu >= MAX_CPUS) {
+				fprintf(stderr, "ERR: invalid cpu id '%d'\n", cpu);
+				goto error;
+			}
+			break;
+		case 'v':
+			vlan = atoi(optarg);
+			if (!vlan || vlan >= 4096) {
+				fprintf(stderr, "ERR: invalid vlan id '%d'\n", vlan);
+				goto error;
+			}
+			break;
+		case 'i':
+			install = true;
+			break;
+		case 'r':
+			remove = true;
+			break;
+		case 'h':
+		default:
+			goto error;
+		}
+	}
+
+	/* Required options */
+	if (ifindex == -1) {
+		fprintf(stderr, "ERR: required option --dev missing\n");
+		goto error;
+	}
+
+	if (install && remove) {
+		fprintf(stderr, "ERR: pick only one of install or remove\n");
+		goto error;
+	}
+
+	if ((install || remove) && (vlan || cpu != MAX_CPUS)) {
+		fprintf(stderr, "ERR: pick either (install or remove) or vlan and cpu\n");
+		goto error;
+	}
+
+	if (strlen(argv[0]) > (sizeof(filename) - sizeof(kern_suffix))) {
+		fprintf(stderr, "filename %s too long\n", argv[0]);
+		return -1;
+	}
+
+	snprintf(filename, sizeof(filename), "%s%s", argv[0], kern_suffix);
+	snprintf(pin_prog_name, sizeof(pin_prog_name), "%s%s", pinpath, argv[0]);
+	snprintf(pin_vlanmap_name, sizeof(pin_vlanmap_name), "%s%s_vlan_map", pinpath, argv[0]);
+	snprintf(pin_countermap_name, sizeof(pin_countermap_name), "%s%s_counter_map", pinpath, argv[0]);
+
+	if (install) {
+
+		/* check to see if already installed */
+		errno = 0;
+		access(pin_prog_name, R_OK);
+		if (errno != ENOENT) {
+			fprintf(stderr, "ERR: %s is already installed\n", argv[0]);
+			return -1;
+		}
+
+		/* load the XDP program and maps with the convenient library */
+		if (load_bpf_file(filename)) {
+			fprintf(stderr, "ERR: load_bpf_file(%s): \n%s",
+				filename, bpf_log_buf);
+			return -1;
+		}
+		if (!prog_fd[0]) {
+			fprintf(stderr, "ERR: load_bpf_file(%s): %d %s\n",
+				filename, errno, strerror(errno));
+			return -1;
+		}
+
+		/* pin the XDP program and maps */
+		if (bpf_obj_pin(prog_fd[0], pin_prog_name) < 0) {
+			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
+				pin_prog_name, errno, strerror(errno));
+			if (errno == 2)
+				fprintf(stderr, "     (is the BPF fs mounted on /sys/fs/bpf?)\n");
+			return -1;
+		}
+		if (bpf_obj_pin(map_fd[0], pin_vlanmap_name) < 0) {
+			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
+				pin_vlanmap_name, errno, strerror(errno));
+			return -1;
+		}
+		if (bpf_obj_pin(map_fd[2], pin_countermap_name) < 0) {
+			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
+				pin_countermap_name, errno, strerror(errno));
+			return -1;
+		}
+
+		/* prep the vlan map with "not used" values */
+		c64 = UNDEF_CPU;
+		for (v64 = 0; v64 < 4096; v64++) {
+			if (bpf_map_update_elem(map_fd[0], &v64, &c64, 0)) {
+				fprintf(stderr, "ERR: preping vlan map failed on v=%llu: %d %s\n",
+					v64, errno, strerror(errno));
+				return -1;
+			}
+		}
+
+		/* prep the cpumap with queue sizes */
+		c64 = 128+64;  /* see note in xdp_redirect_cpu_user.c */
+		for (v64 = 0; v64 < MAX_CPUS; v64++) {
+			if (bpf_map_update_elem(map_fd[1], &v64, &c64, 0)) {
+				if (errno == ENODEV) {
+					/* Save the last CPU number attempted
+					 * into the counters map
+					 */
+					c64 = CPU_COUNT;
+					ret = bpf_map_update_elem(map_fd[2], &c64, &v64, 0);
+					break;
+				}
+
+				fprintf(stderr, "ERR: preping cpu map failed on v=%llu: %d %s\n",
+					v64, errno, strerror(errno));
+				return -1;
+			}
+		}
+
+		/* wire the XDP program to the device */
+		if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], 0) < 0) {
+			fprintf(stderr, "ERR: bpf_set_link_xdp_fd(): %d %s\n",
+				errno, strerror(errno));
+			return -1;
+		}
+
+		return 0;
+	}
+
+	if (remove) {
+
+		/* unlink the program from the device */
+		if (bpf_set_link_xdp_fd(ifindex, -1, 0) < 0)
+			fprintf(stderr, "ERR: bpf_set_link_xdp_fd(): %d %s\n",
+				errno, strerror(errno));
+
+		/* unlink pinned files */
+		if (unlink(pin_prog_name))
+			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
+				pin_prog_name, errno, strerror(errno));
+		if (unlink(pin_vlanmap_name))
+			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
+				pin_vlanmap_name, errno, strerror(errno));
+		if (unlink(pin_countermap_name))
+			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
+				pin_countermap_name, errno, strerror(errno));
+
+		return 0;
+	}
+
+	if (vlan == 0) {
+		fprintf(stderr, "ERR: required option --vlan missing\n");
+		goto error;
+	}
+
+	if (cpu == MAX_CPUS && vlan > 0) {
+		fprintf(stderr, "ERR: required option --cpu missing\n");
+		goto error;
+	}
+
+	vfd = bpf_obj_get(pin_vlanmap_name);
+	if (vfd < 0) {
+		fprintf(stderr, "ERR: can't find pinned map %s: %d %s\n",
+			pin_vlanmap_name, errno, strerror(errno));
+		if (errno == ENOENT)
+			fprintf(stderr, "   (has %s been installed yet?)\n", argv[0]);
+		return -1;
+	}
+
+	/* decode the requested action */
+	if (vlan > 0) {
+		/* check cpu against the max value found */
+		cfd = bpf_obj_get(pin_countermap_name);
+		if (cfd < 0) {
+			fprintf(stderr, "ERR: can't find pinned map %s: %d %s\n",
+				pin_countermap_name, errno, strerror(errno));
+			return -1;
+		}
+		c64 = CPU_COUNT;
+		ret = bpf_map_lookup_elem(cfd, &c64, &v64);
+		if (cpu >= v64) {
+			fprintf(stderr, "ERR: cpu %d greater than max %llu\n", cpu, v64);
+			return -1;
+		}
+
+		/* Note that the value and key pointers really do need to be
+		 * pointers to 64-bit values, else things get a bit muddled.
+		 */
+		v64 = vlan;
+		c64 = cpu;
+		ret = bpf_map_update_elem(vfd, &v64, &c64, 0);
+		if (ret) {
+			fprintf(stderr, "Adding vlan %d CPU %d failed: %d %s\n",
+				vlan, cpu, errno, strerror(errno));
+			return -1;
+		}
+
+	} else {
+		v64 = -vlan;
+		c64 = UNDEF_CPU;
+
+		/* We can't actually delete from a TYPE_ARRAY map, so we
+		 * simply set it to an undefined value.
+		 */
+		ret = bpf_map_update_elem(vfd, &v64, &c64, 0);
+		if (ret) {
+			fprintf(stderr, "Delete of vlan %llu failed: %d %s\n",
+				v64, errno, strerror(errno));
+			return -1;
+		}
+	}
+
+	return 0;
+
+error:
+	usage(argv);
+	return -1;
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH net] vhost: Fix Spectre V1 vulnerability
From: Jason Wang @ 2018-10-30  6:10 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, Josh Poimboeuf,
	Andrea Arcangeli

The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.

Fixing this by sanitizing idx before using it to index d->vqs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f52008bb8df7..3a5f81a66d34 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -30,6 +30,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/interval_tree_generic.h>
+#include <linux/nospec.h>
 
 #include "vhost.h"
 
@@ -1387,6 +1388,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 	if (idx >= d->nvqs)
 		return -ENOBUFS;
 
+	idx = array_index_nospec(idx, d->nvqs);
 	vq = d->vqs[idx];
 
 	mutex_lock(&vq->mutex);
-- 
2.17.1

^ permalink raw reply related

* RE: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Tristram.Ha @ 2018-10-29 21:40 UTC (permalink / raw)
  To: Claudiu.Beznea; +Cc: UNGLinuxDriver, netdev, davem, Nicolas.Ferre
In-Reply-To: <aeaa7533-f8b7-8175-cdd0-d0a6e44a6a77@microchip.com>

> Could you, please, tell me if the above variable was false in your case?
> 
> bool cloned = skb_cloned(*skb) || skb_header_cloned(*skb);
> 
> If yes, then, the proper fix would be as follows:
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 8f5bf9166c11..492a8e1a34cd 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1690,7 +1690,7 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
> struct net_device *ndev)
>                 padlen += ETH_FCS_LEN;
>         }
> 
> -       if (!cloned && headroom + tailroom >= padlen) {
> +       if (!cloned && headroom + tailroom >= ETH_FCS_LEN) {
>                 (*skb)->data = memmove((*skb)->head, (*skb)->data, (*skb)->len);
>                 skb_set_tail_pointer(*skb, (*skb)->len);
>         } else {
> 
> Could you please check if it works in your case (and without your patch)?
> 

Actually doing that reveals another bug:

	if (padlen) {
		if (padlen >= ETH_FCS_LEN)
			skb_put_zero(*skb, padlen - ETH_FCS_LEN);
		else
			skb_trim(*skb, ETH_FCS_LEN - padlen);
	}

My fix calls skb_put_zero with zero length.  Your change calls skb_trim which
actually sets the socket buffer length to 1!

When this problem happens headroom is 2, tailroom is 1, skb->len is 61, and
padlen is 3.

DSA driver is being used.  That is why the length is already padded to 60 bytes
and 1-byte tail tag is added.

BTW, I am not sure while this macb_pad_and_fcs function was added.  Is it to
workaround some hardware bugs?  The code is executed only when
NETIF_IF_HW_CSUM is used.  But if hardware tx checksumming is enabled why
not also use the hardware to calculate CRC?

NETIF_F_SG is not enabled in the MAC I used, so enabling NETIF_IF_HW_CSUM
is rather pointless.  With the padding code the transmit throughput cannot get
higher than 100Mbps in a gigabit connection.

I would recommend to add this option to disable manual padding in one of those
macb_config structures.


^ permalink raw reply

* RE: bnx2: rx_fw_discards: BCM5716 sporadically drops packets when update to driver version 2.2.6
From: Mody, Rasesh @ 2018-10-30  6:47 UTC (permalink / raw)
  To: maowenan, netdev@vger.kernel.org, f.fainelli@gmail.com,
	andrew@lunn.ch, linux-kernel@vger.kernel.org
  Cc: Rahman, Ameen
In-Reply-To: <1a94796f-1760-f332-46f0-6ab0b5a0aab7@huawei.com>

>From: maowenan <maowenan@huawei.com>
>Sent: Thursday, October 25, 2018 8:16 PM
>
>Hi,
>
>After I update version of bnx2 driver from 2.2.1 to 2.2.6, I find BCM5716
>sporadically drops packets, which shows in rx_fw_discards.
>C36-141-5:~ #  ethtool -S NIC0
>
>NIC statistics:
>     rx_ucast_packets: 11902
>
>     rx_mcast_packets: 217
>
>     rx_bcast_packets: 4954320
>
>     rx_filtered_packets: 328793
>
>     rx_fw_discards: 2742
>
>C36-141-5:~ #
>
>5s later:
>
>C36-141-5:~ #  ethtool -S NIC0
>
>NIC statistics:
>     rx_ucast_packets: 11910
>
>     rx_mcast_packets: 217
>
>     rx_bcast_packets: 4958117
>
>     rx_filtered_packets: 328897
>
>     rx_fw_discards: 2750
>
>C36-141-5:~ #
>
>so rx_fw_discards: 2742-----> rx_fw_discards: 2750, lost 8 packets.
>
>the information of bnx2
>C36-141-5:~ # modinfo bnx2
>kernel/drivers/net/ethernet/broadcom/bnx2.ko
>
>firmware:       bnx2/bnx2-rv2p-09ax-6.0.17.fw
>
>firmware:       bnx2/bnx2-rv2p-09-6.0.17.fw
>
>firmware:       bnx2/bnx2-mips-09-6.2.1b.fw
>
>firmware:       bnx2/bnx2-rv2p-06-6.0.15.fw
>
>firmware:       bnx2/bnx2-mips-06-6.2.3.fw
>version:        2.2.6
>
>
>1) Firstly, I check the patches from 2.2.1 to 2.2.6, below patch is interesting.
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id
>=0021850d0417a4dc38ed871d929b651b87e2ead9
>Do not enable filter SORT MODE in chip init routine. This patch addresses an
>issue where BCM5716 sporadically drops packets when changing multicast list.
>
>diff --git a/drivers/net/ethernet/broadcom/bnx2.c
>b/drivers/net/ethernet/broadcom/bnx2.c
>index 8957eb5f4478..8c9a8b7787d2 100644
>--- a/drivers/net/ethernet/broadcom/bnx2.c
>+++ b/drivers/net/ethernet/broadcom/bnx2.c
>@@ -4984,8 +4984,6 @@ bnx2_init_chip(struct bnx2 *bp)
>
>        bp->idle_chk_status_idx = 0xffff;
>
>-       bp->rx_mode = BNX2_EMAC_RX_MODE_SORT_MODE;
>-
>        /* Set up how to generate a link change interrupt. */
>        BNX2_WR(bp, BNX2_EMAC_ATTENTION_ENA,
>BNX2_EMAC_ATTENTION_ENA_LINK);
>
>
>2) Secondly, I revert this patch, after verify it, I find rx_fw_discards does not
>increasing.
>so I think this patch can fix current issue. But I'm not sure the issue of this
>patch to fix will be reproduced?
>I'm not convinced that what factor will trigger rx_fw_discards increasing?
>And how to fix this?

Can you please reword your point above? i.e. what is working and what is not. I am not sure if I understand it completely.

Is the rx_fw_disacard count incrementing with 2.2.6 upstream driver on BCM5716? What is the kernel version?
Which test is being run?

>
>Thanks a lot.
>
>
>
>
>
>
>
>
>
>
>


^ permalink raw reply

* [PATCH bpf] tools/bpf: add unlimited rlimit for flow_dissector_load
From: Yonghong Song @ 2018-10-29 21:56 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

On our test machine, bpf selftest test_flow_dissector.sh failed
with the following error:
  # ./test_flow_dissector.sh
  bpffs not mounted. Mounting...
  libbpf: failed to create map (name: 'jmp_table'): Operation not permitted
  libbpf: failed to load object 'bpf_flow.o'
  ./flow_dissector_load: bpf_prog_load bpf_flow.o
  selftests: test_flow_dissector [FAILED]

Let us increase the rlimit to remove the above map
creation failure.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/flow_dissector_load.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/flow_dissector_load.c b/tools/testing/selftests/bpf/flow_dissector_load.c
index d3273b5b3173..ae8180b11d5f 100644
--- a/tools/testing/selftests/bpf/flow_dissector_load.c
+++ b/tools/testing/selftests/bpf/flow_dissector_load.c
@@ -11,6 +11,8 @@
 #include <bpf/bpf.h>
 #include <bpf/libbpf.h>
 
+#include "bpf_rlimit.h"
+
 const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
 const char *cfg_map_name = "jmp_table";
 bool cfg_attach = true;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] Fix ss Netid column and Local/Peer_Address
From: Stefano Brivio @ 2018-10-29 22:03 UTC (permalink / raw)
  To: Yoann P.; +Cc: netdev, Stephen Hemminger
In-Reply-To: <2356588.cRoqrM3dbm@yo-gs>

On Mon, 29 Oct 2018 21:06:35 +0100
"Yoann P." <yoann.p.public@gmail.com> wrote:

> > By the way, why do you use column(1), when ss already prints output in
> > columns? Any other issue you are working around?  
>
> column can hide columns with "-H -" and is a bit faster than awk to output a 
> single column according to time, it's the only reason I mentioned it.

Okay, but why do you need to hide some columns in the first place? I'm
wondering if your use case would justify adding options to print
selected columns only, in a generic way (right now, you can only
disable some).

Another possibility would be to rename "Local Address:" to "Local:" and
"Peer Address:" to "Peer:" -- in some cases (UNIX sockets) it's already
not so much of an address, more of a path, and "Address" doesn't really
add value when the field contains an address.

I don't like too much "Local_Address:" and "Peer_Address:" as the
output is supposed to be human-readable by default, and that underscore
just doesn't fit.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH] Fix ss Netid column and Local/Peer_Address
From: Stefano Brivio @ 2018-10-29 22:03 UTC (permalink / raw)
  To: Yoann P.; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1657380.VKiAY23QHg@yo-gs>

On Mon, 29 Oct 2018 21:07:47 +0100
"Yoann P." <yoann.p.public@gmail.com> wrote:

> > -                       printed = printf("%s", current_field->ldelim);
> > +                       printed = printf("%s", f->ldelim);
> >                 else
> >                         printed = 0;  
> 
> I can't reproduce the issue with this modification. :).

Thanks a lot for testing, and especially for reporting this! I'll
submit this as a patch in a moment.

-- 
Stefano

^ permalink raw reply

* [PATCH iproute] ss: Actually print left delimiter for columns
From: Stefano Brivio @ 2018-10-29 22:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Yoann P., netdev

While rendering columns, we use a local variable to keep track of the
field currently being printed, without touching current_field, which is
used for buffering.

Use the right pointer to access the left delimiter for the current column,
instead of always printing the left delimiter for the last buffered field,
which is usually an empty string.

This fixes an issue especially visible on narrow terminals, where some
columns might be displayed without separation.

Reported-by: YoyPa <yoann.p.public@gmail.com>
Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: YoyPa <yoann.p.public@gmail.com>
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index c8970438ce73..4d12fb5d19df 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1260,7 +1260,7 @@ static void render(void)
 	while (token) {
 		/* Print left delimiter only if we already started a line */
 		if (line_started++)
-			printed = printf("%s", current_field->ldelim);
+			printed = printf("%s", f->ldelim);
 		else
 			printed = 0;
 
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH bpf] tools/bpf: add unlimited rlimit for flow_dissector_load
From: Song Liu @ 2018-10-29 22:09 UTC (permalink / raw)
  To: yhs; +Cc: ast, Daniel Borkmann, Networking, kernel-team
In-Reply-To: <20181029215648.3043831-1-yhs@fb.com>

On Mon, Oct 29, 2018 at 2:58 PM Yonghong Song <yhs@fb.com> wrote:
>
> On our test machine, bpf selftest test_flow_dissector.sh failed
> with the following error:
>   # ./test_flow_dissector.sh
>   bpffs not mounted. Mounting...
>   libbpf: failed to create map (name: 'jmp_table'): Operation not permitted
>   libbpf: failed to load object 'bpf_flow.o'
>   ./flow_dissector_load: bpf_prog_load bpf_flow.o
>   selftests: test_flow_dissector [FAILED]
>
> Let us increase the rlimit to remove the above map
> creation failure.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/testing/selftests/bpf/flow_dissector_load.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/flow_dissector_load.c b/tools/testing/selftests/bpf/flow_dissector_load.c
> index d3273b5b3173..ae8180b11d5f 100644
> --- a/tools/testing/selftests/bpf/flow_dissector_load.c
> +++ b/tools/testing/selftests/bpf/flow_dissector_load.c
> @@ -11,6 +11,8 @@
>  #include <bpf/bpf.h>
>  #include <bpf/libbpf.h>
>
> +#include "bpf_rlimit.h"
> +
>  const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
>  const char *cfg_map_name = "jmp_table";
>  bool cfg_attach = true;
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next] xdp: sample code for redirecting vlan packets to specific cpus
From: John Fastabend @ 2018-10-29 22:11 UTC (permalink / raw)
  To: Shannon Nelson, ast, daniel
  Cc: netdev, eric.dumazet, silviu.smarandache, shannon.lee.nelson
In-Reply-To: <1540847973-3540-1-git-send-email-shannon.nelson@oracle.com>

On 10/29/2018 02:19 PM, Shannon Nelson wrote:
> This is an example of using XDP to redirect the processing of
> particular vlan packets to specific CPUs.  This is in response
> to comments received on a kernel patch put forth previously
> to do something similar using RPS.
>      https://www.spinics.net/lists/netdev/msg528210.html
>      [PATCH net-next] net: enable RPS on vlan devices
> 
> This XDP application watches for inbound vlan-tagged packets
> and redirects those packets to be processed on a specific CPU
> as configured in a BPF map.  The BPF map can be modified by
> this user program, which can also load and unload the kernel
> XDP code.
> 
> One example use is for supporting VMs where we can't control the
> OS being used: we'd like to separate the VM CPU processing from
> the host's CPUs as a way to help mitigate L1TF related issues.
> When running the VM's traffic on a vlan we can stick the host's
> Rx processing on one set of CPUs separate from the VM's CPUs.
> 
> This example currently uses a vlan key and cpu value in the
> BPF map, so only can do one CPU per vlan.  This could easily
> be modified to use a bitpattern of CPUs rather than a CPU id
> to allow multiple CPUs per vlan.

Great, so does this solve your use case then? At least on drivers
with XDP support?

> 
> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> ---

Some really small and trivial nits below.

Acked-by: John Fastabend <john.fastabend@gmail.com>

[...]

> +	if (install) {
> +

new line probably not needed. 

> +		/* check to see if already installed */
> +		errno = 0;
> +		access(pin_prog_name, R_OK);
> +		if (errno != ENOENT) {
> +			fprintf(stderr, "ERR: %s is already installed\n", argv[0]);
> +			return -1;
> +		}
> +
> +		/* load the XDP program and maps with the convenient library */
> +		if (load_bpf_file(filename)) {
> +			fprintf(stderr, "ERR: load_bpf_file(%s): \n%s",
> +				filename, bpf_log_buf);
> +			return -1;
> +		}
> +		if (!prog_fd[0]) {
> +			fprintf(stderr, "ERR: load_bpf_file(%s): %d %s\n",
> +				filename, errno, strerror(errno));
> +			return -1;
> +		}
> +
> +		/* pin the XDP program and maps */
> +		if (bpf_obj_pin(prog_fd[0], pin_prog_name) < 0) {
> +			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
> +				pin_prog_name, errno, strerror(errno));
> +			if (errno == 2)
> +				fprintf(stderr, "     (is the BPF fs mounted on /sys/fs/bpf?)\n");
> +			return -1;
> +		}
> +		if (bpf_obj_pin(map_fd[0], pin_vlanmap_name) < 0) {
> +			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
> +				pin_vlanmap_name, errno, strerror(errno));
> +			return -1;
> +		}
> +		if (bpf_obj_pin(map_fd[2], pin_countermap_name) < 0) {
> +			fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
> +				pin_countermap_name, errno, strerror(errno));
> +			return -1;
> +		}
> +
> +		/* prep the vlan map with "not used" values */
> +		c64 = UNDEF_CPU;
> +		for (v64 = 0; v64 < 4096; v64++) {

maybe #define MAX_VLANS 4096 just to avoid constants.

> +			if (bpf_map_update_elem(map_fd[0], &v64, &c64, 0)) {
> +				fprintf(stderr, "ERR: preping vlan map failed on v=%llu: %d %s\n",
> +					v64, errno, strerror(errno));
> +				return -1;
> +			}
> +		}
> +
> +		/* prep the cpumap with queue sizes */
> +		c64 = 128+64;  /* see note in xdp_redirect_cpu_user.c */
> +		for (v64 = 0; v64 < MAX_CPUS; v64++) {
> +			if (bpf_map_update_elem(map_fd[1], &v64, &c64, 0)) {
> +				if (errno == ENODEV) {
> +					/* Save the last CPU number attempted
> +					 * into the counters map
> +					 */
> +					c64 = CPU_COUNT;
> +					ret = bpf_map_update_elem(map_fd[2], &c64, &v64, 0);
> +					break;
> +				}
> +
> +				fprintf(stderr, "ERR: preping cpu map failed on v=%llu: %d %s\n",
> +					v64, errno, strerror(errno));
> +				return -1;
> +			}
> +		}
> +
> +		/* wire the XDP program to the device */
> +		if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], 0) < 0) {
> +			fprintf(stderr, "ERR: bpf_set_link_xdp_fd(): %d %s\n",
> +				errno, strerror(errno));
> +			return -1;
> +		}
> +
> +		return 0;
> +	}
> +
> +	if (remove) {
> +
> +		/* unlink the program from the device */
> +		if (bpf_set_link_xdp_fd(ifindex, -1, 0) < 0)
> +			fprintf(stderr, "ERR: bpf_set_link_xdp_fd(): %d %s\n",
> +				errno, strerror(errno));
> +
> +		/* unlink pinned files */
> +		if (unlink(pin_prog_name))
> +			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
> +				pin_prog_name, errno, strerror(errno));
> +		if (unlink(pin_vlanmap_name))
> +			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
> +				pin_vlanmap_name, errno, strerror(errno));
> +		if (unlink(pin_countermap_name))
> +			fprintf(stderr, "ERR: unlink(%s): %d %s\n",
> +				pin_countermap_name, errno, strerror(errno));
> +
> +		return 0;
> +	}
> +
> +	if (vlan == 0) {
> +		fprintf(stderr, "ERR: required option --vlan missing\n");
> +		goto error;
> +	}
> +
> +	if (cpu == MAX_CPUS && vlan > 0) {
> +		fprintf(stderr, "ERR: required option --cpu missing\n");
> +		goto error;
> +	}
> +
> +	vfd = bpf_obj_get(pin_vlanmap_name);
> +	if (vfd < 0) {
> +		fprintf(stderr, "ERR: can't find pinned map %s: %d %s\n",
> +			pin_vlanmap_name, errno, strerror(errno));
> +		if (errno == ENOENT)
> +			fprintf(stderr, "   (has %s been installed yet?)\n", argv[0]);
> +		return -1;
> +	}
> +
> +	/* decode the requested action */
> +	if (vlan > 0) {
> +		/* check cpu against the max value found */
> +		cfd = bpf_obj_get(pin_countermap_name);
> +		if (cfd < 0) {
> +			fprintf(stderr, "ERR: can't find pinned map %s: %d %s\n",
> +				pin_countermap_name, errno, strerror(errno));
> +			return -1;
> +		}
> +		c64 = CPU_COUNT;
> +		ret = bpf_map_lookup_elem(cfd, &c64, &v64);
> +		if (cpu >= v64) {

Need to check ret code?

> +			fprintf(stderr, "ERR: cpu %d greater than max %llu\n", cpu, v64);
> +			return -1;
> +		}
> +
> +		/* Note that the value and key pointers really do need to be
> +		 * pointers to 64-bit values, else things get a bit muddled.
> +		 */
> +		v64 = vlan;
> +		c64 = cpu;
> +		ret = bpf_map_update_elem(vfd, &v64, &c64, 0);
> +		if (ret) {
> +			fprintf(stderr, "Adding vlan %d CPU %d failed: %d %s\n",
> +				vlan, cpu, errno, strerror(errno));
> +			return -1;
> +		}
> +
> +	} else {
> +		v64 = -vlan;
> +		c64 = UNDEF_CPU;
> +
> +		/* We can't actually delete from a TYPE_ARRAY map, so we
> +		 * simply set it to an undefined value.
> +		 */
> +		ret = bpf_map_update_elem(vfd, &v64, &c64, 0);
> +		if (ret) {
> +			fprintf(stderr, "Delete of vlan %llu failed: %d %s\n",
> +				v64, errno, strerror(errno));
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +
> +error:
> +	usage(argv);
> +	return -1;
> +}
> 

^ permalink raw reply

* Re: [PATCH bpf-next] bpf_load: add map name to load_maps error message
From: John Fastabend @ 2018-10-29 22:11 UTC (permalink / raw)
  To: Shannon Nelson, ast, daniel; +Cc: netdev, shannon.lee.nelson
In-Reply-To: <1540847681-3273-1-git-send-email-shannon.nelson@oracle.com>

On 10/29/2018 02:14 PM, Shannon Nelson wrote:
> To help when debugging bpf/xdp load issues, have the load_map()
> error message include the number and name of the map that
> failed.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> ---
>  samples/bpf/bpf_load.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index 89161c9..5de0357 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -282,8 +282,8 @@ static int load_maps(struct bpf_map_data *maps, int nr_maps,
>  							numa_node);
>  		}
>  		if (map_fd[i] < 0) {
> -			printf("failed to create a map: %d %s\n",
> -			       errno, strerror(errno));
> +			printf("failed to create map %d (%s): %d %s\n",
> +			       i, maps[i].name, errno, strerror(errno));
>  			return 1;
>  		}
>  		maps[i].fd = map_fd[i];
> 

LGTM

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH] bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data
From: Song Liu @ 2018-10-29 22:13 UTC (permalink / raw)
  To: John Fastabend; +Cc: Daniel Borkmann, Alexei Starovoitov, Networking
In-Reply-To: <789e7f30-f567-37d3-6cd2-6d9baaa662e8@gmail.com>

On Mon, Oct 29, 2018 at 1:32 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> On 10/29/2018 12:31 PM, John Fastabend wrote:
> > We return 0 in the case of a nonblocking socket that has no data
> > available. However, this is incorrect and may confuse applications.
> > After this patch we do the correct thing and return the error
> > EAGAIN.
> >
> > Quoting return codes from recvmsg manpage,
> >
> > EAGAIN or EWOULDBLOCK
> >  The socket is marked nonblocking and the receive operation would
> >  block, or a receive timeout had been set and the timeout expired
> >  before data was received.
> >
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> > ---
>
> Add fixes tag.
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
>
>
>
>

^ permalink raw reply

* Re: [PATCH] Fix ss Netid column and Local/Peer_Address
From: Yoann P. @ 2018-10-29 22:20 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20181029230307.12919fb6@redhat.com>

Le lundi 29 octobre 2018, 23:03:07 CET Stefano Brivio a écrit :
> On Mon, 29 Oct 2018 21:06:35 +0100
> 
> "Yoann P." <yoann.p.public@gmail.com> wrote:
> > > By the way, why do you use column(1), when ss already prints output in
> > > columns? Any other issue you are working around?
> > 
> > column can hide columns with "-H -" and is a bit faster than awk to output
> > a single column according to time, it's the only reason I mentioned it.
> Okay, but why do you need to hide some columns in the first place? I'm
> wondering if your use case would justify adding options to print
> selected columns only, in a generic way (right now, you can only
> disable some).
> 
> Another possibility would be to rename "Local Address:" to "Local:" and
> "Peer Address:" to "Peer:" -- in some cases (UNIX sockets) it's already
> not so much of an address, more of a path, and "Address" doesn't really
> add value when the field contains an address.
> 
> I don't like too much "Local_Address:" and "Peer_Address:" as the
> output is supposed to be human-readable by default, and that underscore
> just doesn't fit.
I send the peer address column to geoiplookup (currently changing to 
mmdblookup as geoip database is replaced by geolite2) to recover Country, 
Asnum and ASname of peers.

^ permalink raw reply

* Re: [PATCH 1/2] net:stmmac: dwmac-mediatek: add support for mt2712
From: biao huang @ 2018-10-30  7:11 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem, robh+dt, honghui.zhang, yt.shen, liguo.zhang, mark.rutland,
	sean.wang, nelson.chang, matthias.bgg, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, joabreu
In-Reply-To: <20181029122731.GA9174@lunn.ch>

Thanks for your kindly comments.
On Mon, 2018-10-29 at 13:27 +0100, Andrew Lunn wrote:
> > +static int mt2712_config_dt(struct mediatek_dwmac_plat_data *plat)
> > +{
> > +	u32 mac_timings[4];
> > +
> > +	plat->peri_regmap = syscon_regmap_lookup_by_compatible("mediatek,mt2712-pericfg");
> > +	if (IS_ERR(plat->peri_regmap)) {
> > +		dev_err(plat->dev, "Failed to get pericfg syscon\n");
> > +		return PTR_ERR(plat->peri_regmap);
> > +	}
> > +
> > +	if (!of_property_read_u32_array(plat->np, "mac-delay", mac_timings,
> > +					ARRAY_SIZE(mac_timings))) {
> > +		plat->mac_delay.tx_delay = mac_timings[0];
> > +		plat->mac_delay.rx_delay = mac_timings[1];
> > +		plat->mac_delay.tx_inv = mac_timings[2];
> > +		plat->mac_delay.rx_inv = mac_timings[3];
> > +	}
> > +
> > +	plat->fine_tune = of_property_read_bool(plat->np, "fine-tune");
> > +
> > +	plat->rmii_rxc = of_property_read_bool(plat->np, "rmii-rxc");
> 
> Please add document for these properties in the binding.
ok, I forgot these properties.
> 
> Ideally, you should reuse the binding that some of the other stmmac
> glue layer uses. e.g. there is already allwinner,tx-delay-ps,
> allwinner,rx-delay-ps.
take it, seems that tx-delay/rx-delay will be more readable than
mac-delay in dts. will be changed in next patch.
> 
>        Thanks
>               Andrew

^ permalink raw reply

* Re: [PATCH 1/2] net:stmmac: dwmac-mediatek: add support for mt2712
From: biao huang @ 2018-10-30  7:16 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: davem, robh+dt, mark.rutland, devicetree, nelson.chang, andrew,
	netdev, sean.wang, liguo.zhang, linux-kernel, matthias.bgg,
	joabreu, linux-mediatek, honghui.zhang, yt.shen, linux-arm-kernel
In-Reply-To: <20181029100828.GA19103@Red>

Thanks for your nice comments.
On Mon, 2018-10-29 at 11:08 +0100, Corentin Labbe wrote:
> Hello
> I have some minor comments below
> 
> On Mon, Oct 29, 2018 at 11:04:53AM +0800, Biao Huang wrote:
> > Add Ethernet support for MediaTek SoCs from the mt2712 family
> > 
> > Signed-off-by: Biao Huang <biao.huang@mediatek.com>
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/Kconfig        |    8 +
> >  drivers/net/ethernet/stmicro/stmmac/Makefile       |    1 +
> >  .../net/ethernet/stmicro/stmmac/dwmac-mediatek.c   |  364 ++++++++++++++++++++
> >  3 files changed, 373 insertions(+)
> >  create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> > index edf2036..76d779e 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
> > +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> > @@ -75,6 +75,14 @@ config DWMAC_LPC18XX
> >  	---help---
> >  	  Support for NXP LPC18xx/43xx DWMAC Ethernet.
> >  
> > +config DWMAC_MEDIATEK
> > +	tristate "MediaTek MT27xx GMAC support"
> > +	depends on OF
> 
> You should add something like && (COMPILE_TEST || ARCH_MEDIATEK)
ok, I'll add it in next patch.
> 
> [...]
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
> > new file mode 100644
> > index 0000000..9ccf3a5
> > --- /dev/null
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c
> > @@ -0,0 +1,364 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +//
> > +// Copyright (c) 2018 MediaTek Inc.
> 
> Only SPDX can use the // comment style, the rest should use /**/
will fix it.
> 
> 
> [...]
> > +static int mt2712_set_interface(struct mediatek_dwmac_plat_data *plat)
> > +{
> > +	int rmii_rxc = plat->rmii_rxc ? RMII_CLK_SRC_RXC : 0;
> > +	u32 intf_val = 0;
> > +
> > +	/* select phy interface in top control domain */
> > +	switch (plat->phy_mode) {
> > +	case PHY_INTERFACE_MODE_MII:
> > +		intf_val |= PHY_INTF_MII_GMII;
> > +		break;
> > +	case PHY_INTERFACE_MODE_RMII:
> > +		intf_val |= PHY_INTF_RMII;
> > +		intf_val |= rmii_rxc;
> > +		break;
> > +	case PHY_INTERFACE_MODE_RGMII:
> > +	case PHY_INTERFACE_MODE_RGMII_TXID:
> > +	case PHY_INTERFACE_MODE_RGMII_RXID:
> > +	case PHY_INTERFACE_MODE_RGMII_ID:
> > +		intf_val |= PHY_INTF_RGMII;
> > +		break;
> > +	default:
> > +		pr_err("phy interface not support\n");
> 
> I think you could use dev_err() instead.
> And I think it is better spelled "not supported".
yes, take it.
> 
> 
> [...]
> > +static int mediatek_dwmac_probe(struct platform_device *pdev)
> > +{
> > +	int ret = 0;
> > +	struct plat_stmmacenet_data *plat_dat;
> > +	struct stmmac_resources stmmac_res;
> > +	struct mediatek_dwmac_plat_data *priv_plat;
> > +
> > +	priv_plat = devm_kzalloc(&pdev->dev, sizeof(*priv_plat), GFP_KERNEL);
> > +	if (!priv_plat)
> > +		return -ENOMEM;
> > +
> > +	priv_plat->variant = of_device_get_match_data(&pdev->dev);
> > +	if (!priv_plat->variant) {
> > +		dev_err(&pdev->dev, "Missing dwmac-mediatek variant\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	priv_plat->dev = &pdev->dev;
> > +	priv_plat->np = pdev->dev.of_node;
> > +	priv_plat->phy_mode = of_get_phy_mode(priv_plat->np);
> > +
> > +	ret = mediatek_dwmac_config_dt(priv_plat);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = stmmac_get_platform_resources(pdev, &stmmac_res);
> > +	if (ret)
> > +		return ret;
> > +
> > +	plat_dat = stmmac_probe_config_dt(pdev, &stmmac_res.mac);
> > +	if (IS_ERR(plat_dat))
> > +		return PTR_ERR(plat_dat);
> > +
> > +	plat_dat->interface = priv_plat->phy_mode;
> > +	/* clk_csr_i = 250-300MHz & MDC = clk_csr_i/124 */
> > +	plat_dat->clk_csr = 5;
> > +	plat_dat->has_gmac4 = 1;
> > +	plat_dat->has_gmac = 0;
> > +	plat_dat->pmt = 0;
> > +	plat_dat->maxmtu = 1500;
> 
> ETH_DATA_LEN ?
how about getting maxmtu from device tree rather than assignment here?
> 
> Regards

^ permalink raw reply

* [PATCH iproute2] bpf: check map symbol type properly with newer llvm compiler
From: Yonghong Song @ 2018-10-29 22:32 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
  -bash-4.4$ cat t.c
  __attribute__((section("maps"))) int g;
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 2 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 g
  -bash-4.4$

The following llvm commit enables BPF target to generate
proper symbol type and size.
  commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
  Author: Yonghong Song <yhs@fb.com>
  Date:   Wed Sep 19 16:04:13 2018 +0000

      [bpf] Symbol sizes and types in object file

      Clang-compiled object files currently don't include the symbol sizes and
      types.  Some tools however need that information.  For example, ctfconvert
      uses that information to generate FreeBSD's CTF representation from ELF
      files.
      With this patch, symbol sizes and types are included in object files.

      Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
      Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>

Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but OBJECT.
  -bash-4.4$ clang -target bpf -O2 -c t.c
  -bash-4.4$ readelf -s t.o

  Symbol table '.symtab' contains 3 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS t.c
       2: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 g
-bash-4.4$

This patch makes sure bpf library accepts both NOTYPE and OBJECT types
of global map symbols.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 lib/bpf.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index d093d0bd..45f279fa 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -1758,11 +1758,13 @@ static const char *bpf_map_fetch_name(struct bpf_elf_ctx *ctx, int which)
 	int i;
 
 	for (i = 0; i < ctx->sym_num; i++) {
+		int type = GELF_ST_TYPE(sym.st_info);
+
 		if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
 			continue;
 
 		if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
-		    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+		    (type != STT_NOTYPE && type != STT_OBJECT) ||
 		    sym.st_shndx != ctx->sec_maps ||
 		    sym.st_value / ctx->map_len != which)
 			continue;
@@ -1849,11 +1851,13 @@ static int bpf_map_num_sym(struct bpf_elf_ctx *ctx)
 	GElf_Sym sym;
 
 	for (i = 0; i < ctx->sym_num; i++) {
+		int type = GELF_ST_TYPE(sym.st_info);
+
 		if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
 			continue;
 
 		if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
-		    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+		    (type != STT_NOTYPE && type != STT_OBJECT) ||
 		    sym.st_shndx != ctx->sec_maps)
 			continue;
 		num++;
@@ -1927,10 +1931,12 @@ static int bpf_map_verify_all_offs(struct bpf_elf_ctx *ctx, int end)
 		 * the table again.
 		 */
 		for (i = 0; i < ctx->sym_num; i++) {
+			int type = GELF_ST_TYPE(sym.st_info);
+
 			if (gelf_getsym(ctx->sym_tab, i, &sym) != &sym)
 				continue;
 			if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
-			    GELF_ST_TYPE(sym.st_info) != STT_NOTYPE ||
+			    (type != STT_NOTYPE && type != STT_OBJECT) ||
 			    sym.st_shndx != ctx->sec_maps)
 				continue;
 			if (sym.st_value == off)
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf-next] xdp: sample code for redirecting vlan packets to specific cpus
From: Shannon Nelson @ 2018-10-29 23:11 UTC (permalink / raw)
  To: John Fastabend, ast, daniel
  Cc: netdev, eric.dumazet, silviu.smarandache, shannon.lee.nelson
In-Reply-To: <499cc6e8-f533-3b90-b053-b3a1c7fa3a15@gmail.com>

On 10/29/2018 3:11 PM, John Fastabend wrote:
> On 10/29/2018 02:19 PM, Shannon Nelson wrote:
>> This is an example of using XDP to redirect the processing of
>> particular vlan packets to specific CPUs.  This is in response
>> to comments received on a kernel patch put forth previously
>> to do something similar using RPS.
>>       https://www.spinics.net/lists/netdev/msg528210.html
>>       [PATCH net-next] net: enable RPS on vlan devices
>>
>> This XDP application watches for inbound vlan-tagged packets
>> and redirects those packets to be processed on a specific CPU
>> as configured in a BPF map.  The BPF map can be modified by
>> this user program, which can also load and unload the kernel
>> XDP code.
>>
>> One example use is for supporting VMs where we can't control the
>> OS being used: we'd like to separate the VM CPU processing from
>> the host's CPUs as a way to help mitigate L1TF related issues.
>> When running the VM's traffic on a vlan we can stick the host's
>> Rx processing on one set of CPUs separate from the VM's CPUs.
>>
>> This example currently uses a vlan key and cpu value in the
>> BPF map, so only can do one CPU per vlan.  This could easily
>> be modified to use a bitpattern of CPUs rather than a CPU id
>> to allow multiple CPUs per vlan.
> 
> Great, so does this solve your use case then? At least on drivers
> with XDP support?

Well, more or less... the actual issue was a request for our UEK5 
distribution, based on v4.14, which doesn't have support for the CPU 
redirect.  Internal discussion continues.

> 
>>
>> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
>> ---
> 
> Some really small and trivial nits below.
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> 

Thanks,
sln

^ permalink raw reply

* [PATCH net 0/3] net: bql: better deal with GSO
From: Eric Dumazet @ 2018-10-29 23:25 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Willem de Bruijn, Tariq Toukan,
	Eric Dumazet

While BQL bulk dequeue works well for TSO packets, it is
not very efficient as soon as GSO is involved.

On a GSO only workload (UDP or TCP), this patch series
can save about 8 % of cpu cycles on a 40Gbit mlx4 NIC,
by keeping optimal batching, and avoiding expensive
qdisc requeues and reschedules.

This patch series :

- Add netdev_tx_sent_queue_more() so that drivers
  can implement efficient BQL and xmit_more support.

- Implement a work around in dev_hard_start_xmit()
  for drivers not using netdev_tx_sent_queue_more()

- changes mlx4 to use netdev_tx_sent_queue_more()

Eric Dumazet (3):
  net: bql: add netdev_tx_sent_queue_more() helper
  net: do not abort bulk send on BQL status
  net/mlx4_en: use netdev_tx_sent_queue_more()

 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 10 ++++++++--
 include/linux/netdevice.h                  | 12 ++++++++++++
 net/core/dev.c                             |  2 +-
 3 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply

* [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper
From: Eric Dumazet @ 2018-10-29 23:25 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Willem de Bruijn, Tariq Toukan,
	Eric Dumazet
In-Reply-To: <20181029232539.217268-1-edumazet@google.com>

When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
of packets, GSO can later transform this list in another list
of skbs, and each skb is sent to device ndo_start_xmit(),
one at a time, with skb->xmit_more being set to one but
for last skb.

Problem is that very often, BQL limit is hit in the middle of
the packet train, forcing dev_hard_start_xmit() to stop the
bulk send and requeue the end of the list.

BQL role is to avoid head of line blocking, making sure
a qdisc can deliver high priority packets before low priority ones.

But there is no way requeued packets can be bypassed by fresh
packets in the qdisc.

Aborting the bulk send increases TX softirqs, and hot cache
lines (after skb_segment()) are wasted.

Note that for TSO packets, we never split a packet in the middle
because of BQL limit being hit.

Drivers should be able to update BQL counters without
flipping/caring about BQL status, if the current skb
has xmit_more set.

Upper layers are ultimately responsible to stop sending another
packet train when BQL limit is hit.

Code template in a driver might look like the following :

	if (skb->xmit_more) {
		netdev_tx_sent_queue_more(tx_queue, nr_bytes);
		send_doorbell = netif_tx_queue_stopped(tx_queue);
	} else {
		netdev_tx_sent_queue(tx_queue, nr_bytes);
		send_doorbell = true;
	}

Note that netdev_tx_sent_queue_more() use is not mandatory,
since following patch will change dev_hard_start_xmit()
to not care about BQL status.

But it is higly recommended so that xmit_more full benefits
can be reached (less doorbells sent, and less atomic operations as well)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc1d9ed33b3192e9406b17c3107b3235b28ff1b9..beb37232688f7e4a71c932e472454e94df18b865 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3166,6 +3166,18 @@ static inline void netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
 #endif
 }
 
+/* Variant of netdev_tx_sent_queue() for packets with xmit_more.
+ * We do want to change __QUEUE_STATE_STACK_XOFF only for the last
+ * skb of a batch.
+ */
+static inline void netdev_tx_sent_queue_more(struct netdev_queue *dev_queue,
+					     unsigned int bytes)
+{
+#ifdef CONFIG_BQL
+	dql_queued(&dev_queue->dql, bytes);
+#endif
+}
+
 static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
 					unsigned int bytes)
 {
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related

* [PATCH net 2/3] net: do not abort bulk send on BQL status
From: Eric Dumazet @ 2018-10-29 23:25 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Willem de Bruijn, Tariq Toukan,
	Eric Dumazet
In-Reply-To: <20181029232539.217268-1-edumazet@google.com>

Before calling dev_hard_start_xmit(), upper layers tried
to cook optimal skb list based on BQL budget.

Problem is that GSO packets can end up comsuming more than
the BQL budget.

Breaking the loop is not useful, since requeued packets
are ahead of any packets still in the qdisc.

It is also more expensive, since next TX completion will
push these packets later, while skbs are not in cpu caches.

It is also a behavior difference with TSO packets, that can
break the BQL limit by a large amount.

Note that drivers should use netdev_tx_sent_queue_more()
in order to have optimal xmit_more support, and avoid
useless atomic operations as shown in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 77d43ae2a7bbe1267f8430d5c35637d1984f463c..0ffcbdd55fa9ee545c807f2ed3fc178830e3075a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3272,7 +3272,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *de
 		}
 
 		skb = next;
-		if (netif_xmit_stopped(txq) && skb) {
+		if (netif_tx_queue_stopped(txq) && skb) {
 			rc = NETDEV_TX_BUSY;
 			break;
 		}
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox