Netdev List
 help / color / mirror / Atom feed
* [README] Reiteration of statement regarding NetDev conference and NetDev Society
From: David Miller @ 2018-04-25 14:57 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel


As the Linux networking maintainer, I feel the need to reiterate my
statement of a month ago, and to provide some supporting facts on my
end so that there is no confusion in the community on these issues.

I am no longer associated with either the Netdev Society or their
event, the NetDev conference.

Neither is endorsed nor supported by me.

I am writing to provide clarity on this situation to the subscribers
of the 'netdev' mailing list and the community at large.

Specifically, I am writing to make sure there isn't any confusion
through the use of the name "NetDev" that I either continue to plan,
support or participate in this event.  I do not.

I have in the past participated in the NetDev events as a member of
technical committee and as the committee's chair.  I took on this role
because I felt that I could improve the technical content.

I spent many hours helping to plan and make preparations for Netdev
2.2 in Seoul.  In connection with this work, the Executive Director at
the time (Soyoung Park) and I were reimbursed for our airfare
($7321.44 USD) and lodging ($717.55 USD) as we prepared for the
conference during the weeks leading up to the event.

I was also a member of the NetDev Society's board but have formally
left the board and the technical committee, am no longer associated
with planning the NetDev conference and do not intend to participate
in future NetDev conference events.

Having said all of this, I look forward to continued great work on the
netdev project and sharing cool advances in networking at the
networking track of Linux Plumbers Conference.

Thank you.

^ permalink raw reply

* Re: [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Toke Høiland-Jørgensen @ 2018-04-25 14:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Kevin Darbyshire-Bryant
In-Reply-To: <20180425075527.680f34fc@xeon-e3>

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Wed, 25 Apr 2018 16:30:22 +0200
> Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> print_uint() will silently promote its variable type to uint64_t, but there
>> is nothing that ensures that the format string specifier passed along with
>> it fits (and the function name suggest to pass "%u").
>> 
>> Fix this by changing print_uint() to use a native 'unsigned int' type, and
>> introduce a separate print_u64() function for printing 64-bit values. All
>> call sites that were actually printing 64-bit values using print_uint() are
>> converted to use print_u64() instead.
>> 
>> Since print_int() was already using native int types, just add a
>> print_s64() to match, but don't convert any call sites.
>> 
>> Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> Yes, this makes sense. Maybe there should be a print_luint for
> consistency.

I just realised I missed a few call sites, so I'll resend. Should I
call the new function print_luint() instead of print_u64()?

> Also, I tried (in vain) to make a version that allows GCC to check the
> format string.  But it was a struggle and just gave up.

Yeah, no idea how to do this either...

-Toke

^ permalink raw reply

* Re: [PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()
From: Stefano Brivio @ 2018-04-25 14:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andreas Christoforou, kernel-hardening, Steffen Klassert,
	Herbert Xu, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	netdev, linux-kernel
In-Reply-To: <20180425144639.GA38350@beast>

On Wed, 25 Apr 2018 07:46:39 -0700
Kees Cook <keescook@chromium.org> wrote:

> In the quest to remove all stack VLA usage removed from the kernel[1],
> just use XFRM_MAX_DEPTH as already done for the "class" array. In one
> case, it'll do this loop up to 5, the other caller up to 6.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v4:
> - actually remove memset(). :)
> v3:
> - adjust Subject and commit log (Steffen)
> - use "= { }" instead of memset() (Stefano)
> v2:
> - use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
> ---

Acked-by: Stefano Brivio <sbrivio@redhat.com>

^ permalink raw reply

* Re: [PATCH] net: net_cls: remove a NULL check for css_cls_state
From: David Miller @ 2018-04-25 15:00 UTC (permalink / raw)
  To: roy.qing.li; +Cc: lirongqing, netdev
In-Reply-To: <CAJFZqHxzNFzg8hBMWc3DS3Nn1hdVxQ9cBhPSy4uL2b6ckCmbtg@mail.gmail.com>

From: Li RongQing <roy.qing.li@gmail.com>
Date: Wed, 25 Apr 2018 16:35:02 +0800

> On 4/20/18, David Miller <davem@davemloft.net> wrote:
>> From: Li RongQing <lirongqing@baidu.com>
>> Date: Thu, 19 Apr 2018 12:59:21 +0800
>>
>>> The input of css_cls_state() is impossible to NULL except
>>> cgrp_css_online, so simplify it
>>>
>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>
>> I don't view this as an improvement.  Just let the helper always check
>> NULL and that way there are less situations to audit.
>>
> css_cls_state maybe return NULL, but nearly no places check the return
> value with NULL, this seems unreadable.

I saw this email of your's the first few times, you don't need to post
it again.

I still disagree with your analysis, and I do not see your change as
an overall improvement.

Thank you.

^ permalink raw reply

* [PATCH bpf-next] selftests/bpf: bpf tunnel test.
From: William Tu @ 2018-04-25 15:01 UTC (permalink / raw)
  To: netdev

The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
changes from the original:
    1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
    2) simplify the original ipip tests (remove iperf tests)
    3) improve documentation
    4) use bpf_ntoh* and bpf_hton* api

In summary, 'test_tunnel_kern.o' contains the following bpf program:
  GRE: gre_set_tunnel, gre_get_tunnel
  IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
  ERSPAN: erspan_set_tunnel, erspan_get_tunnel
  IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
  VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
  IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
  GENEVE: geneve_set_tunnel, geneve_get_tunnel
  IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
  IPIP: ipip_set_tunnel, ipip_get_tunnel
  IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
         ip6ip6_set_tunnel, ip6ip6_get_tunnel

Signed-off-by: William Tu <u9012063@gmail.com>
---
 samples/bpf/Makefile                           |   1 -
 samples/bpf/tcbpf2_kern.c                      | 612 ----------------------
 samples/bpf/test_tunnel_bpf.sh                 | 390 --------------
 tools/testing/selftests/bpf/Makefile           |   5 +-
 tools/testing/selftests/bpf/test_tunnel.sh     | 651 +++++++++++++++++++++++
 tools/testing/selftests/bpf/test_tunnel_kern.c | 691 +++++++++++++++++++++++++
 6 files changed, 1345 insertions(+), 1005 deletions(-)
 delete mode 100644 samples/bpf/tcbpf2_kern.c
 delete mode 100755 samples/bpf/test_tunnel_bpf.sh
 create mode 100755 tools/testing/selftests/bpf/test_tunnel.sh
 create mode 100644 tools/testing/selftests/bpf/test_tunnel_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index aa8c392e2e52..b853581592fd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,7 +114,6 @@ always += sock_flags_kern.o
 always += test_probe_write_user_kern.o
 always += trace_output_kern.o
 always += tcbpf1_kern.o
-always += tcbpf2_kern.o
 always += tc_l2_redirect_kern.o
 always += lathist_kern.o
 always += offwaketime_kern.o
diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
deleted file mode 100644
index fa260c750fb1..000000000000
--- a/samples/bpf/tcbpf2_kern.c
+++ /dev/null
@@ -1,612 +0,0 @@
-/* Copyright (c) 2016 VMware
- * Copyright (c) 2016 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include <uapi/linux/if_ether.h>
-#include <uapi/linux/if_packet.h>
-#include <uapi/linux/ip.h>
-#include <uapi/linux/ipv6.h>
-#include <uapi/linux/in.h>
-#include <uapi/linux/tcp.h>
-#include <uapi/linux/filter.h>
-#include <uapi/linux/pkt_cls.h>
-#include <uapi/linux/erspan.h>
-#include <net/ipv6.h>
-#include "bpf_helpers.h"
-#include "bpf_endian.h"
-
-#define _htonl __builtin_bswap32
-#define ERROR(ret) do {\
-		char fmt[] = "ERROR line:%d ret:%d\n";\
-		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
-	} while(0)
-
-struct geneve_opt {
-	__be16	opt_class;
-	u8	type;
-	u8	length:5;
-	u8	r3:1;
-	u8	r2:1;
-	u8	r1:1;
-	u8	opt_data[8]; /* hard-coded to 8 byte */
-};
-
-struct vxlan_metadata {
-	u32     gbp;
-};
-
-SEC("gre_set_tunnel")
-int _gre_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("gre_get_tunnel")
-int _gre_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "key %d remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_set_tunnel")
-int _ip6gretap_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11); /* ::11 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-	key.tunnel_label = 0xabcde;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
-				     BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_get_tunnel")
-int _ip6gretap_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip6 ::%x label %x\n";
-	struct bpf_tunnel_key key;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_set_tunnel")
-int _erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-#ifdef ERSPAN_V1
-	md.version = 1;
-	md.u.index = bpf_htonl(123);
-#else
-	u8 direction = 1;
-	u8 hwid = 7;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_get_tunnel")
-int _erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_set_tunnel")
-int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11);
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-
-#ifdef ERSPAN_V1
-	md.u.index = htonl(123);
-	md.version = 1;
-#else
-	u8 direction = 0;
-	u8 hwid = 17;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_get_tunnel")
-int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_set_tunnel")
-int _vxlan_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_get_tunnel")
-int _vxlan_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.gbp);
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_set_tunnel")
-int _geneve_set_tunnel(struct __sk_buff *skb)
-{
-	int ret, ret2;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	__builtin_memset(&gopt, 0x0, sizeof(gopt));
-	gopt.opt_class = 0x102; /* Open Virtual Networking (OVN) */
-	gopt.type = 0x08;
-	gopt.r1 = 0;
-	gopt.r2 = 0;
-	gopt.r3 = 0;
-	gopt.length = 2; /* 4-byte multiple */
-	*(int *) &gopt.opt_data = 0xdeadbeef;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_get_tunnel")
-int _geneve_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
-	return TC_ACT_OK;
-}
-
-SEC("ipip_set_tunnel")
-int _ipip_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.tunnel_ttl = 64;
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
-			return TC_ACT_SHOT;
-
-		if (tcp->dest == htons(5200))
-			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-		else if (tcp->dest == htons(5201))
-			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
-		else
-			return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip_get_tunnel")
-int _ipip_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_set_tunnel")
-int _ipip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5) {
-			ERROR(iph->protocol);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_get_tunnel")
-int _ipip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_set_tunnel")
-int _ip6ip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct ipv6hdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->nexthdr == NEXTHDR_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->nexthdr != NEXTHDR_TCP) {
-			ERROR(iph->nexthdr);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_get_tunnel")
-int _ip6ip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("xfrm_get_state")
-int _xfrm_get_state(struct __sk_buff *skb)
-{
-	struct bpf_xfrm_state x;
-	char fmt[] = "reqid %d spi 0x%x remote ip 0x%x\n";
-	int ret;
-
-	ret = bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
-	if (ret < 0)
-		return TC_ACT_OK;
-
-	bpf_trace_printk(fmt, sizeof(fmt), x.reqid, bpf_ntohl(x.spi),
-			 bpf_ntohl(x.remote_ipv4));
-	return TC_ACT_OK;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
deleted file mode 100755
index 9c534dc07b36..000000000000
--- a/samples/bpf/test_tunnel_bpf.sh
+++ /dev/null
@@ -1,390 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-# In Namespace 0 (at_ns0) using native tunnel
-# Overlay IP: 10.1.1.100
-# local 192.16.1.100 remote 192.16.1.200
-# veth0 IP: 172.16.1.100, tunnel dev <type>00
-
-# Out of Namespace using BPF set/get on lwtunnel
-# Overlay IP: 10.1.1.200
-# local 172.16.1.200 remote 172.16.1.100
-# veth1 IP: 172.16.1.200, tunnel dev <type>11
-
-function config_device {
-	ip netns add at_ns0
-	ip link add veth0 type veth peer name veth1
-	ip link set veth0 netns at_ns0
-	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip link set dev veth1 up mtu 1500
-	ip addr add dev veth1 172.16.1.200/24
-}
-
-function add_gre_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-        ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE key 2 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6gretap_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
-		local ::11 remote ::22
-
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip addr add dev $DEV fc80::200/24
-	ip link set dev $DEV up
-}
-
-function add_erspan_tunnel {
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 3
-	fi
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6erspan_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 7
-	fi
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip link set dev $DEV up
-}
-
-function add_vxlan_tunnel {
-	# Set static ARP entry here because iptables set-mark works
-	# on L3 packet, as a result not applying to ARP packets,
-	# causing errors at get_tunnel_{key/opt}.
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
-	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external gbp dstport 4789
-	ip link set dev $DEV address 52:54:00:d9:02:00 up
-	ip addr add dev $DEV 10.1.1.200/24
-	arp -s 10.1.1.100 52:54:00:d9:01:00
-}
-
-function add_geneve_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 6081 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE dstport 6081 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ipip_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function setup_xfrm_tunnel {
-	auth=0x$(printf '1%.0s' {1..40})
-	enc=0x$(printf '2%.0s' {1..32})
-	spi_in_to_out=0x1
-	spi_out_to_in=0x2
-	# in namespace
-	# in -> out
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-			spi $spi_in_to_out reqid 1 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-			spi $spi_out_to_in reqid 2 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip netns exec at_ns0 \
-		ip addr add dev veth0 10.1.1.100/32
-	ip netns exec at_ns0 \
-		ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
-			src 10.1.1.100
-
-	# out of namespace
-	# in -> out
-	ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-		spi $spi_in_to_out reqid 1 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-		spi $spi_out_to_in reqid 2 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip addr add dev veth1 10.1.1.200/32
-	ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
-}
-
-function attach_bpf {
-	DEV=$1
-	SET_TUNNEL=$2
-	GET_TUNNEL=$3
-	tc qdisc add dev $DEV clsact
-	tc filter add dev $DEV egress bpf da obj tcbpf2_kern.o sec $SET_TUNNEL
-	tc filter add dev $DEV ingress bpf da obj tcbpf2_kern.o sec $GET_TUNNEL
-}
-
-function test_gre {
-	TYPE=gretap
-	DEV_NS=gretap00
-	DEV=gretap11
-	config_device
-	add_gre_tunnel
-	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6gre {
-	TYPE=ip6gre
-	DEV_NS=ip6gre00
-	DEV=ip6gre11
-	config_device
-	# reuse the ip6gretap function
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_ip6gretap {
-	TYPE=ip6gretap
-	DEV_NS=ip6gretap00
-	DEV=ip6gretap11
-	config_device
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -i .2 -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_erspan {
-	TYPE=erspan
-	DEV_NS=erspan00
-	DEV=erspan11
-	config_device
-	add_erspan_tunnel $1
-	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6erspan {
-	TYPE=ip6erspan
-	DEV_NS=ip6erspan00
-	DEV=ip6erspan11
-	config_device
-	add_ip6erspan_tunnel $1
-	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
-	ping6 -c 3 ::11
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_vxlan {
-	TYPE=vxlan
-	DEV_NS=vxlan00
-	DEV=vxlan11
-	config_device
-	add_vxlan_tunnel
-	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_geneve {
-	TYPE=geneve
-	DEV_NS=geneve00
-	DEV=geneve11
-	config_device
-	add_geneve_tunnel
-	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ipip {
-	TYPE=ipip
-	DEV_NS=ipip00
-	DEV=ipip11
-	config_device
-	tcpdump -nei veth1 &
-	cat /sys/kernel/debug/tracing/trace_pipe &
-	add_ipip_tunnel
-	ethtool -K veth1 gso off gro off rx off tx off
-	ip link set dev veth1 mtu 1500
-	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ip netns exec at_ns0 iperf -sD -p 5200 > /dev/null
-	sleep 0.2
-	iperf -c 10.1.1.100 -n 5k -p 5200
-	cleanup
-}
-
-function test_xfrm_tunnel {
-	config_device
-        tcpdump -nei veth1 ip &
-	output=$(mktemp)
-	cat /sys/kernel/debug/tracing/trace_pipe | tee $output &
-        setup_xfrm_tunnel
-	tc qdisc add dev veth1 clsact
-	tc filter add dev veth1 proto ip ingress bpf da obj tcbpf2_kern.o \
-		sec xfrm_get_state
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	grep "reqid 1" $output
-	grep "spi 0x1" $output
-	grep "remote ip 0xac100164" $output
-	cleanup
-}
-
-function cleanup {
-	set +ex
-	pkill iperf
-	ip netns delete at_ns0
-	ip link del veth1
-	ip link del ipip11
-	ip link del gretap11
-	ip link del ip6gre11
-	ip link del ip6gretap11
-	ip link del vxlan11
-	ip link del geneve11
-	ip link del erspan11
-	ip link del ip6erspan11
-	ip x s flush
-	ip x p flush
-	pkill tcpdump
-	pkill cat
-	set -ex
-}
-
-trap cleanup 0 2 3 6 9
-cleanup
-echo "Testing GRE tunnel..."
-test_gre
-echo "Testing IP6GRE tunnel..."
-test_ip6gre
-echo "Testing IP6GRETAP tunnel..."
-test_ip6gretap
-echo "Testing ERSPAN tunnel..."
-test_erspan v1
-test_erspan v2
-echo "Testing IP6ERSPAN tunnel..."
-test_ip6erspan v1
-test_ip6erspan v2
-echo "Testing VXLAN tunnel..."
-test_vxlan
-echo "Testing GENEVE tunnel..."
-test_geneve
-echo "Testing IPIP tunnel..."
-test_ipip
-echo "Testing IPSec tunnel..."
-test_xfrm_tunnel
-echo "*** PASS ***"
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0c19d5e08f08..b64a7a39cbc8 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -32,7 +32,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
-	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o
+	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -40,7 +40,8 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_redirect.sh \
 	test_xdp_meta.sh \
 	test_offload.py \
-	test_sock_addr.sh
+	test_sock_addr.sh \
+	test_tunnel.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh
new file mode 100755
index 000000000000..7fce054eaebd
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel.sh
@@ -0,0 +1,651 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# End-to-end eBPF tunnel test suite
+#   The script tests BPF network tunnel implementation.
+#
+# Topology:
+# ---------
+#     root namespace   |     at_ns0 namespace
+#                      |
+#      -----------     |     -----------
+#      | tnl dev |     |     | tnl dev |  (overlay network)
+#      -----------     |     -----------
+#      metadata-mode   |     native-mode
+#       with bpf       |
+#                      |
+#      ----------      |     ----------
+#      |  veth1  | --------- |  veth0  |   (underlay network)
+#      ----------    peer    ----------
+#
+#
+# Device Configuration
+# --------------------
+# Root namespace with metadata-mode tunnel + BPF
+# Device names and addresses:
+# 	veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
+# 	tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
+#
+# Namespace at_ns0 with native tunnel
+# Device names and addresses:
+# 	veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
+# 	tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
+#
+#
+# End-to-end ping packet flow
+# ---------------------------
+# Most of the tests start by namespace creation, device configuration,
+# then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'
+# from root namespace, the following operations happen:
+# 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
+# 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
+#    with remote_ip=172.16.1.200 and others.
+# 3) Outer tunnel header is prepended and route the packet to veth1's egress
+# 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
+# 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
+# 6) Forward the packet to the overlay tnl dev
+
+PING_ARG="-c 3 -w 10 -q"
+ret=0
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+config_device()
+{
+	ip netns add at_ns0
+	ip link add veth0 type veth peer name veth1
+	ip link set veth0 netns at_ns0
+	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip link set dev veth1 up mtu 1500
+	ip addr add dev veth1 172.16.1.200/24
+}
+
+add_gre_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+        ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE key 2 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6gretap_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
+		local ::11 remote ::22
+
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip addr add dev $DEV fc80::200/24
+	ip link set dev $DEV up
+}
+
+add_erspan_tunnel()
+{
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 3
+	fi
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6erspan_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 7
+	fi
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_vxlan_tunnel()
+{
+	# Set static ARP entry here because iptables set-mark works
+	# on L3 packet, as a result not applying to ARP packets,
+	# causing errors at get_tunnel_{key/opt}.
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
+	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external gbp dstport 4789
+	ip link set dev $DEV address 52:54:00:d9:02:00 up
+	ip addr add dev $DEV 10.1.1.200/24
+	arp -s 10.1.1.100 52:54:00:d9:01:00
+}
+
+add_ip6vxlan_tunnel()
+{
+	#ip netns exec at_ns0 ip -4 addr del 172.16.1.100 dev veth0
+	ip netns exec at_ns0 ip -6 addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	#ip -4 addr del 172.16.1.200 dev veth1
+	ip -6 addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 dstport 4789 \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	echo "set dev done" > /dev/kmsg
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external dstport 4789
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_geneve_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		id 2 dstport 6081 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE dstport 6081 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6geneve_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 \
+		remote ::22     # geneve has no local option
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_ipip_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ipip6tnl_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+test_gre()
+{
+	TYPE=gretap
+	DEV_NS=gretap00
+	DEV=gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_gre_tunnel
+	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gre()
+{
+	TYPE=ip6gre
+	DEV_NS=ip6gre00
+	DEV=ip6gre11
+	ret=0
+
+	check $TYPE
+	config_device
+	# reuse the ip6gretap function
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gretap()
+{
+	TYPE=ip6gretap
+	DEV_NS=ip6gretap00
+	DEV=ip6gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_erspan()
+{
+	TYPE=erspan
+	DEV_NS=erspan00
+	DEV=erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_erspan_tunnel $1
+	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6erspan()
+{
+	TYPE=ip6erspan
+	DEV_NS=ip6erspan00
+	DEV=ip6erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6erspan_tunnel $1
+	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
+	ping6 $PING_ARG ::11
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=vxlan00
+	DEV=vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_vxlan_tunnel
+	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=ip6vxlan00
+	DEV=ip6vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6vxlan_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ip6vxlan_set_tunnel ip6vxlan_get_tunnel
+	echo "set bpf done" > /dev/kmsg
+	# underlay
+	ping6 $PING_ARG ::11
+	echo "ping underlay" > /dev/kmsg
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	echo "ping overlay" > /dev/kmsg
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_geneve()
+{
+	TYPE=geneve
+	DEV_NS=geneve00
+	DEV=geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_geneve_tunnel
+	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6geneve()
+{
+	TYPE=geneve
+	DEV_NS=ip6geneve00
+	DEV=ip6geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6geneve_tunnel
+	attach_bpf $DEV ip6geneve_set_tunnel ip6geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_ipip()
+{
+	TYPE=ipip
+	DEV_NS=ipip00
+	DEV=ipip11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ipip6()
+{
+	TYPE=ip6tnl
+	DEV_NS=ipip6tnl00
+	DEV=ipip6tnl11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip6tnl_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip6_set_tunnel ipip6_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+attach_bpf()
+{
+	DEV=$1
+	SET=$2
+	GET=$3
+	tc qdisc add dev $DEV clsact
+	tc filter add dev $DEV egress bpf da obj test_tunnel_kern.o sec $SET
+	tc filter add dev $DEV ingress bpf da obj test_tunnel_kern.o sec $GET
+}
+
+cleanup()
+{
+	ip netns delete at_ns0 2> /dev/null
+	ip link del veth1 2> /dev/null
+	ip link del ipip11 2> /dev/null
+	ip link del ipip6tnl11 2> /dev/null
+	ip link del gretap11 2> /dev/null
+	ip link del ip6gre11 2> /dev/null
+	ip link del ip6gretap11 2> /dev/null
+	ip link del vxlan11 2> /dev/null
+	ip link del ip6vxlan11 2> /dev/null
+	ip link del geneve11 2> /dev/null
+	ip link del ip6geneve11 2> /dev/null
+	ip link del erspan11 2> /dev/null
+	ip link del ip6erspan11 2> /dev/null
+}
+
+cleanup_exit()
+{
+	echo "CATCH SIGKILL or SIGINT, cleanup and exit"
+	cleanup
+	exit 0
+}
+
+check()
+{
+   ip link help $1 2>&1 | grep -q "^Usage:"
+    if [ $? -ne 0 ];then
+        echo "SKIP $1: iproute2 not support"
+        cleanup
+        return 1
+    fi
+}
+
+enable_debug()
+{
+    echo 'file ip_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file ip6_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file vxlan.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file geneve.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file ipip.c +p' > /sys/kernel/debug/dynamic_debug/control
+}
+
+check_err()
+{
+    if [ $ret -eq 0 ]; then
+        ret=$1
+    fi
+}
+
+bpf_tunnel_test()
+{
+	echo "Testing GRE tunnel..."
+	test_gre
+	echo "Testing IP6GRE tunnel..."
+	test_ip6gre
+	echo "Testing IP6GRETAP tunnel..."
+	test_ip6gretap
+	echo "Testing ERSPAN tunnel..."
+	test_erspan v2
+	echo "Testing IP6ERSPAN tunnel..."
+	test_ip6erspan v2
+	echo "Testing VXLAN tunnel..."
+	test_vxlan
+	echo "Testing IP6VXLAN tunnel..."
+	test_ip6vxlan
+	echo "Testing GENEVE tunnel..."
+	test_geneve
+	echo "Testing IP6GENEVE tunnel..."
+	test_ip6geneve
+	echo "Testing IPIP tunnel..."
+	test_ipip
+	echo "Testing IPIP6 tunnel..."
+	test_ipip6
+}
+
+trap cleanup 0 3 6
+trap cleanup_exit 2 9
+
+cleanup
+bpf_tunnel_test
+
+exit 0
diff --git a/tools/testing/selftests/bpf/test_tunnel_kern.c b/tools/testing/selftests/bpf/test_tunnel_kern.c
new file mode 100644
index 000000000000..0bf642c530fd
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel_kern.c
@@ -0,0 +1,691 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016 VMware
+ * Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stddef.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+#include <linux/tcp.h>
+#include <linux/socket.h>
+#include <linux/pkt_cls.h>
+#include <linux/erspan.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define ERROR(ret) do {\
+		char fmt[] = "ERROR line:%d ret:%d\n";\
+		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
+	} while(0)
+
+int _version SEC("version") = 1;
+
+struct geneve_opt {
+	__be16	opt_class;
+	__u8	type;
+	__u8	length:5;
+	__u8	r3:1;
+	__u8	r2:1;
+	__u8	r1:1;
+	__u8	opt_data[8]; /* hard-coded to 8 byte */
+};
+
+struct vxlan_metadata {
+	__u32     gbp;
+};
+
+SEC("gre_set_tunnel")
+int _gre_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("gre_get_tunnel")
+int _gre_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "key %d remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_set_tunnel")
+int _ip6gretap_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+	key.tunnel_label = 0xabcde;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
+				     BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_get_tunnel")
+int _ip6gretap_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+#ifdef ERSPAN_V1
+	md.version = 1;
+	md.u.index = bpf_htonl(123);
+#else
+	__u8 direction = 1;
+	__u8 hwid = 7;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_set_tunnel")
+int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11);
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+
+#ifdef ERSPAN_V1
+	md.u.index = bpf_htonl(123);
+	md.version = 1;
+#else
+	__u8 direction = 0;
+	__u8 hwid = 17;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_get_tunnel")
+int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_set_tunnel")
+int _vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_get_tunnel")
+int _vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.gbp);
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_set_tunnel")
+int _ip6vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_get_tunnel")
+int _ip6vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_set_tunnel")
+int _geneve_set_tunnel(struct __sk_buff *skb)
+{
+	int ret, ret2;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_get_tunnel")
+int _geneve_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_set_tunnel")
+int _ip6geneve_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xfeedbeef);
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_get_tunnel")
+int _ip6geneve_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_set_tunnel")
+int _ipip_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.tunnel_ttl = 64;
+	if (iph->protocol == IPPROTO_ICMP) {
+		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	} else {
+		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
+			return TC_ACT_SHOT;
+
+		if (tcp->dest == bpf_htons(5200))
+			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+		else if (tcp->dest == bpf_htons(5201))
+			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
+		else
+			return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_get_tunnel")
+int _ipip_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_set_tunnel")
+int _ipip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_get_tunnel")
+int _ipip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_set_tunnel")
+int _ip6ip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct ipv6hdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.remote_ipv6[0] = bpf_htonl(0x2401db00);
+	key.tunnel_ttl = 64;
+
+	if (iph->nexthdr == 58 /* NEXTHDR_ICMP */) {
+		key.remote_ipv6[3] = bpf_htonl(1);
+	} else {
+		if (iph->nexthdr != 6 /* NEXTHDR_TCP */) {
+			ERROR(iph->nexthdr);
+			return TC_ACT_SHOT;
+		}
+
+		if (tcp->dest == bpf_htons(5200)) {
+			key.remote_ipv6[3] = bpf_htonl(1);
+		} else if (tcp->dest == bpf_htons(5201)) {
+			key.remote_ipv6[3] = bpf_htonl(2);
+		} else {
+			ERROR(tcp->dest);
+			return TC_ACT_SHOT;
+		}
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_get_tunnel")
+int _ip6ip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 3/8] ARM: dts: stm32: add ethernet pins to stm32mp157c
From: Rob Herring @ 2018-04-25 15:09 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: Mark Rutland, Maxime Coquelin, Alexandre Torgue,
	Giuseppe CAVALLARO, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE, netdev
In-Reply-To: <1524582120-4451-4-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 10:01 AM, Christophe Roullier
<christophe.roullier@st.com> wrote:
> Add ethernet pins on stm32mp157c.
>
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  arch/arm/boot/dts/stm32mp157-pinctrl.dtsi | 46 +++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
>
> diff --git a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> index 6f044100..86720a5 100644
> --- a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> +++ b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> @@ -158,6 +158,52 @@
>                                         bias-disable;
>                                 };
>                         };
> +
> +                       ethernet0_rgmii_pins_a: rgmii@0 {

A unit-address without 'reg' property is not valid, so drop the '@0'.

Please build your dtb with W=1 or W=12 which will tell you this and
other errors.

Rob

^ permalink raw reply

* Re: [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Stephen Hemminger @ 2018-04-25 15:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, Kevin Darbyshire-Bryant
In-Reply-To: <87bme75o5r.fsf@toke.dk>

On Wed, 25 Apr 2018 16:57:52 +0200
Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Stephen Hemminger <stephen@networkplumber.org> writes:
> 
> > On Wed, 25 Apr 2018 16:30:22 +0200
> > Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> >  
> >> print_uint() will silently promote its variable type to uint64_t, but there
> >> is nothing that ensures that the format string specifier passed along with
> >> it fits (and the function name suggest to pass "%u").
> >> 
> >> Fix this by changing print_uint() to use a native 'unsigned int' type, and
> >> introduce a separate print_u64() function for printing 64-bit values. All
> >> call sites that were actually printing 64-bit values using print_uint() are
> >> converted to use print_u64() instead.
> >> 
> >> Since print_int() was already using native int types, just add a
> >> print_s64() to match, but don't convert any call sites.
> >> 
> >> Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>  
> >
> > Yes, this makes sense. Maybe there should be a print_luint for
> > consistency.  
> 
> I just realised I missed a few call sites, so I'll resend. Should I
> call the new function print_luint() instead of print_u64()?

Ideally, there would be both functions, and use based on what is being printed.

> > Also, I tried (in vain) to make a version that allows GCC to check the
> > format string.  But it was a struggle and just gave up.  
> 
> Yeah, no idea how to do this either...

Maybe some magic smatch or multi-line regex it would be possible to
find all instances of print_uint, then look at format string of each and see if there is
a single %u.  Some added complexity since some places only print json and don't care
and pass NULL for format.

^ permalink raw reply

* Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-04-25 15:22 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: cake, Dave Taht
In-Reply-To: <e27a6618-3a68-fa60-53a0-109b4df70482@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote:
>> sch_cake targets the home router use case and is intended to squeeze the
>> most bandwidth and latency out of even the slowest ISP links and routers,
>> while presenting an API simple enough that even an ISP can configure it.
>> 
>
> * Support for ack filtering.
>
> Oh my god. Cake became a monster.

Haha, you either die a hero or live long enough to see yourself become
the villain, I guess? ;)

We do realise there are lots of good reasons not to do ACK filtering;
but it does make a difference on highly asymmetrical links (see
http://blog.cerowrt.org/post/ack_filtering/), which sadly some people
are still behind.

> syzkaller will be very happy to trigger all kind of bugs in it.

Heh, right.

> Lack of any pskb_may_pull() is really concerning.

By this you mean "check that the packet is long enough to contain the
header we are looking for before trying to do ACK filtering", right?

> How ack filter deals with reorders ?

I think this will do the right thing?

		/* new ack sequence must be greater
		 */
		if (thisconn &&
		    (ntohl(tcph_check->ack_seq) > ntohl(tcph->ack_seq)))
			continue;

> Also the forced GSO segmentation looks wrong to me.

Wrong as in it's done wrong, or wrong as in you object to the whole
concept?

> This kills xmit_more gain we have when GSO is performed after
> qdisc dequeue before hitting device.
>
> This should be driven by a parameter really, some threshold on the skb size.
>
> What performance number do you get on a 10Gbit NIC for example ?

Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE
in unlimited mode vs pfifo_fast on the router):

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    18840.40   

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    24804.77   


Note, however, that CAKE is explicitly targeted at home gateways, which
generally run at speed a few orders of magnitude slower than this, and
we have heavily optimised for lowest possible latency. So this is a
conscious design choice, I guess you could say.

> Also, how ack filter can suppress packets after skb_gso_segment() ?

Hmm, because pure ACKs are not generally aggregated (sorry, I'm not
quite clear on when exactly GSO will kick in)?

-Toke

^ permalink raw reply

* Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
From: Christoph Hellwig @ 2018-04-25 15:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Eric Dumazet, David S . Miller, netdev,
	Andy Lutomirski, linux-kernel, linux-mm, Soheil Hassas Yeganeh
In-Reply-To: <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com>

On Wed, Apr 25, 2018 at 06:01:02AM -0700, Eric Dumazet wrote:
> Thanks Christoph
> 
> Note the high cost of zap_page_range(), needed to avoid -EBUSY being returned
> from vm_insert_page() the second time TCP_ZEROCOPY_RECEIVE is used on one VMA.
> 
> Ideally a vm_replace_page() would avoid this cost ?

No my area of expertise, I'll defer to the MM folks..

^ permalink raw reply

* [PATCH iproute2 v3] json_print: Fix hidden 64-bit type promotion
From: Toke Høiland-Jørgensen @ 2018-04-25 15:28 UTC (permalink / raw)
  To: netdev; +Cc: Toke Høiland-Jørgensen
In-Reply-To: <20180425143022.6986-1-toke@toke.dk>

print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
v3:
  - Fix a few more call sites.
  - Also add a print_luint() function for symmetry with the existing
    ones.
  
v2 (v1 was submitted by Kevin Darbyshire-Bryant):
  - Add print_u64() and convert call sites that were actually passing
    64-bit values to print_uint().

 include/json_print.h  |  5 +++-
 include/json_writer.h | 15 ++++++++---
 ip/ipaddress.c        | 62 +++++++++++++++++++++----------------------
 ip/ipmacsec.c         |  8 +++---
 ip/ipmroute.c         |  6 ++---
 ip/ipntable.c         | 38 +++++++++++++-------------
 ip/iproute_lwtunnel.c |  6 ++---
 ip/iptuntap.c         |  4 +--
 ip/tcp_metrics.c      |  6 ++---
 lib/json_print.c      |  5 +++-
 lib/json_writer.c     | 43 +++++++++++++++++++++++++++---
 11 files changed, 123 insertions(+), 75 deletions(-)

diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830a..218fedc5 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -56,13 +56,16 @@ void close_json_array(enum output_type type, const char *delim);
 		print_color_##type_name(t, COLOR_NONE, key, fmt, value);	\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(bool, bool);
 _PRINT_FUNC(null, const char*);
 _PRINT_FUNC(string, const char*);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
 _PRINT_FUNC(hu, unsigned short);
 _PRINT_FUNC(hex, unsigned int);
 _PRINT_FUNC(0xhex, unsigned int);
+_PRINT_FUNC(luint, unsigned long int);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
diff --git a/include/json_writer.h b/include/json_writer.h
index 4b4dec28..9ab88e1d 100644
--- a/include/json_writer.h
+++ b/include/json_writer.h
@@ -34,22 +34,29 @@ void jsonw_string(json_writer_t *self, const char *value);
 void jsonw_bool(json_writer_t *self, bool value);
 void jsonw_float(json_writer_t *self, double number);
 void jsonw_float_fmt(json_writer_t *self, const char *fmt, double num);
-void jsonw_uint(json_writer_t *self, uint64_t number);
+void jsonw_uint(json_writer_t *self, unsigned int number);
+void jsonw_u64(json_writer_t *self, uint64_t number);
 void jsonw_xint(json_writer_t *self, uint64_t number);
 void jsonw_hu(json_writer_t *self, unsigned short number);
-void jsonw_int(json_writer_t *self, int64_t number);
+void jsonw_int(json_writer_t *self, int number);
+void jsonw_s64(json_writer_t *self, int64_t number);
 void jsonw_null(json_writer_t *self);
+void jsonw_luint(json_writer_t *self, unsigned long int num);
 void jsonw_lluint(json_writer_t *self, unsigned long long int num);
 
 /* Useful Combinations of name and value */
 void jsonw_string_field(json_writer_t *self, const char *prop, const char *val);
 void jsonw_bool_field(json_writer_t *self, const char *prop, bool value);
 void jsonw_float_field(json_writer_t *self, const char *prop, double num);
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num);
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num);
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num);
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num);
+void jsonw_int_field(json_writer_t *self, const char *prop, int num);
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num);
 void jsonw_null_field(json_writer_t *self, const char *prop);
+void jsonw_luint_field(json_writer_t *self, const char *prop,
+			unsigned long int num);
 void jsonw_lluint_field(json_writer_t *self, const char *prop,
 			unsigned long long int num);
 void jsonw_float_field_fmt(json_writer_t *self, const char *prop,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index aecc9a1d..75539e05 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -554,21 +554,21 @@ static void print_vf_stats64(FILE *fp, struct rtattr *vfstats)
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL,
+		print_u64(PRINT_JSON, "bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
-		print_uint(PRINT_JSON, "packets", NULL,
+		print_u64(PRINT_JSON, "packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
-		print_uint(PRINT_JSON, "multicast", NULL,
+		print_u64(PRINT_JSON, "multicast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
-		print_uint(PRINT_JSON, "broadcast", NULL,
+		print_u64(PRINT_JSON, "broadcast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "tx_bytes", NULL,
+		print_u64(PRINT_JSON, "tx_bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
-		print_uint(PRINT_JSON, "tx_packets", NULL,
+		print_u64(PRINT_JSON, "tx_packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
 		close_json_object();
 		close_json_object();
@@ -608,69 +608,69 @@ static void __print_link_stats(FILE *fp, struct rtattr *tb[])
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->rx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->rx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->rx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->rx_dropped);
-		print_uint(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
-		print_uint(PRINT_JSON, "multicast", NULL, s->multicast);
+		print_u64(PRINT_JSON, "bytes", NULL, s->rx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->rx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->rx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->rx_dropped);
+		print_u64(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
+		print_u64(PRINT_JSON, "multicast", NULL, s->multicast);
 		if (s->rx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->rx_compressed);
 
 		/* RX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "length_errors",
 				   NULL, s->rx_length_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "crc_errors",
 				   NULL, s->rx_crc_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "frame_errors",
 				   NULL, s->rx_frame_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->rx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "missed_errors",
 				   NULL, s->rx_missed_errors);
 			if (s->rx_nohandler)
-				print_uint(PRINT_JSON,
+				print_u64(PRINT_JSON,
 					   "nohandler", NULL, s->rx_nohandler);
 		}
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->tx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->tx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->tx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->tx_dropped);
-		print_uint(PRINT_JSON,
+		print_u64(PRINT_JSON, "bytes", NULL, s->tx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->tx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->tx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->tx_dropped);
+		print_u64(PRINT_JSON,
 			   "carrier_errors",
 			   NULL, s->tx_carrier_errors);
-		print_uint(PRINT_JSON, "collisions", NULL, s->collisions);
+		print_u64(PRINT_JSON, "collisions", NULL, s->collisions);
 		if (s->tx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->tx_compressed);
 
 		/* TX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "aborted_errors",
 				   NULL, s->tx_aborted_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->tx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "window_errors",
 				   NULL, s->tx_window_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "heartbeat_errors",
 				   NULL, s->tx_heartbeat_errors);
 			if (carrier_changes)
-				print_uint(PRINT_JSON, "carrier_changes", NULL,
+				print_u64(PRINT_JSON, "carrier_changes", NULL,
 					   rta_getattr_u32(carrier_changes));
 		}
 
diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index 38ec7136..4e4e158e 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -640,7 +640,7 @@ static void print_attrs(struct rtattr *attrs[])
 	}
 }
 
-static __u64 getattr_uint(struct rtattr *stat)
+static __u64 getattr_u64(struct rtattr *stat)
 {
 	switch (RTA_PAYLOAD(stat)) {
 	case sizeof(__u64):
@@ -681,7 +681,7 @@ static void print_fp_stats(const char *prefix,
 
 		pad = strlen(names[i]) + 1;
 		if (stats[i])
-			printf("%*llu", pad, getattr_uint(stats[i]));
+			printf("%*llu", pad, getattr_u64(stats[i]));
 		else
 			printf("%*c", pad, '-');
 	}
@@ -697,8 +697,8 @@ static void print_json_stats(const char *names[], unsigned int num,
 		if (!names[i] || !stats[i])
 			continue;
 
-		print_uint(PRINT_JSON, names[i],
-			   NULL, getattr_uint(stats[i]));
+		print_u64(PRINT_JSON, names[i],
+			   NULL, getattr_u64(stats[i]));
 	}
 }
 
diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index 59c5b771..cdb4d898 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -182,12 +182,12 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		struct rta_mfc_stats *mfcs = RTA_DATA(tb[RTA_MFC_STATS]);
 
 		print_string(PRINT_FP, NULL, "%s", _SL_);
-		print_uint(PRINT_ANY, "packets", "  %"PRIu64" packets,",
+		print_u64(PRINT_ANY, "packets", "  %"PRIu64" packets,",
 			   mfcs->mfcs_packets);
-		print_uint(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
+		print_u64(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
 
 		if (mfcs->mfcs_wrong_if)
-			print_uint(PRINT_ANY, "wrong_if",
+			print_u64(PRINT_ANY, "wrong_if",
 				   ", %"PRIu64" arrived on wrong iif.",
 				   mfcs->mfcs_wrong_if);
 	}
diff --git a/ip/ipntable.c b/ip/ipntable.c
index 82f40f87..4fae181d 100644
--- a/ip/ipntable.c
+++ b/ip/ipntable.c
@@ -392,7 +392,7 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_REACHABLE_TIME]) {
 		__u64 reachable = rta_getattr_u64(tpb[NDTPA_REACHABLE_TIME]);
 
-		print_uint(PRINT_ANY, "reachable",
+		print_u64(PRINT_ANY, "reachable",
 			   "reachable %llu ", reachable);
 	}
 
@@ -400,14 +400,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 		__u64 breachable
 			= rta_getattr_u64(tpb[NDTPA_BASE_REACHABLE_TIME]);
 
-		print_uint(PRINT_ANY, "base_reachable",
+		print_u64(PRINT_ANY, "base_reachable",
 			   "base_reachable %llu ", breachable);
 	}
 
 	if (tpb[NDTPA_RETRANS_TIME]) {
 		__u64 retrans = rta_getattr_u64(tpb[NDTPA_RETRANS_TIME]);
 
-		print_uint(PRINT_ANY, "retrans", "retrans %llu ", retrans);
+		print_u64(PRINT_ANY, "retrans", "retrans %llu ", retrans);
 	}
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
@@ -415,14 +415,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_GC_STALETIME]) {
 		__u64 gc_stale = rta_getattr_u64(tpb[NDTPA_GC_STALETIME]);
 
-		print_uint(PRINT_ANY, "gc_stale", "gc_stale %llu ", gc_stale);
+		print_u64(PRINT_ANY, "gc_stale", "gc_stale %llu ", gc_stale);
 	}
 
 	if (tpb[NDTPA_DELAY_PROBE_TIME]) {
 		__u64 delay_probe
 			= rta_getattr_u64(tpb[NDTPA_DELAY_PROBE_TIME]);
 
-		print_uint(PRINT_ANY, "delay_probe",
+		print_u64(PRINT_ANY, "delay_probe",
 			   "delay_probe %llu ", delay_probe);
 	}
 
@@ -459,14 +459,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_ANYCAST_DELAY]) {
 		__u64 anycast_delay = rta_getattr_u64(tpb[NDTPA_ANYCAST_DELAY]);
 
-		print_uint(PRINT_ANY, "anycast_delay",
+		print_u64(PRINT_ANY, "anycast_delay",
 			   "anycast_delay %llu ", anycast_delay);
 	}
 
 	if (tpb[NDTPA_PROXY_DELAY]) {
 		__u64 proxy_delay = rta_getattr_u64(tpb[NDTPA_PROXY_DELAY]);
 
-		print_uint(PRINT_ANY, "proxy_delay",
+		print_u64(PRINT_ANY, "proxy_delay",
 			   "proxy_delay %llu ", proxy_delay);
 	}
 
@@ -479,7 +479,7 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_LOCKTIME]) {
 		__u64 locktime = rta_getattr_u64(tpb[NDTPA_LOCKTIME]);
 
-		print_uint(PRINT_ANY, "locktime", "locktime %llu ", locktime);
+		print_u64(PRINT_ANY, "locktime", "locktime %llu ", locktime);
 	}
 
 	print_string(PRINT_FP, NULL, "%s", _SL_);
@@ -490,31 +490,31 @@ static void print_ndtstats(const struct ndt_stats *ndts)
 
 	print_string(PRINT_FP, NULL, "    stats ", NULL);
 
-	print_uint(PRINT_ANY, "allocs", "allocs %llu ", ndts->ndts_allocs);
-	print_uint(PRINT_ANY, "destroys", "destroys %llu ",
+	print_u64(PRINT_ANY, "allocs", "allocs %llu ", ndts->ndts_allocs);
+	print_u64(PRINT_ANY, "destroys", "destroys %llu ",
 		   ndts->ndts_destroys);
-	print_uint(PRINT_ANY, "hash_grows", "hash_grows %llu ",
+	print_u64(PRINT_ANY, "hash_grows", "hash_grows %llu ",
 		   ndts->ndts_hash_grows);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "res_failed", "res_failed %llu ",
+	print_u64(PRINT_ANY, "res_failed", "res_failed %llu ",
 		   ndts->ndts_res_failed);
-	print_uint(PRINT_ANY, "lookups", "lookups %llu ", ndts->ndts_lookups);
-	print_uint(PRINT_ANY, "hits", "hits %llu ", ndts->ndts_hits);
+	print_u64(PRINT_ANY, "lookups", "lookups %llu ", ndts->ndts_lookups);
+	print_u64(PRINT_ANY, "hits", "hits %llu ", ndts->ndts_hits);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "rcv_probes_mcast", "rcv_probes_mcast %llu ",
+	print_u64(PRINT_ANY, "rcv_probes_mcast", "rcv_probes_mcast %llu ",
 		   ndts->ndts_rcv_probes_mcast);
-	print_uint(PRINT_ANY, "rcv_probes_ucast", "rcv_probes_ucast %llu ",
+	print_u64(PRINT_ANY, "rcv_probes_ucast", "rcv_probes_ucast %llu ",
 		   ndts->ndts_rcv_probes_ucast);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "periodic_gc_runs", "periodic_gc_runs %llu ",
+	print_u64(PRINT_ANY, "periodic_gc_runs", "periodic_gc_runs %llu ",
 		   ndts->ndts_periodic_gc_runs);
-	print_uint(PRINT_ANY, "forced_gc_runs", "forced_gc_runs %llu ",
+	print_u64(PRINT_ANY, "forced_gc_runs", "forced_gc_runs %llu ",
 		   ndts->ndts_forced_gc_runs);
 
 	print_string(PRINT_FP, NULL, "%s", _SL_);
@@ -607,7 +607,7 @@ static int print_ntable(const struct sockaddr_nl *who,
 	if (tb[NDTA_GC_INTERVAL]) {
 		__u64 gc_int = rta_getattr_u64(tb[NDTA_GC_INTERVAL]);
 
-		print_uint(PRINT_ANY, "gc_interval", "gc_int %llu ", gc_int);
+		print_u64(PRINT_ANY, "gc_interval", "gc_int %llu ", gc_int);
 	}
 
 	if (ret)
diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index cde9b3d2..46a212c8 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -273,7 +273,7 @@ static void print_encap_ip(FILE *fp, struct rtattr *encap)
 	parse_rtattr_nested(tb, LWTUNNEL_IP_MAX, encap);
 
 	if (tb[LWTUNNEL_IP_ID])
-		print_uint(PRINT_ANY, "id", "id %llu ",
+		print_u64(PRINT_ANY, "id", "id %llu ",
 			   ntohll(rta_getattr_u64(tb[LWTUNNEL_IP_ID])));
 
 	if (tb[LWTUNNEL_IP_SRC])
@@ -333,7 +333,7 @@ static void print_encap_ip6(FILE *fp, struct rtattr *encap)
 	parse_rtattr_nested(tb, LWTUNNEL_IP6_MAX, encap);
 
 	if (tb[LWTUNNEL_IP6_ID])
-		print_uint(PRINT_ANY, "id", "id %llu ",
+		print_u64(PRINT_ANY, "id", "id %llu ",
 			    ntohll(rta_getattr_u64(tb[LWTUNNEL_IP6_ID])));
 
 	if (tb[LWTUNNEL_IP6_SRC])
@@ -347,7 +347,7 @@ static void print_encap_ip6(FILE *fp, struct rtattr *encap)
 				   rt_addr_n2a_rta(AF_INET6, tb[LWTUNNEL_IP6_DST]));
 
 	if (tb[LWTUNNEL_IP6_HOPLIMIT])
-		print_uint(PRINT_ANY, "hoplimit",
+		print_u64(PRINT_ANY, "hoplimit",
 			   "hoplimit %u ",
 			   rta_getattr_u8(tb[LWTUNNEL_IP6_HOPLIMIT]));
 
diff --git a/ip/iptuntap.c b/ip/iptuntap.c
index 6c5a7259..58996e6c 100644
--- a/ip/iptuntap.c
+++ b/ip/iptuntap.c
@@ -440,10 +440,10 @@ static int print_tuntap(const struct sockaddr_nl *who,
 			   "ifname", "%s:", name);
 	print_flags(flags);
 	if (owner != -1)
-		print_uint(PRINT_ANY, "user",
+		print_u64(PRINT_ANY, "user",
 			   " user %ld", owner);
 	if (group != -1)
-		print_uint(PRINT_ANY, "group",
+		print_u64(PRINT_ANY, "group",
 			   " group %ld", group);
 
 	if (show_details) {
diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index 72dc980c..ad3d6f36 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -139,19 +139,19 @@ static void print_tcp_metrics(struct rtattr *a)
 
 		print_uint(PRINT_JSON, name, NULL, val);
 		print_string(PRINT_FP, NULL, " %s ", name);
-		print_uint(PRINT_FP, NULL, "%lu", val);
+		print_uint(PRINT_FP, NULL, "%u", val);
 	}
 
 	if (rtt) {
 		print_float(PRINT_JSON, "rtt", NULL,
 			    (double)rtt / usec_per_sec);
-		print_uint(PRINT_FP, NULL,
+		print_u64(PRINT_FP, NULL,
 			   " rtt %luus", rtt);
 	}
 	if (rttvar) {
 		print_float(PRINT_JSON, "rttvar", NULL,
 			    (double) rttvar / usec_per_sec);
-		print_uint(PRINT_FP, NULL,
+		print_u64(PRINT_FP, NULL,
 			   " rttvar %luus", rttvar);
 	}
 }
diff --git a/lib/json_print.c b/lib/json_print.c
index bda72933..5dc41bfa 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -116,8 +116,11 @@ void close_json_array(enum output_type type, const char *str)
 		}							\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(hu, unsigned short);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
+_PRINT_FUNC(luint, unsigned long int);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 0ad04218..aa9ce1c6 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -220,7 +220,12 @@ void jsonw_hu(json_writer_t *self, unsigned short num)
 	jsonw_printf(self, "%hu", num);
 }
 
-void jsonw_uint(json_writer_t *self, uint64_t num)
+void jsonw_uint(json_writer_t *self, unsigned int num)
+{
+	jsonw_printf(self, "%u", num);
+}
+
+void jsonw_u64(json_writer_t *self, uint64_t num)
 {
 	jsonw_printf(self, "%"PRIu64, num);
 }
@@ -230,12 +235,22 @@ void jsonw_xint(json_writer_t *self, uint64_t num)
 	jsonw_printf(self, "%"PRIx64, num);
 }
 
+void jsonw_luint(json_writer_t *self, unsigned long int num)
+{
+	jsonw_printf(self, "%lu", num);
+}
+
 void jsonw_lluint(json_writer_t *self, unsigned long long int num)
 {
 	jsonw_printf(self, "%llu", num);
 }
 
-void jsonw_int(json_writer_t *self, int64_t num)
+void jsonw_int(json_writer_t *self, int num)
+{
+	jsonw_printf(self, "%d", num);
+}
+
+void jsonw_s64(json_writer_t *self, int64_t num)
 {
 	jsonw_printf(self, "%"PRId64, num);
 }
@@ -268,12 +283,18 @@ void jsonw_float_field_fmt(json_writer_t *self,
 	jsonw_float_fmt(self, fmt, val);
 }
 
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num)
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num)
 {
 	jsonw_name(self, prop);
 	jsonw_uint(self, num);
 }
 
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_u64(self, num);
+}
+
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num)
 {
 	jsonw_name(self, prop);
@@ -286,6 +307,14 @@ void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num)
 	jsonw_hu(self, num);
 }
 
+void jsonw_luint_field(json_writer_t *self,
+			const char *prop,
+			unsigned long int num)
+{
+	jsonw_name(self, prop);
+	jsonw_luint(self, num);
+}
+
 void jsonw_lluint_field(json_writer_t *self,
 			const char *prop,
 			unsigned long long int num)
@@ -294,12 +323,18 @@ void jsonw_lluint_field(json_writer_t *self,
 	jsonw_lluint(self, num);
 }
 
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num)
+void jsonw_int_field(json_writer_t *self, const char *prop, int num)
 {
 	jsonw_name(self, prop);
 	jsonw_int(self, num);
 }
 
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_s64(self, num);
+}
+
 void jsonw_null_field(json_writer_t *self, const char *prop)
 {
 	jsonw_name(self, prop);
-- 
2.17.0

^ permalink raw reply related

* [net-next 00/10][pull request] 10GbE Intel Wired LAN Driver Updates 2018-04-25
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series represents yet another phase of the macvlan cleanup Alex has
been working on.

The main goal of these changes is to make it so that we only support
offloading what we can actually offload and we don't break any existing
functionality. So for example we were claiming to advertise source mode
macvlan and we were doing nothing of the sort, so support for that has been
dropped.

The biggest change with this set is that broadcast/multicast replication is
no longer being supported in software. Alex dropped it as it leads to
scaling issues when a broadcast frame has to be replicated up to 64 times.

Beyond that this set goes through and optimized the time needed to bring up
and tear down the macvlan interfaces on ixgbe and provides a clean way for
us to disable the macvlan offload when needed.

The following are changes since commit c749fa181bd5848be78691d23168ec61ce691b95:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Alexander Duyck (10):
  ixgbe: Drop support for macvlan specific unicast lists
  macvlan: Rename fwd_priv to accel_priv and add accessor function
  macvlan: Use software path for offloaded local, broadcast, and
    multicast traffic
  ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicast
  macvlan: macvlan_count_rx shouldn't be static inline AND extern
  macvlan: Add function to test for destination filtering support
  macvlan: Provide function for interfaces to release HW offload
  ixgbe/fm10k: Only support macvlan offload for types that support
    destination filtering
  ixgbe: Drop real_adapter from l2 fwd acceleration structure
  ixgbe: Avoid performing unnecessary resets for macvlan offload

 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |   7 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |  12 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |   1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 306 +++++++++++++-----------
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c  |   5 +-
 drivers/net/macvlan.c                           |  68 +++---
 include/linux/if_macvlan.h                      |  29 ++-
 7 files changed, 243 insertions(+), 185 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [net-next 02/10] macvlan: Rename fwd_priv to accel_priv and add accessor function
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change renames the fwd_priv member to accel_priv as this more
accurately reflects the actual purpose of this value. In addition I am
adding an accessor which will allow us to further abstract this in the
future if needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 ++++-----
 drivers/net/macvlan.c                         | 31 ++++++++++++++++-----------
 include/linux/if_macvlan.h                    |  8 ++++++-
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c8f59430a5a1..0fbeb2f48711 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5398,10 +5398,9 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 
-		if (dfwd->fwd_priv)
+		if (vadapter)
 			ixgbe_fwd_ring_up(upper, vadapter);
 	}
 
@@ -8983,13 +8982,12 @@ struct upper_walk_data {
 static int get_macvlan_queue(struct net_device *upper, void *_data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 		struct upper_walk_data *data = _data;
 		struct ixgbe_adapter *adapter = data->adapter;
 		int ifindex = data->ifindex;
 
-		if (vadapter && vadapter->netdev->ifindex == ifindex) {
+		if (vadapter && upper->ifindex == ifindex) {
 			data->queue = adapter->rx_ring[vadapter->rx_base_queue]->reg_idx;
 			data->action = data->queue;
 			return 1;
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 725f4b4afc6d..7ddc94ff4109 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -559,9 +559,9 @@ static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	if (unlikely(netpoll_tx_running(dev)))
 		return macvlan_netpoll_send_skb(vlan, skb);
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		skb->dev = vlan->lowerdev;
-		ret = dev_queue_xmit_accel(skb, vlan->fwd_priv);
+		ret = dev_queue_xmit_accel(skb, vlan->accel_priv);
 	} else {
 		ret = macvlan_queue_xmit(skb, dev);
 	}
@@ -613,16 +613,23 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	err = -EBUSY;
+	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
+		goto out;
+
+	/* Attempt to populate accel_priv which is used to offload the L2
+	 * forwarding requests for unicast packets.
+	 */
 	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
-		vlan->fwd_priv =
+		vlan->accel_priv =
 		      lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
 
 		/* If we get a NULL pointer back, or if we get an error
 		 * then we should just fall through to the non accelerated path
 		 */
-		if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
-			vlan->fwd_priv = NULL;
-		} else
+		if (IS_ERR_OR_NULL(vlan->accel_priv))
+			vlan->accel_priv = NULL;
+		else
 			return 0;
 	}
 
@@ -655,10 +662,10 @@ static int macvlan_open(struct net_device *dev)
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 	}
 	return err;
 }
@@ -668,10 +675,10 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 		return 0;
 	}
 
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 4cb7aeeafce0..c5106ce8273a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -21,7 +21,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
-	void			*fwd_priv;
+	void			*accel_priv;
 	struct vlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
@@ -86,4 +86,10 @@ macvlan_dev_real_dev(const struct net_device *dev)
 }
 #endif
 
+static inline void *macvlan_accel_priv(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->accel_priv;
+}
 #endif /* _LINUX_IF_MACVLAN_H */
-- 
2.14.3

^ permalink raw reply related

* [net-next 06/10] macvlan: Add function to test for destination filtering support
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch adds a function indicating if a given macvlan can fully supports
destination filtering, especially as it relates to unicast traffic. For
those macvlan interfaces that do not support destination filtering such
passthru or source mode filtering we should not be enabling offload
support.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_macvlan.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 80089d6c5a0a..0221390668a0 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -88,4 +88,13 @@ static inline void *macvlan_accel_priv(struct net_device *dev)
 
 	return macvlan->accel_priv;
 }
+
+static inline bool macvlan_supports_dest_filter(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->mode == MACVLAN_MODE_PRIVATE ||
+	       macvlan->mode == MACVLAN_MODE_VEPA ||
+	       macvlan->mode == MACVLAN_MODE_BRIDGE;
+}
 #endif /* _LINUX_IF_MACVLAN_H */
-- 
2.14.3

^ permalink raw reply related

* [net-next 01/10] ixgbe: Drop support for macvlan specific unicast lists
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Drop the code for handling macvlan specific unicast lists. It isn't needed
since we don't take any efforts to maintain it when we bring the interface
up and it takes the slow path anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 31 ---------------------------
 1 file changed, 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 51e7d82a5860..c8f59430a5a1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4900,36 +4900,6 @@ int ixgbe_del_mac_filter(struct ixgbe_adapter *adapter,
 	return -ENOMEM;
 }
 
-/**
- * ixgbe_write_uc_addr_list - write unicast addresses to RAR table
- * @netdev: network interface device structure
- * @vfn: pool to associate with unicast addresses
- *
- * Writes unicast address list to the RAR table.
- * Returns: -ENOMEM on failure/insufficient address space
- *                0 on no addresses written
- *                X on writing X addresses to the RAR table
- **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev, int vfn)
-{
-	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int count = 0;
-
-	/* return ENOMEM indicating insufficient memory for addresses */
-	if (netdev_uc_count(netdev) > ixgbe_available_rars(adapter, vfn))
-		return -ENOMEM;
-
-	if (!netdev_uc_empty(netdev)) {
-		struct netdev_hw_addr *ha;
-		netdev_for_each_uc_addr(ha, netdev) {
-			ixgbe_del_mac_filter(adapter, ha->addr, vfn);
-			ixgbe_add_mac_filter(adapter, ha->addr, vfn);
-			count++;
-		}
-	}
-	return count;
-}
-
 static int ixgbe_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
@@ -5328,7 +5298,6 @@ static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
 		vmolr |= IXGBE_VMOLR_ROMPE;
 		hw->mac.ops.update_mc_addr_list(hw, dev);
 	}
-	ixgbe_write_uc_addr_list(adapter->netdev, pool);
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 04/10] ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicast
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Drop dead code now that we shouldn't be receiving broadcast or multicast
frames on the queues associated to the macvlan netdev.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 7 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df8607097e4a..c51d61f5f715 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -445,15 +445,14 @@ static void fm10k_type_trans(struct fm10k_ring *rx_ring,
 			l2_accel = NULL;
 	}
 
-	skb->protocol = eth_type_trans(skb, dev);
-
 	/* Record Rx queue, or update macvlan statistics */
 	if (!l2_accel)
 		skb_record_rx_queue(skb, rx_ring->queue_index);
 	else
 		macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
-				 (skb->pkt_type == PACKET_BROADCAST) ||
-				 (skb->pkt_type == PACKET_MULTICAST));
+				 false);
+
+	skb->protocol = eth_type_trans(skb, dev);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4d4fa46a1e7c..2c648214148e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1768,15 +1768,14 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 	if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_STAT_SECP))
 		ixgbe_ipsec_rx(rx_ring, rx_desc, skb);
 
-	skb->protocol = eth_type_trans(skb, dev);
-
 	/* record Rx queue, or update MACVLAN statistics */
 	if (netif_is_ixgbe(dev))
 		skb_record_rx_queue(skb, rx_ring->queue_index);
 	else
 		macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
-				 (skb->pkt_type == PACKET_BROADCAST) ||
-				 (skb->pkt_type == PACKET_MULTICAST));
+				 false);
+
+	skb->protocol = eth_type_trans(skb, dev);
 }
 
 static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
-- 
2.14.3

^ permalink raw reply related

* [net-next 09/10] ixgbe: Drop real_adapter from l2 fwd acceleration structure
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch drops the real_adapter member from the fwd_adapter structure.
The general idea behind the change is that the real_adapter is carrying
unnecessary data since we could always just grab the adapter structure
from netdev_priv(macvlan->lowerdev) if we really needed to get at it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 +++++++++++++++------------
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 7dd5038cfcc4..8fccca57cd6a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -305,7 +305,6 @@ enum ixgbe_ring_state_t {
 struct ixgbe_fwd_adapter {
 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 	struct net_device *netdev;
-	struct ixgbe_adapter *real_adapter;
 	unsigned int tx_base_queue;
 	unsigned int rx_base_queue;
 	int pool;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 76f54771fdb2..f00720582e52 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5338,10 +5338,10 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 	rx_ring->next_to_use = 0;
 }
 
-static int ixgbe_fwd_ring_up(struct net_device *vdev,
+static int ixgbe_fwd_ring_up(struct ixgbe_adapter *adapter,
 			     struct ixgbe_fwd_adapter *accel)
 {
-	struct ixgbe_adapter *adapter = accel->real_adapter;
+	struct net_device *vdev = accel->netdev;
 	int i, baseq, err;
 
 	if (!test_bit(accel->pool, adapter->fwd_bitmask))
@@ -5378,14 +5378,19 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 	return err;
 }
 
-static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
+static int ixgbe_macvlan_up(struct net_device *vdev, void *data)
 {
-	if (netif_is_macvlan(upper)) {
-		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
+	struct ixgbe_adapter *adapter = data;
+	struct ixgbe_fwd_adapter *accel;
 
-		if (vadapter)
-			ixgbe_fwd_ring_up(upper, vadapter);
-	}
+	if (!netif_is_macvlan(vdev))
+		return 0;
+
+	accel = macvlan_accel_priv(vdev);
+	if (!accel)
+		return 0;
+
+	ixgbe_fwd_ring_up(adapter, accel);
 
 	return 0;
 }
@@ -5393,7 +5398,7 @@ static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
 static void ixgbe_configure_dfwd(struct ixgbe_adapter *adapter)
 {
 	netdev_walk_all_upper_dev_rcu(adapter->netdev,
-				      ixgbe_upper_dev_walk, NULL);
+				      ixgbe_macvlan_up, adapter);
 }
 
 static void ixgbe_configure(struct ixgbe_adapter *adapter)
@@ -9792,13 +9797,12 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 	adapter->ring_feature[RING_F_VMDQ].limit = limit + 1;
 
 	fwd_adapter->pool = pool;
-	fwd_adapter->real_adapter = adapter;
 
 	/* Force reinit of ring allocation with VMDQ enabled */
 	err = ixgbe_setup_tc(pdev, adapter->hw_tcs);
 
 	if (!err && netif_running(pdev))
-		err = ixgbe_fwd_ring_up(vdev, fwd_adapter);
+		err = ixgbe_fwd_ring_up(adapter, fwd_adapter);
 
 	if (!err)
 		return fwd_adapter;
@@ -9814,7 +9818,7 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 {
 	struct ixgbe_fwd_adapter *accel = priv;
-	struct ixgbe_adapter *adapter = accel->real_adapter;
+	struct ixgbe_adapter *adapter = netdev_priv(pdev);
 	unsigned int rxbase = accel->rx_base_queue;
 	unsigned int limit, i;
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 08/10] ixgbe/fm10k: Only support macvlan offload for types that support destination filtering
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Both the ixgbe and fm10k drivers support destination filtering.

Instead of adding a ton of complexity to support either source or passthru
mode we can instead just avoid offloading them for now. Doing this we avoid
leaking packets into interfaces that aren't meant to receive them.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index ee645bacfe93..26e749766337 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -22,6 +22,7 @@
 #include "fm10k.h"
 #include <linux/vmalloc.h>
 #include <net/udp_tunnel.h>
+#include <linux/if_macvlan.h>
 
 /**
  * fm10k_setup_tx_resources - allocate Tx resources (Descriptors)
@@ -1449,6 +1450,13 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 	int size = 0, i;
 	u16 glort;
 
+	/* The hardware supported by fm10k only filters on the destination MAC
+	 * address. In order to avoid issues we only support offloading modes
+	 * where the hardware can actually provide the functionality.
+	 */
+	if (!macvlan_supports_dest_filter(sdev))
+		return ERR_PTR(-EMEDIUMTYPE);
+
 	/* allocate l2 accel structure if it is not available */
 	if (!l2_accel) {
 		/* verify there is enough free GLORTs to support l2_accel */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2c648214148e..76f54771fdb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9760,6 +9760,13 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 	unsigned int limit;
 	int pool, err;
 
+	/* The hardware supported by ixgbe only filters on the destination MAC
+	 * address. In order to avoid issues we only support offloading modes
+	 * where the hardware can actually provide the functionality.
+	 */
+	if (!macvlan_supports_dest_filter(vdev))
+		return ERR_PTR(-EMEDIUMTYPE);
+
 	/* Hardware has a limited number of available pools. Each VF, and the
 	 * PF require a pool. Check to ensure we don't attempt to use more
 	 * then the available number of pools.
-- 
2.14.3

^ permalink raw reply related

* [net-next 03/10] macvlan: Use software path for offloaded local, broadcast, and multicast traffic
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that we use a software path for packets that are
going to be locally switched between two macvlan interfaces on the same
device. In addition we resort to software replication of broadcast and
multicast packets instead of offloading that to hardware.

The general idea is that using the device for east/west traffic local to
the system is extremely inefficient. We can only support up to whatever the
PCIe limit is for any given device so this caps us at somewhere around 20G
for devices supported by ixgbe. This is compounded even further when you
take broadcast and multicast into account as a single 10G port can come to
a crawl as a packet is replicated up to 60+ times in some cases. In order
to get away from that I am implementing changes so that we handle
broadcast/multicast replication and east/west local traffic all in
software.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |  4 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 39 ++++++--------------
 drivers/net/macvlan.c                           | 47 +++++++++++--------------
 3 files changed, 33 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 45793491d4ba..ee645bacfe93 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1254,7 +1254,7 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
 			glort = l2_accel->dglort + 1 + i;
 
 			hw->mac.ops.update_xcast_mode(hw, glort,
-						      FM10K_XCAST_MODE_MULTI);
+						      FM10K_XCAST_MODE_NONE);
 			fm10k_queue_mac_request(interface, glort,
 						sdev->dev_addr,
 						hw->mac.default_vid, true);
@@ -1515,7 +1515,7 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 
 	if (fm10k_host_mbx_ready(interface)) {
 		hw->mac.ops.update_xcast_mode(hw, glort,
-					      FM10K_XCAST_MODE_MULTI);
+					      FM10K_XCAST_MODE_NONE);
 		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
 					hw->mac.default_vid, true);
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0fbeb2f48711..4d4fa46a1e7c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4219,7 +4219,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 reg_offset, vf_shift;
+	u16 pool = adapter->num_rx_pools;
+	u32 reg_offset, vf_shift, vmolr;
 	u32 gcr_ext, vmdctl;
 	int i;
 
@@ -4233,6 +4234,13 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	vmdctl |= IXGBE_VT_CTL_REPLEN;
 	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl);
 
+	/* accept untagged packets until a vlan tag is
+	 * specifically set for the VMDQ queue/pool
+	 */
+	vmolr = IXGBE_VMOLR_AUPE;
+	while (pool--)
+		IXGBE_WRITE_REG(hw, IXGBE_VMOLR(VMDQ_P(pool)), vmolr);
+
 	vf_shift = VMDQ_P(0) % 32;
 	reg_offset = (VMDQ_P(0) >= 32) ? 1 : 0;
 
@@ -5279,28 +5287,6 @@ static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter)
 	spin_unlock(&adapter->fdir_perfect_lock);
 }
 
-static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
-				      struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	u32 vmolr;
-
-	/* No unicast promiscuous support for VMDQ devices. */
-	vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(pool));
-	vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE);
-
-	/* clear the affected bit */
-	vmolr &= ~IXGBE_VMOLR_MPE;
-
-	if (dev->flags & IFF_ALLMULTI) {
-		vmolr |= IXGBE_VMOLR_MPE;
-	} else {
-		vmolr |= IXGBE_VMOLR_ROMPE;
-		hw->mac.ops.update_mc_addr_list(hw, dev);
-	}
-	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
-}
-
 /**
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
@@ -5384,10 +5370,8 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 	 */
 	err = ixgbe_add_mac_filter(adapter, vdev->dev_addr,
 				   VMDQ_P(accel->pool));
-	if (err >= 0) {
-		ixgbe_macvlan_set_rx_mode(vdev, accel->pool, adapter);
+	if (err >= 0)
 		return 0;
-	}
 
 	for (i = 0; i < adapter->num_rx_queues_per_pool; i++)
 		adapter->rx_ring[baseq + i]->netdev = NULL;
@@ -9832,9 +9816,6 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	ixgbe_del_mac_filter(adapter, accel->netdev->dev_addr,
 			     VMDQ_P(accel->pool));
 
-	/* disable ability to receive packets for this pool */
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_VMOLR(accel->pool), 0);
-
 	/* Allow remaining Rx packets to get flushed out of the
 	 * Rx FIFO before we drop the netdev for the ring.
 	 */
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 7ddc94ff4109..adde8fc45588 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -514,6 +514,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 	const struct macvlan_port *port = vlan->port;
 	const struct macvlan_dev *dest;
+	void *accel_priv = NULL;
 
 	if (vlan->mode == MACVLAN_MODE_BRIDGE) {
 		const struct ethhdr *eth = (void *)skb->data;
@@ -533,9 +534,14 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
+	/* For packets that are non-multicast and not bridged we will pass
+	 * the necessary information so that the lowerdev can distinguish
+	 * the source of the packets via the accel_priv value.
+	 */
+	accel_priv = vlan->accel_priv;
 xmit_world:
 	skb->dev = vlan->lowerdev;
-	return dev_queue_xmit(skb);
+	return dev_queue_xmit_accel(skb, accel_priv);
 }
 
 static inline netdev_tx_t macvlan_netpoll_send_skb(struct macvlan_dev *vlan, struct sk_buff *skb)
@@ -552,19 +558,14 @@ static inline netdev_tx_t macvlan_netpoll_send_skb(struct macvlan_dev *vlan, str
 static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev)
 {
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	unsigned int len = skb->len;
 	int ret;
-	struct macvlan_dev *vlan = netdev_priv(dev);
 
 	if (unlikely(netpoll_tx_running(dev)))
 		return macvlan_netpoll_send_skb(vlan, skb);
 
-	if (vlan->accel_priv) {
-		skb->dev = vlan->lowerdev;
-		ret = dev_queue_xmit_accel(skb, vlan->accel_priv);
-	} else {
-		ret = macvlan_queue_xmit(skb, dev);
-	}
+	ret = macvlan_queue_xmit(skb, dev);
 
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct vlan_pcpu_stats *pcpu_stats;
@@ -620,26 +621,20 @@ static int macvlan_open(struct net_device *dev)
 	/* Attempt to populate accel_priv which is used to offload the L2
 	 * forwarding requests for unicast packets.
 	 */
-	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
+	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD)
 		vlan->accel_priv =
 		      lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
 
-		/* If we get a NULL pointer back, or if we get an error
-		 * then we should just fall through to the non accelerated path
-		 */
-		if (IS_ERR_OR_NULL(vlan->accel_priv))
-			vlan->accel_priv = NULL;
-		else
-			return 0;
+	/* If earlier attempt to offload failed, or accel_priv is not
+	 * populated we must add the unicast address to the lower device.
+	 */
+	if (IS_ERR_OR_NULL(vlan->accel_priv)) {
+		vlan->accel_priv = NULL;
+		err = dev_uc_add(lowerdev, dev->dev_addr);
+		if (err < 0)
+			goto out;
 	}
 
-	err = -EBUSY;
-	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
-		goto out;
-
-	err = dev_uc_add(lowerdev, dev->dev_addr);
-	if (err < 0)
-		goto out;
 	if (dev->flags & IFF_ALLMULTI) {
 		err = dev_set_allmulti(lowerdev, 1);
 		if (err < 0)
@@ -660,13 +655,14 @@ static int macvlan_open(struct net_device *dev)
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, -1);
 del_unicast:
-	dev_uc_del(lowerdev, dev->dev_addr);
-out:
 	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
 							   vlan->accel_priv);
 		vlan->accel_priv = NULL;
+	} else {
+		dev_uc_del(lowerdev, dev->dev_addr);
 	}
+out:
 	return err;
 }
 
@@ -679,7 +675,6 @@ static int macvlan_stop(struct net_device *dev)
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
 							   vlan->accel_priv);
 		vlan->accel_priv = NULL;
-		return 0;
 	}
 
 	dev_uc_unsync(lowerdev, dev);
-- 
2.14.3

^ permalink raw reply related

* [net-next 05/10] macvlan: macvlan_count_rx shouldn't be static inline AND extern
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

It doesn't make sense to define macvlan_count_rx as a static inline and
then add a forward declaration after that as an extern. I am dropping the
extern declaration since it seems like it is something that likely got
missed when the function was made an inline.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_macvlan.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index c5106ce8273a..80089d6c5a0a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,10 +61,6 @@ extern int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 				  struct nlattr *tb[], struct nlattr *data[],
 				  struct netlink_ext_ack *extack);
 
-extern void macvlan_count_rx(const struct macvlan_dev *vlan,
-			     unsigned int len, bool success,
-			     bool multicast);
-
 extern void macvlan_dellink(struct net_device *dev, struct list_head *head);
 
 extern int macvlan_link_register(struct rtnl_link_ops *ops);
-- 
2.14.3

^ permalink raw reply related

* [net-next 07/10] macvlan: Provide function for interfaces to release HW offload
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch provides a basic function to allow a lower device to disable
macvlan offload if it was previously enabled on a given macvlan. The idea
here is to allow for recovery from failure should the lowerdev run out of
resources.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_macvlan.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 0221390668a0..2e55e4cdbd8a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -97,4 +97,12 @@ static inline bool macvlan_supports_dest_filter(struct net_device *dev)
 	       macvlan->mode == MACVLAN_MODE_VEPA ||
 	       macvlan->mode == MACVLAN_MODE_BRIDGE;
 }
+
+static inline int macvlan_release_l2fw_offload(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	macvlan->accel_priv = NULL;
+	return dev_uc_add(macvlan->lowerdev, dev->dev_addr);
+}
 #endif /* _LINUX_IF_MACVLAN_H */
-- 
2.14.3

^ permalink raw reply related

* [net-next 10/10] ixgbe: Avoid performing unnecessary resets for macvlan offload
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

The original implementation for macvlan offload has us performing a full
port reset every time we added a new macvlan. This shouldn't be necessary
and can be avoided with a few behavior changes.

This patches updates the logic for the queues so that we have essentially 3
possible configurations for macvlan offload. They consist of 15 macvlans
with 4 queues per macvlan, 31 macvlans with 2 queues per macvlan, and 63
macvlans with 1 queue per macvlan. As macvlans are added you will encounter
up to 3 total resets if you add all the way up to 63, and after that the
device will stay in the mode supporting up to 63 macvlans until the L2FW
flag is cleared.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 192 +++++++++++++++++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   5 +-
 2 files changed, 135 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f00720582e52..b6e5cea84949 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5344,15 +5344,11 @@ static int ixgbe_fwd_ring_up(struct ixgbe_adapter *adapter,
 	struct net_device *vdev = accel->netdev;
 	int i, baseq, err;
 
-	if (!test_bit(accel->pool, adapter->fwd_bitmask))
-		return 0;
-
 	baseq = accel->pool * adapter->num_rx_queues_per_pool;
 	netdev_dbg(vdev, "pool %i:%i queues %i:%i\n",
 		   accel->pool, adapter->num_rx_pools,
 		   baseq, baseq + adapter->num_rx_queues_per_pool);
 
-	accel->netdev = vdev;
 	accel->rx_base_queue = baseq;
 	accel->tx_base_queue = baseq;
 
@@ -5372,9 +5368,17 @@ static int ixgbe_fwd_ring_up(struct ixgbe_adapter *adapter,
 	if (err >= 0)
 		return 0;
 
+	/* if we cannot add the MAC rule then disable the offload */
+	macvlan_release_l2fw_offload(vdev);
+
 	for (i = 0; i < adapter->num_rx_queues_per_pool; i++)
 		adapter->rx_ring[baseq + i]->netdev = NULL;
 
+	netdev_err(vdev, "L2FW offload disabled due to L2 filter error\n");
+
+	clear_bit(accel->pool, adapter->fwd_bitmask);
+	kfree(accel);
+
 	return err;
 }
 
@@ -8799,6 +8803,49 @@ static void ixgbe_set_prio_tc_map(struct ixgbe_adapter *adapter)
 }
 
 #endif /* CONFIG_IXGBE_DCB */
+static int ixgbe_reassign_macvlan_pool(struct net_device *vdev, void *data)
+{
+	struct ixgbe_adapter *adapter = data;
+	struct ixgbe_fwd_adapter *accel;
+	int pool;
+
+	/* we only care about macvlans... */
+	if (!netif_is_macvlan(vdev))
+		return 0;
+
+	/* that have hardware offload enabled... */
+	accel = macvlan_accel_priv(vdev);
+	if (!accel)
+		return 0;
+
+	/* If we can relocate to a different bit do so */
+	pool = find_first_zero_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
+	if (pool < adapter->num_rx_pools) {
+		set_bit(pool, adapter->fwd_bitmask);
+		accel->pool = pool;
+		return 0;
+	}
+
+	/* if we cannot find a free pool then disable the offload */
+	netdev_err(vdev, "L2FW offload disabled due to lack of queue resources\n");
+	macvlan_release_l2fw_offload(vdev);
+	kfree(accel);
+
+	return 0;
+}
+
+static void ixgbe_defrag_macvlan_pools(struct net_device *dev)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	/* flush any stale bits out of the fwd bitmask */
+	bitmap_clear(adapter->fwd_bitmask, 1, 63);
+
+	/* walk through upper devices reassigning pools */
+	netdev_walk_all_upper_dev_rcu(dev, ixgbe_reassign_macvlan_pool,
+				      adapter);
+}
+
 /**
  * ixgbe_setup_tc - configure net_device for multiple traffic classes
  *
@@ -8866,6 +8913,8 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 #endif /* CONFIG_IXGBE_DCB */
 	ixgbe_init_interrupt_scheme(adapter);
 
+	ixgbe_defrag_macvlan_pools(dev);
+
 	if (netif_running(dev))
 		return ixgbe_open(dev);
 
@@ -9415,6 +9464,22 @@ static netdev_features_t ixgbe_fix_features(struct net_device *netdev,
 	return features;
 }
 
+static void ixgbe_reset_l2fw_offload(struct ixgbe_adapter *adapter)
+{
+	int rss = min_t(int, ixgbe_max_rss_indices(adapter),
+			num_online_cpus());
+
+	/* go back to full RSS if we're not running SR-IOV */
+	if (!adapter->ring_feature[RING_F_VMDQ].offset)
+		adapter->flags &= ~(IXGBE_FLAG_VMDQ_ENABLED |
+				    IXGBE_FLAG_SRIOV_ENABLED);
+
+	adapter->ring_feature[RING_F_RSS].limit = rss;
+	adapter->ring_feature[RING_F_VMDQ].limit = 1;
+
+	ixgbe_setup_tc(adapter->netdev, adapter->hw_tcs);
+}
+
 static int ixgbe_set_features(struct net_device *netdev,
 			      netdev_features_t features)
 {
@@ -9495,7 +9560,9 @@ static int ixgbe_set_features(struct net_device *netdev,
 		}
 	}
 
-	if (need_reset)
+	if ((changed & NETIF_F_HW_L2FW_DOFFLOAD) && adapter->num_rx_pools > 1)
+		ixgbe_reset_l2fw_offload(adapter);
+	else if (need_reset)
 		ixgbe_do_reset(netdev);
 	else if (changed & (NETIF_F_HW_VLAN_CTAG_RX |
 			    NETIF_F_HW_VLAN_CTAG_FILTER))
@@ -9758,11 +9825,9 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 
 static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 {
-	struct ixgbe_fwd_adapter *fwd_adapter = NULL;
 	struct ixgbe_adapter *adapter = netdev_priv(pdev);
-	int used_pools = adapter->num_vfs + adapter->num_rx_pools;
+	struct ixgbe_fwd_adapter *accel;
 	int tcs = adapter->hw_tcs ? : 1;
-	unsigned int limit;
 	int pool, err;
 
 	/* The hardware supported by ixgbe only filters on the destination MAC
@@ -9772,47 +9837,73 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 	if (!macvlan_supports_dest_filter(vdev))
 		return ERR_PTR(-EMEDIUMTYPE);
 
-	/* Hardware has a limited number of available pools. Each VF, and the
-	 * PF require a pool. Check to ensure we don't attempt to use more
-	 * then the available number of pools.
-	 */
-	if (used_pools >= IXGBE_MAX_VF_FUNCTIONS)
-		return ERR_PTR(-EINVAL);
+	pool = find_first_zero_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
+	if (pool == adapter->num_rx_pools) {
+		u16 used_pools = adapter->num_vfs + adapter->num_rx_pools;
+		u16 reserved_pools;
+
+		if (((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
+		     adapter->num_rx_pools >= (MAX_TX_QUEUES / tcs)) ||
+		    adapter->num_rx_pools > IXGBE_MAX_MACVLANS)
+			return ERR_PTR(-EBUSY);
+
+		/* Hardware has a limited number of available pools. Each VF,
+		 * and the PF require a pool. Check to ensure we don't
+		 * attempt to use more then the available number of pools.
+		 */
+		if (used_pools >= IXGBE_MAX_VF_FUNCTIONS)
+			return ERR_PTR(-EBUSY);
 
-	if (((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
-	      adapter->num_rx_pools >= (MAX_TX_QUEUES / tcs)) ||
-	    (adapter->num_rx_pools > IXGBE_MAX_MACVLANS))
-		return ERR_PTR(-EBUSY);
+		/* Enable VMDq flag so device will be set in VM mode */
+		adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED |
+				  IXGBE_FLAG_SRIOV_ENABLED;
 
-	fwd_adapter = kzalloc(sizeof(*fwd_adapter), GFP_KERNEL);
-	if (!fwd_adapter)
-		return ERR_PTR(-ENOMEM);
+		/* Try to reserve as many queues per pool as possible,
+		 * we start with the configurations that support 4 queues
+		 * per pools, followed by 2, and then by just 1 per pool.
+		 */
+		if (used_pools < 32 && adapter->num_rx_pools < 16)
+			reserved_pools = min_t(u16,
+					       32 - used_pools,
+					       16 - adapter->num_rx_pools);
+		else if (adapter->num_rx_pools < 32)
+			reserved_pools = min_t(u16,
+					       64 - used_pools,
+					       32 - adapter->num_rx_pools);
+		else
+			reserved_pools = 64 - used_pools;
 
-	pool = find_first_zero_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
-	set_bit(pool, adapter->fwd_bitmask);
-	limit = find_last_bit(adapter->fwd_bitmask, adapter->num_rx_pools + 1);
 
-	/* Enable VMDq flag so device will be set in VM mode */
-	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
-	adapter->ring_feature[RING_F_VMDQ].limit = limit + 1;
+		if (!reserved_pools)
+			return ERR_PTR(-EBUSY);
 
-	fwd_adapter->pool = pool;
+		adapter->ring_feature[RING_F_VMDQ].limit += reserved_pools;
 
-	/* Force reinit of ring allocation with VMDQ enabled */
-	err = ixgbe_setup_tc(pdev, adapter->hw_tcs);
+		/* Force reinit of ring allocation with VMDQ enabled */
+		err = ixgbe_setup_tc(pdev, adapter->hw_tcs);
+		if (err)
+			return ERR_PTR(err);
 
-	if (!err && netif_running(pdev))
-		err = ixgbe_fwd_ring_up(adapter, fwd_adapter);
+		if (pool >= adapter->num_rx_pools)
+			return ERR_PTR(-ENOMEM);
+	}
 
-	if (!err)
-		return fwd_adapter;
+	accel = kzalloc(sizeof(*accel), GFP_KERNEL);
+	if (!accel)
+		return ERR_PTR(-ENOMEM);
+
+	set_bit(pool, adapter->fwd_bitmask);
+	accel->pool = pool;
+	accel->netdev = vdev;
 
-	/* unwind counter and free adapter struct */
-	netdev_info(pdev,
-		    "%s: dfwd hardware acceleration failed\n", vdev->name);
-	clear_bit(pool, adapter->fwd_bitmask);
-	kfree(fwd_adapter);
-	return ERR_PTR(err);
+	if (!netif_running(pdev))
+		return accel;
+
+	err = ixgbe_fwd_ring_up(adapter, accel);
+	if (err)
+		return ERR_PTR(err);
+
+	return accel;
 }
 
 static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
@@ -9820,7 +9911,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	struct ixgbe_fwd_adapter *accel = priv;
 	struct ixgbe_adapter *adapter = netdev_priv(pdev);
 	unsigned int rxbase = accel->rx_base_queue;
-	unsigned int limit, i;
+	unsigned int i;
 
 	/* delete unicast filter associated with offloaded interface */
 	ixgbe_del_mac_filter(adapter, accel->netdev->dev_addr,
@@ -9844,25 +9935,6 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	}
 
 	clear_bit(accel->pool, adapter->fwd_bitmask);
-	limit = find_last_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
-	adapter->ring_feature[RING_F_VMDQ].limit = limit + 1;
-
-	/* go back to full RSS if we're done with our VMQs */
-	if (adapter->ring_feature[RING_F_VMDQ].limit == 1) {
-		int rss = min_t(int, ixgbe_max_rss_indices(adapter),
-				num_online_cpus());
-
-		adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
-		adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
-		adapter->ring_feature[RING_F_RSS].limit = rss;
-	}
-
-	ixgbe_setup_tc(pdev, adapter->hw_tcs);
-	netdev_dbg(pdev, "pool %i:%i queues %i:%i\n",
-		   accel->pool, adapter->num_rx_pools,
-		   accel->rx_base_queue,
-		   accel->rx_base_queue +
-		   adapter->num_rx_queues_per_pool);
 	kfree(accel);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 008aa073a679..bfc4171cd3f9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -266,7 +266,7 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
 #endif
 
 	/* Disable VMDq flag so device will be set in VM mode */
-	if (adapter->ring_feature[RING_F_VMDQ].limit == 1) {
+	if (bitmap_weight(adapter->fwd_bitmask, adapter->num_rx_pools) == 1) {
 		adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
 		adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
 		rss = min_t(int, ixgbe_max_rss_indices(adapter),
@@ -312,7 +312,8 @@ static int ixgbe_pci_sriov_enable(struct pci_dev *dev, int num_vfs)
 	 * other values out of range.
 	 */
 	num_tc = adapter->hw_tcs;
-	num_rx_pools = adapter->num_rx_pools;
+	num_rx_pools = bitmap_weight(adapter->fwd_bitmask,
+				     adapter->num_rx_pools);
 	limit = (num_tc > 4) ? IXGBE_MAX_VFS_8TC :
 		(num_tc > 1) ? IXGBE_MAX_VFS_4TC : IXGBE_MAX_VFS_1TC;
 
-- 
2.14.3

^ permalink raw reply related

* Re: [RFC bpf] bpf, x64: fix JIT emission for dead code
From: Daniel Borkmann @ 2018-04-25 15:34 UTC (permalink / raw)
  To: Gianluca Borello, netdev; +Cc: ast
In-Reply-To: <20180425054216.48961-1-g.borello@gmail.com>

On 04/25/2018 07:42 AM, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
> 
> bool flag = false
> ...
> /* A bunch of other code */
> ...
> if (flag)
>         do_something()
> 
> In some cases llvm is not able to remove at compile time the code for
> do_something(), so the generated BPF program ends up with a large amount
> of dead instructions. In one specific real life example, there are two
> series of ~500 and ~1000 dead instructions in the program. When the
> verifier replaces them with a series of ja-1 instructions, it causes an
> interesting behavior at JIT time.
[...]
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@gmail.com>

I've applied this fix to bpf tree, thanks Gianluca!

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
From: Daniel Borkmann @ 2018-04-25 15:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Leo Yan
  Cc: David S. Miller, Alexei Starovoitov, Kirill Tkhai, netdev,
	linux-kernel
In-Reply-To: <20180425141435.bvolztivms25y2vt@ast-mbp>

On 04/25/2018 04:14 PM, Alexei Starovoitov wrote:
> On Wed, Apr 25, 2018 at 05:25:47PM +0800, Leo Yan wrote:
>>
>> If we have concern for security issue, should we remove support for
>> 'bpf_jit_enable = 2' and modify the doc to reflect this change?
> 
> I suggest to fix the doc.

Agree, lets do that instead. Leo, could you cook a patch for that?

Thanks,
Daniel

^ permalink raw reply

* NOTIFICATION
From: ZIMBRA EMAIL ADMIN @ 2018-04-25 15:46 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 96 bytes --]


Open the attachment to read your Account Update message. 

Sincerely: 
Your Email Administrator

[-- Attachment #2: ZIMBRA ADMIN.pdf --]
[-- Type: application/pdf, Size: 62125 bytes --]

^ permalink raw reply

* simplify procfs code for seq_file instances V2
From: Christoph Hellwig @ 2018-04-25 15:47 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel,
	linux-kernel, linux-scsi, linux-ide, Greg Kroah-Hartman,
	jfs-discussion, linux-afs, linux-acpi, netdev, netfilter-devel,
	Jiri Slaby, linux-ext4, Alexey Dobriyan, megaraidlinux.pdl,
	drbd-dev

We currently have hundreds of proc files that implement plain, read-only
seq_file based interfaces.  This series consolidates them using new
procfs helpers that take the seq_operations or simple show callback
directly.

A git tree is available at:

    git://git.infradead.org/users/hch/misc.git proc_create.2

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/proc_create.2

Changes since V1:
 - open code proc_create_data to avoid setting not fully initialized
   entries live
 - use unsigned int for state_size
 - dropped the s390/cio/blacklist hunk as it has a write method
 - dropped the IPMI patch given that IPMI proc support is scheduled for
   removal.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox