Netdev List
 help / color / mirror / Atom feed
* [PATCH v3,net-next 1/2] ip_gre: fix error path when erspan_rcv failed
From: Haishuang Yan @ 2017-12-20  2:21 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513736507-22968-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

When erspan_rcv call return PACKET_REJECT, we shoudn't call ipgre_rcv to
process packets again, instead send icmp unreachable message in error
path.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Acked-by: William Tu <u9012063@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Change since v3:
  * Rebase on latest master branch.
  * Fix wrong commit information.
---
 net/ipv4/ip_gre.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 3029e3e..90c9123 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -436,11 +436,13 @@ static int gre_rcv(struct sk_buff *skb)
 		     tpi.proto == htons(ETH_P_ERSPAN2))) {
 		if (erspan_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
 			return 0;
+		goto out;
 	}
 
 	if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
 		return 0;
 
+out:
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 drop:
 	kfree_skb(skb);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v3,net-next 0/2] net: erspan: fix erspan_rcv/ip6erspan_rcv error path
From: Haishuang Yan @ 2017-12-20  2:21 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan

This patch series fix potential issue in error path.

Haishuang Yan (2):
  ip_gre: fix error path when erspan_rcv failed
  ip6_gre: fix error path when ip6erspan_rcv failed

 net/ipv4/ip_gre.c  | 2 ++
 net/ipv6/ip6_gre.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable
From: Alexei Starovoitov @ 2017-12-20  2:14 UTC (permalink / raw)
  To: Masami Hiramatsu, Josef Bacik
  Cc: rostedt, mingo, davem, netdev, linux-kernel, ast, kernel-team,
	daniel, linux-btrfs, darrick.wong, Josef Bacik
In-Reply-To: <20171219152925.5789309c6c4d27807d42f11c@kernel.org>

On 12/18/17 10:29 PM, Masami Hiramatsu wrote:
>>
>> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>
> BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
> Since this feature override a function to just return with
> some return value (as far as I understand, or would you
> also plan to modify execution path inside a function?),
> I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
> CONFIG_BPF_EXECUTION_OVERRIDE.

I don't think such renaming makes sense.
The feature is overriding kprobe by changing how kprobe returns.
It doesn't override BPF_FUNCTION or BPF_EXECUTION.
The kernel enters and exists bpf program as normal.

> Indeed, BPF is based on kprobes, but it seems you are limiting it
> with ftrace (function-call trace) (I'm not sure the reason why),
> so using "kprobes" for this feature seems strange for me.

do you have an idea how kprobe override can happen when kprobe
placed in the middle of the function?

Please make your suggestion as patches based on top of bpf-next.

Thanks

^ permalink raw reply

* [PATCH v3,net-next 2/2] ip6_gre: fix potential memory leak in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-20  2:07 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513735621-21913-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.

Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Changes since v3:
  * Rebase on latest master branch.
  * Fix wrong commit information.
---
 net/ipv6/ip6_gre.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 9bd1103..45038a9 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -550,8 +550,10 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
 
 			info = &tun_dst->u.tun_info;
 			md = ip_tunnel_info_opts(info);
-			if (!md)
+			if (!md) {
+				dst_release((struct dst_entry *)tun_dst);
 				return PACKET_REJECT;
+			}
 
 			memcpy(md, pkt_md, sizeof(*md));
 			md->version = ver;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v3,net-next 1/2] ip_gre: fix potential memory leak in erspan_rcv
From: Haishuang Yan @ 2017-12-20  2:07 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu
In-Reply-To: <1513735621-21913-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

If md is NULL, tun_dst must be freed, otherwise it will cause memory
leak.

Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Changes since v3:
  * Rebase on latest master branch.
  * Fix wrong commit information.
---
 net/ipv4/ip_gre.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fd4d6e9..3029e3e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -313,8 +313,10 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 				return PACKET_REJECT;
 
 			md = ip_tunnel_info_opts(&tun_dst->u.tun_info);
-			if (!md)
+			if (!md) {
+				dst_release((struct dst_entry *)tun_dst);
 				return PACKET_REJECT;
+			}
 
 			memcpy(md, pkt_md, sizeof(*md));
 			md->version = ver;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v3,net-next 0/2] net: erspan: fix potential memory leak
From: Haishuang Yan @ 2017-12-20  2:06 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan

This patch series fix potential memory leak issue.

Haishuang Yan (2):
  ip_gre: fix potential memory leak in erspan_rcv
  ip6_gre: fix potential memory leak in ip6erspan_rcv

 net/ipv4/ip_gre.c  | 4 +++-
 net/ipv6/ip6_gre.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

--
1.8.3.1

^ permalink raw reply

* [PATCH v4 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20  2:01 UTC (permalink / raw)
  To: netdev; +Cc: dsahern

The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation.  Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph.  Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
---
change in v4:
  - use matches instead of strcmp on ingress/egress
change in v3:
  - change erspan_dir 0/1 to "in[gress]/e[gress]"
  - update manpage
change in v2:
  - fix typo ETH_P_ERSPAN2
  - fix space and indent
---
 include/uapi/linux/if_ether.h  |  1 +
 include/uapi/linux/if_tunnel.h |  3 ++
 ip/link_gre.c                  | 66 ++++++++++++++++++++++++++++--
 ip/link_gre6.c                 | 67 ++++++++++++++++++++++++++++--
 man/man8/ip-link.8.in          | 92 ++++++++++++++++++++++++++++++++++++------
 5 files changed, 210 insertions(+), 19 deletions(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 2eb529a90250..133567bf2e04 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -47,6 +47,7 @@
 #define ETH_P_PUP	0x0200		/* Xerox PUP packet		*/
 #define ETH_P_PUPAT	0x0201		/* Xerox PUP Addr Trans packet	*/
 #define ETH_P_TSN	0x22F0		/* TSN (IEEE 1722) packet	*/
+#define ETH_P_ERSPAN2	0x22EB		/* ERSPAN version 2 (type III)	*/
 #define ETH_P_IP	0x0800		/* Internet Protocol packet	*/
 #define ETH_P_X25	0x0805		/* CCITT X.25			*/
 #define ETH_P_ARP	0x0806		/* Address Resolution packet	*/
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 38cdf90692f8..ecdc76669cfd 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -137,6 +137,9 @@ enum {
 	IFLA_GRE_IGNORE_DF,
 	IFLA_GRE_FWMARK,
 	IFLA_GRE_ERSPAN_INDEX,
+	IFLA_GRE_ERSPAN_VER,
+	IFLA_GRE_ERSPAN_DIR,
+	IFLA_GRE_ERSPAN_HWID,
 	__IFLA_GRE_MAX,
 };
 
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 43cb1af6196a..0b9c71baebaf 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -98,6 +98,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 ignore_df = 0;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -179,6 +182,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -343,6 +355,24 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "ingress") == 0)
+				erspan_dir = 0;
+			else if (matches(*argv, "egress") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -374,8 +404,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
 		addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else if (erspan_ver == 2) {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 	} else {
 		addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
 	}
@@ -514,7 +551,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
 
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 2cb46ca116d0..e4a8e1f5ee41 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -109,6 +109,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	int len;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -191,6 +194,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -389,6 +401,24 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "ingress") == 0)
+				erspan_dir = 0;
+			else if (matches(*argv, "egress") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -408,9 +438,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
 		addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
-
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 		addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_SPORT, htons(encapsport));
@@ -587,7 +623,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 9e9a5f0d2cef..0086b3dfa09d 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -665,13 +665,13 @@ keyword.
 .in -8
 
 .TP
-GRE, IPIP, SIT, ERSPAN Type Support
+GRE, IPIP, SIT Type Support
 For a link of types
-.I GRE/IPIP/SIT/ERSPAN
+.I GRE/IPIP/SIT
 the following additional arguments are supported:
 
 .BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " | " erspan " }"
+.BR type " { " gre " | " ipip " | " sit " }"
 .BI " remote " ADDR " local " ADDR
 [
 .BR encap " { " fou " | " gue " | " none " }"
@@ -685,8 +685,6 @@ the following additional arguments are supported:
 .I " [no]encap-remcsum "
 ] [
 .I " mode " { ip6ip | ipip | mplsip | any } "
-] [
-.BR erspan " \fIIDX "
 ]
 
 .in +8
@@ -731,13 +729,6 @@ MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Supported for
 SIT where the default is "ip6ip" and IPIP where the default is "ipip".
 IPv6-Over-IPv4 is not supported for IPIP.
 
-.sp
-.BR erspan " \fIIDX "
-- specifies the ERSPAN index field.
-.IR IDX
-indicates a 20 bit index/port number associated with the ERSPAN
-traffic's source port and direction.
-
 .in -8
 
 .TP
@@ -883,6 +874,76 @@ the following additional arguments are supported:
 - specifies the mode (datagram or connected) to use.
 
 .TP
+ERSPAN Type Support
+For a link of type
+.I ERSPAN/IP6ERSPAN
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " erspan " | " ip6erspan " }"
+.BI remote " ADDR " local " ADDR " seq
+.RB key
+.I KEY
+.BR erspan_ver " \fIversion "
+[
+.BR erspan " \fIIDX "
+] [
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+] [
+.BR erspan_hwid " \fIhwid "
+] [
+.RB external
+]
+
+.in +8
+.sp
+.BI  remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI  local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.BR erspan_ver " \fIversion "
+- specifies the ERSPAN version number.
+.IR version
+indicates the ERSPAN version to be created: 1 for version 1 (type II)
+or 2 for version 2 (type III).
+
+.sp
+.BR erspan " \fIIDX "
+- specifies the ERSPAN v1 index field.
+.IR IDX
+indicates a 20 bit index/port number associated with the ERSPAN
+traffic's source port and direction.
+
+.sp
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+- specifies the ERSPAN v2 mirrored traffic's direction.
+
+.sp
+.BR erspan_hwid " \fIhwid "
+- an unique identifier of an ERSPAN v2 engine within a system.
+.IR hwid
+is a 6-bit value for users to configure.
+
+.sp
+.BR external
+- make this tunnel externally controlled (or not, which is the default).
+In the kernel, this is referred to as collect metadata mode.  This flag is
+mutually exclusive with the
+.BR remote ,
+.BR local ,
+.BR erspan_ver ,
+.BR erspan ,
+.BR erspan_dir " and " erspan_hwid
+options.
+
+.in -8
+
+.TP
 GENEVE Type Support
 For a link of type
 .I GENEVE
@@ -2062,6 +2123,13 @@ ip link add link wpan0 lowpan0 type lowpan
 Creates a 6LoWPAN interface named lowpan0 on the underlying
 IEEE 802.15.4 device wpan0.
 .RE
+.PP
+ip link add dev ip6erspan11 type ip6erspan seq key 102
+local fc00:100::2 remote fc00:100::1
+erspan_ver 2 erspan_dir ingress erspan_hwid 17
+.RS 4
+Creates a IP6ERSPAN version 2 interface named ip6erspan00.
+.RE
 
 .SH SEE ALSO
 .br
-- 
2.7.4

^ permalink raw reply related

* RCU callback crashes
From: Jakub Kicinski @ 2017-12-20  1:59 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Jiri Pirko, Cong Wang

Hi!

If I run the netdevsim test long enough on a kernel with no debugging 
I get this:

[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
[ 1400.458005] IP: 0x46474e552
[ 1400.461231] PGD 0 P4D 0 
[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 1400.535150] RIP: 0010:0x46474e552
[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
[ 1400.585197] FS:  0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
[ 1400.594349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1400.625048] Call Trace:
[ 1400.627868]  <IRQ>
[ 1400.630207]  ? rcu_process_callbacks+0x1a0/0x4d0
[ 1400.635458]  ? __do_softirq+0xd1/0x30a
[ 1400.639739]  ? irq_exit+0xae/0xb0
[ 1400.643532]  ? smp_apic_timer_interrupt+0x60/0x140
[ 1400.648977]  ? apic_timer_interrupt+0x8c/0xa0
[ 1400.653934]  </IRQ>
[ 1400.656370]  ? cpuidle_enter_state+0xb0/0x2f0
[ 1400.661328]  ? cpuidle_enter_state+0x8d/0x2f0
[ 1400.666287]  ? do_idle+0x17b/0x1d0
[ 1400.670167]  ? cpu_startup_entry+0x5f/0x70
[ 1400.674836]  ? start_secondary+0x169/0x190
[ 1400.679504]  ? secondary_startup_64+0xa5/0xb0
[ 1400.684466] Code:  Bad RIP value.
[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
[ 1400.693703] CR2: 000000046474e552
[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Unfortunately reproducing the crash on an instrumented kernel seems to
be difficult..

I managed to gather this:

[   26.157415] ------------[ cut here ]------------
[   26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint:           (null)
[   26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
[   26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[   26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
[   26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[   26.248453] RIP: 0010:debug_print_object+0x64/0x80
[   26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
[   26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
[   26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
[   26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
[   26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
[   26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
[   26.300194] FS:  00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
[   26.309346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
[   26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   26.331994] Call Trace:
[   26.331998]  debug_check_no_obj_freed+0x1e6/0x220
[   26.332020]  ? qdisc_graft+0x14f/0x450
[   26.332025]  kfree+0x14d/0x1b0
[   26.332027]  qdisc_graft+0x14f/0x450
[   26.332029]  tc_get_qdisc+0x12f/0x200
[   26.332035]  rtnetlink_rcv_msg+0x122/0x310
[   26.332039]  ? __skb_try_recv_datagram+0xef/0x150
[   26.332040]  ? __kmalloc_node_track_caller+0x205/0x2b0
[   26.332042]  ? rtnl_calcit.isra.12+0x100/0x100
[   26.332044]  netlink_rcv_skb+0x8d/0x130
[   26.332046]  netlink_unicast+0x16a/0x210
[   26.332048]  netlink_sendmsg+0x32a/0x370
[   26.332054]  sock_sendmsg+0x2d/0x40
[   26.332056]  ___sys_sendmsg+0x298/0x2e0
[   26.332061]  ? mem_cgroup_commit_charge+0x7a/0x540
[   26.332062]  ? mem_cgroup_try_charge+0x8e/0x1d0
[   26.332066]  ? __handle_mm_fault+0x3a1/0x1190
[   26.332068]  ? __sys_sendmsg+0x41/0x70
[   26.332069]  __sys_sendmsg+0x41/0x70
[   26.332074]  entry_SYSCALL_64_fastpath+0x1e/0x81
[   26.332076] RIP: 0033:0x7f64d3b53450
[   26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
[   26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
[   26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
[   26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
[   26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
[   26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b 
[   26.332097] ---[ end trace bd33b199ae76ad43 ]---

^ permalink raw reply

* [PATCH v3,net-next] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-20  1:53 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu

pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
the right place.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Cc: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Change since v3:
  * Rebase on latest master branch.
  * Fix wrong commit information.
---
 net/ipv6/ip6_gre.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 87b9892..9bd1103 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -507,12 +507,11 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
 	struct ip6_tnl *tunnel;
 	u8 ver;
 
-	ipv6h = ipv6_hdr(skb);
-	ershdr = (struct erspan_base_hdr *)skb->data;
-
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr))))
 		return PACKET_REJECT;
 
+	ipv6h = ipv6_hdr(skb);
+	ershdr = (struct erspan_base_hdr *)skb->data;
 	ver = (ntohs(ershdr->ver_vlan) & VER_MASK) >> VER_OFFSET;
 	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v3 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20  1:51 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers
In-Reply-To: <8eb4e84f-2218-0c96-ece6-2b1008f2da2f@gmail.com>

On Tue, Dec 19, 2017 at 5:28 PM, David Ahern <dsahern@gmail.com> wrote:
> Hi William:
>
> On 12/19/17 6:08 PM, William Tu wrote:
>> @@ -343,6 +355,26 @@ get_failed:
>>                               invarg("invalid erspan index\n", *argv);
>>                       if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
>>                               invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
>> +             } else if (strcmp(*argv, "erspan_ver") == 0) {
>> +                     NEXT_ARG();
>> +                     if (get_u8(&erspan_ver, *argv, 0))
>> +                             invarg("invalid erspan version\n", *argv);
>> +                     if (erspan_ver != 1 && erspan_ver != 2)
>> +                             invarg("erspan version must be 1 or 2\n", *argv);
>> +             } else if (strcmp(*argv, "erspan_dir") == 0) {
>> +                     NEXT_ARG();
>> +                     if (strcmp(*argv, "ingress") == 0 ||
>> +                         strcmp(*argv, "in") == 0)
>> +                             erspan_dir = 0;
>> +                     else if (strcmp(*argv, "egress") == 0 ||
>> +                              strcmp(*argv, "e") == 0)
>
> iproute2 has a matches() function that should be used -- it basically
> allows whatever shorthand notation matches -- in this case e, eg, egr,
> egres, egress all match. Checkout ip/iplink.c and search for matches.
>
Hi David,
Thanks, will fix it in next version.
William

^ permalink raw reply

* Re: [v2 PATCH -tip 1/6] net: tcp: Add trace events for TCP congestion window tracing
From: kbuild test robot @ 2017-12-20  1:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: kbuild-all, Ingo Molnar, Ian McDonald, Vlad Yasevich,
	Stephen Hemminger, Steven Rostedt, Peter Zijlstra,
	Thomas Gleixner, LKML, H . Peter Anvin, Gerrit Renker,
	David S . Miller, Neil Horman, dccp, netdev, linux-sctp,
	Stephen Rothwell, mhiramat
In-Reply-To: <151358467535.28850.8937168919346099524.stgit@devbox>

[-- Attachment #1: Type: text/plain, Size: 11009 bytes --]

Hi Masami,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on v4.15-rc4 next-20171219]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/net-tcp-sctp-dccp-Replace-jprobe-usage-with-trace-events/20171220-081035
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All error/warnings (new ones prefixed by >>):

   In file included from include/trace/events/udp.h:9:0,
                    from net//core/net-traces.c:35:
>> include/trace/events/tcp.h:37:11: error: expected ')' before 'const'
     TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
              ^
   include/linux/tracepoint.h:105:27: note: in definition of macro 'TP_PROTO'
    #define TP_PROTO(args...) args
                              ^~~~
>> include/linux/tracepoint.h:237:20: error: redefinition of '__tpstrtab_tcp_retransmit_skb'
     static const char __tpstrtab_##name[]     \
                       ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~
>> include/trace/events/tcp.h:90:1: note: in expansion of macro 'DEFINE_EVENT'
    DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
    ^~~~~~~~~~~~
   In file included from include/trace/events/tcp.h:10:0,
                    from net//core/net-traces.c:34:
   include/linux/tracepoint.h:237:20: note: previous definition of '__tpstrtab_tcp_retransmit_skb' was here
     static const char __tpstrtab_##name[]     \
                       ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~
>> include/trace/events/tcp.h:90:1: note: in expansion of macro 'DEFINE_EVENT'
    DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
    ^~~~~~~~~~~~
   In file included from include/trace/events/udp.h:9:0,
                    from net//core/net-traces.c:35:
   include/linux/tracepoint.h:239:20: error: redefinition of '__tracepoint_tcp_retransmit_skb'
     struct tracepoint __tracepoint_##name     \
                       ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~
>> include/trace/events/tcp.h:90:1: note: in expansion of macro 'DEFINE_EVENT'
    DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
    ^~~~~~~~~~~~
   In file included from include/trace/events/tcp.h:10:0,
                    from net//core/net-traces.c:34:
   include/linux/tracepoint.h:239:20: note: previous definition of '__tracepoint_tcp_retransmit_skb' was here
     struct tracepoint __tracepoint_##name     \
                       ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~
>> include/trace/events/tcp.h:90:1: note: in expansion of macro 'DEFINE_EVENT'
    DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
    ^~~~~~~~~~~~
   In file included from include/trace/events/udp.h:9:0,
                    from net//core/net-traces.c:35:
>> include/linux/tracepoint.h:242:35: error: redefinition of '__tracepoint_ptr_tcp_retransmit_skb'
     static struct tracepoint * const __tracepoint_ptr_##name __used  \
                                      ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~
>> include/trace/events/tcp.h:90:1: note: in expansion of macro 'DEFINE_EVENT'
    DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
    ^~~~~~~~~~~~
   In file included from include/trace/events/tcp.h:10:0,
                    from net//core/net-traces.c:34:
   include/linux/tracepoint.h:242:35: note: previous definition of '__tracepoint_ptr_tcp_retransmit_skb' was here
     static struct tracepoint * const __tracepoint_ptr_##name __used  \
                                      ^
>> include/linux/tracepoint.h:247:2: note: in expansion of macro 'DEFINE_TRACE_FN'
     DEFINE_TRACE_FN(name, NULL, NULL);
     ^~~~~~~~~~~~~~~
>> include/trace/define_trace.h:51:2: note: in expansion of macro 'DEFINE_TRACE'
     DEFINE_TRACE(name)
     ^~~~~~~~~~~~

vim +37 include/trace/events/tcp.h

e086101b Cong Wang   2017-10-13  12  
e8fce239 Song Liu    2017-10-23  13  #define tcp_state_name(state)	{ state, #state }
e8fce239 Song Liu    2017-10-23  14  #define show_tcp_state_name(val)			\
e8fce239 Song Liu    2017-10-23  15  	__print_symbolic(val,				\
e8fce239 Song Liu    2017-10-23  16  		tcp_state_name(TCP_ESTABLISHED),	\
e8fce239 Song Liu    2017-10-23  17  		tcp_state_name(TCP_SYN_SENT),		\
e8fce239 Song Liu    2017-10-23  18  		tcp_state_name(TCP_SYN_RECV),		\
e8fce239 Song Liu    2017-10-23  19  		tcp_state_name(TCP_FIN_WAIT1),		\
e8fce239 Song Liu    2017-10-23  20  		tcp_state_name(TCP_FIN_WAIT2),		\
e8fce239 Song Liu    2017-10-23  21  		tcp_state_name(TCP_TIME_WAIT),		\
e8fce239 Song Liu    2017-10-23  22  		tcp_state_name(TCP_CLOSE),		\
e8fce239 Song Liu    2017-10-23  23  		tcp_state_name(TCP_CLOSE_WAIT),		\
e8fce239 Song Liu    2017-10-23  24  		tcp_state_name(TCP_LAST_ACK),		\
e8fce239 Song Liu    2017-10-23  25  		tcp_state_name(TCP_LISTEN),		\
e8fce239 Song Liu    2017-10-23  26  		tcp_state_name(TCP_CLOSING),		\
e8fce239 Song Liu    2017-10-23  27  		tcp_state_name(TCP_NEW_SYN_RECV))
e8fce239 Song Liu    2017-10-23  28  
f6e37b25 Song Liu    2017-10-23  29  /*
f6e37b25 Song Liu    2017-10-23  30   * tcp event with arguments sk and skb
f6e37b25 Song Liu    2017-10-23  31   *
f6e37b25 Song Liu    2017-10-23  32   * Note: this class requires a valid sk pointer; while skb pointer could
f6e37b25 Song Liu    2017-10-23  33   *       be NULL.
f6e37b25 Song Liu    2017-10-23  34   */
f6e37b25 Song Liu    2017-10-23 @35  DECLARE_EVENT_CLASS(tcp_event_sk_skb,
e086101b Cong Wang   2017-10-13  36  
7344e29f Song Liu    2017-10-23 @37  	TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
e086101b Cong Wang   2017-10-13  38  
e086101b Cong Wang   2017-10-13  39  	TP_ARGS(sk, skb),
e086101b Cong Wang   2017-10-13  40  
e086101b Cong Wang   2017-10-13  41  	TP_STRUCT__entry(
7344e29f Song Liu    2017-10-23  42  		__field(const void *, skbaddr)
7344e29f Song Liu    2017-10-23  43  		__field(const void *, skaddr)
e086101b Cong Wang   2017-10-13  44  		__field(__u16, sport)
e086101b Cong Wang   2017-10-13  45  		__field(__u16, dport)
e086101b Cong Wang   2017-10-13  46  		__array(__u8, saddr, 4)
e086101b Cong Wang   2017-10-13  47  		__array(__u8, daddr, 4)
e086101b Cong Wang   2017-10-13  48  		__array(__u8, saddr_v6, 16)
e086101b Cong Wang   2017-10-13  49  		__array(__u8, daddr_v6, 16)
e086101b Cong Wang   2017-10-13  50  	),
e086101b Cong Wang   2017-10-13  51  
e086101b Cong Wang   2017-10-13  52  	TP_fast_assign(
e086101b Cong Wang   2017-10-13  53  		struct inet_sock *inet = inet_sk(sk);
e086101b Cong Wang   2017-10-13  54  		struct in6_addr *pin6;
e086101b Cong Wang   2017-10-13  55  		__be32 *p32;
e086101b Cong Wang   2017-10-13  56  
e086101b Cong Wang   2017-10-13  57  		__entry->skbaddr = skb;
e086101b Cong Wang   2017-10-13  58  		__entry->skaddr = sk;
e086101b Cong Wang   2017-10-13  59  
e086101b Cong Wang   2017-10-13  60  		__entry->sport = ntohs(inet->inet_sport);
e086101b Cong Wang   2017-10-13  61  		__entry->dport = ntohs(inet->inet_dport);
e086101b Cong Wang   2017-10-13  62  
e086101b Cong Wang   2017-10-13  63  		p32 = (__be32 *) __entry->saddr;
e086101b Cong Wang   2017-10-13  64  		*p32 = inet->inet_saddr;
e086101b Cong Wang   2017-10-13  65  
e086101b Cong Wang   2017-10-13  66  		p32 = (__be32 *) __entry->daddr;
e086101b Cong Wang   2017-10-13  67  		*p32 =  inet->inet_daddr;
e086101b Cong Wang   2017-10-13  68  
89005678 David Ahern 2017-10-18  69  #if IS_ENABLED(CONFIG_IPV6)
89005678 David Ahern 2017-10-18  70  		if (sk->sk_family == AF_INET6) {
e086101b Cong Wang   2017-10-13  71  			pin6 = (struct in6_addr *)__entry->saddr_v6;
386fd5da David Ahern 2017-10-16  72  			*pin6 = sk->sk_v6_rcv_saddr;
e086101b Cong Wang   2017-10-13  73  			pin6 = (struct in6_addr *)__entry->daddr_v6;
386fd5da David Ahern 2017-10-16  74  			*pin6 = sk->sk_v6_daddr;
89005678 David Ahern 2017-10-18  75  		} else
89005678 David Ahern 2017-10-18  76  #endif
89005678 David Ahern 2017-10-18  77  		{
e086101b Cong Wang   2017-10-13  78  			pin6 = (struct in6_addr *)__entry->saddr_v6;
e086101b Cong Wang   2017-10-13  79  			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
e086101b Cong Wang   2017-10-13  80  			pin6 = (struct in6_addr *)__entry->daddr_v6;
e086101b Cong Wang   2017-10-13  81  			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
e086101b Cong Wang   2017-10-13  82  		}
e086101b Cong Wang   2017-10-13  83  	),
e086101b Cong Wang   2017-10-13  84  
fb6ff75e David Ahern 2017-10-16  85  	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
e086101b Cong Wang   2017-10-13  86  		  __entry->sport, __entry->dport, __entry->saddr, __entry->daddr,
e086101b Cong Wang   2017-10-13  87  		  __entry->saddr_v6, __entry->daddr_v6)
e086101b Cong Wang   2017-10-13  88  );
e086101b Cong Wang   2017-10-13  89  
f6e37b25 Song Liu    2017-10-23 @90  DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
f6e37b25 Song Liu    2017-10-23  91  
7344e29f Song Liu    2017-10-23  92  	TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
f6e37b25 Song Liu    2017-10-23  93  
f6e37b25 Song Liu    2017-10-23  94  	TP_ARGS(sk, skb)
f6e37b25 Song Liu    2017-10-23  95  );
f6e37b25 Song Liu    2017-10-23  96  

:::::: The code at line 37 was first introduced by commit
:::::: 7344e29f285a94b965075599731811c352f3ab40 tcp: mark trace event arguments sk and skb as const

:::::: TO: Song Liu <songliubraving@fb.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47602 bytes --]

^ permalink raw reply

* RE: [PATCH net-next] netdevsim: correctly check return value of debugfs_create_dir
From: Prashant Bhole @ 2017-12-20  1:40 UTC (permalink / raw)
  To: 'Jakub Kicinski'; +Cc: 'David Miller', netdev
In-Reply-To: <20171219171806.15fe3765@cakuba.netronome.com>

> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> 
> On Wed, 20 Dec 2017 09:54:59 +0900, Prashant Bhole wrote:
> > > Ah, I would just error out in case we can't create any of the
> > > sub-directories as well.
> >
> > Does that mean fatal error if we can't create any of the subdirectories?
> 
> Yes.

Ok. In this case there is no need of condition before creating files. I will
submit v2.

-Prashant

^ permalink raw reply

* Re: [PATCH bpf 11/11] bpf: add selftest for tcpbpf
From: Alexei Starovoitov @ 2017-12-20  1:34 UTC (permalink / raw)
  To: Lawrence Brakmo, netdev; +Cc: Kernel Team, Blake Matheny, Daniel Borkmann
In-Reply-To: <20171219062200.372711-12-brakmo@fb.com>

On 12/18/17 10:22 PM, Lawrence Brakmo wrote:
> -	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o
> +	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
> +	test_tcpbpf_kern.o

it won't apply. please base patches on bpf-next tree

> +#!/usr/local/bin/python
> +#
> +# Copyright (c) 2017 Facebook
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of version 2 of the GNU General Public
> +# License as published by the Free Software Foundation.

the license should be in SPDX format.

> +++ b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
> @@ -0,0 +1,133 @@
> +/* Copyright (c) 2017 Facebook
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */

same here.

> +		case BPF_SOCK_OPS_STATE_CB:
> +			if (skops->args[1] == 7) {
> +				__u32 key = 0;
> +				struct globals g, *gp;
> +
> +				gp = bpf_map_lookup_elem(&global_map, &key);
> +				if (gp == NULL) {
> +				} else {
> +					g = *gp;
> +					g.total_retrans = skops->total_retrans;
> +					g.data_segs_in = skops->data_segs_in;

you can reduce indent by doing
if (!gp)
   break;
g = *gp;
g.total_retrans = skops->total_retrans;

> +++ b/tools/testing/selftests/bpf/test_tcpbpf_user.c
> @@ -0,0 +1,119 @@
> +/* Copyright (c) 2017 Facebook
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <assert.h>
> +#include <linux/perf_event.h>
> +#include <linux/ptrace.h>
> +#include <linux/bpf.h>
> +#include <sys/ioctl.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +//#include "bpf_load.h"

please remove left over comments.

^ permalink raw reply

* Re: [PATCH v2,net-next 1/2] ip_gre: fix potential memory leak in erspan_rcv
From: Haishuang Yan @ 2017-12-20  1:33 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <20171219.103633.721139612524381957.davem@davemloft.net>



> On 2017年12月19日, at 下午11:36, David Miller <davem@davemloft.net> wrote:
> 
> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> Date: Sat, 16 Dec 2017 10:48:38 +0800
> 
>> If md is NULL, tun_dst must be freed, otherwise it will cause memory
>> leak.
>> 
>> Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
>> Cc: William Tu <u9012063@gmail.com>
>> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
>> 
>> Change since v2:
>>  * Rebase on latest master branch.
>>  * Correct wrong fix information.
> 
> Please do not put a changelog after the fixes and signoff tags, those tags must
> appear last in the commit message.
> 
> Thank you.
> 

Okay, I will resubmit another commit, thanks.

^ permalink raw reply

* Re: [PATCH v3 iproute2 net-next] erspan: add erspan version II support
From: David Ahern @ 2017-12-20  1:28 UTC (permalink / raw)
  To: William Tu, netdev
In-Reply-To: <1513732121-115044-1-git-send-email-u9012063@gmail.com>

Hi William:

On 12/19/17 6:08 PM, William Tu wrote:
> @@ -343,6 +355,26 @@ get_failed:
>  				invarg("invalid erspan index\n", *argv);
>  			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
>  				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
> +		} else if (strcmp(*argv, "erspan_ver") == 0) {
> +			NEXT_ARG();
> +			if (get_u8(&erspan_ver, *argv, 0))
> +				invarg("invalid erspan version\n", *argv);
> +			if (erspan_ver != 1 && erspan_ver != 2)
> +				invarg("erspan version must be 1 or 2\n", *argv);
> +		} else if (strcmp(*argv, "erspan_dir") == 0) {
> +			NEXT_ARG();
> +			if (strcmp(*argv, "ingress") == 0 ||
> +			    strcmp(*argv, "in") == 0)
> +				erspan_dir = 0;
> +			else if (strcmp(*argv, "egress") == 0 ||
> +				 strcmp(*argv, "e") == 0)

iproute2 has a matches() function that should be used -- it basically
allows whatever shorthand notation matches -- in this case e, eg, egr,
egres, egress all match. Checkout ip/iplink.c and search for matches.

^ permalink raw reply

* Re: [PATCH v2,net-next] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-20  1:27 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <20171219.103459.1958757813714459905.davem@davemloft.net>



> On 2017年12月19日, at 下午11:34, David Miller <davem@davemloft.net> wrote:
> 
> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> Date: Sat, 16 Dec 2017 10:25:25 +0800
> 
>> pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
>> the right place.
>> 
>> Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
>> Acked-by: William Tu <u9012063@gmail.com>
>> Cc: William Tu <u9012063@gmail.com>
>> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> This patch does not apply:
> 
>> +	ipv6h = ipv6_hdr(skb);
>> +	ershdr = (struct erspan_base_hdr *)skb->data;
>> 	ver = (ntohs(ershdr->ver_vlan) & VER_MASK) >> VER_OFFSET;
>> 	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
>> 	pkt_md = (struct erspan_metadata *)(ershdr + 1);
> 
> There is not "pkt_md = ..." assignment in net-next on this line.
> 

Okay, I will fix it and resubmit another commit, thanks.

^ permalink raw reply

* Re: [PATCH net-next] netdevsim: correctly check return value of debugfs_create_dir
From: Jakub Kicinski @ 2017-12-20  1:18 UTC (permalink / raw)
  To: Prashant Bhole; +Cc: 'David Miller', netdev
In-Reply-To: <024e01d3792d$291ef420$7b5cdc60$@lab.ntt.co.jp>

On Wed, 20 Dec 2017 09:54:59 +0900, Prashant Bhole wrote:
> > Ah, I would just error out in case we can't create any of the  
> > sub-directories as well.  
> 
> Does that mean fatal error if we can't create any of the subdirectories?

Yes.

^ permalink raw reply

* Re: [PATCH bpf 03/11] bpf: Add write access to tcp_sock and sock fields
From: Alexei Starovoitov @ 2017-12-20  1:10 UTC (permalink / raw)
  To: Lawrence Brakmo, netdev; +Cc: Kernel Team, Blake Matheny, Daniel Borkmann
In-Reply-To: <20171219062200.372711-4-brakmo@fb.com>

On 12/18/17 10:21 PM, Lawrence Brakmo wrote:
> +#define SOCK_OPS_SET_FIELD(FIELD_NAME, OBJ)				      \
> +	do {								      \
> +		int reg = BPF_REG_9;					      \
> +		BUILD_BUG_ON(FIELD_SIZEOF(OBJ, FIELD_NAME) >		      \
> +			     FIELD_SIZEOF(struct bpf_sock_ops, FIELD_NAME));  \
> +		while (si->dst_reg == reg || si->src_reg == reg)	      \
> +			reg--;						      \
> +		*insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg,		      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +					       temp));			      \
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
> +						struct bpf_sock_ops_kern,     \
> +						is_fullsock),		      \
> +				      reg, si->dst_reg,			      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +					       is_fullsock));		      \
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2);		      \
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
> +						struct bpf_sock_ops_kern, sk),\
> +				      reg, si->dst_reg,			      \
> +				      offsetof(struct bpf_sock_ops_kern, sk));\
> +		*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, FIELD_NAME),      \
> +				      reg, si->src_reg,			      \
> +				      offsetof(OBJ, FIELD_NAME));	      \
> +		*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg,		      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +					       temp));			      \
> +	} while (0)

that's neat. I like it.
I guess the prog can check is_fullsock on its own to see whether writes
will fail or not, so JEQ above is ok.
Only while() loop looks a bit scary.
May be replace with two 'if' ?
if (si->dst_reg == reg || si->src_reg == reg)
   reg --;
if (si->dst_reg == reg || si->src_reg == reg)
   reg --;
so it's clear that tmp reg will be reg_7, 8 or 9.

^ permalink raw reply

* [PATCH v3 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20  1:08 UTC (permalink / raw)
  To: netdev; +Cc: dsahern

The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation.  Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph.  Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
---
change in v3:
  - change erspan_dir 0/1 to "in[gress]/e[gress]"
  - update manpage
change in v2:
  - fix typo ETH_P_ERSPAN2
  - fix space and indent
---
 include/uapi/linux/if_ether.h  |  1 +
 include/uapi/linux/if_tunnel.h |  3 ++
 ip/link_gre.c                  | 68 +++++++++++++++++++++++++++++--
 ip/link_gre6.c                 | 69 +++++++++++++++++++++++++++++--
 man/man8/ip-link.8.in          | 92 ++++++++++++++++++++++++++++++++++++------
 5 files changed, 214 insertions(+), 19 deletions(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 2eb529a90250..133567bf2e04 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -47,6 +47,7 @@
 #define ETH_P_PUP	0x0200		/* Xerox PUP packet		*/
 #define ETH_P_PUPAT	0x0201		/* Xerox PUP Addr Trans packet	*/
 #define ETH_P_TSN	0x22F0		/* TSN (IEEE 1722) packet	*/
+#define ETH_P_ERSPAN2	0x22EB		/* ERSPAN version 2 (type III)	*/
 #define ETH_P_IP	0x0800		/* Internet Protocol packet	*/
 #define ETH_P_X25	0x0805		/* CCITT X.25			*/
 #define ETH_P_ARP	0x0806		/* Address Resolution packet	*/
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 38cdf90692f8..ecdc76669cfd 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -137,6 +137,9 @@ enum {
 	IFLA_GRE_IGNORE_DF,
 	IFLA_GRE_FWMARK,
 	IFLA_GRE_ERSPAN_INDEX,
+	IFLA_GRE_ERSPAN_VER,
+	IFLA_GRE_ERSPAN_DIR,
+	IFLA_GRE_ERSPAN_HWID,
 	__IFLA_GRE_MAX,
 };
 
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 43cb1af6196a..27c03121f7e3 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -98,6 +98,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 ignore_df = 0;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -179,6 +182,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -343,6 +355,26 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "ingress") == 0 ||
+			    strcmp(*argv, "in") == 0)
+				erspan_dir = 0;
+			else if (strcmp(*argv, "egress") == 0 ||
+				 strcmp(*argv, "e") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -374,8 +406,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
 		addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else if (erspan_ver == 2) {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 	} else {
 		addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
 	}
@@ -514,7 +553,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
 
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 2cb46ca116d0..de6a38d50cd1 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -109,6 +109,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	int len;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -191,6 +194,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -389,6 +401,26 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "ingress") == 0 ||
+			    strcmp(*argv, "in") == 0)
+				erspan_dir = 0;
+			else if (strcmp(*argv, "egress") == 0 ||
+				 strcmp(*argv, "e") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -408,9 +440,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
 		addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
-
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 		addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_SPORT, htons(encapsport));
@@ -587,7 +625,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 9e9a5f0d2cef..2b051ed7b5a0 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -665,13 +665,13 @@ keyword.
 .in -8
 
 .TP
-GRE, IPIP, SIT, ERSPAN Type Support
+GRE, IPIP, SIT Type Support
 For a link of types
-.I GRE/IPIP/SIT/ERSPAN
+.I GRE/IPIP/SIT
 the following additional arguments are supported:
 
 .BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " | " erspan " }"
+.BR type " { " gre " | " ipip " | " sit " }"
 .BI " remote " ADDR " local " ADDR
 [
 .BR encap " { " fou " | " gue " | " none " }"
@@ -685,8 +685,6 @@ the following additional arguments are supported:
 .I " [no]encap-remcsum "
 ] [
 .I " mode " { ip6ip | ipip | mplsip | any } "
-] [
-.BR erspan " \fIIDX "
 ]
 
 .in +8
@@ -731,13 +729,6 @@ MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Supported for
 SIT where the default is "ip6ip" and IPIP where the default is "ipip".
 IPv6-Over-IPv4 is not supported for IPIP.
 
-.sp
-.BR erspan " \fIIDX "
-- specifies the ERSPAN index field.
-.IR IDX
-indicates a 20 bit index/port number associated with the ERSPAN
-traffic's source port and direction.
-
 .in -8
 
 .TP
@@ -883,6 +874,76 @@ the following additional arguments are supported:
 - specifies the mode (datagram or connected) to use.
 
 .TP
+ERSPAN Type Support
+For a link of type
+.I ERSPAN/IP6ERSPAN
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " erspan " | " ip6erspan " }"
+.BI remote " ADDR " local " ADDR " seq
+.RB key
+.I KEY
+.BR erspan_ver " \fIversion "
+[
+.BR erspan " \fIIDX "
+] [
+.BR erspan_dir " { " \fIin[gress] " | " \fIe[gress] " }"
+] [
+.BR erspan_hwid " \fIhwid "
+] [
+.RB external
+]
+
+.in +8
+.sp
+.BI  remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI  local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.BR erspan_ver " \fIversion "
+- specifies the ERSPAN version number.
+.IR version
+indicates the ERSPAN version to be created: 1 for version 1 (type II)
+or 2 for version 2 (type III).
+
+.sp
+.BR erspan " \fIIDX "
+- specifies the ERSPAN v1 index field.
+.IR IDX
+indicates a 20 bit index/port number associated with the ERSPAN
+traffic's source port and direction.
+
+.sp
+.BR erspan_dir " { " \fIin[gress] " | " \fIe[gress] " }"
+- specifies the ERSPAN v2 mirrored traffic's direction.
+
+.sp
+.BR erspan_hwid " \fIhwid "
+- an unique identifier of an ERSPAN v2 engine within a system.
+.IR hwid
+is a 6-bit value for users to configure.
+
+.sp
+.BR external
+- make this tunnel externally controlled (or not, which is the default).
+In the kernel, this is referred to as collect metadata mode.  This flag is
+mutually exclusive with the
+.BR remote ,
+.BR local ,
+.BR erspan_ver ,
+.BR erspan ,
+.BR erspan_dir " and " erspan_hwid
+options.
+
+.in -8
+
+.TP
 GENEVE Type Support
 For a link of type
 .I GENEVE
@@ -2062,6 +2123,13 @@ ip link add link wpan0 lowpan0 type lowpan
 Creates a 6LoWPAN interface named lowpan0 on the underlying
 IEEE 802.15.4 device wpan0.
 .RE
+.PP
+ip link add dev ip6erspan11 type ip6erspan seq key 102
+local fc00:100::2 remote fc00:100::1
+erspan_ver 2 erspan_dir ingress erspan_hwid 17
+.RS 4
+Creates a IP6ERSPAN version 2 interface named ip6erspan00.
+.RE
 
 .SH SEE ALSO
 .br
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH -tip v3 3/6] net: sctp: Add SCTP ACK tracking trace event
From: Masami Hiramatsu @ 2017-12-20  1:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Ian McDonald, Vlad Yasevich, Stephen Hemminger,
	Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
	Gerrit Renker, David S . Miller, Neil Horman, dccp, netdev,
	linux-sctp, Stephen Rothwell
In-Reply-To: <20171219102024.09a92c75@gandalf.local.home>

On Tue, 19 Dec 2017 10:20:24 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 19 Dec 2017 17:58:25 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > +TRACE_EVENT(sctp_probe,
> > +
> > +	TP_PROTO(const struct sctp_endpoint *ep,
> > +		 const struct sctp_association *asoc,
> > +		 struct sctp_chunk *chunk),
> > +
> > +	TP_ARGS(ep, asoc, chunk),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(__u64, asoc)
> > +		__field(__u32, mark)
> > +		__field(__u16, bind_port)
> > +		__field(__u16, peer_port)
> > +		__field(__u32, pathmtu)
> > +		__field(__u32, rwnd)
> > +		__field(__u16, unack_data)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		struct sctp_transport *sp;
> > +		struct sk_buff *skb = chunk->skb;
> > +
> > +		__entry->asoc = (__u64)asoc;
> > +		__entry->mark = skb->mark;
> > +		__entry->bind_port = ep->base.bind_addr.port;
> > +		__entry->peer_port = asoc->peer.port;
> > +		__entry->pathmtu = asoc->pathmtu;
> > +		__entry->rwnd = asoc->peer.rwnd;
> > +		__entry->unack_data = asoc->unack_data;
> > +
> > +		if (trace_sctp_probe_path_enabled()) {
> > +			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
> > +					    transports) {
> > +				trace_sctp_probe_path(sp, asoc);
> > +			}
> > +		}
> 
> I thought you were going to move this into the code, like I suggested?

Ah, I missed to define sp in the block...

Thanks,

> 
> -- Steve
> 
> > +	),
> > +
> > +	TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
> > +		  "rwnd=%u unack_data=%d",
> > +		  __entry->asoc, __entry->mark, __entry->bind_port,
> > +		  __entry->peer_port, __entry->pathmtu, __entry->rwnd,
> > +		  __entry->unack_data)
> > +);
> > +


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* RE: [PATCH net-next] netdevsim: correctly check return value of debugfs_create_dir
From: Prashant Bhole @ 2017-12-20  0:54 UTC (permalink / raw)
  To: 'Jakub Kicinski'; +Cc: 'David Miller', netdev
In-Reply-To: <20171219164523.60ac1308@cakuba.netronome.com>


> From: Jakub Kicinski [mailto:jakub.kicinski@netronome.com]
> 
> On Wed, 20 Dec 2017 09:38:52 +0900, Prashant Bhole wrote:
> > > > 2) In case sim0 or bpf_bound_progs are  fail to create, we need to
> > > > add checks before creating any file in them.
> > >
> > > What do you mean by "check before"?  Checking if creation of each
> > > file fails or not, or something different?
> >
> > For example:
> > I will check if state->ddir is not NULL before creating files in it.
> >
> > if (state->ddir) {
> > 	debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
> > 	debugfs_create_file("state", 0400, state->ddir,
> > 			    &state->state, &nsim_bpf_string_fops);
> > 	debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
> > }
> 
> Ah, I would just error out in case we can't create any of the
sub-directories as
> well.

Does that mean fatal error if we can't create any of the subdirectories?
Or
Similar check as mentioned above before creating subdirectories? (I was
about to do this)

-Prashant

^ permalink raw reply

* Re: [PATCH bpf 03/11] bpf: Add write access to tcp_sock and sock fields
From: Daniel Borkmann @ 2017-12-20  0:51 UTC (permalink / raw)
  To: Lawrence Brakmo, netdev; +Cc: Kernel Team, Blake Matheny, Alexei Starovoitov
In-Reply-To: <20171219062200.372711-4-brakmo@fb.com>

On 12/19/2017 07:21 AM, Lawrence Brakmo wrote:
> This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
> struct tcp_sock or struct sock fields. This required adding a new
> field "temp" to struct bpf_sock_ops_kern for temporary storage that
> is used by sock_ops_convert_ctx_access. It is used to store and recover
> the contents of a register, so the register can be used to store the
> address of the sk. Since we cannot overwrite the dst_reg because it
> contains the pointer to ctx, nor the src_reg since it contains the value
> we want to store, we need an extra register to contain the address
> of the sk.
> 
> Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
> GET or SET macros depending on the value of the TYPE field.
> 
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
> ---
>  include/linux/filter.h |  3 +++
>  include/net/tcp.h      |  2 +-
>  net/core/filter.c      | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 5feb441..8929162 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -987,6 +987,9 @@ struct bpf_sock_ops_kern {
>  		u32 replylong[4];
>  	};
>  	u32	is_fullsock;
> +	u64	temp;			/* Used by sock_ops_convert_ctx_access
> +					 * as temporary storaage of a register
> +					 */
>  };
>  
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 6cc205c..e0213f1 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2011,7 +2011,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
>  	struct bpf_sock_ops_kern sock_ops;
>  	int ret;
>  
> -	memset(&sock_ops, 0, sizeof(sock_ops));
> +	memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, is_fullsock));

I don't think this is correct. sock_ops is on stack, so above you only
zero up to the offset of is_fullsock, but not including it, so when
you have !sk_fullsock(sk), then your BPF prog will still act as if the
sock_ops.is_fullsock was set in case prior stack garbage said so.

>  	if (sk_fullsock(sk)) {
>  		sock_ops.is_fullsock = 1;
>  		sock_owned_by_me(sk);

Thanks,
Daniel

^ permalink raw reply

* Re: [v2 PATCH -tip 3/6] net: sctp: Add SCTP ACK tracking trace event
From: kbuild test robot @ 2017-12-20  0:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: kbuild-all, Ingo Molnar, Ian McDonald, Vlad Yasevich,
	Stephen Hemminger, Steven Rostedt, Peter Zijlstra,
	Thomas Gleixner, LKML, H . Peter Anvin, Gerrit Renker,
	David S . Miller, Neil Horman, dccp, netdev, linux-sctp,
	Stephen Rothwell, mhiramat
In-Reply-To: <151358473510.28850.10475072993963389604.stgit@devbox>

[-- Attachment #1: Type: text/plain, Size: 7154 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[also build test WARNING on v4.15-rc4 next-20171219]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/net-tcp-sctp-dccp-Replace-jprobe-usage-with-trace-events/20171220-081035
config: i386-randconfig-x011-201751 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/trace/define_trace.h:96:0,
                    from include/trace/events/sctp.h:96,
                    from net//sctp/sm_statefuns.c:63:
   include/trace/events/sctp.h: In function 'trace_event_raw_event_sctp_probe_path':
>> include/trace/events/sctp.h:31:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      __entry->asoc = (__u64)asoc;
                      ^
   include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS'
     { assign; }       \
       ^~~~~~
   include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS'
            PARAMS(assign),         \
            ^~~~~~
>> include/trace/events/sctp.h:11:1: note: in expansion of macro 'TRACE_EVENT'
    TRACE_EVENT(sctp_probe_path,
    ^~~~~~~~~~~
>> include/trace/events/sctp.h:30:2: note: in expansion of macro 'TP_fast_assign'
     TP_fast_assign(
     ^~~~~~~~~~~~~~
   include/trace/events/sctp.h: In function 'trace_event_raw_event_sctp_probe':
   include/trace/events/sctp.h:72:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      __entry->asoc = (__u64)asoc;
                      ^
   include/trace/trace_events.h:719:4: note: in definition of macro 'DECLARE_EVENT_CLASS'
     { assign; }       \
       ^~~~~~
   include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS'
            PARAMS(assign),         \
            ^~~~~~
   include/trace/events/sctp.h:50:1: note: in expansion of macro 'TRACE_EVENT'
    TRACE_EVENT(sctp_probe,
    ^~~~~~~~~~~
   include/trace/events/sctp.h:68:2: note: in expansion of macro 'TP_fast_assign'
     TP_fast_assign(
     ^~~~~~~~~~~~~~
   In file included from include/trace/define_trace.h:97:0,
                    from include/trace/events/sctp.h:96,
                    from net//sctp/sm_statefuns.c:63:
   include/trace/events/sctp.h: In function 'perf_trace_sctp_probe_path':
>> include/trace/events/sctp.h:31:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      __entry->asoc = (__u64)asoc;
                      ^
   include/trace/perf.h:66:4: note: in definition of macro 'DECLARE_EVENT_CLASS'
     { assign; }       \
       ^~~~~~
   include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS'
            PARAMS(assign),         \
            ^~~~~~
>> include/trace/events/sctp.h:11:1: note: in expansion of macro 'TRACE_EVENT'
    TRACE_EVENT(sctp_probe_path,
    ^~~~~~~~~~~
>> include/trace/events/sctp.h:30:2: note: in expansion of macro 'TP_fast_assign'
     TP_fast_assign(
     ^~~~~~~~~~~~~~
   include/trace/events/sctp.h: In function 'perf_trace_sctp_probe':
   include/trace/events/sctp.h:72:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      __entry->asoc = (__u64)asoc;
                      ^
   include/trace/perf.h:66:4: note: in definition of macro 'DECLARE_EVENT_CLASS'
     { assign; }       \
       ^~~~~~
   include/trace/trace_events.h:78:9: note: in expansion of macro 'PARAMS'
            PARAMS(assign),         \
            ^~~~~~
   include/trace/events/sctp.h:50:1: note: in expansion of macro 'TRACE_EVENT'
    TRACE_EVENT(sctp_probe,
    ^~~~~~~~~~~
   include/trace/events/sctp.h:68:2: note: in expansion of macro 'TP_fast_assign'
     TP_fast_assign(
     ^~~~~~~~~~~~~~

vim +31 include/trace/events/sctp.h

    10	
  > 11	TRACE_EVENT(sctp_probe_path,
    12	
    13		TP_PROTO(struct sctp_transport *sp,
    14			 const struct sctp_association *asoc),
    15	
    16		TP_ARGS(sp, asoc),
    17	
    18		TP_STRUCT__entry(
    19			__field(__u64, asoc)
    20			__field(__u32, primary)
    21			__array(__u8, ipaddr, sizeof(union sctp_addr))
    22			__field(__u32, state)
    23			__field(__u32, cwnd)
    24			__field(__u32, ssthresh)
    25			__field(__u32, flight_size)
    26			__field(__u32, partial_bytes_acked)
    27			__field(__u32, pathmtu)
    28		),
    29	
  > 30		TP_fast_assign(
  > 31			__entry->asoc = (__u64)asoc;
    32			__entry->primary = (sp == asoc->peer.primary_path);
    33			memcpy(__entry->ipaddr, &sp->ipaddr, sizeof(union sctp_addr));
    34			__entry->state = sp->state;
    35			__entry->cwnd = sp->cwnd;
    36			__entry->ssthresh = sp->ssthresh;
    37			__entry->flight_size = sp->flight_size;
    38			__entry->partial_bytes_acked = sp->partial_bytes_acked;
    39			__entry->pathmtu = sp->pathmtu;
    40		),
    41	
    42		TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u "
    43			  "flight_size=%u partial_bytes_acked=%u pathmtu=%u",
    44			  __entry->asoc, __entry->primary ? "(*)" : "",
    45			  __entry->ipaddr, __entry->state, __entry->cwnd,
    46			  __entry->ssthresh, __entry->flight_size,
    47			  __entry->partial_bytes_acked, __entry->pathmtu)
    48	);
    49	
    50	TRACE_EVENT(sctp_probe,
    51	
    52		TP_PROTO(const struct sctp_endpoint *ep,
    53			 const struct sctp_association *asoc,
    54			 struct sctp_chunk *chunk),
    55	
    56		TP_ARGS(ep, asoc, chunk),
    57	
    58		TP_STRUCT__entry(
    59			__field(__u64, asoc)
    60			__field(__u32, mark)
    61			__field(__u16, bind_port)
    62			__field(__u16, peer_port)
    63			__field(__u32, pathmtu)
    64			__field(__u32, rwnd)
    65			__field(__u16, unack_data)
    66		),
    67	
    68		TP_fast_assign(
    69			struct sctp_transport *sp;
    70			struct sk_buff *skb = chunk->skb;
    71	
    72			__entry->asoc = (__u64)asoc;
    73			__entry->mark = skb->mark;
    74			__entry->bind_port = ep->base.bind_addr.port;
    75			__entry->peer_port = asoc->peer.port;
    76			__entry->pathmtu = asoc->pathmtu;
    77			__entry->rwnd = asoc->peer.rwnd;
    78			__entry->unack_data = asoc->unack_data;
    79	
    80			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
    81					    transports) {
    82				trace_sctp_probe_path(sp, asoc);
    83			}
    84		),
    85	
    86		TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
    87			  "rwnd=%u unack_data=%d",
    88			  __entry->asoc, __entry->mark, __entry->bind_port,
    89			  __entry->peer_port, __entry->pathmtu, __entry->rwnd,
    90			  __entry->unack_data)
    91	);
    92	
    93	#endif /* _TRACE_SCTP_H */
    94	
    95	/* This part must be outside protection */
  > 96	#include <trace/define_trace.h>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26730 bytes --]

^ permalink raw reply

* Re: [PATCH net] enic: add wq clean up budget
From: Govindarajulu Varadarajan @ 2017-12-20  0:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, govindarajulu90, benve
In-Reply-To: <alpine.LNX.2.20.1712081528060.28747@cae-iprp-alln-lb.cisco.com>

On Fri, 8 Dec 2017, Govindarajulu Varadarajan wrote:

> On Wed, 6 Dec 2017, David Miller wrote:
>
>> From: Govindarajulu Varadarajan <gvaradar@cisco.com>
>> Date: Tue,  5 Dec 2017 11:14:41 -0800
>> 
>>> In case of tx clean up, we set '-1' as budget. This means clean up until
>>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>>> will run for long time and cause
>>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>> 
>>> This patch sets wq clean up budget to 256.
>>> 
>>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
>> 
>> This driver with all of it's indirection and layers upon layers of
>> macros for queue processing is so difficult to read, and this can't
>> be generating nice optimal code either...
>> 
>> Anyways, I was walking over the driver to see if the logic is
>> contributing to this.
>> 
>> The limit you are proposing sounds unnecessary, nobody else I can
>> see needs this, and that includes all of the most heavily used
>> drivers under load.
>
> I used 256 as the limit because most of the other drivers use it.
>
> * mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in 
> mlx4_en_process_tx_cq()
>  Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi 
> polling")
>
> * i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
>  set to I40E_DEFAULT_IRQ_WORK. Added in commit
>  a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")
>
> * ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
>  which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
>  592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")
>
>> 
>> If I had to guess I'd say that the problem is that the queue loop
>> keeps sampling the head and tail pointers, where as it should just
>> do that _once_ and only process that TX entries found in that
>> snapshot and return to the poll() routine immedately afterwards.
>
> The only way to know the tail pointer at the time napi is scheduled is to 
> read
> hw fetch_index register. This is discouraged by hw engineers.
>
> We work around this by using color bit. Every cq entry has color bit. It is
> either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
> cq entry will have a different color bit than previous. We reach end of the
> queue when previous color bit is same as current cq entry's color. i.e hw did
> not flip the bit, so its not a new cq entry.
>
> So enic driver cannot know the tail pointer at the time napi is scheduled, 
> until
> we reach the tail pointer.
>

David,

How would you want us to fix this issue? Is doing an ioread on fetch_index for
every poll our only option? (to get head and tail point once)

If 256 is not reasonable, will wq_budget equal to wq ring size be acceptable?
At any point number of wq entries to be cleaned cannot be more than ring size.

Thanks
Govind

^ permalink raw reply

* Re: [PATCH net-next] netdevsim: correctly check return value of debugfs_create_dir
From: Jakub Kicinski @ 2017-12-20  0:45 UTC (permalink / raw)
  To: Prashant Bhole; +Cc: 'David Miller', netdev
In-Reply-To: <024a01d3792a$e92c7ab0$bb857010$@lab.ntt.co.jp>

On Wed, 20 Dec 2017 09:38:52 +0900, Prashant Bhole wrote:
> > > 2) In case sim0 or bpf_bound_progs are  fail to create, we need to add
> > > checks before creating any file in them.  
> > 
> > What do you mean by "check before"?  Checking if creation of each file  
> > fails or not, or something different?  
> 
> For example:
> I will check if state->ddir is not NULL before creating files in it.
> 
> if (state->ddir) {
> 	debugfs_create_u32("id", 0400, state->ddir, &prog->aux->id);
> 	debugfs_create_file("state", 0400, state->ddir,
> 			    &state->state, &nsim_bpf_string_fops);
> 	debugfs_create_bool("loaded", 0400, state->ddir, &state->is_loaded);
> }

Ah, I would just error out in case we can't create any of the
sub-directories as well.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox