Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH v3 03/10] udp: add support for UDP_GRO cmsg
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
datagram size. User-space can use such info to compute the original
packets layout.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: I avoided setting a bit in cmsg_flag for UDP_GRO, as that
attempt produced some uglyfication, expecially on the ipv6 side
with no measurable performances benefits.
---
 include/linux/udp.h | 11 +++++++++++
 net/ipv4/udp.c      |  4 ++++
 net/ipv6/udp.c      |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index f613b329852e..e23d5024f42f 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -121,6 +121,17 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
 	return udp_sk(sk)->no_check6_rx;
 }
 
+static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
+				 struct sk_buff *skb)
+{
+	int gso_size;
+
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
+		gso_size = skb_shinfo(skb)->gso_size;
+		put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size);
+	}
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4d4f4d044c28..b345f71b1cbb 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1714,6 +1714,10 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 		memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
 		*addr_len = sizeof(*sin);
 	}
+
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (inet->cmsg_flags)
 		ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index fc0ce6c59ebb..8e76e719305c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -421,6 +421,9 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		*addr_len = sizeof(*sin6);
 	}
 
+	if (udp_sk(sk)->gro_enabled)
+		udp_cmsg_recv(msg, sk, skb);
+
 	if (np->rxopt.all)
 		ip6_datagram_recv_common_ctl(sk, msg, skb);
 
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 02/10] udp: implement GRO for plain UDP sockets.
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled with setsockopt(UDP_GRO).

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s    24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after a UDP tunnel creation, now the kernel performs a lookup per ingress
UDP packet, while before such lookup happened only if the ingress packet
carried a valid internal header csum.

rfc v2 -> rfc v3:
 - fixed typos in macro name and comments
 - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1
 - acquire socket lock in UDP_GRO setsockopt

rfc v1 -> rfc v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
Note: I opted for acquiring the socket lock only for the newly introduced
setsockopt instead for every value, despite the previous conversation on
this topic, to avoid introducing somewhat larger and unrelated changes.
---
 include/linux/udp.h      |   3 +-
 include/uapi/linux/udp.h |   1 +
 net/ipv4/udp.c           |   8 +++
 net/ipv4/udp_offload.c   | 109 +++++++++++++++++++++++++++++++--------
 net/ipv6/udp_offload.c   |   6 +--
 5 files changed, 99 insertions(+), 28 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index a4dafff407fb..f613b329852e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,11 +50,12 @@ struct udp_sock {
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 	unsigned char	 no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
 			 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
-			 encap_enabled:1; /* This socket enabled encap
+			 encap_enabled:1, /* This socket enabled encap
 					   * processing; UDP tunnels and
 					   * different encapsulation layer set
 					   * this
 					   */
+			 gro_enabled:1;	/* Can accept GRO packets */
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 09502de447f5..30baccb6c9c4 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -33,6 +33,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_TX 101	/* Disable sending checksum for UDP6X */
 #define UDP_NO_CHECK6_RX 102	/* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT	103	/* Set GSO segmentation size */
+#define UDP_GRO		104	/* This socket can receive UDP GRO packets */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE	1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c51721fb293a..4d4f4d044c28 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2474,6 +2474,14 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 		up->gso_size = val;
 		break;
 
+	case UDP_GRO:
+		lock_sock(sk);
+		if (valbool)
+			udp_tunnel_encap_enable(sk->sk_socket);
+		up->gro_enabled = valbool;
+		release_sock(sk);
+		break;
+
 	/*
 	 * 	UDP-Lite's partial checksum coverage (RFC 3828).
 	 */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 802f2bc00d69..0646d61f4fa8 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -343,6 +343,54 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 	return segs;
 }
 
+#define UDP_GRO_CNT_MAX 64
+static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
+					       struct sk_buff *skb)
+{
+	struct udphdr *uh = udp_hdr(skb);
+	struct sk_buff *pp = NULL;
+	struct udphdr *uh2;
+	struct sk_buff *p;
+
+	/* requires non zero csum, for symmetry with GSO */
+	if (!uh->check) {
+		NAPI_GRO_CB(skb)->flush = 1;
+		return NULL;
+	}
+
+	/* pull encapsulating udp header */
+	skb_gro_pull(skb, sizeof(struct udphdr));
+	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+
+	list_for_each_entry(p, head, list) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		uh2 = udp_hdr(p);
+
+		/* Match ports only, as csum is always non zero */
+		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+
+		/* Terminate the flow on len mismatch or if it grow "too much".
+		 * Under small packet flood GRO count could elsewhere grow a lot
+		 * leading to execessive truesize values
+		 */
+		if (!skb_gro_receive(p, skb) &&
+		    NAPI_GRO_CB(p)->count >= UDP_GRO_CNT_MAX)
+			pp = p;
+		else if (uh->len != uh2->len)
+			pp = p;
+
+		return pp;
+	}
+
+	/* mismatch, but we never need to flush */
+	return NULL;
+}
+
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 				struct udphdr *uh, udp_lookup_t lookup)
 {
@@ -353,23 +401,27 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	int flush = 1;
 	struct sock *sk;
 
+	rcu_read_lock();
+	sk = (*lookup)(skb, uh->source, uh->dest);
+	if (!sk)
+		goto out_unlock;
+
+	if (udp_sk(sk)->gro_enabled) {
+		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+		rcu_read_unlock();
+		return pp;
+	}
+
 	if (NAPI_GRO_CB(skb)->encap_mark ||
 	    (skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
-	     !NAPI_GRO_CB(skb)->csum_valid))
-		goto out;
+	     !NAPI_GRO_CB(skb)->csum_valid) ||
+	    !udp_sk(sk)->gro_receive)
+		goto out_unlock;
 
 	/* mark that this skb passed once through the tunnel gro layer */
 	NAPI_GRO_CB(skb)->encap_mark = 1;
 
-	rcu_read_lock();
-	sk = (*lookup)(skb, uh->source, uh->dest);
-
-	if (sk && udp_sk(sk)->gro_receive)
-		goto unflush;
-	goto out_unlock;
-
-unflush:
 	flush = 0;
 
 	list_for_each_entry(p, head, list) {
@@ -394,7 +446,6 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 
 out_unlock:
 	rcu_read_unlock();
-out:
 	skb_gro_flush_final(skb, pp, flush);
 	return pp;
 }
@@ -427,6 +478,19 @@ static struct sk_buff *udp4_gro_receive(struct list_head *head,
 	return NULL;
 }
 
+static int udp_gro_complete_segment(struct sk_buff *skb)
+{
+	struct udphdr *uh = udp_hdr(skb);
+
+	skb->csum_start = (unsigned char *)uh - skb->head;
+	skb->csum_offset = offsetof(struct udphdr, check);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+
+	skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+	skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_L4;
+	return 0;
+}
+
 int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		     udp_lookup_t lookup)
 {
@@ -437,16 +501,21 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 
 	uh->len = newlen;
 
-	/* Set encapsulation before calling into inner gro_complete() functions
-	 * to make them set up the inner offsets.
-	 */
-	skb->encapsulation = 1;
-
 	rcu_read_lock();
 	sk = (*lookup)(skb, uh->source, uh->dest);
-	if (sk && udp_sk(sk)->gro_complete)
+	if (sk && udp_sk(sk)->gro_enabled) {
+		err = udp_gro_complete_segment(skb);
+	} else if (sk && udp_sk(sk)->gro_complete) {
+		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
+					: SKB_GSO_UDP_TUNNEL;
+
+		/* Set encapsulation before calling into inner gro_complete()
+		 * functions to make them set up the inner offsets.
+		 */
+		skb->encapsulation = 1;
 		err = udp_sk(sk)->gro_complete(sk, skb,
 				nhoff + sizeof(struct udphdr));
+	}
 	rcu_read_unlock();
 
 	if (skb->remcsum_offload)
@@ -461,13 +530,9 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (uh->check) {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	if (uh->check)
 		uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
 					  iph->daddr, 0);
-	} else {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
-	}
 
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 1b8e161ac527..828b2457f97b 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -147,13 +147,9 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (uh->check) {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+	if (uh->check)
 		uh->check = ~udp_v6_check(skb->len - nhoff, &ipv6h->saddr,
 					  &ipv6h->daddr, 0);
-	} else {
-		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
-	}
 
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 01/10] udp: implement complete book-keeping for encap_needed
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <cover.1540920083.git.pabeni@redhat.com>

The *encap_needed static keys are enabled by UDP tunnels
and several UDP encapsulations type, but they are never
turned off. This can cause unneeded overall performance
degradation for systems where such features are used
transiently.

This patch introduces complete book-keeping for such keys,
decreasing the usage at socket destruction time, if needed,
and avoiding that the same socket could increase the key
usage multiple times.

rfc v2 - rfc v3:
 - use udp_tunnel_encap_enable() in setsockopt()

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h      |  7 ++++++-
 include/net/udp_tunnel.h |  6 ++++++
 net/ipv4/udp.c           | 17 +++++++++++------
 net/ipv6/udp.c           | 14 +++++++++-----
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 320d49d85484..a4dafff407fb 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -49,7 +49,12 @@ struct udp_sock {
 	unsigned int	 corkflag;	/* Cork is required */
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 	unsigned char	 no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
-			 no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
+			 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
+			 encap_enabled:1; /* This socket enabled encap
+					   * processing; UDP tunnels and
+					   * different encapsulation layer set
+					   * this
+					   */
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index fe680ab6b15a..3fbe56430e3b 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -165,6 +165,12 @@ static inline int udp_tunnel_handle_offloads(struct sk_buff *skb, bool udp_csum)
 
 static inline void udp_tunnel_encap_enable(struct socket *sock)
 {
+	struct udp_sock *up = udp_sk(sock->sk);
+
+	if (up->encap_enabled)
+		return;
+
+	up->encap_enabled = 1;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (sock->sk->sk_family == PF_INET6)
 		ipv6_stub->udpv6_encap_enable();
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca3ed931f2a9..c51721fb293a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -115,6 +115,7 @@
 #include "udp_impl.h"
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
+#include <net/udp_tunnel.h>
 
 struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
@@ -2398,11 +2399,15 @@ void udp_destroy_sock(struct sock *sk)
 	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
 	unlock_sock_fast(sk, slow);
-	if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
-		void (*encap_destroy)(struct sock *sk);
-		encap_destroy = READ_ONCE(up->encap_destroy);
-		if (encap_destroy)
-			encap_destroy(sk);
+	if (static_branch_unlikely(&udp_encap_needed_key)) {
+		if (up->encap_type) {
+			void (*encap_destroy)(struct sock *sk);
+			encap_destroy = READ_ONCE(up->encap_destroy);
+			if (encap_destroy)
+				encap_destroy(sk);
+		}
+		if (up->encap_enabled)
+			static_branch_disable(&udp_encap_needed_key);
 	}
 }
 
@@ -2447,7 +2452,7 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 			/* FALLTHROUGH */
 		case UDP_ENCAP_L2TPINUDP:
 			up->encap_type = val;
-			udp_encap_enable();
+			udp_tunnel_encap_enable(sk->sk_socket);
 			break;
 		default:
 			err = -ENOPROTOOPT;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d2d97d07ef27..fc0ce6c59ebb 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1458,11 +1458,15 @@ void udpv6_destroy_sock(struct sock *sk)
 	udp_v6_flush_pending_frames(sk);
 	release_sock(sk);
 
-	if (static_branch_unlikely(&udpv6_encap_needed_key) && up->encap_type) {
-		void (*encap_destroy)(struct sock *sk);
-		encap_destroy = READ_ONCE(up->encap_destroy);
-		if (encap_destroy)
-			encap_destroy(sk);
+	if (static_branch_unlikely(&udpv6_encap_needed_key)) {
+		if (up->encap_type) {
+			void (*encap_destroy)(struct sock *sk);
+			encap_destroy = READ_ONCE(up->encap_destroy);
+			if (encap_destroy)
+				encap_destroy(sk);
+		}
+		if (up->encap_enabled)
+			static_branch_disable(&udpv6_encap_needed_key);
 	}
 
 	inet6_destroy_sock(sk);
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v3 00/10] udp: implement GRO support
From: Paolo Abeni @ 2018-10-30 17:24 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan

This series implements GRO support for UDP sockets, as the RX counterpart
of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
The core functionality is implemented by the second patch, introducing a new
sockopt to enable UDP_GRO, while patch 3 implements support for passing the
segment size to the user space via a new cmsg.
UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
directed to UDP GRO enabled sockets with constant l4 tuple.

UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
rules, and that could potentially confuse existing applications.

The solution adopted here is to de-segment the GRO packet before enqueuing
as needed. Since we must cope with packet reinsertion after de-segmentation,
the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
to UDP usage.

While the current code can probably be improved, this safeguard ,implemented in
the patches 4-7, allows future enachements to enable UDP GSO offload on more
virtual devices eventually even on forwarded packets.

The last 4 for patches implement some performance and functional self-tests,
re-using the existing udpgso infrastructure. The problematic scenario described
above is explicitly tested.

This revision of the series try to address the feedback provided by Willem,
Steffen and Subash fixing several bugs all along

rfc v2 - rfc v3:
 - cope better with exceptional conditions
 - test cases cleanup

rfc v1 - rfc v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup
 - cope with UDP GRO misdirection
 - add self-tests

Paolo Abeni (10):
  udp: implement complete book-keeping for encap_needed
  udp: implement GRO for plain UDP sockets.
  udp: add support for UDP_GRO cmsg
  ip: factor out protocol delivery helper
  ipv6: factor out protocol delivery helper
  udp: cope with UDP GRO packet misdirection
  selftests: add GRO support to udp bench rx program
  selftests: conditionally enable XDP support in udpgso_bench_rx
  selftests: add some benchmark for UDP GRO
  selftests: add functionals test for UDP GRO

 include/linux/udp.h                           |  25 ++-
 include/net/udp.h                             |  51 ++++-
 include/net/udp_tunnel.h                      |   6 +
 include/uapi/linux/udp.h                      |   1 +
 net/ipv4/ip_input.c                           |  73 ++++---
 net/ipv4/udp.c                                |  54 ++++-
 net/ipv4/udp_offload.c                        | 109 ++++++++--
 net/ipv6/ip6_input.c                          |  28 +--
 net/ipv6/udp.c                                |  44 +++-
 net/ipv6/udp_offload.c                        |   6 +-
 tools/testing/selftests/net/Makefile          |  70 +++++++
 tools/testing/selftests/net/udpgro.sh         | 147 +++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |  94 +++++++++
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 193 ++++++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 +-
 tools/testing/selftests/net/xdp_dummy.c       |  13 ++
 17 files changed, 816 insertions(+), 122 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh
 create mode 100644 tools/testing/selftests/net/xdp_dummy.c

-- 
2.17.2

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: David Ahern @ 2018-10-30 16:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stefano Brivio, Yoann P., netdev
In-Reply-To: <20181030093842.0e174ea6@xeon-e3>

On 10/30/18 10:38 AM, Stephen Hemminger wrote:
> On Tue, 30 Oct 2018 10:34:45 -0600
> David Ahern <dsahern@gmail.com> wrote:
> 
>> On 10/30/18 9:05 AM, Stefano Brivio wrote:
>>> Now that we have an abstraction for columns, it's relatively easy to
>>> selectively display only some of them, and Yoann has a use case for it.
>>>
>>> Patch 1/3 fixes a rendering issue that shows up only when display of
>>> arbitrary columns is disabled. Patch 2/3 implements the relevant option,
>>> and patch 3/3 makes the output more readable when some columns are
>>> disabled.
>>>
>>>  
>>
>> I like the intent, and I have prototyped something similar for 'ip'.
>>
>> A more flexible approach is to use format strings to allow users to
>> customize the output order and whitespace as well. So for ss and your
>> column list (winging it here):
>>
>>     netid          = %N
>>     state          = %S
>>     recv Q         = %Qr
>>     send Q         = %Qs
>>     local address  = %Al
>>     lport port     = %Pl
>>     remote address = %Ar
>>     remote port    = %Pr
>>     process data   = %p
>>     ...
>>
>> then a format string could be: "%S  %Qr %Qs  %Al:%Pl %Ar:%Pr  %p\n"
>>
>> or for csv output: "%S,%Qr,%Qs,%Al,%Pl,%Ar,%Pr,%p\n"
>>
>> I have not had time to look into an implementation for ip. Conceptually
>> - and scanning the kernel's vsprintf code - it does not look that
>> difficult, just time consuming on the frontend with the initial setup.
> 
> The problem with custom formats is that you lose all ability for Gcc
> to check format strings.
> 

Sure, trade-offs. A custom print string is powerful.

While selecting columns is an improvement, column ordering is also
important - even handling other output formats (csv).

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: Stephen Hemminger @ 2018-10-30 16:38 UTC (permalink / raw)
  To: David Ahern; +Cc: Stefano Brivio, Yoann P., netdev
In-Reply-To: <7ffc00c8-bdf6-5c75-564e-2663494bda5d@gmail.com>

On Tue, 30 Oct 2018 10:34:45 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 10/30/18 9:05 AM, Stefano Brivio wrote:
> > Now that we have an abstraction for columns, it's relatively easy to
> > selectively display only some of them, and Yoann has a use case for it.
> > 
> > Patch 1/3 fixes a rendering issue that shows up only when display of
> > arbitrary columns is disabled. Patch 2/3 implements the relevant option,
> > and patch 3/3 makes the output more readable when some columns are
> > disabled.
> > 
> >  
> 
> I like the intent, and I have prototyped something similar for 'ip'.
> 
> A more flexible approach is to use format strings to allow users to
> customize the output order and whitespace as well. So for ss and your
> column list (winging it here):
> 
>     netid          = %N
>     state          = %S
>     recv Q         = %Qr
>     send Q         = %Qs
>     local address  = %Al
>     lport port     = %Pl
>     remote address = %Ar
>     remote port    = %Pr
>     process data   = %p
>     ...
> 
> then a format string could be: "%S  %Qr %Qs  %Al:%Pl %Ar:%Pr  %p\n"
> 
> or for csv output: "%S,%Qr,%Qs,%Al,%Pl,%Ar,%Pr,%p\n"
> 
> I have not had time to look into an implementation for ip. Conceptually
> - and scanning the kernel's vsprintf code - it does not look that
> difficult, just time consuming on the frontend with the initial setup.

The problem with custom formats is that you lose all ability for Gcc
to check format strings.

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: David Ahern @ 2018-10-30 16:34 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <cover.1540910943.git.sbrivio@redhat.com>

On 10/30/18 9:05 AM, Stefano Brivio wrote:
> Now that we have an abstraction for columns, it's relatively easy to
> selectively display only some of them, and Yoann has a use case for it.
> 
> Patch 1/3 fixes a rendering issue that shows up only when display of
> arbitrary columns is disabled. Patch 2/3 implements the relevant option,
> and patch 3/3 makes the output more readable when some columns are
> disabled.
> 
>

I like the intent, and I have prototyped something similar for 'ip'.

A more flexible approach is to use format strings to allow users to
customize the output order and whitespace as well. So for ss and your
column list (winging it here):

    netid          = %N
    state          = %S
    recv Q         = %Qr
    send Q         = %Qs
    local address  = %Al
    lport port     = %Pl
    remote address = %Ar
    remote port    = %Pr
    process data   = %p
    ...

then a format string could be: "%S  %Qr %Qs  %Al:%Pl %Ar:%Pr  %p\n"

or for csv output: "%S,%Qr,%Qs,%Al,%Pl,%Ar,%Pr,%p\n"

I have not had time to look into an implementation for ip. Conceptually
- and scanning the kernel's vsprintf code - it does not look that
difficult, just time consuming on the frontend with the initial setup.

^ permalink raw reply

* [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
From: Quentin Monnet @ 2018-10-30 15:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: netdev, oss-drivers, Quentin Monnet

The limit for memory locked in the kernel by a process is usually set to
64 bytes by default. This can be an issue when creating large BPF maps.
A workaround is to raise this limit for the current process before
trying to create a new BPF map. Changing the hard limit requires the
CAP_SYS_RESOURCE and can usually only be done by root user (but then
only root can create BPF maps).

As far as I know there is not API to get the current amount of memory
locked for a user, therefore we cannot raise the limit only when
required. One solution, used by bcc, is to try to create the map, and on
getting a EPERM error, raising the limit to infinity before giving
another try. Another approach, used in iproute, is to raise the limit in
all cases, before trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load the map.

I send this patch as a RFC to see if people would prefer the bcc
approach instead, or the rlimit change to be in bpftool rather than in
libbpf.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/lib/bpf/bpf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 03f9bcc4ef50..456a5a7b112c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -26,6 +26,8 @@
 #include <unistd.h>
 #include <asm/unistd.h>
 #include <linux/bpf.h>
+#include <sys/resource.h>
+#include <sys/types.h>
 #include "bpf.h"
 #include "libbpf.h"
 #include <errno.h>
@@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
 int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
 {
 	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
 	union bpf_attr attr;

+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+
 	memset(&attr, '\0', sizeof(attr));

 	attr.map_type = create_attr->map_type;
-- 
2.7.4

^ permalink raw reply related

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Marc Zyngier @ 2018-10-30 15:22 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Marcin Wojtas, Antoine Tenart, Maxime Chevallier,
	linux-arm-kernel, netdev, Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <20181030161007.360d5a53@windsurf>

On 30/10/18 15:10, Thomas Petazzoni wrote:
> Hello,
> 
> On Tue, 30 Oct 2018 14:55:01 +0000, Marc Zyngier wrote:
> 
>>> I.e, isn't the firmware fix papering over a bug that should be fixed in
>>> Linux mvpp2 driver anyway ?  
>>
>> Absolutely. Leaving this unpatched in the kernel, with a 100% chance of
>> memory corruption is just mad.
>>
>> I'm pretty sure there should be a way to sanely reset the interface
>> before it starts repainting the memory.
> 
> I agree here. Do you still have an image of that old firmware version,
> so that we can try to reproduce, and see if we can come up with a way
> to reset the BM on boot up that would avoid this issue ?

Yup. I still have both the original build tree as well as the sdcard, so
you should be able to trigger on demand.

I'll email you the stuff separately, unless you want another delivery
method.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Thomas Petazzoni @ 2018-10-30 15:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marcin Wojtas, Antoine Tenart, Maxime Chevallier,
	linux-arm-kernel, netdev, Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <6bf82e04-5463-aa7d-bbac-f09519ff9815@arm.com>

Hello,

On Tue, 30 Oct 2018 14:55:01 +0000, Marc Zyngier wrote:

> > I.e, isn't the firmware fix papering over a bug that should be fixed in
> > Linux mvpp2 driver anyway ?  
> 
> Absolutely. Leaving this unpatched in the kernel, with a 100% chance of
> memory corruption is just mad.
> 
> I'm pretty sure there should be a way to sanely reset the interface
> before it starts repainting the memory.

I agree here. Do you still have an image of that old firmware version,
so that we can try to reproduce, and see if we can come up with a way
to reset the BM on boot up that would avoid this issue ?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* [PATCH iproute2 net-next 3/3] ss: Beautify output when arbitrary columns are hidden
From: Stefano Brivio @ 2018-10-30 15:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <cover.1540910943.git.sbrivio@redhat.com>

Define a secondary alignment for columns in case the next column is
hidden, this avoids awkward outputs if e.g. the local address is shown,
but not the local port.

Omit embedded delimiter in socket specifiers if the port or service field
is hidden.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 misc/ss.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 91be3c6db151..d489233681e9 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -131,7 +131,8 @@ enum col_align {
 };
 
 struct column {
-	const enum col_align align;
+	enum col_align align;
+	const enum col_align align_without_next;
 	const char *optname;
 	const char *header;
 	const char *ldelim;
@@ -141,15 +142,15 @@ struct column {
 };
 
 static struct column columns[] = {
-	{ ALIGN_LEFT,	"netid",	"Netid",		"",  0, 0, 0 },
-	{ ALIGN_LEFT,	"state",	"State",		" ", 0, 0, 0 },
-	{ ALIGN_LEFT,	"recvq",	"Recv-Q",		" ", 0, 0, 0 },
-	{ ALIGN_LEFT,	"sendq",	"Send-Q",		" ", 0, 0, 0 },
-	{ ALIGN_RIGHT,	"local",	"Local Address:",	" ", 0, 0, 0 },
-	{ ALIGN_LEFT,	"lport",	"Port",			"",  0, 0, 0 },
-	{ ALIGN_RIGHT,	"peer",		"Peer Address:",	" ", 0, 0, 0 },
-	{ ALIGN_LEFT,	"pport",	"Port",			"",  0, 0, 0 },
-	{ ALIGN_LEFT,	"ext",		"",			"",  0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "netid", "Netid",          "",  0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "state", "State",          " ", 0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "recvq", "Recv-Q",         " ", 0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "sendq", "Send-Q",         " ", 0, 0, 0 },
+	{ ALIGN_RIGHT, ALIGN_LEFT, "local", "Local Address:", " ", 0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "lport", "Port",           "",  0, 0, 0 },
+	{ ALIGN_RIGHT, ALIGN_LEFT, "peer",  "Peer Address:",  " ", 0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "pport", "Port",           "",  0, 0, 0 },
+	{ ALIGN_LEFT,  ALIGN_LEFT, "ext",   "",               "",  0, 0, 0 },
 };
 
 static struct column *current_field = columns;
@@ -1374,6 +1375,9 @@ static void sock_details_print(struct sockstat *s)
 static void sock_addr_print(const char *addr, char *delim, const char *port,
 		const char *ifname)
 {
+	if ((current_field + 1)->disabled)
+		delim = "";
+
 	if (ifname)
 		out("%s" "%%" "%s%s", addr, ifname, delim);
 	else
@@ -5006,6 +5010,12 @@ int main(int argc, char *argv[])
 				}
 				p = p1 + 1;
 			} while (p1);
+
+			for (f = columns; field_is_valid(f + 1); f++) {
+				if ((f + 1)->disabled)
+					f->align = f->align_without_next;
+			}
+
 			break;
 		}
 		case 'h':
-- 
2.19.1

^ permalink raw reply related

* [PATCH iproute2 net-next 2/3] ss: Introduce option to display selected columns only
From: Stefano Brivio @ 2018-10-30 15:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <cover.1540910943.git.sbrivio@redhat.com>

The new option --columns (short: -c) allows to select columns to be
displayed. Note that this doesn't affect the order in which columns are
displayed.

Reported-by: Yoann P. <yoann.p.public@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 man/man8/ss.8 |  5 +++++
 misc/ss.c     | 62 ++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 7a6572b17364..c987dec6bcd7 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -24,6 +24,11 @@ Output version information.
 .B \-H, \-\-no-header
 Suppress header line.
 .TP
+.B \-c COLS, \-\-columns=COLS
+Only display selected columns, separated by commas. The following column names
+are understood: netid, state, local, lport, peer, pport, ext. This does not
+define the order of columns.
+.TP
 .B \-n, \-\-numeric
 Do not try to resolve service names.
 .TP
diff --git a/misc/ss.c b/misc/ss.c
index c3f61ef66258..91be3c6db151 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -132,6 +132,7 @@ enum col_align {
 
 struct column {
 	const enum col_align align;
+	const char *optname;
 	const char *header;
 	const char *ldelim;
 	int disabled;
@@ -140,15 +141,15 @@ struct column {
 };
 
 static struct column columns[] = {
-	{ ALIGN_LEFT,	"Netid",		"",	0, 0, 0 },
-	{ ALIGN_LEFT,	"State",		" ",	0, 0, 0 },
-	{ ALIGN_LEFT,	"Recv-Q",		" ",	0, 0, 0 },
-	{ ALIGN_LEFT,	"Send-Q",		" ",	0, 0, 0 },
-	{ ALIGN_RIGHT,	"Local Address:",	" ",	0, 0, 0 },
-	{ ALIGN_LEFT,	"Port",			"",	0, 0, 0 },
-	{ ALIGN_RIGHT,	"Peer Address:",	" ",	0, 0, 0 },
-	{ ALIGN_LEFT,	"Port",			"",	0, 0, 0 },
-	{ ALIGN_LEFT,	"",			"",	0, 0, 0 },
+	{ ALIGN_LEFT,	"netid",	"Netid",		"",  0, 0, 0 },
+	{ ALIGN_LEFT,	"state",	"State",		" ", 0, 0, 0 },
+	{ ALIGN_LEFT,	"recvq",	"Recv-Q",		" ", 0, 0, 0 },
+	{ ALIGN_LEFT,	"sendq",	"Send-Q",		" ", 0, 0, 0 },
+	{ ALIGN_RIGHT,	"local",	"Local Address:",	" ", 0, 0, 0 },
+	{ ALIGN_LEFT,	"lport",	"Port",			"",  0, 0, 0 },
+	{ ALIGN_RIGHT,	"peer",		"Peer Address:",	" ", 0, 0, 0 },
+	{ ALIGN_LEFT,	"pport",	"Port",			"",  0, 0, 0 },
+	{ ALIGN_LEFT,	"ext",		"",			"",  0, 0, 0 },
 };
 
 static struct column *current_field = columns;
@@ -1073,6 +1074,11 @@ static int field_is_last(struct column *f)
 	return f - columns == COL_MAX - 1;
 }
 
+static int field_is_valid(struct column *f)
+{
+	return f >= columns && f - columns < COL_MAX;
+}
+
 static void field_next(void)
 {
 	field_flush(current_field);
@@ -4666,6 +4672,8 @@ static void _usage(FILE *dest)
 "\n"
 "   -K, --kill          forcibly close sockets, display what was closed\n"
 "   -H, --no-header     Suppress header line\n"
+"   -c, --columns=COLS  display only COLS columns\n"
+"       COLS := {netid|state|local|lport|peer|pport|ext}[,COLS]\n"
 "\n"
 "   -A, --query=QUERY, --socket=QUERY\n"
 "       QUERY := {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|packet|netlink|vsock_stream|vsock_dgram|tipc}[,QUERY]\n"
@@ -4785,6 +4793,7 @@ static const struct option long_opts[] = {
 	{ "tipcinfo", 0, 0, OPT_TIPCINFO},
 	{ "kill", 0, 0, 'K' },
 	{ "no-header", 0, 0, 'H' },
+	{ "columns", 1, 0, 'c' },
 	{ 0 }
 
 };
@@ -4800,7 +4809,7 @@ int main(int argc, char *argv[])
 	int state_filter = 0;
 
 	while ((ch = getopt_long(argc, argv,
-				 "dhaletuwxnro460spbEf:miA:D:F:vVzZN:KHS",
+				 "dhaletuwxnro460spbEf:miA:D:F:vVzZN:KHc:S",
 				 long_opts, NULL)) != EOF) {
 		switch (ch) {
 		case 'n':
@@ -4966,6 +4975,39 @@ int main(int argc, char *argv[])
 		case 'H':
 			show_header = 0;
 			break;
+		case 'c':
+		{
+			struct column *f;
+			char *p, *p1;
+
+			if (!optarg) {
+				fprintf(stderr, "ss: No columns given.\n");
+				usage();
+			}
+
+			for (f = columns; field_is_valid(f); f++)
+				f->disabled = 1;
+
+			p = optarg;
+			do {
+				p1 = strchr(p, ',');
+				if (p1)
+					*p1 = 0;
+				for (f = columns; field_is_valid(f); f++) {
+					if (!strcmp(f->optname, p)) {
+						f->disabled = 0;
+						break;
+					}
+				}
+				if (!field_is_valid(f)) {
+					fprintf(stderr, "ss: No column %s\n",
+						p);
+					usage();
+				}
+				p = p1 + 1;
+			} while (p1);
+			break;
+		}
 		case 'h':
 			help();
 		case '?':
-- 
2.19.1

^ permalink raw reply related

* [PATCH iproute2 net-next 1/3] ss: Discard empty descriptor at the end of buffer, if any, before rendering
From: Stefano Brivio @ 2018-10-30 15:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <cover.1540910943.git.sbrivio@redhat.com>

This will allow us to disable display of any given column.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 misc/ss.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index c8970438ce73..c3f61ef66258 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1245,8 +1245,15 @@ static void render(void)
 
 	token = (struct buf_token *)buffer.head->data;
 
-	/* Ensure end alignment of last token, it wasn't necessarily flushed */
-	buffer.tail->end += buffer.cur->len % 2;
+	if (!buffer.cur->len) {
+		/* Last token was flushed, a new empty descriptor was appended:
+		 * discard it
+		 */
+		buffer.tail->end -= sizeof(buffer.cur->len);
+	} else {
+		/* Last token wasn't flushed: ensure end alignment */
+		buffer.tail->end += buffer.cur->len % 2;
+	}
 
 	render_calc_width();
 
-- 
2.19.1

^ permalink raw reply related

* [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: Stefano Brivio @ 2018-10-30 15:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev

Now that we have an abstraction for columns, it's relatively easy to
selectively display only some of them, and Yoann has a use case for it.

Patch 1/3 fixes a rendering issue that shows up only when display of
arbitrary columns is disabled. Patch 2/3 implements the relevant option,
and patch 3/3 makes the output more readable when some columns are
disabled.

Stefano Brivio (3):
  ss: Discard empty descriptor at the end of buffer, if any, before
    rendering
  ss: Introduce option to display selected columns only
  ss: Beautify output when arbitrary columns are hidden

 man/man8/ss.8 |  5 +++
 misc/ss.c     | 85 +++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 77 insertions(+), 13 deletions(-)

-- 
2.19.1

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in _decode_session6 (2)
From: syzbot @ 2018-10-30 15:00 UTC (permalink / raw)
  To: davem, herbert, kuznet, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    d8fd9e106fbc bpf: fix wrong helper enablement in cgroup lo..
git tree:       bpf
console output: https://syzkaller.appspot.com/x/log.txt?x=174157cb400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9cb981ee01463dea
dashboard link: https://syzkaller.appspot.com/bug?extid=240f9766d6be3d69431e
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10828e4d400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15fdf2eb400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+240f9766d6be3d69431e@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: slab-out-of-bounds in _decode_session6+0x134a/0x1500  
net/ipv6/xfrm6_policy.c:161
Read of size 1 at addr ffff8801ca971707 by task syz-executor081/5431

CPU: 0 PID: 5431 Comm: syz-executor081 Not tainted 4.19.0+ #72
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  _decode_session6+0x134a/0x1500 net/ipv6/xfrm6_policy.c:161
  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
  xfrm_decode_session include/net/xfrm.h:1232 [inline]
  vti6_tnl_xmit+0x3fc/0x1c10 net/ipv6/ip6_vti.c:542
  __netdev_start_xmit include/linux/netdevice.h:4336 [inline]
  netdev_start_xmit include/linux/netdevice.h:4345 [inline]
  xmit_one net/core/dev.c:3252 [inline]
  dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3268
  __dev_queue_xmit+0x2f71/0x3ad0 net/core/dev.c:3838
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3871
  __bpf_tx_skb net/core/filter.c:2017 [inline]
  __bpf_redirect_common net/core/filter.c:2055 [inline]
  __bpf_redirect+0x5cf/0xb20 net/core/filter.c:2062
  ____bpf_clone_redirect net/core/filter.c:2095 [inline]
  bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2067
  bpf_prog_c39d1ba309a769f7+0x800/0x1000

Allocated by task 5431:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  __do_kmalloc_node mm/slab.c:3682 [inline]
  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3696
  __kmalloc_reserve.isra.40+0x41/0xe0 net/core/skbuff.c:137
  pskb_expand_head+0x230/0x10f0 net/core/skbuff.c:1460
  skb_ensure_writable+0x3dd/0x640 net/core/skbuff.c:5071
  __bpf_try_make_writable net/core/filter.c:1638 [inline]
  bpf_try_make_writable net/core/filter.c:1644 [inline]
  bpf_try_make_head_writable net/core/filter.c:1652 [inline]
  ____bpf_clone_redirect net/core/filter.c:2089 [inline]
  bpf_clone_redirect+0x14a/0x490 net/core/filter.c:2067
  bpf_prog_c39d1ba309a769f7+0x800/0x1000

Freed by task 4022:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xcf/0x230 mm/slab.c:3813
  load_elf_binary+0x25b4/0x5620 fs/binfmt_elf.c:1118
  search_binary_handler+0x17d/0x570 fs/exec.c:1653
  exec_binprm fs/exec.c:1695 [inline]
  __do_execve_file.isra.33+0x162f/0x2540 fs/exec.c:1819
  do_execveat_common fs/exec.c:1866 [inline]
  do_execve fs/exec.c:1883 [inline]
  __do_sys_execve fs/exec.c:1964 [inline]
  __se_sys_execve fs/exec.c:1959 [inline]
  __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ca971500
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 7 bytes to the right of
  512-byte region [ffff8801ca971500, ffff8801ca971700)
The buggy address belongs to the page:
page:ffffea00072a5c40 count:1 mapcount:0 mapping:ffff8801da800940 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea00072a6088 ffffea00072a7808 ffff8801da800940
raw: 0000000000000000 ffff8801ca971000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ca971600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801ca971680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff8801ca971700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                    ^
  ffff8801ca971780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ca971800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [BUG] MVPP2 driver exploding in presence of a tap interface
From: Marc Zyngier @ 2018-10-30 14:55 UTC (permalink / raw)
  To: Thomas Petazzoni, Marcin Wojtas
  Cc: Antoine Tenart, Maxime Chevallier, linux-arm-kernel, netdev,
	Grzegorz Jaszczyk, Tomasz Nowicki
In-Reply-To: <20181030140056.2fc69efc@windsurf>

On 30/10/18 13:00, Thomas Petazzoni wrote:
> Hello Marcin,
> 
> Thanks for the feedback.
> 
> On Tue, 30 Oct 2018 13:37:37 +0100, Marcin Wojtas wrote:
> 
>> You use _really_ archaic firmware, the bug you see is 99% caused by a
>> bug already fixed long time ago (cleanup all PP2 BM pools correctly
>> during exit boot services). Please grab the latest release:
>> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/wiki/files/flash-image-18.09.4.bin
>> and let know if you observe any further issues with vanilla kernel.
> 
> Even if this was a bug in the UEFI firmware, shouldn't the kernel be
> independent from that, by doing a proper reset/reinit of the HW ?
> 
> I.e, isn't the firmware fix papering over a bug that should be fixed in
> Linux mvpp2 driver anyway ?

Absolutely. Leaving this unpatched in the kernel, with a 100% chance of
memory corruption is just mad.

I'm pretty sure there should be a way to sanely reset the interface
before it starts repainting the memory. And if there is none, we must
find a way to tell the user that the machine is a death trap. Really.

	M.

PS: updating the FW to the version provided by Marcin indeed makes
things much more reliable. Thanks for that.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [PATCH v1 0/5] can: add SAE J1939 protocol
From: Marc Kleine-Budde @ 2018-10-30 14:54 UTC (permalink / raw)
  To: Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can
In-Reply-To: <20181008094836.14080-1-o.rempel@pengutronix.de>


[-- Attachment #1.1: Type: text/plain, Size: 696 bytes --]

On 10/08/2018 11:48 AM, Oleksij Rempel wrote:
> This series adds SAE J1939 support to the current kernel v4.19-rc6.
> 
> This stack has long history, starting back in 27 Apr 2011, if not
> earlier:
> https://lists.openwall.net/netdev/2011/04/27/45
> 
> After major rework and testing it is a time to send it mainline.

I've removed some trailing newlines and added the stack to
linux-can-next/j1939.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30 14:16 UTC (permalink / raw)
  To: Paweł Staszewski, Dimitris Michailidis
  Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <b0a36d1f-5518-392c-d731-1ec66ec48f92@itcare.pl>



On 10/30/2018 01:09 AM, Paweł Staszewski wrote:
> 
> 
> W dniu 30.10.2018 o 08:29, Eric Dumazet pisze:
>>
>> On 10/29/2018 11:09 PM, Dimitris Michailidis wrote:
>>
>>> Indeed this is a bug. I would expect it to produce frequent errors
>>> though as many odd-length
>>> packets would trigger it. Do you have RXFCS? Regardless, how
>>> frequently do you see the problem?
>>>
>> Old kernels (before 88078d98d1bb) were simply resetting ip_summed to CHECKSUM_NONE
>>
>> And before your fix (commit d55bef5059dd057bd), mlx5 bug was canceling the bug you fixed.
>>
>> So we now need to also fix mlx5.
>>
>> And of course use skb_header_pointer() in mlx5e_get_fcs() as I mentioned earlier,
>> plus __get_unaligned_cpu32() as you hinted.
>>
>>
>>
>>
> 
> No RXFCS
> 
> And this trace is rly frequently like once per 3/4 seconds
> like below:
> [28965.776864] vlan1490: hw csum failure

Might be vlan related.

Can you first check this :

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 94224c22ecc310a87b6715051e335446f29bec03..6f4bfebf0d9a3ae7567062abb3ea6532b3aaf3d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -789,13 +789,8 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
                if (network_depth > ETH_HLEN)
-                       /* CQE csum is calculated from the IP header and does
-                        * not cover VLAN headers (if present). This will add
-                        * the checksum manually.
-                        */
-                       skb->csum = csum_partial(skb->data + ETH_HLEN,
-                                                network_depth - ETH_HLEN,
-                                                skb->csum);
+                       /* Temporary debugging */
+                       skb->ip_summed = CHECKSUM_NONE;
                if (unlikely(netdev->features & NETIF_F_RXFCS))
                        skb->csum = csum_add(skb->csum,
                                             (__force __wsum)mlx5e_get_fcs(skb));

^ permalink raw reply related

* [Patch V5 net 05/11] net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

It is not necessary to reset the queue in the hns3_uninit_all_ring(),
since the queue is stopped in the down operation, and will be reset
in the up operation. And the judgment of the HCLGE_STATE_RST_HANDLING
flag in the hclge_reset_tqp() is not correct, because we need to reset
tqp during pf reset, otherwise it may cause queue not being reset to
working state problem.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V4: Fixes comments from Sergei Shtylyov
V3: Fixes comments from Sergei Shtylyov
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c         | 3 ---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index b767ff9..bf71c23 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3250,9 +3250,6 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
 	int i;
 
 	for (i = 0; i < h->kinfo.num_tqps; i++) {
-		if (h->ae_algo->ops->reset_queue)
-			h->ae_algo->ops->reset_queue(h, i);
-
 		hns3_fini_ring(priv->ring_data[i].ring);
 		hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring);
 	}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2a63147..4dd0506 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -6116,9 +6116,6 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	u16 queue_gid;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
-		return;
-
 	queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
 
 	ret = hclge_tqp_enable(hdev, queue_id, 0, false);
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 08/11] net: hns3: fix incorrect return value/type of some functions
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

There are some functions that, when they fail to send the command,
need to return the corresponding error value to its caller.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Fixes: 681ec3999b3d ("net: hns3: fix for vlan table lost problem when resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V2: Fixes the compilation error reported by kbuild test robot
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |  6 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 80 +++++++++++++++-------
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 34 ++++-----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 14 ++--
 6 files changed, 85 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index e82e4ca..055b406 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -316,8 +316,8 @@ struct hnae3_ae_ops {
 	int (*set_loopback)(struct hnae3_handle *handle,
 			    enum hnae3_loop loop_mode, bool en);
 
-	void (*set_promisc_mode)(struct hnae3_handle *handle, bool en_uc_pmc,
-				 bool en_mc_pmc);
+	int (*set_promisc_mode)(struct hnae3_handle *handle, bool en_uc_pmc,
+				bool en_mc_pmc);
 	int (*set_mtu)(struct hnae3_handle *handle, int new_mtu);
 
 	void (*get_pauseparam)(struct hnae3_handle *handle,
@@ -391,7 +391,7 @@ struct hnae3_ae_ops {
 				      int vector_num,
 				      struct hnae3_ring_chain_node *vr_chain);
 
-	void (*reset_queue)(struct hnae3_handle *handle, u16 queue_id);
+	int (*reset_queue)(struct hnae3_handle *handle, u16 queue_id);
 	u32 (*get_fw_version)(struct hnae3_handle *handle);
 	void (*get_mdix_mode)(struct hnae3_handle *handle,
 			      u8 *tp_mdix_ctrl, u8 *tp_mdix);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index bf71c23..3f96aa3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -509,16 +509,18 @@ static void hns3_nic_set_rx_mode(struct net_device *netdev)
 	h->netdev_flags = new_flags;
 }
 
-void hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags)
+int hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
 
 	if (h->ae_algo->ops->set_promisc_mode) {
-		h->ae_algo->ops->set_promisc_mode(h,
-						  promisc_flags & HNAE3_UPE,
-						  promisc_flags & HNAE3_MPE);
+		return h->ae_algo->ops->set_promisc_mode(h,
+						promisc_flags & HNAE3_UPE,
+						promisc_flags & HNAE3_MPE);
 	}
+
+	return 0;
 }
 
 void hns3_enable_vlan_filter(struct net_device *netdev, bool enable)
@@ -1494,18 +1496,22 @@ static int hns3_vlan_rx_kill_vid(struct net_device *netdev,
 	return ret;
 }
 
-static void hns3_restore_vlan(struct net_device *netdev)
+static int hns3_restore_vlan(struct net_device *netdev)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
+	int ret = 0;
 	u16 vid;
-	int ret;
 
 	for_each_set_bit(vid, priv->active_vlans, VLAN_N_VID) {
 		ret = hns3_vlan_rx_add_vid(netdev, htons(ETH_P_8021Q), vid);
-		if (ret)
-			netdev_warn(netdev, "Restore vlan: %d filter, ret:%d\n",
-				    vid, ret);
+		if (ret) {
+			netdev_err(netdev, "Restore vlan: %d filter, ret:%d\n",
+				   vid, ret);
+			return ret;
+		}
 	}
+
+	return ret;
 }
 
 static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
@@ -3257,11 +3263,12 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
 }
 
 /* Set mac addr if it is configured. or leave it to the AE driver */
-static void hns3_init_mac_addr(struct net_device *netdev, bool init)
+static int hns3_init_mac_addr(struct net_device *netdev, bool init)
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
 	u8 mac_addr_temp[ETH_ALEN];
+	int ret = 0;
 
 	if (h->ae_algo->ops->get_mac_addr && init) {
 		h->ae_algo->ops->get_mac_addr(h, mac_addr_temp);
@@ -3276,8 +3283,9 @@ static void hns3_init_mac_addr(struct net_device *netdev, bool init)
 	}
 
 	if (h->ae_algo->ops->set_mac_addr)
-		h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true);
+		ret = h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr, true);
 
+	return ret;
 }
 
 static int hns3_restore_fd_rules(struct net_device *netdev)
@@ -3490,20 +3498,29 @@ static int hns3_client_setup_tc(struct hnae3_handle *handle, u8 tc)
 	return ret;
 }
 
-static void hns3_recover_hw_addr(struct net_device *ndev)
+static int hns3_recover_hw_addr(struct net_device *ndev)
 {
 	struct netdev_hw_addr_list *list;
 	struct netdev_hw_addr *ha, *tmp;
+	int ret = 0;
 
 	/* go through and sync uc_addr entries to the device */
 	list = &ndev->uc;
-	list_for_each_entry_safe(ha, tmp, &list->list, list)
-		hns3_nic_uc_sync(ndev, ha->addr);
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		ret = hns3_nic_uc_sync(ndev, ha->addr);
+		if (ret)
+			return ret;
+	}
 
 	/* go through and sync mc_addr entries to the device */
 	list = &ndev->mc;
-	list_for_each_entry_safe(ha, tmp, &list->list, list)
-		hns3_nic_mc_sync(ndev, ha->addr);
+	list_for_each_entry_safe(ha, tmp, &list->list, list) {
+		ret = hns3_nic_mc_sync(ndev, ha->addr);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
 }
 
 static void hns3_remove_hw_addr(struct net_device *netdev)
@@ -3630,7 +3647,10 @@ int hns3_nic_reset_all_ring(struct hnae3_handle *h)
 	int ret;
 
 	for (i = 0; i < h->kinfo.num_tqps; i++) {
-		h->ae_algo->ops->reset_queue(h, i);
+		ret = h->ae_algo->ops->reset_queue(h, i);
+		if (ret)
+			return ret;
+
 		hns3_init_ring_hw(priv->ring_data[i].ring);
 
 		/* We need to clear tx ring here because self test will
@@ -3722,18 +3742,30 @@ static int hns3_reset_notify_init_enet(struct hnae3_handle *handle)
 	bool vlan_filter_enable;
 	int ret;
 
-	hns3_init_mac_addr(netdev, false);
-	hns3_recover_hw_addr(netdev);
-	hns3_update_promisc_mode(netdev, handle->netdev_flags);
+	ret = hns3_init_mac_addr(netdev, false);
+	if (ret)
+		return ret;
+
+	ret = hns3_recover_hw_addr(netdev);
+	if (ret)
+		return ret;
+
+	ret = hns3_update_promisc_mode(netdev, handle->netdev_flags);
+	if (ret)
+		return ret;
+
 	vlan_filter_enable = netdev->flags & IFF_PROMISC ? false : true;
 	hns3_enable_vlan_filter(netdev, vlan_filter_enable);
 
-
 	/* Hardware table is only clear when pf resets */
-	if (!(handle->flags & HNAE3_SUPPORT_VF))
-		hns3_restore_vlan(netdev);
+	if (!(handle->flags & HNAE3_SUPPORT_VF)) {
+		ret = hns3_restore_vlan(netdev);
+		return ret;
+	}
 
-	hns3_restore_fd_rules(netdev);
+	ret = hns3_restore_fd_rules(netdev);
+	if (ret)
+		return ret;
 
 	/* Carrier off reporting is important to ethtool even BEFORE open */
 	netif_carrier_off(netdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 71cfca1..d3636d0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -640,7 +640,7 @@ void hns3_set_vector_coalesce_rl(struct hns3_enet_tqp_vector *tqp_vector,
 				 u32 rl_value);
 
 void hns3_enable_vlan_filter(struct net_device *netdev, bool enable);
-void hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags);
+int hns3_update_promisc_mode(struct net_device *netdev, u8 promisc_flags);
 
 #ifdef CONFIG_HNS3_DCB
 void hns3_dcbnl_setup(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 4dd0506..f3212c9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3314,8 +3314,8 @@ void hclge_promisc_param_init(struct hclge_promisc_param *param, bool en_uc,
 	param->vf_id = vport_id;
 }
 
-static void hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
-				   bool en_mc_pmc)
+static int hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
+				  bool en_mc_pmc)
 {
 	struct hclge_vport *vport = hclge_get_vport(handle);
 	struct hclge_dev *hdev = vport->back;
@@ -3323,7 +3323,7 @@ static void hclge_set_promisc_mode(struct hnae3_handle *handle, bool en_uc_pmc,
 
 	hclge_promisc_param_init(&param, en_uc_pmc, en_mc_pmc, true,
 				 vport->vport_id);
-	hclge_cmd_set_promisc_mode(hdev, &param);
+	return hclge_cmd_set_promisc_mode(hdev, &param);
 }
 
 static int hclge_get_fd_mode(struct hclge_dev *hdev, u8 *fd_mode)
@@ -6107,28 +6107,28 @@ static u16 hclge_covert_handle_qid_global(struct hnae3_handle *handle,
 	return tqp->index;
 }
 
-void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
+int hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 {
 	struct hclge_vport *vport = hclge_get_vport(handle);
 	struct hclge_dev *hdev = vport->back;
 	int reset_try_times = 0;
 	int reset_status;
 	u16 queue_gid;
-	int ret;
+	int ret = 0;
 
 	queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
 
 	ret = hclge_tqp_enable(hdev, queue_id, 0, false);
 	if (ret) {
-		dev_warn(&hdev->pdev->dev, "Disable tqp fail, ret = %d\n", ret);
-		return;
+		dev_err(&hdev->pdev->dev, "Disable tqp fail, ret = %d\n", ret);
+		return ret;
 	}
 
 	ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true);
 	if (ret) {
-		dev_warn(&hdev->pdev->dev,
-			 "Send reset tqp cmd fail, ret = %d\n", ret);
-		return;
+		dev_err(&hdev->pdev->dev,
+			"Send reset tqp cmd fail, ret = %d\n", ret);
+		return ret;
 	}
 
 	reset_try_times = 0;
@@ -6141,16 +6141,16 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	}
 
 	if (reset_try_times >= HCLGE_TQP_RESET_TRY_TIMES) {
-		dev_warn(&hdev->pdev->dev, "Reset TQP fail\n");
-		return;
+		dev_err(&hdev->pdev->dev, "Reset TQP fail\n");
+		return ret;
 	}
 
 	ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, false);
-	if (ret) {
-		dev_warn(&hdev->pdev->dev,
-			 "Deassert the soft reset fail, ret = %d\n", ret);
-		return;
-	}
+	if (ret)
+		dev_err(&hdev->pdev->dev,
+			"Deassert the soft reset fail, ret = %d\n", ret);
+
+	return ret;
 }
 
 void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index e3dfd65..0d92154 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -778,7 +778,7 @@ int hclge_rss_init_hw(struct hclge_dev *hdev);
 void hclge_rss_indir_init_cfg(struct hclge_dev *hdev);
 
 void hclge_mbx_handler(struct hclge_dev *hdev);
-void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id);
+int hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id);
 void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id);
 int hclge_cfg_flowctrl(struct hclge_dev *hdev);
 int hclge_func_reset_cmd(struct hclge_dev *hdev, int func_id);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index e0a86a5..b224f6a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -925,12 +925,12 @@ static int hclgevf_cmd_set_promisc_mode(struct hclgevf_dev *hdev,
 	return status;
 }
 
-static void hclgevf_set_promisc_mode(struct hnae3_handle *handle,
-				     bool en_uc_pmc, bool en_mc_pmc)
+static int hclgevf_set_promisc_mode(struct hnae3_handle *handle,
+				    bool en_uc_pmc, bool en_mc_pmc)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 
-	hclgevf_cmd_set_promisc_mode(hdev, en_uc_pmc, en_mc_pmc);
+	return hclgevf_cmd_set_promisc_mode(hdev, en_uc_pmc, en_mc_pmc);
 }
 
 static int hclgevf_tqp_enable(struct hclgevf_dev *hdev, int tqp_id,
@@ -1080,7 +1080,7 @@ static int hclgevf_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
 				    1, false, NULL, 0);
 }
 
-static void hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
+static int hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 	u8 msg_data[2];
@@ -1091,10 +1091,10 @@ static void hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	/* disable vf queue before send queue reset msg to PF */
 	ret = hclgevf_tqp_enable(hdev, queue_id, 0, false);
 	if (ret)
-		return;
+		return ret;
 
-	hclgevf_send_mbx_msg(hdev, HCLGE_MBX_QUEUE_RESET, 0, msg_data,
-			     2, true, NULL, 0);
+	return hclgevf_send_mbx_msg(hdev, HCLGE_MBX_QUEUE_RESET, 0, msg_data,
+				    2, true, NULL, 0);
 }
 
 static int hclgevf_notify_client(struct hclgevf_dev *hdev,
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 07/11] net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

When there is a PHY, the driver needs to complete some operations through
MDIO during reset reinitialization, so HCLGE_STATE_CMD_DISABLE is more
suitable than HCLGE_STATE_RST_HANDLING to prevent the MDIO operation from
being sent during the hardware reset.

Fixes: b50ae26c57cb ("net: hns3: never send command queue message to IMP when reset)
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 24b1f2a..0301863 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -52,7 +52,7 @@ static int hclge_mdio_write(struct mii_bus *bus, int phyid, int regnum,
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, false);
@@ -90,7 +90,7 @@ static int hclge_mdio_read(struct mii_bus *bus, int phyid, int regnum)
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, true);
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 04/11] net: hns3: bugfix for the initialization of command queue's spin lock
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

The spin lock of the command queue only need to be initialized once
when the driver initializes the command queue. It is not necessary to
initialize the spin lock when resetting. At the same time, the
modification of the queue member should be performed after acquiring
the lock.

Fixes: 3efb960f056d ("net: hns3: Refactor the initialization of command queue")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index ac13cb2..68026a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -304,6 +304,10 @@ int hclge_cmd_queue_init(struct hclge_dev *hdev)
 {
 	int ret;
 
+	/* Setup the lock for command queue */
+	spin_lock_init(&hdev->hw.cmq.csq.lock);
+	spin_lock_init(&hdev->hw.cmq.crq.lock);
+
 	/* Setup the queue entries for use cmd queue */
 	hdev->hw.cmq.csq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
 	hdev->hw.cmq.crq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
@@ -337,18 +341,20 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 	u32 version;
 	int ret;
 
+	spin_lock_bh(&hdev->hw.cmq.csq.lock);
+	spin_lock_bh(&hdev->hw.cmq.crq.lock);
+
 	hdev->hw.cmq.csq.next_to_clean = 0;
 	hdev->hw.cmq.csq.next_to_use = 0;
 	hdev->hw.cmq.crq.next_to_clean = 0;
 	hdev->hw.cmq.crq.next_to_use = 0;
 
-	/* Setup the lock for command queue */
-	spin_lock_init(&hdev->hw.cmq.csq.lock);
-	spin_lock_init(&hdev->hw.cmq.crq.lock);
-
 	hclge_cmd_init_regs(&hdev->hw);
 	clear_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
 
+	spin_unlock_bh(&hdev->hw.cmq.crq.lock);
+	spin_unlock_bh(&hdev->hw.cmq.csq.lock);
+
 	ret = hclge_cmd_query_firmware_version(&hdev->hw, &version);
 	if (ret) {
 		dev_err(&hdev->pdev->dev,
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 03/11] net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

The current driver supports handling two vector0 interrupts, reset and
mailbox. When the hardware reports an interrupt of another type of
interrupt source, if the driver does not process the interrupt, but
enables the interrupt, the hardware will repeatedly report the unknown
interrupt.

Therefore, the driver enables the vector0 interrupt after clearing the
known type of interrupt source. Other conditions are not enabled.

Fixes: cd8c5c269b1d ("net: hns3: Fix for hclge_reset running repeatly problem")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 5234b53..2a63147 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2236,7 +2236,7 @@ static irqreturn_t hclge_misc_irq_handle(int irq, void *data)
 	}
 
 	/* clear the source of interrupt if it is not cause by reset */
-	if (event_cause != HCLGE_VECTOR0_EVENT_RST) {
+	if (event_cause == HCLGE_VECTOR0_EVENT_MBX) {
 		hclge_clear_event_cause(hdev, event_cause, clearval);
 		hclge_enable_vector(&hdev->misc_vector, true);
 	}
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 02/11] net: hns3: bugfix for buffer not free problem during resetting
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

When hns3_get_ring_config()/hns3_queue_to_ring()/
hns3_get_vector_ring_chain() failed during resetting, the allocated
memory has not been freed before these three functions return. So
this patch adds error handler in these functions to fix it.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
V5: Fixes comments from Sergei Shtylyov
    add error handler for hns3_get_vector_ring_chain()
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 0b4323b..b767ff9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2727,7 +2727,7 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 			chain = devm_kzalloc(&pdev->dev, sizeof(*chain),
 					     GFP_KERNEL);
 			if (!chain)
-				return -ENOMEM;
+				goto err_free_chain;
 
 			cur_chain->next = chain;
 			chain->tqp_index = tx_ring->tqp->tqp_index;
@@ -2757,7 +2757,7 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 	while (rx_ring) {
 		chain = devm_kzalloc(&pdev->dev, sizeof(*chain), GFP_KERNEL);
 		if (!chain)
-			return -ENOMEM;
+			goto err_free_chain;
 
 		cur_chain->next = chain;
 		chain->tqp_index = rx_ring->tqp->tqp_index;
@@ -2772,6 +2772,16 @@ static int hns3_get_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
 	}
 
 	return 0;
+
+err_free_chain:
+	cur_chain = head->next;
+	while (cur_chain) {
+		chain = cur_chain->next;
+		devm_kfree(&pdev->dev, chain);
+		cur_chain = chain;
+	}
+
+	return -ENOMEM;
 }
 
 static void hns3_free_vector_ring_chain(struct hns3_enet_tqp_vector *tqp_vector,
@@ -3037,8 +3047,10 @@ static int hns3_queue_to_ring(struct hnae3_queue *tqp,
 		return ret;
 
 	ret = hns3_ring_get_cfg(tqp, priv, HNAE3_RING_TYPE_RX);
-	if (ret)
+	if (ret) {
+		devm_kfree(priv->dev, priv->ring_data[tqp->tqp_index].ring);
 		return ret;
+	}
 
 	return 0;
 }
@@ -3065,6 +3077,12 @@ static int hns3_get_ring_config(struct hns3_nic_priv *priv)
 
 	return 0;
 err:
+	while (i--) {
+		devm_kfree(priv->dev, priv->ring_data[i].ring);
+		devm_kfree(priv->dev,
+			   priv->ring_data[i + h->kinfo.num_tqps].ring);
+	}
+
 	devm_kfree(&pdev->dev, priv->ring_data);
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related

* [Patch V5 net 11/11] net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
From: Huazhong Tan @ 2018-10-30 13:50 UTC (permalink / raw)
  To: davem, sergei.shtylyov, joe
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1540907453-42276-1-git-send-email-tanhuazhong@huawei.com>

Since hclgevf_reset_wait() is used to wait for the hardware to complete
the reset, it is not necessary to hold the rtnl_lock during
hclgevf_reset_wait(). So this patch releases the lock for the duration
of hclgevf_reset_wait().

Fixes: 6988eb2a9b77 ("net: hns3: Add support to reset the enet/ring mgmt layer")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index b224f6a..085edb9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1170,6 +1170,8 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
 	/* bring down the nic to stop any ongoing TX/RX */
 	hclgevf_notify_client(hdev, HNAE3_DOWN_CLIENT);
 
+	rtnl_unlock();
+
 	/* check if VF could successfully fetch the hardware reset completion
 	 * status from the hardware
 	 */
@@ -1181,12 +1183,15 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
 			ret);
 
 		dev_warn(&hdev->pdev->dev, "VF reset failed, disabling VF!\n");
+		rtnl_lock();
 		hclgevf_notify_client(hdev, HNAE3_UNINIT_CLIENT);
 
 		rtnl_unlock();
 		return ret;
 	}
 
+	rtnl_lock();
+
 	/* now, re-initialize the nic client and ae device*/
 	ret = hclgevf_reset_stack(hdev);
 	if (ret)
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox