Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/5 net-next] inet: kill smallest_size and smallest_port
From: Josef Bacik @ 2016-12-20 20:07 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-1-git-send-email-jbacik@fb.com>

In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 74f6a57..1a1a94bd 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -93,7 +93,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
 	int ret = 1, attempts = 5, port = snum;
-	int smallest_size = -1, smallest_port;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -103,7 +102,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuseport_ok = !!snum;
 
 	if (port) {
-have_port:
 		head = &hinfo->bhash[inet_bhashfn(net, port,
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);
@@ -137,8 +135,6 @@ other_half_scan:
 	 * We do the opposite to not pollute connect() users.
 	 */
 	offset |= 1U;
-	smallest_size = -1;
-	smallest_port = low; /* avoid compiler warning */
 
 other_parity_scan:
 	port = low + offset;
@@ -152,15 +148,6 @@ other_parity_scan:
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
-				if (((tb->fastreuse > 0 && reuse) ||
-				     (tb->fastreuseport > 0 &&
-				      sk->sk_reuseport &&
-				      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-				      uid_eq(tb->fastuid, uid))) &&
-				    (tb->num_owners < smallest_size || smallest_size == -1)) {
-					smallest_size = tb->num_owners;
-					smallest_port = port;
-				}
 				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
 					goto tb_found;
 				goto next_port;
@@ -171,10 +158,6 @@ next_port:
 		cond_resched();
 	}
 
-	if (smallest_size != -1) {
-		port = smallest_port;
-		goto have_port;
-	}
 	offset--;
 	if (!(offset & 1))
 		goto other_parity_scan;
@@ -196,19 +179,18 @@ tb_found:
 		if (sk->sk_reuse == SK_FORCE_REUSE)
 			goto success;
 
-		if (((tb->fastreuse > 0 && reuse) ||
+		if ((tb->fastreuse > 0 && reuse) ||
 		     (tb->fastreuseport > 0 &&
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
-		    smallest_size == -1)
+		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
 			if ((reuse ||
 			     (tb->fastreuseport > 0 &&
 			      sk->sk_reuseport &&
 			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) &&
-			    !snum && smallest_size != -1 && --attempts >= 0) {
+			      uid_eq(tb->fastuid, uid))) && !snum &&
+			    --attempts >= 0) {
 				spin_unlock_bh(&head->lock);
 				goto again;
 			}
-- 
2.9.3

^ permalink raw reply related

* [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Josef Bacik @ 2016-12-20 20:07 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-1-git-send-email-jbacik@fb.com>

The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict
is how they check the rcv_saddr.  Since we want to be able to check the saddr in
other places just drop the protocol specific ->bind_conflict and replace it with
->rcv_saddr_equal, then make inet_csk_bind_conflict the one true bind conflict
function.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/inet6_connection_sock.h |  5 -----
 include/net/inet_connection_sock.h  |  9 +++------
 net/dccp/ipv4.c                     |  3 ++-
 net/dccp/ipv6.c                     |  2 +-
 net/ipv4/inet_connection_sock.c     | 22 +++++++-------------
 net/ipv4/tcp_ipv4.c                 |  3 ++-
 net/ipv4/udp.c                      |  1 +
 net/ipv6/inet6_connection_sock.c    | 40 -------------------------------------
 net/ipv6/tcp_ipv6.c                 |  4 ++--
 9 files changed, 18 insertions(+), 71 deletions(-)

diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
index 3212b39..8ec87b6 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -15,16 +15,11 @@
 
 #include <linux/types.h>
 
-struct inet_bind_bucket;
 struct request_sock;
 struct sk_buff;
 struct sock;
 struct sockaddr;
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-			    const struct inet_bind_bucket *tb, bool relax,
-			    bool soreuseport_ok);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6,
 				      const struct request_sock *req, u8 proto);
 
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index ec0479a..9cd43c5 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
 				char __user *optval, int __user *optlen);
 #endif
 	void	    (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
-	int	    (*bind_conflict)(const struct sock *sk,
-				     const struct inet_bind_bucket *tb,
-				     bool relax, bool soreuseport_ok);
+	int         (*rcv_saddr_equal)(const struct sock *sk1,
+				       const struct sock *sk2,
+				       bool match_wildcard);
 	void	    (*mtu_reduced)(struct sock *sk);
 };
 
@@ -261,9 +261,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
 
 struct sock *inet_csk_accept(struct sock *sk, int flags, int *err);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-			   const struct inet_bind_bucket *tb, bool relax,
-			   bool soreuseport_ok);
 int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 9c67a96..1931324 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -17,6 +17,7 @@
 #include <linux/skbuff.h>
 #include <linux/random.h>
 
+#include <net/addrconf.h>
 #include <net/icmp.h>
 #include <net/inet_common.h>
 #include <net/inet_hashtables.h>
@@ -901,7 +902,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv4_af_ops = {
 	.getsockopt	   = ip_getsockopt,
 	.addr2sockaddr	   = inet_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in),
-	.bind_conflict	   = inet_csk_bind_conflict,
+	.rcv_saddr_equal   = ipv4_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ip_setsockopt,
 	.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 4663a01..45242b8 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -926,7 +926,7 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_af_ops = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
+	.rcv_saddr_equal   = ipv6_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 5f44fa1..74f6a57 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -44,9 +44,9 @@ void inet_get_local_port_range(struct net *net, int *low, int *high)
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-			   const struct inet_bind_bucket *tb, bool relax,
-			   bool reuseport_ok)
+static int inet_csk_bind_conflict(const struct sock *sk,
+				  const struct inet_bind_bucket *tb,
+				  bool relax, bool reuseport_ok)
 {
 	struct sock *sk2;
 	bool reuse = sk->sk_reuse;
@@ -62,7 +62,6 @@ int inet_csk_bind_conflict(const struct sock *sk,
 
 	sk_for_each_bound(sk2, &tb->owners) {
 		if (sk != sk2 &&
-		    !inet_v6_ipv6only(sk2) &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
@@ -72,23 +71,18 @@ int inet_csk_bind_conflict(const struct sock *sk,
 			     rcu_access_pointer(sk->sk_reuseport_cb) ||
 			     (sk2->sk_state != TCP_TIME_WAIT &&
 			     !uid_eq(uid, sock_i_uid(sk2))))) {
-
-				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
-				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
+				if (inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
 			if (!relax && reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
-
-				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
-				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
+				if (inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
 		}
 	}
 	return sk2 != NULL;
 }
-EXPORT_SYMBOL_GPL(inet_csk_bind_conflict);
 
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
@@ -167,8 +161,7 @@ other_parity_scan:
 					smallest_size = tb->num_owners;
 					smallest_port = port;
 				}
-				if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false,
-									      reuseport_ok))
+				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
 					goto tb_found;
 				goto next_port;
 			}
@@ -209,8 +202,7 @@ tb_found:
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
 		    smallest_size == -1)
 			goto success;
-		if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true,
-							     reuseport_ok)) {
+		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
 			if ((reuse ||
 			     (tb->fastreuseport > 0 &&
 			      sk->sk_reuseport &&
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 029708f..7608012 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -63,6 +63,7 @@
 #include <linux/times.h>
 #include <linux/slab.h>
 
+#include <net/addrconf.h>
 #include <net/net_namespace.h>
 #include <net/icmp.h>
 #include <net/inet_hashtables.h>
@@ -1781,7 +1782,7 @@ const struct inet_connection_sock_af_ops ipv4_specific = {
 	.getsockopt	   = ip_getsockopt,
 	.addr2sockaddr	   = inet_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in),
-	.bind_conflict	   = inet_csk_bind_conflict,
+	.rcv_saddr_equal   = ipv4_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ip_setsockopt,
 	.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2a70c05..6089ea8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -374,6 +374,7 @@ int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2,
 	}
 	return 0;
 }
+EXPORT_SYMBOL(ipv4_rcv_saddr_equal);
 
 static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr,
 			      unsigned int port)
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 71939a2..7538715 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -28,46 +28,6 @@
 #include <net/inet6_connection_sock.h>
 #include <net/sock_reuseport.h>
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-			    const struct inet_bind_bucket *tb, bool relax,
-			    bool reuseport_ok)
-{
-	const struct sock *sk2;
-	bool reuse = !!sk->sk_reuse;
-	bool reuseport = !!sk->sk_reuseport && reuseport_ok;
-	kuid_t uid = sock_i_uid((struct sock *)sk);
-
-	/* We must walk the whole port owner list in this case. -DaveM */
-	/*
-	 * See comment in inet_csk_bind_conflict about sock lookup
-	 * vs net namespaces issues.
-	 */
-	sk_for_each_bound(sk2, &tb->owners) {
-		if (sk != sk2 &&
-		    (!sk->sk_bound_dev_if ||
-		     !sk2->sk_bound_dev_if ||
-		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
-			if ((!reuse || !sk2->sk_reuse ||
-			     sk2->sk_state == TCP_LISTEN) &&
-			    (!reuseport || !sk2->sk_reuseport ||
-			     rcu_access_pointer(sk->sk_reuseport_cb) ||
-			     (sk2->sk_state != TCP_TIME_WAIT &&
-			      !uid_eq(uid,
-				      sock_i_uid((struct sock *)sk2))))) {
-				if (ipv6_rcv_saddr_equal(sk, sk2, true))
-					break;
-			}
-			if (!relax && reuse && sk2->sk_reuse &&
-			    sk2->sk_state != TCP_LISTEN &&
-			    ipv6_rcv_saddr_equal(sk, sk2, true))
-				break;
-		}
-	}
-
-	return sk2 != NULL;
-}
-EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk,
 				      struct flowi6 *fl6,
 				      const struct request_sock *req,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index bee59a6..2f40b98 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1603,7 +1603,7 @@ static const struct inet_connection_sock_af_ops ipv6_specific = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
+	.rcv_saddr_equal   = ipv6_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
@@ -1634,7 +1634,7 @@ static const struct inet_connection_sock_af_ops ipv6_mapped = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
+	.rcv_saddr_equal   = ipv6_rcv_saddr_equal,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
-- 
2.9.3

^ permalink raw reply related

* [PATCH 3/5 net-next] inet: don't check for bind conflicts twice when searching for a port
From: Josef Bacik @ 2016-12-20 20:07 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-1-git-send-email-jbacik@fb.com>

This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer.  Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path.  Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 39 ++++++++++++++++++---------------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1a1a94bd..fc9bfe1 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -92,7 +92,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 {
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, attempts = 5, port = snum;
+	int ret = 1, port = snum;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -100,6 +100,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	kuid_t uid = sock_i_uid(sk);
 	u32 remaining, offset;
 	bool reuseport_ok = !!snum;
+	bool empty_tb = true;
 
 	if (port) {
 		head = &hinfo->bhash[inet_bhashfn(net, port,
@@ -111,7 +112,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
 		goto tb_not_found;
 	}
-again:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -148,8 +148,12 @@ other_parity_scan:
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
-				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
-					goto tb_found;
+				if (hlist_empty(&tb->owners))
+					goto success;
+				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) {
+					empty_tb = false;
+					goto success;
+				}
 				goto next_port;
 			}
 		goto tb_not_found;
@@ -184,23 +188,12 @@ tb_found:
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
-			if ((reuse ||
-			     (tb->fastreuseport > 0 &&
-			      sk->sk_reuseport &&
-			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) && !snum &&
-			    --attempts >= 0) {
-				spin_unlock_bh(&head->lock);
-				goto again;
-			}
+		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
 			goto fail_unlock;
-		}
-		if (!reuse)
-			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
-			tb->fastreuseport = 0;
-	} else {
+		empty_tb = false;
+	}
+success:
+	if (empty_tb) {
 		tb->fastreuse = reuse;
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
@@ -208,8 +201,12 @@ tb_found:
 		} else {
 			tb->fastreuseport = 0;
 		}
+	} else {
+		if (!reuse)
+			tb->fastreuse = 0;
+		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+			tb->fastreuseport = 0;
 	}
-success:
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
 	WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
-- 
2.9.3

^ permalink raw reply related

* [PATCH 4/5 net-next] inet: split inet_csk_get_port into two functions
From: Josef Bacik @ 2016-12-20 20:07 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-1-git-send-email-jbacik@fb.com>

inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use.  Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 72 +++++++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index fc9bfe1..d3ccf62 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -84,34 +84,21 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 	return sk2 != NULL;
 }
 
-/* Obtain a reference to a local port for the given sock,
- * if snum is zero it means select any available local port.
- * We try to allocate an odd port (and leave even ports for connect())
+/*
+ * Find an open port number for the socket.  Returns with the
+ * inet_bind_hashbucket lock held.
  */
-int inet_csk_get_port(struct sock *sk, unsigned short snum)
+static struct inet_bind_hashbucket *
+inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *port_ret)
 {
-	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, port = snum;
+	int port = 0;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
-	kuid_t uid = sock_i_uid(sk);
 	u32 remaining, offset;
-	bool reuseport_ok = !!snum;
-	bool empty_tb = true;
 
-	if (port) {
-		head = &hinfo->bhash[inet_bhashfn(net, port,
-						  hinfo->bhash_size)];
-		spin_lock_bh(&head->lock);
-		inet_bind_bucket_for_each(tb, &head->chain)
-			if (net_eq(ib_net(tb), net) && tb->port == port)
-				goto tb_found;
-
-		goto tb_not_found;
-	}
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -150,13 +137,12 @@ other_parity_scan:
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
 				if (hlist_empty(&tb->owners))
 					goto success;
-				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) {
-					empty_tb = false;
+				if (!inet_csk_bind_conflict(sk, tb, false, false))
 					goto success;
-				}
 				goto next_port;
 			}
-		goto tb_not_found;
+		tb = NULL;
+		goto success;
 next_port:
 		spin_unlock_bh(&head->lock);
 		cond_resched();
@@ -171,8 +157,44 @@ next_port:
 		attempt_half = 2;
 		goto other_half_scan;
 	}
-	return ret;
+	return NULL;
+success:
+	*port_ret = port;
+	*tb_ret = tb;
+	return head;
+}
 
+/* Obtain a reference to a local port for the given sock,
+ * if snum is zero it means select any available local port.
+ * We try to allocate an odd port (and leave even ports for connect())
+ */
+int inet_csk_get_port(struct sock *sk, unsigned short snum)
+{
+	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
+	int ret = 1, port = snum;
+	struct inet_bind_hashbucket *head;
+	struct net *net = sock_net(sk);
+	struct inet_bind_bucket *tb = NULL;
+	kuid_t uid = sock_i_uid(sk);
+	bool empty_tb = true;
+
+	if (!port) {
+		head = inet_csk_find_open_port(sk, &tb, &port);
+		if (!head)
+			return 1;
+		if (!tb)
+			goto tb_not_found;
+		if (!hlist_empty(&tb->owners))
+			empty_tb = false;
+		goto success;
+	}
+	head = &hinfo->bhash[inet_bhashfn(net, port,
+					  hinfo->bhash_size)];
+	spin_lock_bh(&head->lock);
+	inet_bind_bucket_for_each(tb, &head->chain)
+		if (net_eq(ib_net(tb), net) && tb->port == port)
+			goto tb_found;
 tb_not_found:
 	tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
 				     net, head, port);
@@ -188,7 +210,7 @@ tb_found:
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
+		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
 		empty_tb = false;
 	}
-- 
2.9.3

^ permalink raw reply related

* [PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Josef Bacik @ 2016-12-20 20:07 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482264424-15439-1-git-send-email-jbacik@fb.com>

If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
Once one sk has made it onto the list we know that there are no potential bind
conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
we know we have to do an extra check for subsequent reuseport sockets and skip
the expensive bind conflict check.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/inet_hashtables.h   |  4 ++++
 net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 50f635c..b776401 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -74,12 +74,16 @@ struct inet_ehash_bucket {
  * users logged onto your box, isn't it nice to know that new data
  * ports are created in O(1) time?  I thought so. ;-)	-DaveM
  */
+#define FASTREUSEPORT_ANY	1
+#define FASTREUSEPORT_STRICT	2
+
 struct inet_bind_bucket {
 	possible_net_t		ib_net;
 	unsigned short		port;
 	signed char		fastreuse;
 	signed char		fastreuseport;
 	kuid_t			fastuid;
+	struct sock_common	fastsock;
 	int			num_owners;
 	struct hlist_node	node;
 	struct hlist_head	owners;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d3ccf62..9e29fad 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -164,6 +164,32 @@ success:
 	return head;
 }
 
+static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
+				     struct sock *sk)
+{
+	struct sock *sk2 = (struct sock *)&tb->fastsock;
+	kuid_t uid = sock_i_uid(sk);
+
+	if (tb->fastreuseport <= 0)
+		return 0;
+	if (!sk->sk_reuseport)
+		return 0;
+	if (rcu_access_pointer(sk->sk_reuseport_cb))
+		return 0;
+	if (!uid_eq(tb->fastuid, uid))
+		return 0;
+	/* We only need to check the rcv_saddr if this tb was once marked
+	 * without fastreuseport and then was reset, as we can only know that
+	 * the fastsock has no potential bind conflicts with the rest of the
+	 * possible socks on the owners list.
+	 */
+	if (tb->fastreuseport == FASTREUSEPORT_ANY)
+		return 1;
+	if (!inet_csk(sk)->icsk_af_ops->rcv_saddr_equal(sk, sk2, true))
+		return 0;
+	return 1;
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -206,9 +232,7 @@ tb_found:
 			goto success;
 
 		if ((tb->fastreuse > 0 && reuse) ||
-		     (tb->fastreuseport > 0 &&
-		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
+		    sk_reuseport_match(tb, sk))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
@@ -220,14 +244,35 @@ success:
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
 			tb->fastuid = uid;
+			memcpy(&tb->fastsock, &sk->__sk_common,
+			       sizeof(struct sock_common));
 		} else {
 			tb->fastreuseport = 0;
 		}
 	} else {
 		if (!reuse)
 			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+		if (sk->sk_reuseport) {
+			/* We didn't match or we don't have fastreuseport set on
+			 * the tb, but we have sk_reuseport set on this socket
+			 * and we know that there are no bind conflicts with
+			 * this socket in this tb, so reset our tb's reuseport
+			 * settings so that any subsequent sockets that match
+			 * our current socket will be put on the fast path.
+			 *
+			 * If we reset we need to set FASTREUSEPORT_STRICT so we
+			 * do extra checking for all subsequent sk_reuseport
+			 * socks.
+			 */
+			if (!sk_reuseport_match(tb, sk)) {
+				tb->fastreuseport = FASTREUSEPORT_STRICT;
+				tb->fastuid = uid;
+				memcpy(&tb->fastsock, &sk->__sk_common,
+				       sizeof(struct sock_common));
+			}
+		} else {
 			tb->fastreuseport = 0;
+		}
 	}
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
-- 
2.9.3

^ permalink raw reply related

* [ANNOUNCE] nftables 0.7 release
From: Pablo Neira Ayuso @ 2016-12-20 20:46 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, netfilter, netfilter-announce, lwn

[-- Attachment #1: Type: text/plain, Size: 10356 bytes --]

Hi!

The Netfilter project proudly presents:

        nftables 0.7

This release contains many accumulated bug fixes and new features
available up to the (upcoming) Linux 4.10-rc1 kernel release.

* Facilitate migration from iptables to nftables:

  At compilation time, you have to pass this option.

  # ./configure --with-xtables

  And libxtables needs to be installed in your system. This allows you
  to list a ruleset containing xt extensions loaded through
  iptables-compat-restore tool. The nft tool provides a native
  translation for iptables extensions (if available).

* Add new fib expression, which can be used to obtain the output
  interface from the route table based on either source or destination
  address of a packet. This can be used to e.g. add reverse path
  filtering, eg. drop if not coming from the same interface packet
  arrived on:

  # nft add rule x prerouting fib saddr . iif oif eq 0 drop

  Accept only if from eth:

  # nft add rule x prerouting fib saddr . iif oif eq "eth0" accept

  Accept if from any valid interface:

  # nft add rule x prerouting fib saddr oif accept

  Querying of address type is also supported, this can be used
  to only accept packets to addresses configured in the same
  interface, eg.

  # nft add rule x prerouting fib daddr . iif type local accept

  Its also possible to use mark and verdict map, eg,

  # nft add rule x prerouting \
        meta mark set 0xdead fib daddr . mark type vmap {
                blackhole : drop,
                prohibit : drop,
                unicast : accept
        }

* Support hashing of any arbitrary key combination, eg.

  # nft add rule x y \
        dnat to jhash ip saddr . tcp dport mod 2 map { \
                0 : 192.168.20.100, \
                1 : 192.168.30.100 \
        }

  Another usecase: Set packet marks based on any arbitrary hashing.

* Add number generation support. Useful for round-robin packet mark
  setting, eg.

  # nft add rule filter prerouting meta mark set numgen inc mod 2

  You can also specify an offset to indicate from what value you want
  to start from.

  The modulus provides the scale of the counting sequence. You can
  also use this from maps, eg.

  # nft add rule nat prerouting \
        dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 }

  So this is distributing new connections in a round-robin fashion
  between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT
  chain semantics: Only the first packet evaluates the rule, follow up
  packets rely on conntrack to apply the NAT information.

  You can also emulate flow distribution with different backend weights
  using intervals, eg.

  # nft add rule nat prerouting \
        dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200 }

* Add quota support, eg.

  # nft add rule filter input \
            flow table http { ip saddr timeout 60s quota over 50 mbytes } drop

  This creates a flow table, where every flow gets a quota of 50
  mbytes. You can also from use simple rules too to enforce quotas, of
  course.

* Introduce routing expression, for routing related data with support
  for nexthop (i.e. the directly connected IP address that an outgoing
  packet is sent to), which can be used either for matching or accounting, eg.

     # nft add rule filter postrouting \
          ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop

  This will drop any traffic to 192.168.1.0/24 that is not routed via
  192.168.0.1.

     # nft add rule filter postrouting \
          flow table acct { rt nexthop timeout 600s counter }

     # nft add rule ip6 filter postrouting \
          flow table acct { rt nexthop timeout 600s counter }

  These rules count outgoing traffic per nexthop. Note that the timeout
  releases an entry if no traffic is seen for this nexthop within 10
  minutes.

* Notrack support, to explicitly skip connection tracking for matching
  packets, eg.

     # nft add rule ip raw prerouting tcp dport { 80, 443 } notrack

  So you can skip tracking for http and https traffic.

* Support to set non-byte bound packet header fields, including
  checksum adjustment, eg. ip6 ecn set 1.

* Add 'create set' and 'create element' commands, eg.

     # nft add set x y { type ipv4_addr\; }
     # nft create set x y { type ipv4_addr\; }
     <cmdline>:1:1-35: Error: Could not process rule: File exists
     create set x y { type ipv4_addr; }
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     # nft add set x y { type ipv4_addr\; }
     #

  So 'create' bails out if the set already exists, while 'add'
  doesn't, for more ergonomic usage as several users requested on
  the mailing list.

* Allow to use variable reference for set element definitions, eg.

  # cat ruleset.nft
    define s-ext-2-int = { 10.10.10.10 . 25, 10.10.10.10 . 143 }

    table inet forward {
            set s-ext-2-int {
                 type ipv4_addr . inet_service
                 elements = $s-ext-2-int
            }
    }
  # nft -f ruleset.nft

  Useful to improve ruleset maintainability, as you can split out
  variable and set definitions from the filtering policy itself.

* Allow to use variable definitions from element commands, eg.

     define whitelist_v4 = { 1.1.1.1 }

     table inet filter {
        set whitelist_v4 { type ipv4_addr; }
     }

     add element inet filter whitelist_v4 $whitelist_v4

* Add support to flush set. You can use this new command to remove all
  existing elements in a set, eg.

  # nft flush set filter xyz

  Note that this requires (upcoming) Linux kernel 4.10-rc versions.

* Inverted set lookups, eg. tcp dport != { 80, 443 }.

* Honor absolute and relative paths via include file, where:

    include "./ruleset.nft"

  refers to a file in the working directory.

    include "ruleset.nft"

  refers to a file in the nftables root path (via sysconfdir), and:

    include "/etc/nftables/ruleset.nft"

  provides an absolute reference to the file that need to be included.
  This also solves an ambiguity if the same file name is used both under
  sysconfdir and the current working directory.

* Support log flags, to enable logging TCP sequence and options:

     # nft add rule x y log flags tcp sequence,options

  ... IP options, eg:

     # nft add rule x y log flags ip options

  ... socket UID, eg.

     # nft add rule x y log flags skuid

  ... decide ethernet link layer address, eg.

     # nft add rule x y log flags ether

  ... or simply set on all flags:

     # nft add rule x y log flags all

* tc classid parser support, eg.

    nft add rule filter forward meta priority abcd:1234

* Allow numeric connlabels, so if connlabel still works with undefined
  labels, eg. ct label set 2.

* Document log, reject, counter, meta, limit, nat, ct, payload and
  queue statements from nft(8) manpage.

Bugfixes
========

Not strictly limited to this list below, but some highlights:

* Allow split table definitions, eg.

  # cat ruleset.nft
  table inet filter {
       chain ssh {
               type filter hook input priority 0; policy accept;
               tcp dport ssh accept;
       }
  }
  table inet filter {
       chain input {
               type filter hook input priority 1; policy drop;
       }
  }
  # nft -f ruleset.nft

* Use new range expression to represent inverted intervals, eg.
  ip saddr != 1.1.1.1-2.2.2.2, since previously generated bytecode was
  not correct.
* Solve endianness problems with link layer address.
* Fix parser to keep map flag around on definition.
* Skip timeout attribute in dynamic set updates, other kernel bails
  out with EINVAL.
* Restore parsing of dynamic set element updates.
* The time datatype now uses milliseconds, as the kernel expects.
* Allow numeric interface index numbers, eg. in meta iif, oif.
* Fix monitor trace crash with netdev family.
* Flow table with concatenation fixes.
* Keep element comments around when using set intervals.
* Fixed memory corruption in userspace when deleting lots of elements
  in one go via nft -f.
* Several nft internal cache fixes, including cache reset on 'flush
  ruleset'.
* Restore parens on right-hand side of relational expression.
* Replace getnameinfo() by internal lookup table, so we don't rely on
  /etc/services anymore for service names, so we restrict them to
  a well-known set that is supported by our scanner. You can list
  service names via 'nft describe tcp dport'.
* Display symbol table values in the right hostbyte order and
  decimal/hexadecimal representation.
* Fix a nasty bug in the set interval code triggering huge memory
  consumption in userspace for set and map intervals with runtime
  updates.

We also got lots more tests added to our infrastructure to catch up
regressions.

Syntax updates
==============

Several minor syntax updates, although previous syntax has been
preserved by now to facilitate transition, the new one is prefered:

* Consistency grammar fixes: 'snat' and 'dnat' now require 'to', eg.
  snat to 1.2.3.4. For consistency with existing statements such as
  redirect, masquerade, dup and fwd. Moreover, add colon after 'to' in
  'redirect' for consistency with nat and masq statements.

* Allow ct l3proto/protocol without direction since they are unrelated
  to the direction.

* Explicit ruleset exportation, eg. nft export ruleset json, for
  consistency with other existing ruleset commands.

* Always quote user-defined strings from rules when listing them.

* Support for RFC2732 IPv6 address format with brackets, eg.

  dnat to [2001:838:35f:1::]:80

* Allow strings starting by underscores and dots in user-define
  strings, conforming with POSIX.1-2008 (which is simultaneously IEEE
  Std 1003.1-2008).

Resources
=========

The nftables code can be obtained from:

* http://netfilter.org/projects/nftables/downloads.html
* ftp://ftp.netfilter.org/pub/nftables
* git://git.netfilter.org/nftables

To build the code, libnftnl 1.0.7 and libmnl >= 1.0.2 are required:

* http://netfilter.org/projects/libnftnl/index.html
* http://netfilter.org/projects/libmnl/index.html

Visit our wikipage for user documentation at:

* http://wiki.nftables.org

For the manpage reference, check man(8) nft.

In case of bugs and feature request, file them via:

* https://bugzilla.netfilter.org

Make sure you create no duplicates already, thanks!

Happy holidays!

[-- Attachment #2: changes-nftables-0.7.txt --]
[-- Type: text/plain, Size: 9551 bytes --]

Anatole Denis (7):
      evaluate: Add set to cache only when well-formed
      tests: Add regression test for malformed sets
      Revert "evaluate: check for NULL datatype in rhs in lookup expr"
      src: Interpret OP_NEQ against a set as OP_LOOKUP
      tests/py: Unmask negative set lookup
      rule: Introduce helper function cache_flush
      evaluate: Update cache on flush ruleset

Anders K. Pedersen (4):
      rt: introduce routing expression
      Replace tests/files/expr-rt with Python based tests, and replace ether type     with meta nfproto, which generates a bit fewer instructions.
      evaluate: Allow concatenation of rt nexthop etc.
      doc: fix synopsis for ct expression

Arturo Borrero (3):
      tests: shell: delete unused variable in run-tests.sh
      tests: shell: cleanup tempfile handling in testcases/sets/cache_handling_0
      tests: shell: run-tests.sh: use src/nft binary by default

Arturo Borrero Gonzalez (12):
      tests: shell: update kernel modules to clean
      xt: update Arturo Borrero Gonzalez email address
      tests: shell: delete useless stderr output in testcase
      tests: shell: introduce the cache testcases directory
      tests: shell: add a new testcase for ruleset loading bug
      tests: shell: add testcases for comments in set elements
      tests: shell: allow to execute a single testcase
      tests: shell: testcase for adding many set elements
      tests: shell: testcase for deleting many set elements
      tests: shell: another testcase for deleting many set elements
      tests: shell: add a testcase for many defines
      tests: shell: add testcase for different defines usage

Carlos Falgueras García (1):
      src: Simplify parser rule_spec tree

Elise Lennion (4):
      datatype: Replace getnameinfo() by internal lookup table
      datatype: Display pre-defined inet_service values in host byte order
      datatype: Display pre-defined inet_service values in decimal base
      expression: Show the base which pre-defined constants are displayed

Florian Westphal (30):
      payload: don't update protocol context if we can't find a description
      meta: add random support
      meta: add tests for meta random
      ct: use nftables sysconf location for connlabel configuration
      tests: add basic payload tests
      tests: add ether payload set test
      netlink: add __binop_adjust helper
      payload: print base and raw values for unknown payloads
      evaluate: add small helper to check if payload expr needs binop adjustment
      evaluate: add support to set IPv6 non-byte header fields
      netlink: decode payload statment
      tests: ip6 dscp, flowlabel and ecn test cases
      netlink: make checksum fixup work with odd-sized header fields
      tests: ip payload set support for ecn and dscp
      ct: allow numeric conntrack labels
      ct: display bit number instead of raw value
      doc: update meta expression
      doc: payload and conntrack statement
      datatype: ll: use big endian byte ordering
      tests: catch ordering issue w. ether set
      payload: remove byteorder conversion
      meta: permit numeric interface type
      netlink: fix monitor trace crash with netdev family
      meta: fix pkttype name and add 'other' symbol
      utils: provide snprintf helper macro
      ct: allow resolving ct keys at run time
      meta: allow resolving meta keys at run time
      src: add fib expression
      Revert "tests: py: nft-tests.py: Add function for loading and removing kernel modules"
      bison: remove old log level tokens

Jon Jensen (1):
      Correct description of -n/--numeric option

Laura Garcia Liebana (5):
      doc: Update datatypes
      src: add offset attribute for numgen expression
      netlink: fix linearize numgen type
      src: make hash seed attribute optional
      src: add offset attribute for hash expression

Liping Zhang (14):
      tests: shell: make testcases which using tcp/udp port more rubost
      tests: shell: add endless jump loop tests
      parser_bison: keep snat/dnat existing syntax unchanged
      tests: shell: add testcase for reject expr
      meta: fix memory leak in tc classid parser
      tests: py: replace "eth0" with "lo" in dup expr tests
      src: fix compile error due to _UNTIL renamed to _MODULUS in libnftnl
      tests: py: add more test cases for queue expr
      tests: py: fix numgen case failed due to changes in libnftnl
      src: support ct l3proto/protocol without direction syntax
      ct: fix "ct l3proto/protocol" syntax broken
      log: rename the log level "warning" to "warn"
      src: add log flags syntax support
      tests: shell: add test case for inserting element into verdict map

Manuel Johannes Messner (3):
      tests: py: nft-tests.py: Add function for loading and removing kernel modules
      tests: py: any: Make tests more generic by using other interfaces
      tests: py: any: Remove duplicate tests

Nicholas Vinson (1):
      nft: configure.ac: Replace magic dblatex dep.

Pablo Neira (2):
      src: expose delinearize/linearize structures and stmt_error()
      src: trigger layer 4 checksum when pseudoheader fields are modified

Pablo Neira Ayuso (71):
      src: use new definitions from libnftnl
      segtree: don't check for overlaps if set definition is empty
      tests: shell: cover transactions via nft -f using flat syntax
      datatype: time_type should send milliseconds to userspace
      parser_bison: restore parsing of dynamic set element updates
      netlink_linearize: skip NFTNL_EXPR_DYNSET_TIMEOUT attribute if timeout is unset
      include: cache ip_tables.h, ip6_tables.h, arp_tables.h and ebtables.h
      src: add xt compat support
      parser_bison: fix typo in symbol redefinition error reporting
      tests: shell: make sure split table definition works via nft -f
      xt: use struct xt_xlate_{mt,tg}_params
      parser_bison: keep map flag around when flags are specified
      scanner: honor absolute and relative paths via include file
      scanner: don't fall back on current directory if include is not found
      scanner: don't break line on include error message
      tests: tests to include files
      ct: add missing slash to connlabel path
      ct: release ct_label table on exit
      src: quote user-defined strings when used from rule selectors
      src: add 'to' for snat and dnat
      src: support for RFC2732 IPv6 address format with brackets
      parser_bison: missing token string in QUOTED_ASTERISK and ASTERISK_STRING
      scanner: allow strings starting by underscores and dots
      scanner: remove range expression
      src: rename datatype name from tc_handle to classid
      src: simplify classid printing using %x instead of %04x
      src: meta priority support using tc classid
      parser_bison: redirect to :port for consistency with nat/masq statement
      parser_bison: explicit indication on export ruleset
      src: add create set command
      tests: shell: cover add and create set command
      src: create element command
      tests: shell: cover add and create set command
      include: refresh uapi/linux/netfilter/nf_tables.h copy
      tests: py: adapt it to new add element command semantics
      src: add quota statement
      src: add numgen expression
      src: add hash expression
      evaluate: add expr_evaluate_integer()
      evaluate: validate maximum hash and numgen value
      parser_bison: add variable_expr rule
      parser_bison: allow variable references in set elements definition
      tests: py: adapt netlink bytecode output of numgen and hash
      evaluate: display expression, statement and command name on debug
      netlink_delinearize: Avoid potential null pointer deref
      doc: nft: add my copyright statement to the manpage
      doc: nft: document log, reject, counter, meta, limit, nat and queue statements
      src: use new range expression for != [a,b] intervals
      parser_bison: allow to use variable to add/create/delete elements
      src: don't need keyword for log level
      parser: add offset keyword and parser rule
      tests/py: add missing payload test for numgen offset
      netlink_linearize: skip set element expression in flow table key
      segtree: keep element comments in set intervals
      tests: py: add some testcases for log flags
      tests: py: missing range conversion in icmpv6
      src: add notrack support
      mnl: use nftnl_set_elems_nlmsg_build_payload_iter() when deleting elements
      include: refresh nf_tables.h header
      datatype: honor -nn option from inet_service_type_print()
      evaluate: return ctx->table from table_lookup_global()
      src: add support to flush sets
      segtree: wrong prefix expression length on interval_map_decompose()
      segtree: don't trigger error on exact overlaps
      mnl: don't send empty set elements netlink message to kernel
      tests: py: update quota and payload
      netlink_linearize: fix IPv6 layer 4 checksum mangling
      mnl: add mnl_nft_setelem_batch_flush() and use it from netlink_flush_setelems()
      xt: use NFTNL_* definitions
      configure: Bump version to v0.7
      include: Missing noinst_HEADERS updates

Phil Sutter (5):
      evaluate: Fix datalen checks in expr_evaluate_string()
      evaluate: reject: Have a generic fix for missing network context
      evaluate: Avoid undefined behaviour in concat_subtype_id()
      parser_bison: Allow parens on RHS of relational_expr
      tests: py: Test TCP flags match with parentheses

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2016-12-20 21:02 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Use rb_entry() instead of hardcoded container_of(), from Geliang Tang.

2) Use correct memory barriers in stammac driver, from Pavel Machek.

3) Fix assoc bind address handling in SCTP, from Xin Long.

4) Make the length check for UFO handling consistent between
   __ip_append_data() and ip_finish_output(), from Zheng Li.

5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
   Li.

6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
   Yadav.

Please pull, thanks a lot!

The following changes since commit 52f40e9d657cc126b766304a5dd58ad73b02ff46:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-12-17 20:17:04 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to a763f78cea845c91b8d91f93dabf70c407635dc5:

  RDS: use rb_entry() (2016-12-20 14:22:49 -0500)

----------------------------------------------------------------
Arvind Yadav (1):
      net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from devm_ioremap

David S. Miller (4):
      Merge branch 'phy-broken-modes'
      Merge branch 'fsl-fixes'
      Merge branch 'hix5hd2_gmac-compatible-string'
      Merge branch 'sctp-fixes'

Dongpo Li (2):
      net: hix5hd2_gmac: fix compatible strings name
      ARM: dts: hix5hd2: don't change the existing compatible string

Geliang Tang (4):
      net/mlx5: use rb_entry()
      net_sched: sch_fq: use rb_entry()
      net_sched: sch_netem: use rb_entry()
      RDS: use rb_entry()

Jarno Rajahalme (1):
      openvswitch: Add a missing break statement.

Madalin Bucur (4):
      fsl/fman: fix 1G support for QSGMII interfaces
      powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
      fsl/fman: A007273 only applies to PPC SoCs
      fsl/fman: enable compilation on ARM64

Pavel Machek (1):
      stmmac: fix memory barriers

Tobias Klauser (1):
      ethernet: sfc: Add Kconfig entry for vendor Solarflare

WingMan Kwok (2):
      net: netcp: ethss: fix errors in ethtool ops
      net: netcp: ethss: fix 10gbe host port tx pri map configuration

Xin Long (2):
      sctp: reduce indent level in sctp_copy_local_addr_list
      sctp: not copying duplicate addrs to the assoc's bind address list

jbrunet (3):
      net: phy: fix sign type error in genphy_config_eee_advert
      net: phy: use boolean dt properties for eee broken modes
      dt: bindings: net: use boolean dt properties for eee broken modes

zheng li (1):
      ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output

 Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt | 13 ++++++++-----
 Documentation/devicetree/bindings/net/phy.txt                    | 10 ++++++++--
 arch/arm/boot/dts/hisi-x5hd2.dtsi                                |  4 ++--
 arch/powerpc/platforms/85xx/corenet_generic.c                    |  3 ---
 drivers/net/ethernet/Kconfig                                     |  1 -
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c                 |  6 ++++++
 drivers/net/ethernet/freescale/fman/Kconfig                      |  2 +-
 drivers/net/ethernet/freescale/fman/fman.c                       | 15 +++++++++++++++
 drivers/net/ethernet/freescale/fman/mac.c                        |  1 +
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c                    | 13 +++++++------
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c            |  2 +-
 drivers/net/ethernet/sfc/Kconfig                                 | 21 +++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c               |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c                   |  2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c                |  8 ++++----
 drivers/net/ethernet/ti/netcp_ethss.c                            | 24 ++++++++++++++++++------
 drivers/net/phy/phy_device.c                                     | 22 +++++++++++++++++-----
 include/dt-bindings/net/mdio.h                                   | 19 -------------------
 net/ipv4/ip_output.c                                             |  2 +-
 net/openvswitch/flow_netlink.c                                   |  1 +
 net/rds/rdma.c                                                   |  2 +-
 net/sched/sch_fq.c                                               | 14 +++++++-------
 net/sched/sch_netem.c                                            |  2 +-
 net/sctp/bind_addr.c                                             |  3 +++
 net/sctp/protocol.c                                              | 40 ++++++++++++++++++++++------------------
 25 files changed, 148 insertions(+), 86 deletions(-)
 delete mode 100644 include/dt-bindings/net/mdio.h

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Theodore Ts'o @ 2016-12-20 21:36 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Jean-Philippe Aumasson, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds
In-Reply-To: <CAHmME9rPmH=wP_eHYopt8ZPG9TSN7bos3fGOuqKL2HjQW-2SWA@mail.gmail.com>

On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
> 1) Anything that requires actual long-term security will use
> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> things like TCP sequence numbers. This seems pretty uncontroversial to
> me. Seem okay to you?

Um, why do TCP sequence numbers need long-term security?  So long as
you rekey every 5 minutes or so, TCP sequence numbers don't need any
more security than that, since even if you break the key used to
generate initial sequence numbers seven a minute or two later, any
pending TCP connections will have timed out long before.

See the security analysis done in RFC 6528[1], where among other
things, it points out why MD5 is acceptable with periodic rekeying,
although there is the concern that this could break certain hueristics
used when establishing new connections during the TIME-WAIT state.

[1] https://tools.ietf.org/html/rfc6528

						- Ted

^ permalink raw reply

* [PATCH net-next 00/10] netcp: enhancements and minor fixes
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	grygorii.strashko-l0cyMroinI0, mugunthanvnm-l0cyMroinI0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, devicetree-u79uwXL29TY76Z2rM5mHXA,
	mark.rutland-5wv7dgnIgG8, robh+dt-DgEjT+Ai2ygdnm+yROfE0A

This series is for net-next. This propagates enhancements and minor
bug fixes from internal version of the driver to keep the upstream
in sync. Please review and apply if this looks good.

Tested on all of K2HK/E/L boards.

Thanks
Murali Karicheri

Michael Scherban (1):
  net: netcp: store network statistics in 64 bits

Murali Karicheri (7):
  net: netcp: extract eflag from desc for rx_hook handling
  net: netcp: remove the redundant memmov()
  net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY
  net: netcp: use hw capability to remove FCS word from rx packets
  net: netcp: ale: update to support unknown vlan controls for NU switch
  net: netcp: ale: use ale_status to size the ale table
  net: netcp: ale: add proper ale entry mask bits for netcp switch ALE

WingMan Kwok (2):
  net: netcp: ethss: add support of subsystem register region regmap
  net: netcp: ethss: add support of 10gbe pcsr link status

 .../devicetree/bindings/net/keystone-netcp.txt     |  19 +-
 drivers/net/ethernet/ti/cpsw_ale.c                 | 180 ++++++++++++++++---
 drivers/net/ethernet/ti/cpsw_ale.h                 |  17 +-
 drivers/net/ethernet/ti/netcp.h                    |  21 +++
 drivers/net/ethernet/ti/netcp_core.c               | 102 ++++++++---
 drivers/net/ethernet/ti/netcp_ethss.c              | 200 +++++++++++++++++----
 include/linux/soc/ti/knav_dma.h                    |   2 +
 7 files changed, 456 insertions(+), 85 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

From: WingMan Kwok <w-kwok2@ti.com>

10gbe phy driver needs to access the 10gbe subsystem control
register during phy initialization. To facilitate the shared
access of the subsystem register region between the 10gbe Ethernet
driver and the phy driver, this patch adds support of the
subsystem register region defined by a syscon node in the dts.

Although there is no shared access to the gbe subsystem register
region, using syscon for that is for the sake of consistency.

This change is backward compatible with previously released gbe
devicetree bindings.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 .../devicetree/bindings/net/keystone-netcp.txt     |  16 ++-
 drivers/net/ethernet/ti/netcp_ethss.c              | 140 +++++++++++++++++----
 2 files changed, 127 insertions(+), 29 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
index 04ba1dc..0854a73 100644
--- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
+++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
@@ -72,20 +72,24 @@ Required properties:
 		"ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
 		"ti,netcp-xgbe" for 10 GbE
 
+- syscon-subsys:	phandle to syscon node of the switch
+			subsystem registers.
+
 - reg:		register location and the size for the following register
 		regions in the specified order.
 		- switch subsystem registers
+		- sgmii module registers
 		- sgmii port3/4 module registers (only for NetCP 1.4)
 		- switch module registers
 		- serdes registers (only for 10G)
 
 		NetCP 1.4 ethss, here is the order
-			index #0 - switch subsystem registers
+			index #0 - sgmii module registers
 			index #1 - sgmii port3/4 module registers
 			index #2 - switch module registers
 
 		NetCP 1.5 ethss 9 port, 5 port and 2 port
-			index #0 - switch subsystem registers
+			index #0 - sgmii module registers
 			index #1 - switch module registers
 			index #2 - serdes registers
 
@@ -145,6 +149,11 @@ Optional properties:
 
 Example binding:
 
+gbe_subsys: subsys@2090000 {
+	compatible = "syscon";
+	reg = <0x02090000 0x100>;
+};
+
 netcp: netcp@2000000 {
 	reg = <0x2620110 0x8>;
 	reg-names = "efuse";
@@ -163,7 +172,8 @@ netcp: netcp@2000000 {
 		ranges;
 		gbe@90000 {
 			label = "netcp-gbe";
-			reg = <0x90000 0x300>, <0x90400 0x400>, <0x90800 0x700>;
+			syscon-subsys = <&gbe_subsys>;
+			reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
 			/* enable-ale; */
 			tx-queue = <648>;
 			tx-channel = <8>;
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index c7e547e..473edda1 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -19,9 +19,11 @@
  */
 
 #include <linux/io.h>
+#include <linux/mfd/syscon.h>
 #include <linux/module.h>
 #include <linux/of_mdio.h>
 #include <linux/of_address.h>
+#include <linux/regmap.h>
 #include <linux/if_vlan.h>
 #include <linux/ptp_classify.h>
 #include <linux/net_tstamp.h>
@@ -43,7 +45,10 @@
 #define GBE_MODULE_NAME			"netcp-gbe"
 #define GBE_SS_VERSION_14		0x4ed21104
 
+/* for devicetree backward compatible only */
 #define GBE_SS_REG_INDEX		0
+
+#define GBE_SGMII_REG_INDEX		0
 #define GBE_SGMII34_REG_INDEX		1
 #define GBE_SM_REG_INDEX		2
 /* offset relative to base of GBE_SS_REG_INDEX */
@@ -71,9 +76,11 @@
 #define IS_SS_ID_NU(d) \
 	(GBE_IDENT((d)->ss_version) == GBE_SS_ID_NU)
 
-#define GBENU_SS_REG_INDEX		0
+#define GBENU_SGMII_REG_INDEX		0
 #define GBENU_SM_REG_INDEX		1
+/* offset relative to base of GBE_SS_REG_INDEX */
 #define GBENU_SGMII_MODULE_OFFSET	0x100
+/* offset relative to base of GBENU_SM_REG_INDEX */
 #define GBENU_HOST_PORT_OFFSET		0x1000
 #define GBENU_SLAVE_PORT_OFFSET		0x2000
 #define GBENU_EMAC_OFFSET		0x2330
@@ -82,13 +89,12 @@
 #define GBENU_ALE_OFFSET		0x1e000
 #define GBENU_HOST_PORT_NUM		0
 #define GBENU_NUM_ALE_ENTRIES		1024
-#define GBENU_SGMII_MODULE_SIZE		0x100
 
 /* 10G Ethernet SS defines */
 #define XGBE_MODULE_NAME		"netcp-xgbe"
 #define XGBE_SS_VERSION_10		0x4ee42100
 
-#define XGBE_SS_REG_INDEX		0
+#define XGBE_SGMII_REG_INDEX		0
 #define XGBE_SM_REG_INDEX		1
 #define XGBE_SERDES_REG_INDEX		2
 
@@ -173,6 +179,7 @@
 #define XGBE_SET_REG_OFS(p, rb, rn) p->rb##_ofs.rn = \
 		offsetof(struct xgbe##_##rb, rn)
 #define GBE_REG_ADDR(p, rb, rn) (p->rb + p->rb##_ofs.rn)
+#define GBE_REG_OFS(p, rb, rn) ((p)->rb##_ofs.rn)
 
 #define HOST_TX_PRI_MAP_DEFAULT			0x00000000
 
@@ -225,6 +232,7 @@
 /* The PTP event messages - Sync, Delay_Req, Pdelay_Req, and Pdelay_Resp. */
 #define EVENT_MSG_BITS (BIT(0) | BIT(1) | BIT(2) | BIT(3))
 #endif /* CONFIG_TI_CPTS */
+#define SGMII_MODULE_SIZE			0x100
 
 struct xgbe_ss_regs {
 	u32	id_ver;
@@ -716,7 +724,9 @@ struct gbe_priv {
 	u32				ss_version;
 	u32				stats_en_mask;
 
-	void __iomem			*ss_regs;
+	struct regmap			*ss_regmap;
+	struct regmap			*pcsr_regmap;
+	void __iomem                    *ss_regs;
 	void __iomem			*switch_regs;
 	void __iomem			*host_port_regs;
 	void __iomem			*ale_reg;
@@ -2192,7 +2202,7 @@ static void gbe_port_config(struct gbe_priv *gbe_dev, struct gbe_slave *slave,
 			    int max_rx_len)
 {
 	void __iomem *rx_maxlen_reg;
-	u32 xgmii_mode;
+	int ret;
 
 	if (max_rx_len > NETCP_MAX_FRAME_SIZE)
 		max_rx_len = NETCP_MAX_FRAME_SIZE;
@@ -2200,9 +2210,16 @@ static void gbe_port_config(struct gbe_priv *gbe_dev, struct gbe_slave *slave,
 	/* Enable correct MII mode at SS level */
 	if ((gbe_dev->ss_version == XGBE_SS_VERSION_10) &&
 	    (slave->link_interface >= XGMII_LINK_MAC_PHY)) {
-		xgmii_mode = readl(GBE_REG_ADDR(gbe_dev, ss_regs, control));
-		xgmii_mode |= (1 << slave->slave_num);
-		writel(xgmii_mode, GBE_REG_ADDR(gbe_dev, ss_regs, control));
+		ret = regmap_update_bits(gbe_dev->ss_regmap,
+					 GBE_REG_OFS(gbe_dev, ss_regs, control),
+					 1 << slave->slave_num,
+					 1 << slave->slave_num);
+
+		if (ret) {
+			dev_err(gbe_dev->dev,
+				"regmap update xgmii mode bit Failed\n");
+			return;
+		}
 	}
 
 	if (IS_SS_ID_MU(gbe_dev))
@@ -3127,35 +3144,46 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 	void __iomem *regs;
 	int ret, i;
 
-	ret = of_address_to_resource(node, XGBE_SS_REG_INDEX, &res);
+	gbe_dev->ss_regmap = syscon_regmap_lookup_by_phandle(node,
+							     "syscon-subsys");
+
+	if (IS_ERR(gbe_dev->ss_regmap)) {
+		dev_err(gbe_dev->dev,
+			"subsys regmap lookup failed: %ld\n",
+			PTR_ERR(gbe_dev->ss_regmap));
+		return PTR_ERR(gbe_dev->ss_regmap);
+	}
+
+	ret = of_address_to_resource(node, XGBE_SM_REG_INDEX, &res);
 	if (ret) {
 		dev_err(gbe_dev->dev,
-			"Can't xlate xgbe of node(%s) ss address at %d\n",
-			node->name, XGBE_SS_REG_INDEX);
+			"Can't xlate xgbe of node(%s) sm address at %d\n",
+			node->name, XGBE_SM_REG_INDEX);
 		return ret;
 	}
 
 	regs = devm_ioremap_resource(gbe_dev->dev, &res);
 	if (IS_ERR(regs)) {
-		dev_err(gbe_dev->dev, "Failed to map xgbe ss register base\n");
+		dev_err(gbe_dev->dev, "Failed to map xgbe sm register base\n");
 		return PTR_ERR(regs);
 	}
-	gbe_dev->ss_regs = regs;
+	gbe_dev->switch_regs = regs;
 
-	ret = of_address_to_resource(node, XGBE_SM_REG_INDEX, &res);
+	ret = of_address_to_resource(node, XGBE_SGMII_REG_INDEX, &res);
 	if (ret) {
 		dev_err(gbe_dev->dev,
-			"Can't xlate xgbe of node(%s) sm address at %d\n",
-			node->name, XGBE_SM_REG_INDEX);
+			"Can't xlate xgbe of node(%s) sgmii address at %d\n",
+			node->name, XGBE_SGMII_REG_INDEX);
 		return ret;
 	}
 
 	regs = devm_ioremap_resource(gbe_dev->dev, &res);
 	if (IS_ERR(regs)) {
-		dev_err(gbe_dev->dev, "Failed to map xgbe sm register base\n");
+		dev_err(gbe_dev->dev,
+			"Failed to map xgbe sgmii register base\n");
 		return PTR_ERR(regs);
 	}
-	gbe_dev->switch_regs = regs;
+	gbe_dev->sgmii_port_regs = regs;
 
 	ret = of_address_to_resource(node, XGBE_SERDES_REG_INDEX, &res);
 	if (ret) {
@@ -3171,6 +3199,8 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 		return PTR_ERR(regs);
 	}
 	gbe_dev->xgbe_serdes_regs = regs;
+	gbe_dev->sgmii_port34_regs = gbe_dev->sgmii_port_regs +
+				     (2 * SGMII_MODULE_SIZE);
 
 	gbe_dev->num_stats_mods = gbe_dev->max_num_ports;
 	gbe_dev->et_stats = xgbe10_et_stats;
@@ -3195,9 +3225,9 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 	}
 
 	gbe_dev->ss_version = XGBE_SS_VERSION_10;
-	gbe_dev->sgmii_port_regs = gbe_dev->ss_regs +
-					XGBE10_SGMII_MODULE_OFFSET;
-	gbe_dev->host_port_regs = gbe_dev->ss_regs + XGBE10_HOST_PORT_OFFSET;
+
+	gbe_dev->host_port_regs = gbe_dev->switch_regs +
+					XGBE10_HOST_PORT_OFFSET;
 
 	for (i = 0; i < gbe_dev->max_num_ports; i++)
 		gbe_dev->hw_stats_regs[i] = gbe_dev->switch_regs +
@@ -3228,8 +3258,8 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 	return 0;
 }
 
-static int get_gbe_resource_version(struct gbe_priv *gbe_dev,
-				    struct device_node *node)
+static int get_gbe_resource_version_ss_regs(struct gbe_priv *gbe_dev,
+					    struct device_node *node)
 {
 	struct resource res;
 	void __iomem *regs;
@@ -3248,8 +3278,27 @@ static int get_gbe_resource_version(struct gbe_priv *gbe_dev,
 		dev_err(gbe_dev->dev, "Failed to map gbe register base\n");
 		return PTR_ERR(regs);
 	}
+
 	gbe_dev->ss_regs = regs;
 	gbe_dev->ss_version = readl(gbe_dev->ss_regs);
+	gbe_dev->ss_regmap = NULL;
+	return 0;
+}
+
+static int get_gbe_resource_version(struct gbe_priv *gbe_dev,
+				    struct device_node *node)
+{
+	gbe_dev->ss_regmap = syscon_regmap_lookup_by_phandle(node,
+							     "syscon-subsys");
+	if (IS_ERR(gbe_dev->ss_regmap)) {
+		dev_dbg(gbe_dev->dev,
+			"subsys regmap lookup failed: %ld. try reg property\n",
+			PTR_ERR(gbe_dev->ss_regmap));
+		return get_gbe_resource_version_ss_regs(gbe_dev, node);
+	}
+
+	regmap_read(gbe_dev->ss_regmap, 0, &gbe_dev->ss_version);
+	gbe_dev->ss_regs = NULL;
 	return 0;
 }
 
@@ -3260,6 +3309,27 @@ static int set_gbe_ethss14_priv(struct gbe_priv *gbe_dev,
 	void __iomem *regs;
 	int i, ret;
 
+	if (gbe_dev->ss_regs) {
+		gbe_dev->sgmii_port_regs = gbe_dev->ss_regs +
+					   GBE13_SGMII_MODULE_OFFSET;
+	} else {
+		ret = of_address_to_resource(node, GBE_SGMII_REG_INDEX, &res);
+		if (ret) {
+			dev_err(gbe_dev->dev,
+				"Can't translate of gbe node(%s) address at index %d\n",
+				node->name, GBE_SGMII_REG_INDEX);
+			return ret;
+		}
+
+		regs = devm_ioremap_resource(gbe_dev->dev, &res);
+		if (IS_ERR(regs)) {
+			dev_err(gbe_dev->dev,
+				"Failed to map gbe sgmii port register base\n");
+			return PTR_ERR(regs);
+		}
+		gbe_dev->sgmii_port_regs = regs;
+	}
+
 	ret = of_address_to_resource(node, GBE_SGMII34_REG_INDEX, &res);
 	if (ret) {
 		dev_err(gbe_dev->dev,
@@ -3314,7 +3384,6 @@ static int set_gbe_ethss14_priv(struct gbe_priv *gbe_dev,
 		return -ENOMEM;
 	}
 
-	gbe_dev->sgmii_port_regs = gbe_dev->ss_regs + GBE13_SGMII_MODULE_OFFSET;
 	gbe_dev->host_port_regs = gbe_dev->switch_regs + GBE13_HOST_PORT_OFFSET;
 
 	/* K2HK has only 2 hw stats modules visible at a time, so
@@ -3402,14 +3471,33 @@ static int set_gbenu_ethss_priv(struct gbe_priv *gbe_dev,
 	}
 	gbe_dev->switch_regs = regs;
 
-	gbe_dev->sgmii_port_regs = gbe_dev->ss_regs + GBENU_SGMII_MODULE_OFFSET;
+	if (gbe_dev->ss_regs) {
+		gbe_dev->sgmii_port_regs = gbe_dev->ss_regs +
+					   GBENU_SGMII_MODULE_OFFSET;
+	} else {
+		ret = of_address_to_resource(node, GBENU_SGMII_REG_INDEX, &res);
+		if (ret) {
+			dev_err(gbe_dev->dev,
+				"Can't translate of gbenu node(%s) addr at index %d\n",
+				node->name, GBENU_SGMII_REG_INDEX);
+			return ret;
+		}
+
+		regs = devm_ioremap_resource(gbe_dev->dev, &res);
+		if (IS_ERR(regs)) {
+			dev_err(gbe_dev->dev,
+				"Failed to map gbenu sgmii port register base\n");
+			return PTR_ERR(regs);
+		}
+		gbe_dev->sgmii_port_regs = regs;
+	}
 
 	/* Although sgmii modules are mem mapped to one contiguous
 	 * region on GBENU devices, setting sgmii_port34_regs allows
 	 * consistent code when accessing sgmii api
 	 */
 	gbe_dev->sgmii_port34_regs = gbe_dev->sgmii_port_regs +
-				     (2 * GBENU_SGMII_MODULE_SIZE);
+				     (2 * SGMII_MODULE_SIZE);
 
 	gbe_dev->host_port_regs = gbe_dev->switch_regs + GBENU_HOST_PORT_OFFSET;
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

From: WingMan Kwok <w-kwok2@ti.com>

The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
functionality of a physical coding sublayer (PCS) on data being
transferred between a demuxed XGMII and SerDes supporting a 16
or 32 bit interface.  From the driver point of view, whether
a ethernet link is up or not depends also on the status of the
block-lock bit of the PCSR.  This patch adds the checking of that
bit in order to determine the link status.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 .../devicetree/bindings/net/keystone-netcp.txt     |  3 ++
 drivers/net/ethernet/ti/netcp_ethss.c              | 37 ++++++++++++++++++++--
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
index 0854a73..57fc13f 100644
--- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
+++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
@@ -75,6 +75,9 @@ Required properties:
 - syscon-subsys:	phandle to syscon node of the switch
 			subsystem registers.
 
+- syscon-pcsr:		(10gbe only) phandle to syscon node of the
+			switch PCSR registers.
+
 - reg:		register location and the size for the following register
 		regions in the specified order.
 		- switch subsystem registers
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 473edda1..cb48f88 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -63,6 +63,12 @@
 #define GBE13_ALE_OFFSET		0x600
 #define GBE13_HOST_PORT_NUM		0
 #define GBE13_NUM_ALE_ENTRIES		1024
+/* offset relative to PCSR regmap */
+#define XGBE10_PCSR_OFFSET(x)		((x) * 0x80)
+#define XGBE10_PCSR_RX_STATUS(x)	(XGBE10_PCSR_OFFSET(x) + 0x0C)
+
+#define XGBE10_PCSR_BLOCK_LOCK_MASK	BIT(30)
+#define XGBE10_PCSR_BLOCK_LOCK_SHIFT	30
 
 /* 1G Ethernet NU SS defines */
 #define GBENU_MODULE_NAME		"netcp-gbenu"
@@ -2111,6 +2117,10 @@ static void netcp_ethss_link_state_action(struct gbe_priv *gbe_dev,
 
 	if (phy)
 		phy_print_status(phy);
+	else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) {
+		netdev_printk(KERN_INFO, ndev,
+			      "Link is %s\n", (up ? "Up" : "Down"));
+	}
 }
 
 static bool gbe_phy_link_status(struct gbe_slave *slave)
@@ -2123,18 +2133,29 @@ static void netcp_ethss_update_link_state(struct gbe_priv *gbe_dev,
 					  struct net_device *ndev)
 {
 	int sp = slave->slave_num;
-	int phy_link_state, sgmii_link_state = 1, link_state;
+	int phy_link_state, sw_link_state = 1, link_state, ret;
+	u32 pcsr_rx_stat;
 
 	if (!slave->open)
 		return;
 
 	if (!SLAVE_LINK_IS_XGMII(slave)) {
-		sgmii_link_state =
+		sw_link_state =
 			netcp_sgmii_get_port_link(SGMII_BASE(gbe_dev, sp), sp);
+	} else if (slave->link_interface == XGMII_LINK_MAC_MAC_FORCED) {
+		/* read status from pcsr status reg */
+		ret = regmap_read(gbe_dev->pcsr_regmap,
+				  XGBE10_PCSR_RX_STATUS(sp), &pcsr_rx_stat);
+
+		if (ret)
+			return;
+
+		sw_link_state = (pcsr_rx_stat & XGBE10_PCSR_BLOCK_LOCK_MASK) >>
+				 XGBE10_PCSR_BLOCK_LOCK_SHIFT;
 	}
 
 	phy_link_state = gbe_phy_link_status(slave);
-	link_state = phy_link_state & sgmii_link_state;
+	link_state = phy_link_state & sw_link_state;
 
 	if (atomic_xchg(&slave->link_state, link_state) != link_state)
 		netcp_ethss_link_state_action(gbe_dev, ndev, slave,
@@ -3154,6 +3175,16 @@ static int set_xgbe_ethss10_priv(struct gbe_priv *gbe_dev,
 		return PTR_ERR(gbe_dev->ss_regmap);
 	}
 
+	gbe_dev->pcsr_regmap = syscon_regmap_lookup_by_phandle(node,
+							       "syscon-pcsr");
+
+	if (IS_ERR(gbe_dev->pcsr_regmap)) {
+		dev_err(gbe_dev->dev,
+			"pcsr regmap lookup failed: %ld\n",
+			PTR_ERR(gbe_dev->pcsr_regmap));
+		return PTR_ERR(gbe_dev->pcsr_regmap);
+	}
+
 	ret = of_address_to_resource(node, XGBE_SM_REG_INDEX, &res);
 	if (ret) {
 		dev_err(gbe_dev->dev,
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 04/10] net: netcp: remove the redundant memmov()
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

The psdata is populated with command data by netcp modules
to the tail of the buffer and set_words() copy the same
to the front of the psdata. So remove the redundant memmov
function call.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
---
 drivers/net/ethernet/ti/netcp_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index a136c56..286fd8d 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1226,9 +1226,9 @@ static int netcp_tx_submit_skb(struct netcp_intf *netcp,
 		/* psdata points to both native-endian and device-endian data */
 		__le32 *psdata = (void __force *)p_info.psdata;
 
-		memmove(p_info.psdata, p_info.psdata + p_info.psdata_len,
-			p_info.psdata_len);
-		set_words(p_info.psdata, p_info.psdata_len, psdata);
+		set_words((u32 *)psdata +
+			  (KNAV_DMA_NUM_PS_WORDS - p_info.psdata_len),
+			  p_info.psdata_len, psdata);
 		tmp |= (p_info.psdata_len & KNAV_DMA_DESC_PSLEN_MASK) <<
 			KNAV_DMA_DESC_PSLEN_SHIFT;
 	}
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 07/10] net: netcp: use hw capability to remove FCS word from rx packets
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

Some of the newer Ethernet switch hw (such as that on k2e/l/g) can
strip the Etherenet FCS from packet at the port 0 egress of the switch.
So use this capability instead of doing it in software.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/netcp.h       |  2 ++
 drivers/net/ethernet/ti/netcp_core.c  |  8 ++++++--
 drivers/net/ethernet/ti/netcp_ethss.c | 10 ++++++++--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index d243c5d..8900a6f 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -102,6 +102,8 @@ struct netcp_intf {
 	void			*rx_fdq[KNAV_DMA_FDQ_PER_CHAN];
 	struct napi_struct	rx_napi;
 	struct napi_struct	tx_napi;
+#define ETH_SW_CAN_REMOVE_ETH_FCS	BIT(0)
+	u32			hw_cap;
 
 	/* 64-bit netcp stats */
 	struct netcp_stats	stats;
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index b077ed4..68a75cc 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -739,8 +739,12 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 		dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of fragments(%d)\n",
 			pkt_sz, accum_sz);
 
-	/* Remove ethernet FCS from the packet */
-	__pskb_trim(skb, skb->len - ETH_FCS_LEN);
+	/* Newer version of the Ethernet switch can trim the Ethernet FCS
+	 * from the packet and is indicated in hw_cap. So trim it only for
+	 * older h/w
+	 */
+	if (!(netcp->hw_cap & ETH_SW_CAN_REMOVE_ETH_FCS))
+		__pskb_trim(skb, skb->len - ETH_FCS_LEN);
 
 	/* Call each of the RX hooks */
 	p_info.skb = skb;
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 9266961..4b2a911 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -133,6 +133,7 @@
 #define MACSL_FULLDUPLEX			BIT(0)
 
 #define GBE_CTL_P0_ENABLE			BIT(2)
+#define ETH_SW_CTL_P0_TX_CRC_REMOVE		BIT(13)
 #define GBE13_REG_VAL_STAT_ENABLE_ALL		0xff
 #define XGBE_REG_VAL_STAT_ENABLE_ALL		0xf
 #define GBE_STATS_CD_SEL			BIT(28)
@@ -2847,7 +2848,7 @@ static int gbe_open(void *intf_priv, struct net_device *ndev)
 	struct netcp_intf *netcp = netdev_priv(ndev);
 	struct gbe_slave *slave = gbe_intf->slave;
 	int port_num = slave->port_num;
-	u32 reg;
+	u32 reg, val;
 	int ret;
 
 	reg = readl(GBE_REG_ADDR(gbe_dev, switch_regs, id_ver));
@@ -2877,7 +2878,12 @@ static int gbe_open(void *intf_priv, struct net_device *ndev)
 	writel(0, GBE_REG_ADDR(gbe_dev, switch_regs, ptype));
 
 	/* Control register */
-	writel(GBE_CTL_P0_ENABLE, GBE_REG_ADDR(gbe_dev, switch_regs, control));
+	val = GBE_CTL_P0_ENABLE;
+	if (IS_SS_ID_MU(gbe_dev)) {
+		val |= ETH_SW_CTL_P0_TX_CRC_REMOVE;
+		netcp->hw_cap = ETH_SW_CAN_REMOVE_ETH_FCS;
+	}
+	writel(val, GBE_REG_ADDR(gbe_dev, switch_regs, control));
 
 	/* All statistics enabled and STAT AB visible by default */
 	writel(gbe_dev->stats_en_mask, GBE_REG_ADDR(gbe_dev, switch_regs,
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 08/10] net: netcp: ale: update to support unknown vlan controls for NU switch
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

In NU Ethernet switch used on some of the Keystone SoCs, there is
separate UNKNOWNVLAN register for membership, unreg mcast flood, reg
mcast flood and force untag egress bits in ALE. So control for these
fields require different address offset, shift and size of field.
As this ALE has the same version number as ALE in CPSW found on other
SoCs, customazation based on version number is not possible. So
use a configuration parameter, nu_switch_ale, to identify the ALE
ALE found in NU Switch. Different treatment is needed for NU Switch
ALE due to difference in the ale table bits, separate unknown vlan
registers etc. The register information available in ale_controls,
needs to be updated to support the netcp NU switch h/w. So it is not
constant array any more since it needs to be updated based
on ALE type. The header of the file is also updated to indicate it
supports N port switch ALE, not just 3 port. The version mask is
3 bits in NU Switch ALE vs 8 bits on other ALE types.

While at it, change the debug print to info print so that ALE
version gets displayed in boot log.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/cpsw_ale.c    | 50 +++++++++++++++++++++++++++++++----
 drivers/net/ethernet/ti/cpsw_ale.h    | 13 ++++++++-
 drivers/net/ethernet/ti/netcp_ethss.c |  5 +++-
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 43b061b..e15db39 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -1,5 +1,5 @@
 /*
- * Texas Instruments 3-Port Ethernet Switch Address Lookup Engine
+ * Texas Instruments N-Port Ethernet Switch Address Lookup Engine
  *
  * Copyright (C) 2012 Texas Instruments
  *
@@ -27,8 +27,9 @@
 
 #define BITMASK(bits)		(BIT(bits) - 1)
 
-#define ALE_VERSION_MAJOR(rev)	((rev >> 8) & 0xff)
+#define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask))
 #define ALE_VERSION_MINOR(rev)	(rev & 0xff)
+#define ALE_VERSION_1R4		0x0104
 
 /* ALE Registers */
 #define ALE_IDVER		0x00
@@ -39,6 +40,12 @@
 #define ALE_TABLE		0x34
 #define ALE_PORTCTL		0x40
 
+/* ALE NetCP NU switch specific Registers */
+#define ALE_UNKNOWNVLAN_MEMBER			0x90
+#define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD	0x94
+#define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD		0x98
+#define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS	0x9C
+
 #define ALE_TABLE_WRITE		BIT(31)
 
 #define ALE_TYPE_FREE			0
@@ -464,7 +471,7 @@ struct ale_control_info {
 	int		bits;
 };
 
-static const struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
+static struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
 	[ALE_ENABLE]		= {
 		.name		= "enable",
 		.offset		= ALE_CONTROL,
@@ -724,8 +731,41 @@ void cpsw_ale_start(struct cpsw_ale *ale)
 	u32 rev;
 
 	rev = __raw_readl(ale->params.ale_regs + ALE_IDVER);
-	dev_dbg(ale->params.dev, "initialized cpsw ale revision %d.%d\n",
-		ALE_VERSION_MAJOR(rev), ALE_VERSION_MINOR(rev));
+	if (!ale->params.major_ver_mask)
+		ale->params.major_ver_mask = 0xff;
+	ale->version =
+		(ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask) << 8) |
+		 ALE_VERSION_MINOR(rev);
+	dev_info(ale->params.dev, "initialized cpsw ale version %d.%d\n",
+		 ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask),
+		 ALE_VERSION_MINOR(rev));
+
+	if (ale->params.nu_switch_ale) {
+		/* Separate registers for unknown vlan configuration.
+		 * Also there are N bits, where N is number of ale
+		 * ports and shift value should be 0
+		 */
+		ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].bits =
+					ale->params.ale_ports;
+		ale_controls[ALE_PORT_UNKNOWN_VLAN_MEMBER].offset =
+					ALE_UNKNOWNVLAN_MEMBER;
+		ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].bits =
+					ale->params.ale_ports;
+		ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].shift = 0;
+		ale_controls[ALE_PORT_UNKNOWN_MCAST_FLOOD].offset =
+					ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD;
+		ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].bits =
+					ale->params.ale_ports;
+		ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].shift = 0;
+		ale_controls[ALE_PORT_UNKNOWN_REG_MCAST_FLOOD].offset =
+					ALE_UNKNOWNVLAN_REG_MCAST_FLOOD;
+		ale_controls[ALE_PORT_UNTAGGED_EGRESS].bits =
+					ale->params.ale_ports;
+		ale_controls[ALE_PORT_UNTAGGED_EGRESS].shift = 0;
+		ale_controls[ALE_PORT_UNTAGGED_EGRESS].offset =
+					ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS;
+	}
+
 	cpsw_ale_control_set(ale, 0, ALE_ENABLE, 1);
 	cpsw_ale_control_set(ale, 0, ALE_CLEAR, 1);
 
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index a700189..b1c7954 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -1,5 +1,5 @@
 /*
- * Texas Instruments 3-Port Ethernet Switch Address Lookup Engine APIs
+ * Texas Instruments N-Port Ethernet Switch Address Lookup Engine APIs
  *
  * Copyright (C) 2012 Texas Instruments
  *
@@ -21,6 +21,16 @@ struct cpsw_ale_params {
 	unsigned long		ale_ageout;	/* in secs */
 	unsigned long		ale_entries;
 	unsigned long		ale_ports;
+	/* NU Switch has specific handling as number of bits in ALE entries
+	 * are different than other versions of ALE. Also there are specific
+	 * registers for unknown vlan specific fields. So use nu_switch_ale
+	 * to identify this hardware.
+	 */
+	bool			nu_switch_ale;
+	/* mask bit used in NU Switch ALE is 3 bits instead of 8 bits. So
+	 * pass it from caller.
+	 */
+	u32			major_ver_mask;
 };
 
 struct cpsw_ale {
@@ -28,6 +38,7 @@ struct cpsw_ale {
 	struct timer_list	timer;
 	unsigned long		ageout;
 	int			allmulti;
+	u32			version;
 };
 
 enum cpsw_ale_control {
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 4b2a911..b37fb73 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3716,7 +3716,10 @@ static int gbe_probe(struct netcp_device *netcp_device, struct device *dev,
 	ale_params.ale_ageout	= GBE_DEFAULT_ALE_AGEOUT;
 	ale_params.ale_entries	= gbe_dev->ale_entries;
 	ale_params.ale_ports	= gbe_dev->ale_ports;
-
+	if (IS_SS_ID_MU(gbe_dev)) {
+		ale_params.major_ver_mask = 0x7;
+		ale_params.nu_switch_ale = true;
+	}
 	gbe_dev->ale = cpsw_ale_create(&ale_params);
 	if (!gbe_dev->ale) {
 		dev_err(gbe_dev->dev, "error initializing ale engine\n");
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 09/10] net: netcp: ale: use ale_status to size the ale table
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	grygorii.strashko-l0cyMroinI0, mugunthanvnm-l0cyMroinI0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, devicetree-u79uwXL29TY76Z2rM5mHXA,
	mark.rutland-5wv7dgnIgG8, robh+dt-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2-l0cyMroinI0@public.gmane.org>

ALE h/w on newer version of NetCP (K2E/L/G) does provide a ALE_STATUS
register for the size of the ALE Table implemented in h/w. Currently
for example we set ALE Table size to 1024 for NetCP ALE on
K2E even though the ALE Status/Documentation shows it has 8192 entries.
So take advantage of this register to read the size of ALE table supported
and use that value in the driver for the newer version of NetCP ALE.
For NetCP lite, ALE Table size is much less (64) and indicated by a size
of zero in ALE_STATUS. So use that as a default for now. While at it,
also fix the ale table size on 10G switch to 2048 per User guide
http://www.ti.com/lit/ug/spruhj5/spruhj5.pdf

Signed-off-by: Murali Karicheri <m-karicheri2-l0cyMroinI0@public.gmane.org>
Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
 drivers/net/ethernet/ti/cpsw_ale.c    | 31 ++++++++++++++++++++++++++++++-
 drivers/net/ethernet/ti/netcp_ethss.c |  4 +---
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index e15db39..62a18d6 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -33,6 +33,7 @@
 
 /* ALE Registers */
 #define ALE_IDVER		0x00
+#define ALE_STATUS		0x04
 #define ALE_CONTROL		0x08
 #define ALE_PRESCALE		0x10
 #define ALE_UNKNOWNVLAN		0x18
@@ -58,6 +59,10 @@
 #define ALE_UCAST_OUI			2
 #define ALE_UCAST_TOUCHED		3
 
+#define ALE_TABLE_SIZE_MULTIPLIER	1024
+#define ALE_STATUS_SIZE_MASK		0x1f
+#define ALE_TABLE_SIZE_DEFAULT		64
+
 static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits)
 {
 	int idx;
@@ -728,7 +733,7 @@ static void cpsw_ale_timer(unsigned long arg)
 
 void cpsw_ale_start(struct cpsw_ale *ale)
 {
-	u32 rev;
+	u32 rev, ale_entries;
 
 	rev = __raw_readl(ale->params.ale_regs + ALE_IDVER);
 	if (!ale->params.major_ver_mask)
@@ -740,6 +745,30 @@ void cpsw_ale_start(struct cpsw_ale *ale)
 		 ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask),
 		 ALE_VERSION_MINOR(rev));
 
+	if (!ale->params.ale_entries) {
+		ale_entries =
+			__raw_readl(ale->params.ale_regs + ALE_STATUS) &
+				    ALE_STATUS_SIZE_MASK;
+		/* ALE available on newer NetCP switches has introduced
+		 * a register, ALE_STATUS, to indicate the size of ALE
+		 * table which shows the size as a multiple of 1024 entries.
+		 * For these, params.ale_entries will be set to zero. So
+		 * read the register and update the value of ale_entries.
+		 * ALE table on NetCP lite, is much smaller and is indicated
+		 * by a value of zero in ALE_STATUS. So use a default value
+		 * of ALE_TABLE_SIZE_DEFAULT for this. Caller is expected
+		 * to set the value of ale_entries for all other versions
+		 * of ALE.
+		 */
+		if (!ale_entries)
+			ale_entries = ALE_TABLE_SIZE_DEFAULT;
+		else
+			ale_entries *= ALE_TABLE_SIZE_MULTIPLIER;
+		ale->params.ale_entries = ale_entries;
+	}
+	dev_info(ale->params.dev,
+		 "ALE Table size %ld\n", ale->params.ale_entries);
+
 	if (ale->params.nu_switch_ale) {
 		/* Separate registers for unknown vlan configuration.
 		 * Also there are N bits, where N is number of ale
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index b37fb73..80d68cb 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -94,7 +94,6 @@
 #define GBENU_CPTS_OFFSET		0x1d000
 #define GBENU_ALE_OFFSET		0x1e000
 #define GBENU_HOST_PORT_NUM		0
-#define GBENU_NUM_ALE_ENTRIES		1024
 
 /* 10G Ethernet SS defines */
 #define XGBE_MODULE_NAME		"netcp-xgbe"
@@ -114,7 +113,7 @@
 #define XGBE10_ALE_OFFSET		0x700
 #define XGBE10_HW_STATS_OFFSET		0x800
 #define XGBE10_HOST_PORT_NUM		0
-#define XGBE10_NUM_ALE_ENTRIES		1024
+#define XGBE10_NUM_ALE_ENTRIES		2048
 
 #define	GBE_TIMER_INTERVAL			(HZ / 2)
 
@@ -3548,7 +3547,6 @@ static int set_gbenu_ethss_priv(struct gbe_priv *gbe_dev,
 	gbe_dev->ale_reg = gbe_dev->switch_regs + GBENU_ALE_OFFSET;
 	gbe_dev->ale_ports = gbe_dev->max_num_ports;
 	gbe_dev->host_port = GBENU_HOST_PORT_NUM;
-	gbe_dev->ale_entries = GBE13_NUM_ALE_ENTRIES;
 	gbe_dev->stats_en_mask = (1 << (gbe_dev->max_num_ports)) - 1;
 
 	/* Subsystem registers */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 10/10] net: netcp: ale: add proper ale entry mask bits for netcp switch ALE
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	grygorii.strashko-l0cyMroinI0, mugunthanvnm-l0cyMroinI0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, devicetree-u79uwXL29TY76Z2rM5mHXA,
	mark.rutland-5wv7dgnIgG8, robh+dt-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2-l0cyMroinI0@public.gmane.org>

For NetCP NU Switch ALE, some of the mask bits are different than
defaults used in the driver. Add a new macro DEFINE_ALE_FIELD1 that use
a configurable mask bits and use it in the driver. These bits are set to
correct values by using the new variables added to cpsw_ale structure
and re-used in the macros. The parameter nu_switch_ale is configured by
the caller driver to indicate the ALE is for that switch and is used in
the ALE driver to do customization as needed.

Signed-off-by: Murali Karicheri <m-karicheri2-l0cyMroinI0@public.gmane.org>
Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
---
 drivers/net/ethernet/ti/cpsw_ale.c | 99 ++++++++++++++++++++++++++++++--------
 drivers/net/ethernet/ti/cpsw_ale.h |  4 ++
 2 files changed, 84 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 62a18d6..ddd43e0 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -29,6 +29,7 @@
 
 #define ALE_VERSION_MAJOR(rev, mask) (((rev) >> 8) & (mask))
 #define ALE_VERSION_MINOR(rev)	(rev & 0xff)
+#define ALE_VERSION_1R3		0x0103
 #define ALE_VERSION_1R4		0x0104
 
 /* ALE Registers */
@@ -46,6 +47,7 @@
 #define ALE_UNKNOWNVLAN_UNREG_MCAST_FLOOD	0x94
 #define ALE_UNKNOWNVLAN_REG_MCAST_FLOOD		0x98
 #define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS	0x9C
+#define ALE_VLAN_MASK_MUX(reg)			(0xc0 + (0x4 * (reg)))
 
 #define ALE_TABLE_WRITE		BIT(31)
 
@@ -96,20 +98,34 @@ static inline void cpsw_ale_set_field(u32 *ale_entry, u32 start, u32 bits,
 	cpsw_ale_set_field(ale_entry, start, bits, value);		\
 }
 
+#define DEFINE_ALE_FIELD1(name, start)					\
+static inline int cpsw_ale_get_##name(u32 *ale_entry, u32 bits)		\
+{									\
+	return cpsw_ale_get_field(ale_entry, start, bits);		\
+}									\
+static inline void cpsw_ale_set_##name(u32 *ale_entry, u32 value,	\
+		u32 bits)						\
+{									\
+	cpsw_ale_set_field(ale_entry, start, bits, value);		\
+}
+
 DEFINE_ALE_FIELD(entry_type,		60,	2)
 DEFINE_ALE_FIELD(vlan_id,		48,	12)
 DEFINE_ALE_FIELD(mcast_state,		62,	2)
-DEFINE_ALE_FIELD(port_mask,		66,     3)
+DEFINE_ALE_FIELD1(port_mask,		66)
 DEFINE_ALE_FIELD(super,			65,	1)
 DEFINE_ALE_FIELD(ucast_type,		62,     2)
-DEFINE_ALE_FIELD(port_num,		66,     2)
+DEFINE_ALE_FIELD1(port_num,		66)
 DEFINE_ALE_FIELD(blocked,		65,     1)
 DEFINE_ALE_FIELD(secure,		64,     1)
-DEFINE_ALE_FIELD(vlan_untag_force,	24,	3)
-DEFINE_ALE_FIELD(vlan_reg_mcast,	16,	3)
-DEFINE_ALE_FIELD(vlan_unreg_mcast,	8,	3)
-DEFINE_ALE_FIELD(vlan_member_list,	0,	3)
+DEFINE_ALE_FIELD1(vlan_untag_force,	24)
+DEFINE_ALE_FIELD1(vlan_reg_mcast,	16)
+DEFINE_ALE_FIELD1(vlan_unreg_mcast,	8)
+DEFINE_ALE_FIELD1(vlan_member_list,	0)
 DEFINE_ALE_FIELD(mcast,			40,	1)
+/* ALE NetCP nu switch specific */
+DEFINE_ALE_FIELD(vlan_unreg_mcast_idx,	20,	3)
+DEFINE_ALE_FIELD(vlan_reg_mcast_idx,	44,	3)
 
 /* The MAC address field in the ALE entry cannot be macroized as above */
 static inline void cpsw_ale_get_addr(u32 *ale_entry, u8 *addr)
@@ -235,14 +251,16 @@ static void cpsw_ale_flush_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 {
 	int mask;
 
-	mask = cpsw_ale_get_port_mask(ale_entry);
+	mask = cpsw_ale_get_port_mask(ale_entry,
+				      ale->port_mask_bits);
 	if ((mask & port_mask) == 0)
 		return; /* ports dont intersect, not interested */
 	mask &= ~port_mask;
 
 	/* free if only remaining port is host port */
 	if (mask)
-		cpsw_ale_set_port_mask(ale_entry, mask);
+		cpsw_ale_set_port_mask(ale_entry, mask,
+				       ale->port_mask_bits);
 	else
 		cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
 }
@@ -303,7 +321,7 @@ int cpsw_ale_add_ucast(struct cpsw_ale *ale, u8 *addr, int port,
 	cpsw_ale_set_ucast_type(ale_entry, ALE_UCAST_PERSISTANT);
 	cpsw_ale_set_secure(ale_entry, (flags & ALE_SECURE) ? 1 : 0);
 	cpsw_ale_set_blocked(ale_entry, (flags & ALE_BLOCKED) ? 1 : 0);
-	cpsw_ale_set_port_num(ale_entry, port);
+	cpsw_ale_set_port_num(ale_entry, port, ale->port_num_bits);
 
 	idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
 	if (idx < 0)
@@ -350,9 +368,11 @@ int cpsw_ale_add_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 	cpsw_ale_set_super(ale_entry, (flags & ALE_BLOCKED) ? 1 : 0);
 	cpsw_ale_set_mcast_state(ale_entry, mcast_state);
 
-	mask = cpsw_ale_get_port_mask(ale_entry);
+	mask = cpsw_ale_get_port_mask(ale_entry,
+				      ale->port_mask_bits);
 	port_mask |= mask;
-	cpsw_ale_set_port_mask(ale_entry, port_mask);
+	cpsw_ale_set_port_mask(ale_entry, port_mask,
+			       ale->port_mask_bits);
 
 	if (idx < 0)
 		idx = cpsw_ale_match_free(ale);
@@ -379,7 +399,8 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 	cpsw_ale_read(ale, idx, ale_entry);
 
 	if (port_mask)
-		cpsw_ale_set_port_mask(ale_entry, port_mask);
+		cpsw_ale_set_port_mask(ale_entry, port_mask,
+				       ale->port_mask_bits);
 	else
 		cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
 
@@ -388,6 +409,21 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_mcast);
 
+/* ALE NetCP NU switch specific vlan functions */
+static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
+				    int reg_mcast, int unreg_mcast)
+{
+	int idx;
+
+	/* Set VLAN registered multicast flood mask */
+	idx = cpsw_ale_get_vlan_reg_mcast_idx(ale_entry);
+	writel(reg_mcast, ale->params.ale_regs + ALE_VLAN_MASK_MUX(idx));
+
+	/* Set VLAN unregistered multicast flood mask */
+	idx = cpsw_ale_get_vlan_unreg_mcast_idx(ale_entry);
+	writel(unreg_mcast, ale->params.ale_regs + ALE_VLAN_MASK_MUX(idx));
+}
+
 int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 		      int reg_mcast, int unreg_mcast)
 {
@@ -401,10 +437,16 @@ int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 	cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_VLAN);
 	cpsw_ale_set_vlan_id(ale_entry, vid);
 
-	cpsw_ale_set_vlan_untag_force(ale_entry, untag);
-	cpsw_ale_set_vlan_reg_mcast(ale_entry, reg_mcast);
-	cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast);
-	cpsw_ale_set_vlan_member_list(ale_entry, port);
+	cpsw_ale_set_vlan_untag_force(ale_entry, untag, ale->vlan_field_bits);
+	if (!ale->params.nu_switch_ale) {
+		cpsw_ale_set_vlan_reg_mcast(ale_entry, reg_mcast,
+					    ale->vlan_field_bits);
+		cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast,
+					      ale->vlan_field_bits);
+	} else {
+		cpsw_ale_set_vlan_mcast(ale, ale_entry, reg_mcast, unreg_mcast);
+	}
+	cpsw_ale_set_vlan_member_list(ale_entry, port, ale->vlan_field_bits);
 
 	if (idx < 0)
 		idx = cpsw_ale_match_free(ale);
@@ -430,7 +472,8 @@ int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 	cpsw_ale_read(ale, idx, ale_entry);
 
 	if (port_mask)
-		cpsw_ale_set_vlan_member_list(ale_entry, port_mask);
+		cpsw_ale_set_vlan_member_list(ale_entry, port_mask,
+					      ale->vlan_field_bits);
 	else
 		cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
 
@@ -458,12 +501,15 @@ void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti)
 		if (type != ALE_TYPE_VLAN)
 			continue;
 
-		unreg_mcast = cpsw_ale_get_vlan_unreg_mcast(ale_entry);
+		unreg_mcast =
+			cpsw_ale_get_vlan_unreg_mcast(ale_entry,
+						      ale->vlan_field_bits);
 		if (allmulti)
 			unreg_mcast |= 1;
 		else
 			unreg_mcast &= ~1;
-		cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast);
+		cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast,
+					      ale->vlan_field_bits);
 		cpsw_ale_write(ale, idx, ale_entry);
 	}
 }
@@ -769,6 +815,14 @@ void cpsw_ale_start(struct cpsw_ale *ale)
 	dev_info(ale->params.dev,
 		 "ALE Table size %ld\n", ale->params.ale_entries);
 
+	/* set default bits for existing h/w */
+	ale->port_mask_bits = 3;
+	ale->port_num_bits = 2;
+	ale->vlan_field_bits = 3;
+
+	/* Set defaults override for ALE on NetCP NU switch and for version
+	 * 1R3
+	 */
 	if (ale->params.nu_switch_ale) {
 		/* Separate registers for unknown vlan configuration.
 		 * Also there are N bits, where N is number of ale
@@ -793,6 +847,13 @@ void cpsw_ale_start(struct cpsw_ale *ale)
 		ale_controls[ALE_PORT_UNTAGGED_EGRESS].shift = 0;
 		ale_controls[ALE_PORT_UNTAGGED_EGRESS].offset =
 					ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS;
+		ale->port_mask_bits = ale->params.ale_ports;
+		ale->port_num_bits = ale->params.ale_ports - 1;
+		ale->vlan_field_bits = ale->params.ale_ports;
+	} else if (ale->version == ALE_VERSION_1R3) {
+		ale->port_mask_bits = ale->params.ale_ports;
+		ale->port_num_bits = 3;
+		ale->vlan_field_bits = ale->params.ale_ports;
 	}
 
 	cpsw_ale_control_set(ale, 0, ALE_ENABLE, 1);
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index b1c7954..25d24e8 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -39,6 +39,10 @@ struct cpsw_ale {
 	unsigned long		ageout;
 	int			allmulti;
 	u32			version;
+	/* These bits are different on NetCP NU Switch ALE */
+	u32			port_mask_bits;
+	u32			port_num_bits;
+	u32			vlan_field_bits;
 };
 
 enum cpsw_ale_control {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 03/10] net: netcp: extract eflag from desc for rx_hook handling
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

Extract the eflag bits from the received desc and pass it down
the rx_hook chain to be available for netcp modules. Also the
psdata and epib data has to be inspected by the netcp modules.
So the desc can be freed only after returning from the rx_hook.
So move knav_pool_desc_put() after the rx_hook processing.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
---
 drivers/net/ethernet/ti/netcp.h      |  1 +
 drivers/net/ethernet/ti/netcp_core.c | 20 +++++++++++++++++---
 include/linux/soc/ti/knav_dma.h      |  2 ++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index 0f58c58..a92abd6 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -115,6 +115,7 @@ struct netcp_packet {
 	struct sk_buff		*skb;
 	__le32			*epib;
 	u32			*psdata;
+	u32			eflags;
 	unsigned int		psdata_len;
 	struct netcp_intf	*netcp;
 	struct netcp_tx_pipe	*tx_pipe;
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index c243335..a136c56 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -122,6 +122,13 @@ static void get_pkt_info(dma_addr_t *buff, u32 *buff_len, dma_addr_t *ndesc,
 	*ndesc = le32_to_cpu(desc->next_desc);
 }
 
+static void get_desc_info(u32 *desc_info, u32 *pkt_info,
+			  struct knav_dma_desc *desc)
+{
+	*desc_info = le32_to_cpu(desc->desc_info);
+	*pkt_info = le32_to_cpu(desc->packet_info);
+}
+
 static u32 get_sw_data(int index, struct knav_dma_desc *desc)
 {
 	/* No Endian conversion needed as this data is untouched by hw */
@@ -653,6 +660,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 	struct netcp_packet p_info;
 	struct sk_buff *skb;
 	void *org_buf_ptr;
+	u32 tmp;
 
 	dma_desc = knav_queue_pop(netcp->rx_queue, &dma_sz);
 	if (!dma_desc)
@@ -724,9 +732,6 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 		knav_pool_desc_put(netcp->rx_pool, ndesc);
 	}
 
-	/* Free the primary descriptor */
-	knav_pool_desc_put(netcp->rx_pool, desc);
-
 	/* check for packet len and warn */
 	if (unlikely(pkt_sz != accum_sz))
 		dev_dbg(netcp->ndev_dev, "mismatch in packet size(%d) & sum of fragments(%d)\n",
@@ -739,6 +744,11 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 	p_info.skb = skb;
 	skb->dev = netcp->ndev;
 	p_info.rxtstamp_complete = false;
+	get_desc_info(&tmp, &p_info.eflags, desc);
+	p_info.epib = desc->epib;
+	p_info.psdata = (u32 __force *)desc->psdata;
+	p_info.eflags = ((p_info.eflags >> KNAV_DMA_DESC_EFLAGS_SHIFT) &
+			 KNAV_DMA_DESC_EFLAGS_MASK);
 	list_for_each_entry(rx_hook, &netcp->rxhook_list_head, list) {
 		int ret;
 
@@ -748,10 +758,14 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 			dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n",
 				rx_hook->order, ret);
 			netcp->ndev->stats.rx_errors++;
+			/* Free the primary descriptor */
+			knav_pool_desc_put(netcp->rx_pool, desc);
 			dev_kfree_skb(skb);
 			return 0;
 		}
 	}
+	/* Free the primary descriptor */
+	knav_pool_desc_put(netcp->rx_pool, desc);
 
 	netcp->ndev->stats.rx_packets++;
 	netcp->ndev->stats.rx_bytes += skb->len;
diff --git a/include/linux/soc/ti/knav_dma.h b/include/linux/soc/ti/knav_dma.h
index 35cb926..2b78826 100644
--- a/include/linux/soc/ti/knav_dma.h
+++ b/include/linux/soc/ti/knav_dma.h
@@ -41,6 +41,8 @@
 #define KNAV_DMA_DESC_RETQ_SHIFT		0
 #define KNAV_DMA_DESC_RETQ_MASK			MASK(14)
 #define KNAV_DMA_DESC_BUF_LEN_MASK		MASK(22)
+#define KNAV_DMA_DESC_EFLAGS_MASK		MASK(4)
+#define KNAV_DMA_DESC_EFLAGS_SHIFT		20
 
 #define KNAV_DMA_NUM_EPIB_WORDS			4
 #define KNAV_DMA_NUM_PS_WORDS			16
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 05/10] net: netcp: store network statistics in 64 bits
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

From: Michael Scherban <m-scherban@ti.com>

Previously the network statistics were stored in 32 bit variable
which can cause some stats to roll over after several minutes of
high traffic. This implements 64 bit storage so larger numbers
can be stored.

Signed-off-by: Michael Scherban <m-scherban@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/netcp.h      | 18 ++++++++++
 drivers/net/ethernet/ti/netcp_core.c | 68 +++++++++++++++++++++++++++++-------
 2 files changed, 74 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index a92abd6..d243c5d 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -23,6 +23,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/soc/ti/knav_dma.h>
+#include <linux/u64_stats_sync.h>
 
 /* Maximum Ethernet frame size supported by Keystone switch */
 #define NETCP_MAX_FRAME_SIZE		9504
@@ -68,6 +69,20 @@ struct netcp_addr {
 	struct list_head	node;
 };
 
+struct netcp_stats {
+	struct u64_stats_sync   syncp_rx ____cacheline_aligned_in_smp;
+	u64                     rx_packets;
+	u64                     rx_bytes;
+	u32                     rx_errors;
+	u32                     rx_dropped;
+
+	struct u64_stats_sync   syncp_tx ____cacheline_aligned_in_smp;
+	u64                     tx_packets;
+	u64                     tx_bytes;
+	u32                     tx_errors;
+	u32                     tx_dropped;
+};
+
 struct netcp_intf {
 	struct device		*dev;
 	struct device		*ndev_dev;
@@ -88,6 +103,9 @@ struct netcp_intf {
 	struct napi_struct	rx_napi;
 	struct napi_struct	tx_napi;
 
+	/* 64-bit netcp stats */
+	struct netcp_stats	stats;
+
 	void			*rx_channel;
 	const char		*dma_chan_name;
 	u32			rx_pool_size;
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 286fd8d..b077ed4 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -629,6 +629,7 @@ static void netcp_free_rx_desc_chain(struct netcp_intf *netcp,
 
 static void netcp_empty_rx_queue(struct netcp_intf *netcp)
 {
+	struct netcp_stats *rx_stats = &netcp->stats;
 	struct knav_dma_desc *desc;
 	unsigned int dma_sz;
 	dma_addr_t dma;
@@ -642,16 +643,17 @@ static void netcp_empty_rx_queue(struct netcp_intf *netcp)
 		if (unlikely(!desc)) {
 			dev_err(netcp->ndev_dev, "%s: failed to unmap Rx desc\n",
 				__func__);
-			netcp->ndev->stats.rx_errors++;
+			rx_stats->rx_errors++;
 			continue;
 		}
 		netcp_free_rx_desc_chain(netcp, desc);
-		netcp->ndev->stats.rx_dropped++;
+		rx_stats->rx_dropped++;
 	}
 }
 
 static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 {
+	struct netcp_stats *rx_stats = &netcp->stats;
 	unsigned int dma_sz, buf_len, org_buf_len;
 	struct knav_dma_desc *desc, *ndesc;
 	unsigned int pkt_sz = 0, accum_sz;
@@ -757,8 +759,8 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 		if (unlikely(ret)) {
 			dev_err(netcp->ndev_dev, "RX hook %d failed: %d\n",
 				rx_hook->order, ret);
-			netcp->ndev->stats.rx_errors++;
 			/* Free the primary descriptor */
+			rx_stats->rx_dropped++;
 			knav_pool_desc_put(netcp->rx_pool, desc);
 			dev_kfree_skb(skb);
 			return 0;
@@ -767,8 +769,10 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 	/* Free the primary descriptor */
 	knav_pool_desc_put(netcp->rx_pool, desc);
 
-	netcp->ndev->stats.rx_packets++;
-	netcp->ndev->stats.rx_bytes += skb->len;
+	u64_stats_update_begin(&rx_stats->syncp_rx);
+	rx_stats->rx_packets++;
+	rx_stats->rx_bytes += skb->len;
+	u64_stats_update_end(&rx_stats->syncp_rx);
 
 	/* push skb up the stack */
 	skb->protocol = eth_type_trans(skb, netcp->ndev);
@@ -777,7 +781,7 @@ static int netcp_process_one_rx_packet(struct netcp_intf *netcp)
 
 free_desc:
 	netcp_free_rx_desc_chain(netcp, desc);
-	netcp->ndev->stats.rx_errors++;
+	rx_stats->rx_errors++;
 	return 0;
 }
 
@@ -1008,6 +1012,7 @@ static void netcp_free_tx_desc_chain(struct netcp_intf *netcp,
 static int netcp_process_tx_compl_packets(struct netcp_intf *netcp,
 					  unsigned int budget)
 {
+	struct netcp_stats *tx_stats = &netcp->stats;
 	struct knav_dma_desc *desc;
 	struct netcp_tx_cb *tx_cb;
 	struct sk_buff *skb;
@@ -1022,7 +1027,7 @@ static int netcp_process_tx_compl_packets(struct netcp_intf *netcp,
 		desc = knav_pool_desc_unmap(netcp->tx_pool, dma, dma_sz);
 		if (unlikely(!desc)) {
 			dev_err(netcp->ndev_dev, "failed to unmap Tx desc\n");
-			netcp->ndev->stats.tx_errors++;
+			tx_stats->tx_errors++;
 			continue;
 		}
 
@@ -1033,7 +1038,7 @@ static int netcp_process_tx_compl_packets(struct netcp_intf *netcp,
 		netcp_free_tx_desc_chain(netcp, desc, dma_sz);
 		if (!skb) {
 			dev_err(netcp->ndev_dev, "No skb in Tx desc\n");
-			netcp->ndev->stats.tx_errors++;
+			tx_stats->tx_errors++;
 			continue;
 		}
 
@@ -1050,8 +1055,10 @@ static int netcp_process_tx_compl_packets(struct netcp_intf *netcp,
 			netif_wake_subqueue(netcp->ndev, subqueue);
 		}
 
-		netcp->ndev->stats.tx_packets++;
-		netcp->ndev->stats.tx_bytes += skb->len;
+		u64_stats_update_begin(&tx_stats->syncp_tx);
+		tx_stats->tx_packets++;
+		tx_stats->tx_bytes += skb->len;
+		u64_stats_update_end(&tx_stats->syncp_tx);
 		dev_kfree_skb(skb);
 		pkts++;
 	}
@@ -1272,6 +1279,7 @@ static int netcp_tx_submit_skb(struct netcp_intf *netcp,
 static int netcp_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 {
 	struct netcp_intf *netcp = netdev_priv(ndev);
+	struct netcp_stats *tx_stats = &netcp->stats;
 	int subqueue = skb_get_queue_mapping(skb);
 	struct knav_dma_desc *desc;
 	int desc_count, ret = 0;
@@ -1287,7 +1295,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 			/* If we get here, the skb has already been dropped */
 			dev_warn(netcp->ndev_dev, "padding failed (%d), packet dropped\n",
 				 ret);
-			ndev->stats.tx_dropped++;
+			tx_stats->tx_dropped++;
 			return ret;
 		}
 		skb->len = NETCP_MIN_PACKET_SIZE;
@@ -1315,7 +1323,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	return NETDEV_TX_OK;
 
 drop:
-	ndev->stats.tx_dropped++;
+	tx_stats->tx_dropped++;
 	if (desc)
 		netcp_free_tx_desc_chain(netcp, desc, sizeof(*desc));
 	dev_kfree_skb(skb);
@@ -1897,12 +1905,46 @@ static int netcp_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 	return 0;
 }
 
+static struct rtnl_link_stats64 *
+netcp_get_stats(struct net_device *ndev, struct rtnl_link_stats64 *stats)
+{
+	struct netcp_intf *netcp = netdev_priv(ndev);
+	struct netcp_stats *p = &netcp->stats;
+	u64 rxpackets, rxbytes, txpackets, txbytes;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin_irq(&p->syncp_rx);
+		rxpackets       = p->rx_packets;
+		rxbytes         = p->rx_bytes;
+	} while (u64_stats_fetch_retry_irq(&p->syncp_rx, start));
+
+	do {
+		start = u64_stats_fetch_begin_irq(&p->syncp_tx);
+		txpackets       = p->tx_packets;
+		txbytes         = p->tx_bytes;
+	} while (u64_stats_fetch_retry_irq(&p->syncp_tx, start));
+
+	stats->rx_packets = rxpackets;
+	stats->rx_bytes = rxbytes;
+	stats->tx_packets = txpackets;
+	stats->tx_bytes = txbytes;
+
+	/* The following are stored as 32 bit */
+	stats->rx_errors = p->rx_errors;
+	stats->rx_dropped = p->rx_dropped;
+	stats->tx_dropped = p->tx_dropped;
+
+	return stats;
+}
+
 static const struct net_device_ops netcp_netdev_ops = {
 	.ndo_open		= netcp_ndo_open,
 	.ndo_stop		= netcp_ndo_stop,
 	.ndo_start_xmit		= netcp_ndo_start_xmit,
 	.ndo_set_rx_mode	= netcp_set_rx_mode,
 	.ndo_do_ioctl           = netcp_ndo_ioctl,
+	.ndo_get_stats64        = netcp_get_stats,
 	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_vlan_rx_add_vid	= netcp_rx_add_vid,
@@ -1949,6 +1991,8 @@ static int netcp_create_interface(struct netcp_device *netcp_device,
 	INIT_LIST_HEAD(&netcp->txhook_list_head);
 	INIT_LIST_HEAD(&netcp->rxhook_list_head);
 	INIT_LIST_HEAD(&netcp->addr_list);
+	u64_stats_init(&netcp->stats.syncp_rx);
+	u64_stats_init(&netcp->stats.syncp_tx);
 	netcp->netcp_device = netcp_device;
 	netcp->dev = netcp_device->device;
 	netcp->ndev = ndev;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 06/10] net: netcp: ethss: get phy-handle only if link interface is MAC-to-PHY
From: Murali Karicheri @ 2016-12-20 22:09 UTC (permalink / raw)
  To: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland, robh+dt
In-Reply-To: <1482271793-7671-1-git-send-email-m-karicheri2@ti.com>

Currently to parse phy-handle, driver doesn't check if the interface is
MAC to PHY. This patch add this check for all MAC to PHY interface types
supported by the driver.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index cb48f88..9266961 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -2956,7 +2956,9 @@ static int init_slave(struct gbe_priv *gbe_dev, struct gbe_slave *slave,
 	}
 
 	slave->open = false;
-	slave->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if ((slave->link_interface == SGMII_LINK_MAC_PHY) ||
+	    (slave->link_interface == XGMII_LINK_MAC_PHY))
+		slave->phy_node = of_parse_phandle(node, "phy-handle", 0);
 	slave->port_num = gbe_get_slave_port(gbe_dev, slave->slave_num);
 
 	if (slave->link_interface >= XGMII_LINK_MAC_PHY)
-- 
1.9.1

^ permalink raw reply related

* Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-20 22:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUgqJEG544HqH1iwdQFL9-nV3-hMuuH_eU9OnJ--pX+jg@mail.gmail.com>

On Tue, Dec 20, 2016 at 11:31:38AM -0800, Cong Wang wrote:
 > On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > >  > From: Dave Jones <davej@codemonkey.org.uk>
 > >  > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > >  >
 > >  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > >  > >
 > >  > >  > Unfortunately, this made no difference.  I spent some time today trying
 > >  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > >  > >  >
 > >  > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
 > >  > >  > I can trigger it with Trinity.
 > >  > >
 > >  > > scratch that last part, I finally just repro'd it with a single process.
 > >  >
 > >  > Thanks for the info, I'll try to think about this some more.
 > >
 > > I threw in some debug printks right before that BUG_ON.
 > > it's always this:
 > >
 > > skb->len=31 skb->data_len=0 offset:30 total_len:9
 > 
 > Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here,
 > off-by-one?

Ok, I finally made a messy, albeit good enough reproducer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

^ permalink raw reply

* [PATCH 1/2] net: mdio: add mdio45_ethtool_ksettings_get
From: Philippe Reynes @ 2016-12-20 22:24 UTC (permalink / raw)
  To: linux-net-drivers, ecree, bkenward, davem, andrew
  Cc: netdev, linux-kernel, Philippe Reynes

There is a function in mdio for the old ethtool api gset.
We add a new function mdio45_ethtool_ksettings_get for the
new ethtool api glinksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/mdio.c   |  178 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mdio.h |   21 ++++++
 2 files changed, 199 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mdio.c b/drivers/net/mdio.c
index 3e027ed..077364c 100644
--- a/drivers/net/mdio.c
+++ b/drivers/net/mdio.c
@@ -342,6 +342,184 @@ void mdio45_ethtool_gset_npage(const struct mdio_if_info *mdio,
 EXPORT_SYMBOL(mdio45_ethtool_gset_npage);
 
 /**
+ * mdio45_ethtool_ksettings_get_npage - get settings for ETHTOOL_GLINKSETTINGS
+ * @mdio: MDIO interface
+ * @cmd: Ethtool request structure
+ * @npage_adv: Modes currently advertised on next pages
+ * @npage_lpa: Modes advertised by link partner on next pages
+ *
+ * The @cmd parameter is expected to have been cleared before calling
+ * mdio45_ethtool_ksettings_get_npage().
+ *
+ * Since the CSRs for auto-negotiation using next pages are not fully
+ * standardised, this function does not attempt to decode them.  The
+ * caller must pass them in.
+ */
+void mdio45_ethtool_ksettings_get_npage(const struct mdio_if_info *mdio,
+					struct ethtool_link_ksettings *cmd,
+					u32 npage_adv, u32 npage_lpa)
+{
+	int reg;
+	u32 speed, supported = 0, advertising = 0, lp_advertising = 0;
+
+	BUILD_BUG_ON(MDIO_SUPPORTS_C22 != ETH_MDIO_SUPPORTS_C22);
+	BUILD_BUG_ON(MDIO_SUPPORTS_C45 != ETH_MDIO_SUPPORTS_C45);
+
+	cmd->base.phy_address = mdio->prtad;
+	cmd->base.mdio_support =
+		mdio->mode_support & (MDIO_SUPPORTS_C45 | MDIO_SUPPORTS_C22);
+
+	reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+			      MDIO_CTRL2);
+	switch (reg & MDIO_PMA_CTRL2_TYPE) {
+	case MDIO_PMA_CTRL2_10GBT:
+	case MDIO_PMA_CTRL2_1000BT:
+	case MDIO_PMA_CTRL2_100BTX:
+	case MDIO_PMA_CTRL2_10BT:
+		cmd->base.port = PORT_TP;
+		supported = SUPPORTED_TP;
+		reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+				      MDIO_SPEED);
+		if (reg & MDIO_SPEED_10G)
+			supported |= SUPPORTED_10000baseT_Full;
+		if (reg & MDIO_PMA_SPEED_1000)
+			supported |= (SUPPORTED_1000baseT_Full |
+					    SUPPORTED_1000baseT_Half);
+		if (reg & MDIO_PMA_SPEED_100)
+			supported |= (SUPPORTED_100baseT_Full |
+					    SUPPORTED_100baseT_Half);
+		if (reg & MDIO_PMA_SPEED_10)
+			supported |= (SUPPORTED_10baseT_Full |
+					    SUPPORTED_10baseT_Half);
+		advertising = ADVERTISED_TP;
+		break;
+
+	case MDIO_PMA_CTRL2_10GBCX4:
+		cmd->base.port = PORT_OTHER;
+		supported = 0;
+		advertising = 0;
+		break;
+
+	case MDIO_PMA_CTRL2_10GBKX4:
+	case MDIO_PMA_CTRL2_10GBKR:
+	case MDIO_PMA_CTRL2_1000BKX:
+		cmd->base.port = PORT_OTHER;
+		supported = SUPPORTED_Backplane;
+		reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+				      MDIO_PMA_EXTABLE);
+		if (reg & MDIO_PMA_EXTABLE_10GBKX4)
+			supported |= SUPPORTED_10000baseKX4_Full;
+		if (reg & MDIO_PMA_EXTABLE_10GBKR)
+			supported |= SUPPORTED_10000baseKR_Full;
+		if (reg & MDIO_PMA_EXTABLE_1000BKX)
+			supported |= SUPPORTED_1000baseKX_Full;
+		reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+				      MDIO_PMA_10GBR_FECABLE);
+		if (reg & MDIO_PMA_10GBR_FECABLE_ABLE)
+			supported |= SUPPORTED_10000baseR_FEC;
+		advertising = ADVERTISED_Backplane;
+		break;
+
+	/* All the other defined modes are flavours of optical */
+	default:
+		cmd->base.port = PORT_FIBRE;
+		supported = SUPPORTED_FIBRE;
+		advertising = ADVERTISED_FIBRE;
+		break;
+	}
+
+	if (mdio->mmds & MDIO_DEVS_AN) {
+		supported |= SUPPORTED_Autoneg;
+		reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_AN,
+				      MDIO_CTRL1);
+		if (reg & MDIO_AN_CTRL1_ENABLE) {
+			cmd->base.autoneg = AUTONEG_ENABLE;
+			advertising |=
+				ADVERTISED_Autoneg |
+				mdio45_get_an(mdio, MDIO_AN_ADVERTISE) |
+				npage_adv;
+		} else {
+			cmd->base.autoneg = AUTONEG_DISABLE;
+		}
+	} else {
+		cmd->base.autoneg = AUTONEG_DISABLE;
+	}
+
+	if (cmd->base.autoneg) {
+		u32 modes = 0;
+		int an_stat = mdio->mdio_read(mdio->dev, mdio->prtad,
+					      MDIO_MMD_AN, MDIO_STAT1);
+
+		/* If AN is complete and successful, report best common
+		 * mode, otherwise report best advertised mode.
+		 */
+		if (an_stat & MDIO_AN_STAT1_COMPLETE) {
+			lp_advertising =
+				mdio45_get_an(mdio, MDIO_AN_LPA) | npage_lpa;
+			if (an_stat & MDIO_AN_STAT1_LPABLE)
+				lp_advertising |= ADVERTISED_Autoneg;
+			modes = advertising & lp_advertising;
+		}
+		if ((modes & ~ADVERTISED_Autoneg) == 0)
+			modes = advertising;
+
+		if (modes & (ADVERTISED_10000baseT_Full |
+			     ADVERTISED_10000baseKX4_Full |
+			     ADVERTISED_10000baseKR_Full)) {
+			speed = SPEED_10000;
+			cmd->base.duplex = DUPLEX_FULL;
+		} else if (modes & (ADVERTISED_1000baseT_Full |
+				    ADVERTISED_1000baseT_Half |
+				    ADVERTISED_1000baseKX_Full)) {
+			speed = SPEED_1000;
+			cmd->base.duplex = !(modes & ADVERTISED_1000baseT_Half);
+		} else if (modes & (ADVERTISED_100baseT_Full |
+				    ADVERTISED_100baseT_Half)) {
+			speed = SPEED_100;
+			cmd->base.duplex = !!(modes & ADVERTISED_100baseT_Full);
+		} else {
+			speed = SPEED_10;
+			cmd->base.duplex = !!(modes & ADVERTISED_10baseT_Full);
+		}
+	} else {
+		/* Report forced settings */
+		reg = mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+				      MDIO_CTRL1);
+		speed = (((reg & MDIO_PMA_CTRL1_SPEED1000) ? 100 : 1)
+			 * ((reg & MDIO_PMA_CTRL1_SPEED100) ? 100 : 10));
+		cmd->base.duplex = (reg & MDIO_CTRL1_FULLDPLX ||
+				    speed == SPEED_10000);
+	}
+
+	cmd->base.speed = speed;
+
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+						supported);
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+						advertising);
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.lp_advertising,
+						lp_advertising);
+
+	/* 10GBASE-T MDI/MDI-X */
+	if (cmd->base.port == PORT_TP && (cmd->base.speed == SPEED_10000)) {
+		switch (mdio->mdio_read(mdio->dev, mdio->prtad, MDIO_MMD_PMAPMD,
+					MDIO_PMA_10GBT_SWAPPOL)) {
+		case MDIO_PMA_10GBT_SWAPPOL_ABNX | MDIO_PMA_10GBT_SWAPPOL_CDNX:
+			cmd->base.eth_tp_mdix = ETH_TP_MDI;
+			break;
+		case 0:
+			cmd->base.eth_tp_mdix = ETH_TP_MDI_X;
+			break;
+		default:
+			/* It's complicated... */
+			cmd->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
+			break;
+		}
+	}
+}
+EXPORT_SYMBOL(mdio45_ethtool_ksettings_get_npage);
+
+/**
  * mdio_mii_ioctl - MII ioctl interface for MDIO (clause 22 or 45) PHYs
  * @mdio: MDIO interface
  * @mii_data: MII ioctl data structure
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index bf9d1d7..b6587a4 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -130,6 +130,10 @@ extern int mdio_set_flag(const struct mdio_if_info *mdio,
 extern void mdio45_ethtool_gset_npage(const struct mdio_if_info *mdio,
 				      struct ethtool_cmd *ecmd,
 				      u32 npage_adv, u32 npage_lpa);
+extern void
+mdio45_ethtool_ksettings_get_npage(const struct mdio_if_info *mdio,
+				   struct ethtool_link_ksettings *cmd,
+				   u32 npage_adv, u32 npage_lpa);
 
 /**
  * mdio45_ethtool_gset - get settings for ETHTOOL_GSET
@@ -147,6 +151,23 @@ static inline void mdio45_ethtool_gset(const struct mdio_if_info *mdio,
 	mdio45_ethtool_gset_npage(mdio, ecmd, 0, 0);
 }
 
+/**
+ * mdio45_ethtool_ksettings_get - get settings for ETHTOOL_GLINKSETTINGS
+ * @mdio: MDIO interface
+ * @cmd: Ethtool request structure
+ *
+ * Since the CSRs for auto-negotiation using next pages are not fully
+ * standardised, this function does not attempt to decode them.  Use
+ * mdio45_ethtool_ksettings_get_npage() to specify advertisement bits
+ * from next pages.
+ */
+static inline void
+mdio45_ethtool_ksettings_get(const struct mdio_if_info *mdio,
+			     struct ethtool_link_ksettings *cmd)
+{
+	mdio45_ethtool_ksettings_get_npage(mdio, cmd, 0, 0);
+}
+
 extern int mdio_mii_ioctl(const struct mdio_if_info *mdio,
 			  struct mii_ioctl_data *mii_data, int cmd);
 
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 2/2] net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-20 22:24 UTC (permalink / raw)
  To: linux-net-drivers, ecree, bkenward, davem, andrew
  Cc: netdev, linux-kernel, Philippe Reynes
In-Reply-To: <1482272667-1206-1-git-send-email-tremyfr@gmail.com>

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/ethernet/sfc/falcon/efx.c          |    2 +-
 drivers/net/ethernet/sfc/falcon/ethtool.c      |   35 ++++++++++++-------
 drivers/net/ethernet/sfc/falcon/mdio_10g.c     |   44 +++++++++++++++---------
 drivers/net/ethernet/sfc/falcon/mdio_10g.h     |    3 +-
 drivers/net/ethernet/sfc/falcon/net_driver.h   |   12 +++---
 drivers/net/ethernet/sfc/falcon/qt202x_phy.c   |    9 +++--
 drivers/net/ethernet/sfc/falcon/tenxpress.c    |   22 ++++++------
 drivers/net/ethernet/sfc/falcon/txc43128_phy.c |    9 +++--
 8 files changed, 80 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index 5c5cb3c..438ef9e 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -986,7 +986,7 @@ void ef4_mac_reconfigure(struct ef4_nic *efx)
 
 /* Push loopback/power/transmit disable settings to the PHY, and reconfigure
  * the MAC appropriately. All other PHY configuration changes are pushed
- * through phy_op->set_settings(), and pushed asynchronously to the MAC
+ * through phy_op->set_link_ksettings(), and pushed asynchronously to the MAC
  * through ef4_monitor().
  *
  * Callers must hold the mac_lock
diff --git a/drivers/net/ethernet/sfc/falcon/ethtool.c b/drivers/net/ethernet/sfc/falcon/ethtool.c
index 8e1929b..659ece7 100644
--- a/drivers/net/ethernet/sfc/falcon/ethtool.c
+++ b/drivers/net/ethernet/sfc/falcon/ethtool.c
@@ -115,44 +115,53 @@ static int ef4_ethtool_phys_id(struct net_device *net_dev,
 }
 
 /* This must be called with rtnl_lock held. */
-static int ef4_ethtool_get_settings(struct net_device *net_dev,
-				    struct ethtool_cmd *ecmd)
+static int
+ef4_ethtool_get_link_ksettings(struct net_device *net_dev,
+			       struct ethtool_link_ksettings *cmd)
 {
 	struct ef4_nic *efx = netdev_priv(net_dev);
 	struct ef4_link_state *link_state = &efx->link_state;
+	u32 supported;
+
+	ethtool_convert_link_mode_to_legacy_u32(&supported,
+						cmd->link_modes.supported);
 
 	mutex_lock(&efx->mac_lock);
-	efx->phy_op->get_settings(efx, ecmd);
+	efx->phy_op->get_link_ksettings(efx, cmd);
 	mutex_unlock(&efx->mac_lock);
 
 	/* Both MACs support pause frames (bidirectional and respond-only) */
-	ecmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+	supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
 
 	if (LOOPBACK_INTERNAL(efx)) {
-		ethtool_cmd_speed_set(ecmd, link_state->speed);
-		ecmd->duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF;
+		cmd->base.speed = link_state->speed;
+		cmd->base.duplex = link_state->fd ? DUPLEX_FULL : DUPLEX_HALF;
 	}
 
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+						supported);
+
 	return 0;
 }
 
 /* This must be called with rtnl_lock held. */
-static int ef4_ethtool_set_settings(struct net_device *net_dev,
-				    struct ethtool_cmd *ecmd)
+static int
+ef4_ethtool_set_link_ksettings(struct net_device *net_dev,
+			       const struct ethtool_link_ksettings *cmd)
 {
 	struct ef4_nic *efx = netdev_priv(net_dev);
 	int rc;
 
 	/* GMAC does not support 1000Mbps HD */
-	if ((ethtool_cmd_speed(ecmd) == SPEED_1000) &&
-	    (ecmd->duplex != DUPLEX_FULL)) {
+	if ((cmd->base.speed == SPEED_1000) &&
+	    (cmd->base.duplex != DUPLEX_FULL)) {
 		netif_dbg(efx, drv, efx->net_dev,
 			  "rejecting unsupported 1000Mbps HD setting\n");
 		return -EINVAL;
 	}
 
 	mutex_lock(&efx->mac_lock);
-	rc = efx->phy_op->set_settings(efx, ecmd);
+	rc = efx->phy_op->set_link_ksettings(efx, cmd);
 	mutex_unlock(&efx->mac_lock);
 	return rc;
 }
@@ -1310,8 +1319,6 @@ static int ef4_ethtool_get_module_info(struct net_device *net_dev,
 }
 
 const struct ethtool_ops ef4_ethtool_ops = {
-	.get_settings		= ef4_ethtool_get_settings,
-	.set_settings		= ef4_ethtool_set_settings,
 	.get_drvinfo		= ef4_ethtool_get_drvinfo,
 	.get_regs_len		= ef4_ethtool_get_regs_len,
 	.get_regs		= ef4_ethtool_get_regs,
@@ -1340,4 +1347,6 @@ static int ef4_ethtool_get_module_info(struct net_device *net_dev,
 	.set_rxfh		= ef4_ethtool_set_rxfh,
 	.get_module_info	= ef4_ethtool_get_module_info,
 	.get_module_eeprom	= ef4_ethtool_get_module_eeprom,
+	.get_link_ksettings	= ef4_ethtool_get_link_ksettings,
+	.set_link_ksettings	= ef4_ethtool_set_link_ksettings,
 };
diff --git a/drivers/net/ethernet/sfc/falcon/mdio_10g.c b/drivers/net/ethernet/sfc/falcon/mdio_10g.c
index e7d7c09..ee0713f 100644
--- a/drivers/net/ethernet/sfc/falcon/mdio_10g.c
+++ b/drivers/net/ethernet/sfc/falcon/mdio_10g.c
@@ -226,33 +226,45 @@ void ef4_mdio_set_mmds_lpower(struct ef4_nic *efx,
 }
 
 /**
- * ef4_mdio_set_settings - Set (some of) the PHY settings over MDIO.
+ * ef4_mdio_set_link_ksettings - Set (some of) the PHY settings over MDIO.
  * @efx:		Efx NIC
- * @ecmd:		New settings
+ * @cmd:		New settings
  */
-int ef4_mdio_set_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
+int ef4_mdio_set_link_ksettings(struct ef4_nic *efx,
+				const struct ethtool_link_ksettings *cmd)
 {
-	struct ethtool_cmd prev = { .cmd = ETHTOOL_GSET };
-
-	efx->phy_op->get_settings(efx, &prev);
-
-	if (ecmd->advertising == prev.advertising &&
-	    ethtool_cmd_speed(ecmd) == ethtool_cmd_speed(&prev) &&
-	    ecmd->duplex == prev.duplex &&
-	    ecmd->port == prev.port &&
-	    ecmd->autoneg == prev.autoneg)
+	struct ethtool_link_ksettings prev = {
+		.base.cmd = ETHTOOL_GLINKSETTINGS
+	};
+	u32 prev_advertising, advertising;
+	u32 prev_supported;
+
+	efx->phy_op->get_link_ksettings(efx, &prev);
+
+	ethtool_convert_link_mode_to_legacy_u32(&advertising,
+						cmd->link_modes.advertising);
+	ethtool_convert_link_mode_to_legacy_u32(&prev_advertising,
+						prev.link_modes.advertising);
+	ethtool_convert_link_mode_to_legacy_u32(&prev_supported,
+						prev.link_modes.supported);
+
+	if (advertising == prev_advertising &&
+	    cmd->base.speed == prev.base.speed &&
+	    cmd->base.duplex == prev.base.duplex &&
+	    cmd->base.port == prev.base.port &&
+	    cmd->base.autoneg == prev.base.autoneg)
 		return 0;
 
 	/* We can only change these settings for -T PHYs */
-	if (prev.port != PORT_TP || ecmd->port != PORT_TP)
+	if (prev.base.port != PORT_TP || cmd->base.port != PORT_TP)
 		return -EINVAL;
 
 	/* Check that PHY supports these settings */
-	if (!ecmd->autoneg ||
-	    (ecmd->advertising | SUPPORTED_Autoneg) & ~prev.supported)
+	if (!cmd->base.autoneg ||
+	    (advertising | SUPPORTED_Autoneg) & ~prev_supported)
 		return -EINVAL;
 
-	ef4_link_set_advertising(efx, ecmd->advertising | ADVERTISED_Autoneg);
+	ef4_link_set_advertising(efx, advertising | ADVERTISED_Autoneg);
 	ef4_mdio_an_reconfigure(efx);
 	return 0;
 }
diff --git a/drivers/net/ethernet/sfc/falcon/mdio_10g.h b/drivers/net/ethernet/sfc/falcon/mdio_10g.h
index 885cf7a..53cb5cc 100644
--- a/drivers/net/ethernet/sfc/falcon/mdio_10g.h
+++ b/drivers/net/ethernet/sfc/falcon/mdio_10g.h
@@ -83,7 +83,8 @@ void ef4_mdio_set_mmds_lpower(struct ef4_nic *efx, int low_power,
 			      unsigned int mmd_mask);
 
 /* Set (some of) the PHY settings over MDIO */
-int ef4_mdio_set_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd);
+int ef4_mdio_set_link_ksettings(struct ef4_nic *efx,
+				const struct ethtool_link_ksettings *cmd);
 
 /* Push advertising flags and restart autonegotiation */
 void ef4_mdio_an_reconfigure(struct ef4_nic *efx);
diff --git a/drivers/net/ethernet/sfc/falcon/net_driver.h b/drivers/net/ethernet/sfc/falcon/net_driver.h
index 210b28f..fe59dd6 100644
--- a/drivers/net/ethernet/sfc/falcon/net_driver.h
+++ b/drivers/net/ethernet/sfc/falcon/net_driver.h
@@ -684,8 +684,8 @@ static inline bool ef4_link_state_equal(const struct ef4_link_state *left,
  * @reconfigure: Reconfigure PHY (e.g. for new link parameters)
  * @poll: Update @link_state and report whether it changed.
  *	Serialised by the mac_lock.
- * @get_settings: Get ethtool settings. Serialised by the mac_lock.
- * @set_settings: Set ethtool settings. Serialised by the mac_lock.
+ * @get_link_ksettings: Get ethtool settings. Serialised by the mac_lock.
+ * @set_link_ksettings: Set ethtool settings. Serialised by the mac_lock.
  * @set_npage_adv: Set abilities advertised in (Extended) Next Page
  *	(only needed where AN bit is set in mmds)
  * @test_alive: Test that PHY is 'alive' (online)
@@ -700,10 +700,10 @@ struct ef4_phy_operations {
 	void (*remove) (struct ef4_nic *efx);
 	int (*reconfigure) (struct ef4_nic *efx);
 	bool (*poll) (struct ef4_nic *efx);
-	void (*get_settings) (struct ef4_nic *efx,
-			      struct ethtool_cmd *ecmd);
-	int (*set_settings) (struct ef4_nic *efx,
-			     struct ethtool_cmd *ecmd);
+	void (*get_link_ksettings)(struct ef4_nic *efx,
+				   struct ethtool_link_ksettings *cmd);
+	int (*set_link_ksettings)(struct ef4_nic *efx,
+				  const struct ethtool_link_ksettings *cmd);
 	void (*set_npage_adv) (struct ef4_nic *efx, u32);
 	int (*test_alive) (struct ef4_nic *efx);
 	const char *(*test_name) (struct ef4_nic *efx, unsigned int index);
diff --git a/drivers/net/ethernet/sfc/falcon/qt202x_phy.c b/drivers/net/ethernet/sfc/falcon/qt202x_phy.c
index d293316..f5e0f18 100644
--- a/drivers/net/ethernet/sfc/falcon/qt202x_phy.c
+++ b/drivers/net/ethernet/sfc/falcon/qt202x_phy.c
@@ -437,9 +437,10 @@ static int qt202x_phy_reconfigure(struct ef4_nic *efx)
 	return 0;
 }
 
-static void qt202x_phy_get_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
+static void qt202x_phy_get_link_ksettings(struct ef4_nic *efx,
+					  struct ethtool_link_ksettings *cmd)
 {
-	mdio45_ethtool_gset(&efx->mdio, ecmd);
+	mdio45_ethtool_ksettings_get(&efx->mdio, cmd);
 }
 
 static void qt202x_phy_remove(struct ef4_nic *efx)
@@ -487,8 +488,8 @@ static int qt202x_phy_get_module_eeprom(struct ef4_nic *efx,
 	.poll		 = qt202x_phy_poll,
 	.fini		 = ef4_port_dummy_op_void,
 	.remove		 = qt202x_phy_remove,
-	.get_settings	 = qt202x_phy_get_settings,
-	.set_settings	 = ef4_mdio_set_settings,
+	.get_link_ksettings = qt202x_phy_get_link_ksettings,
+	.set_link_ksettings = ef4_mdio_set_link_ksettings,
 	.test_alive	 = ef4_mdio_test_alive,
 	.get_module_eeprom = qt202x_phy_get_module_eeprom,
 	.get_module_info = qt202x_phy_get_module_info,
diff --git a/drivers/net/ethernet/sfc/falcon/tenxpress.c b/drivers/net/ethernet/sfc/falcon/tenxpress.c
index acc548a..ff9b4e2 100644
--- a/drivers/net/ethernet/sfc/falcon/tenxpress.c
+++ b/drivers/net/ethernet/sfc/falcon/tenxpress.c
@@ -351,9 +351,6 @@ static int tenxpress_phy_reconfigure(struct ef4_nic *efx)
 	return 0;
 }
 
-static void
-tenxpress_get_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd);
-
 /* Poll for link state changes */
 static bool tenxpress_phy_poll(struct ef4_nic *efx)
 {
@@ -443,7 +440,8 @@ void tenxpress_set_id_led(struct ef4_nic *efx, enum ef4_led_mode mode)
 }
 
 static void
-tenxpress_get_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
+tenxpress_get_link_ksettings(struct ef4_nic *efx,
+			     struct ethtool_link_ksettings *cmd)
 {
 	u32 adv = 0, lpa = 0;
 	int reg;
@@ -455,20 +453,22 @@ void tenxpress_set_id_led(struct ef4_nic *efx, enum ef4_led_mode mode)
 	if (reg & MDIO_AN_10GBT_STAT_LP10G)
 		lpa |= ADVERTISED_10000baseT_Full;
 
-	mdio45_ethtool_gset_npage(&efx->mdio, ecmd, adv, lpa);
+	mdio45_ethtool_ksettings_get_npage(&efx->mdio, cmd, adv, lpa);
 
 	/* In loopback, the PHY automatically brings up the correct interface,
 	 * but doesn't advertise the correct speed. So override it */
 	if (LOOPBACK_EXTERNAL(efx))
-		ethtool_cmd_speed_set(ecmd, SPEED_10000);
+		cmd->base.speed = SPEED_10000;
 }
 
-static int tenxpress_set_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
+static int
+tenxpress_set_link_ksettings(struct ef4_nic *efx,
+			     const struct ethtool_link_ksettings *cmd)
 {
-	if (!ecmd->autoneg)
+	if (!cmd->base.autoneg)
 		return -EINVAL;
 
-	return ef4_mdio_set_settings(efx, ecmd);
+	return ef4_mdio_set_link_ksettings(efx, cmd);
 }
 
 static void sfx7101_set_npage_adv(struct ef4_nic *efx, u32 advertising)
@@ -485,8 +485,8 @@ static void sfx7101_set_npage_adv(struct ef4_nic *efx, u32 advertising)
 	.poll             = tenxpress_phy_poll,
 	.fini             = sfx7101_phy_fini,
 	.remove		  = tenxpress_phy_remove,
-	.get_settings	  = tenxpress_get_settings,
-	.set_settings	  = tenxpress_set_settings,
+	.get_link_ksettings = tenxpress_get_link_ksettings,
+	.set_link_ksettings = tenxpress_set_link_ksettings,
 	.set_npage_adv    = sfx7101_set_npage_adv,
 	.test_alive	  = ef4_mdio_test_alive,
 	.test_name	  = sfx7101_test_name,
diff --git a/drivers/net/ethernet/sfc/falcon/txc43128_phy.c b/drivers/net/ethernet/sfc/falcon/txc43128_phy.c
index 18421f5..3c55fd2 100644
--- a/drivers/net/ethernet/sfc/falcon/txc43128_phy.c
+++ b/drivers/net/ethernet/sfc/falcon/txc43128_phy.c
@@ -540,9 +540,10 @@ static int txc43128_run_tests(struct ef4_nic *efx, int *results, unsigned flags)
 	return rc;
 }
 
-static void txc43128_get_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
+static void txc43128_get_link_ksettings(struct ef4_nic *efx,
+					struct ethtool_link_ksettings *cmd)
 {
-	mdio45_ethtool_gset(&efx->mdio, ecmd);
+	mdio45_ethtool_ksettings_get(&efx->mdio, cmd);
 }
 
 const struct ef4_phy_operations falcon_txc_phy_ops = {
@@ -552,8 +553,8 @@ static void txc43128_get_settings(struct ef4_nic *efx, struct ethtool_cmd *ecmd)
 	.poll		= txc43128_phy_poll,
 	.fini		= txc43128_phy_fini,
 	.remove		= txc43128_phy_remove,
-	.get_settings	= txc43128_get_settings,
-	.set_settings	= ef4_mdio_set_settings,
+	.get_link_ksettings = txc43128_get_link_ksettings,
+	.set_link_ksettings = ef4_mdio_set_link_ksettings,
 	.test_alive	= ef4_mdio_test_alive,
 	.run_tests	= txc43128_run_tests,
 	.test_name	= txc43128_test_name,
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH v5] net: dummy: Introduce dummy virtual functions
From: Phil Sutter @ 2016-12-20 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Sabrina Dubroca

The idea for this was born when testing VF support in iproute2 which was
impeded by hardware requirements. In fact, not every VF-capable hardware
driver implements all netdev ops, so testing the interface is still hard
to do even with a well-sorted hardware shelf.

To overcome this and allow for testing the user-kernel interface, this
patch allows to turn dummy into a PF with a configurable amount of VFs.

Due to the assumption that all PFs are PCI devices, this implementation
is not completely straightforward: In order to allow for
rtnl_fill_ifinfo() to see the dummy VFs, a fake PCI parent device is
attached to the dummy netdev. This has to happen at the right spot so
register_netdevice() does not get confused. This patch abuses
ndo_fix_features callback for that. In ndo_uninit callback, the fake
parent is removed again for the same purpose.

Joint work with Sabrina Dubroca.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
Changes since v4:
- Initialize pci_pdev.sriov at runtime - older gcc versions don't allow
  initializing fields of anonymous unions at declaration time.
- Rebased onto current net-next/master.
  
Changes since v3:
- Changed type of vf_mac field from unsigned char to u8.
- Column-aligned structs' field names.

Changes since v2:
- Fixed oops on reboot (need to initialize parent device mutex).
- Got rid of potential mem leak noticed by Eric Dumazet.
- Dropped stray newline insertion.

Changes since v1:
- Fixed issues reported by kbuild test robot:
  - pci_dev->sriov is only present if CONFIG_PCI_ATS is active.
  - pci_bus_type does not exist if CONFIG_PCI is not defined.
---
 drivers/net/dummy.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 6421835f11b7e..7f8d8598bbbfe 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -34,6 +34,8 @@
 #include <linux/etherdevice.h>
 #include <linux/init.h>
 #include <linux/moduleparam.h>
+#include <linux/pci.h>
+#include "../pci/pci.h"		/* for struct pci_sriov */
 #include <linux/rtnetlink.h>
 #include <net/rtnetlink.h>
 #include <linux/u64_stats_sync.h>
@@ -42,6 +44,34 @@
 #define DRV_VERSION	"1.0"
 
 static int numdummies = 1;
+static int num_vfs;
+
+static struct pci_sriov pdev_sriov;
+
+static struct pci_dev pci_pdev = {
+	.is_physfn = 0,
+#ifdef CONFIG_PCI
+	.dev.bus = &pci_bus_type,
+#endif
+};
+
+struct vf_data_storage {
+	u8	vf_mac[ETH_ALEN];
+	u16	pf_vlan; /* When set, guest VLAN config not allowed. */
+	u16	pf_qos;
+	__be16	vlan_proto;
+	u16	min_tx_rate;
+	u16	max_tx_rate;
+	u8	spoofchk_enabled;
+	bool	rss_query_enabled;
+	u8	trusted;
+	int	link_state;
+};
+
+struct dummy_priv {
+	int			num_vfs;
+	struct vf_data_storage	*vfinfo;
+};
 
 /* fake multicast ability */
 static void set_multicast_list(struct net_device *dev)
@@ -91,15 +121,31 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+	struct dummy_priv *priv = netdev_priv(dev);
+
 	dev->dstats = netdev_alloc_pcpu_stats(struct pcpu_dstats);
 	if (!dev->dstats)
 		return -ENOMEM;
 
+	priv->num_vfs = num_vfs;
+	priv->vfinfo = NULL;
+
+	if (!num_vfs)
+		return 0;
+
+	priv->vfinfo = kcalloc(num_vfs, sizeof(struct vf_data_storage),
+			       GFP_KERNEL);
+	if (!priv->vfinfo) {
+		free_percpu(dev->dstats);
+		return -ENOMEM;
+	}
+
 	return 0;
 }
 
 static void dummy_dev_uninit(struct net_device *dev)
 {
+	dev->dev.parent = NULL;
 	free_percpu(dev->dstats);
 }
 
@@ -112,6 +158,137 @@ static int dummy_change_carrier(struct net_device *dev, bool new_carrier)
 	return 0;
 }
 
+/* fake, just to set fake PCI parent after netdev_register_kobject() */
+static netdev_features_t dummy_fix_features(struct net_device *dev,
+					    netdev_features_t features)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (priv->num_vfs) {
+#ifdef CONFIG_PCI_ATS
+		pci_pdev.sriov = &pdev_sriov;
+#endif
+		dev->dev.parent = &pci_pdev.dev;
+		if (!pci_pdev.is_physfn) {
+			mutex_init(&pci_pdev.dev.mutex);
+			pci_pdev.is_physfn = 1;
+		}
+	}
+
+	return features;
+}
+
+static int dummy_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (!is_valid_ether_addr(mac) || (vf >= priv->num_vfs))
+		return -EINVAL;
+
+	memcpy(priv->vfinfo[vf].vf_mac, mac, ETH_ALEN);
+
+	return 0;
+}
+
+static int dummy_set_vf_vlan(struct net_device *dev, int vf,
+			     u16 vlan, u8 qos, __be16 vlan_proto)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if ((vf >= priv->num_vfs) || (vlan > 4095) || (qos > 7))
+		return -EINVAL;
+
+	priv->vfinfo[vf].pf_vlan = vlan;
+	priv->vfinfo[vf].pf_qos = qos;
+	priv->vfinfo[vf].vlan_proto = vlan_proto;
+
+	return 0;
+}
+
+static int dummy_set_vf_rate(struct net_device *dev, int vf, int min, int max)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	priv->vfinfo[vf].min_tx_rate = min;
+	priv->vfinfo[vf].max_tx_rate = max;
+
+	return 0;
+}
+
+static int dummy_set_vf_spoofchk(struct net_device *dev, int vf, bool val)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	priv->vfinfo[vf].spoofchk_enabled = val;
+
+	return 0;
+}
+
+static int dummy_set_vf_rss_query_en(struct net_device *dev, int vf, bool val)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	priv->vfinfo[vf].rss_query_enabled = val;
+
+	return 0;
+}
+
+static int dummy_set_vf_trust(struct net_device *dev, int vf, bool val)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	priv->vfinfo[vf].trusted = val;
+
+	return 0;
+}
+
+static int dummy_get_vf_config(struct net_device *dev,
+			       int vf, struct ifla_vf_info *ivi)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	ivi->vf = vf;
+	memcpy(&ivi->mac, priv->vfinfo[vf].vf_mac, ETH_ALEN);
+	ivi->vlan = priv->vfinfo[vf].pf_vlan;
+	ivi->qos = priv->vfinfo[vf].pf_qos;
+	ivi->spoofchk = priv->vfinfo[vf].spoofchk_enabled;
+	ivi->linkstate = priv->vfinfo[vf].link_state;
+	ivi->min_tx_rate = priv->vfinfo[vf].min_tx_rate;
+	ivi->max_tx_rate = priv->vfinfo[vf].max_tx_rate;
+	ivi->rss_query_en = priv->vfinfo[vf].rss_query_enabled;
+	ivi->trusted = priv->vfinfo[vf].trusted;
+	ivi->vlan_proto = priv->vfinfo[vf].vlan_proto;
+
+	return 0;
+}
+
+static int dummy_set_vf_link_state(struct net_device *dev, int vf, int state)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	if (vf >= priv->num_vfs)
+		return -EINVAL;
+
+	priv->vfinfo[vf].link_state = state;
+
+	return 0;
+}
+
 static const struct net_device_ops dummy_netdev_ops = {
 	.ndo_init		= dummy_dev_init,
 	.ndo_uninit		= dummy_dev_uninit,
@@ -121,6 +298,15 @@ static const struct net_device_ops dummy_netdev_ops = {
 	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_get_stats64	= dummy_get_stats64,
 	.ndo_change_carrier	= dummy_change_carrier,
+	.ndo_fix_features	= dummy_fix_features,
+	.ndo_set_vf_mac		= dummy_set_vf_mac,
+	.ndo_set_vf_vlan	= dummy_set_vf_vlan,
+	.ndo_set_vf_rate	= dummy_set_vf_rate,
+	.ndo_set_vf_spoofchk	= dummy_set_vf_spoofchk,
+	.ndo_set_vf_trust	= dummy_set_vf_trust,
+	.ndo_get_vf_config	= dummy_get_vf_config,
+	.ndo_set_vf_link_state	= dummy_set_vf_link_state,
+	.ndo_set_vf_rss_query_en = dummy_set_vf_rss_query_en,
 };
 
 static void dummy_get_drvinfo(struct net_device *dev,
@@ -134,6 +320,14 @@ static const struct ethtool_ops dummy_ethtool_ops = {
 	.get_drvinfo            = dummy_get_drvinfo,
 };
 
+static void dummy_free_netdev(struct net_device *dev)
+{
+	struct dummy_priv *priv = netdev_priv(dev);
+
+	kfree(priv->vfinfo);
+	free_netdev(dev);
+}
+
 static void dummy_setup(struct net_device *dev)
 {
 	ether_setup(dev);
@@ -141,7 +335,7 @@ static void dummy_setup(struct net_device *dev)
 	/* Initialize the device structure. */
 	dev->netdev_ops = &dummy_netdev_ops;
 	dev->ethtool_ops = &dummy_ethtool_ops;
-	dev->destructor = free_netdev;
+	dev->destructor = dummy_free_netdev;
 
 	/* Fill in device structure with ethernet-generic values. */
 	dev->flags |= IFF_NOARP;
@@ -172,6 +366,7 @@ static int dummy_validate(struct nlattr *tb[], struct nlattr *data[])
 
 static struct rtnl_link_ops dummy_link_ops __read_mostly = {
 	.kind		= DRV_NAME,
+	.priv_size	= sizeof(struct dummy_priv),
 	.setup		= dummy_setup,
 	.validate	= dummy_validate,
 };
@@ -180,12 +375,16 @@ static struct rtnl_link_ops dummy_link_ops __read_mostly = {
 module_param(numdummies, int, 0);
 MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices");
 
+module_param(num_vfs, int, 0);
+MODULE_PARM_DESC(num_vfs, "Number of dummy VFs per dummy device");
+
 static int __init dummy_init_one(void)
 {
 	struct net_device *dev_dummy;
 	int err;
 
-	dev_dummy = alloc_netdev(0, "dummy%d", NET_NAME_UNKNOWN, dummy_setup);
+	dev_dummy = alloc_netdev(sizeof(struct dummy_priv),
+				 "dummy%d", NET_NAME_UNKNOWN, dummy_setup);
 	if (!dev_dummy)
 		return -ENOMEM;
 
@@ -204,6 +403,8 @@ static int __init dummy_init_module(void)
 {
 	int i, err = 0;
 
+	pdev_sriov.num_VFs = num_vfs;
+
 	rtnl_lock();
 	err = __rtnl_link_register(&dummy_link_ops);
 	if (err < 0)
-- 
2.11.0

^ permalink raw reply related

* [PATCH] net: qcom/emac: add ethtool support
From: Timur Tabi @ 2016-12-20 22:32 UTC (permalink / raw)
  To: David Miller, Florian Fainelli, netdev, Christopher Covington,
	Alok Chauhan

Add support for some ethtool methods: get/set link settings, get/set
message level, get statistics, get link status, and restart
autonegotiation.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/Makefile       |   2 +-
 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c | 156 ++++++++++++++++++++++
 drivers/net/ethernet/qualcomm/emac/emac.c         |  51 ++++---
 drivers/net/ethernet/qualcomm/emac/emac.h         |   3 +
 4 files changed, 191 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-ethtool.c

diff --git a/drivers/net/ethernet/qualcomm/emac/Makefile b/drivers/net/ethernet/qualcomm/emac/Makefile
index 7a66879..fc57ced 100644
--- a/drivers/net/ethernet/qualcomm/emac/Makefile
+++ b/drivers/net/ethernet/qualcomm/emac/Makefile
@@ -4,6 +4,6 @@
 
 obj-$(CONFIG_QCOM_EMAC) += qcom-emac.o
 
-qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o \
+qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o emac-ethtool.o \
 		  emac-sgmii-fsm9900.o emac-sgmii-qdf2432.o \
 		  emac-sgmii-qdf2400.o
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
new file mode 100644
index 0000000..6de5152
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/emac/emac-ethtool.c
@@ -0,0 +1,156 @@
+/* Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/ethtool.h>
+#include <linux/phy.h>
+
+#include "emac.h"
+
+static const char * const emac_ethtool_stat_strings[] = {
+	"rx_ok",
+	"rx_bcast",
+	"rx_mcast",
+	"rx_pause",
+	"rx_ctrl",
+	"rx_fcs_err",
+	"rx_len_err",
+	"rx_byte_cnt",
+	"rx_runt",
+	"rx_frag",
+	"rx_sz_64",
+	"rx_sz_65_127",
+	"rx_sz_128_255",
+	"rx_sz_256_511",
+	"rx_sz_512_1023",
+	"rx_sz_1024_1518",
+	"rx_sz_1519_max",
+	"rx_sz_ov",
+	"rx_rxf_ov",
+	"rx_align_err",
+	"rx_bcast_byte_cnt",
+	"rx_mcast_byte_cnt",
+	"rx_err_addr",
+	"rx_crc_align",
+	"rx_jabbers",
+	"tx_ok",
+	"tx_bcast",
+	"tx_mcast",
+	"tx_pause",
+	"tx_exc_defer",
+	"tx_ctrl",
+	"tx_defer",
+	"tx_byte_cnt",
+	"tx_sz_64",
+	"tx_sz_65_127",
+	"tx_sz_128_255",
+	"tx_sz_256_511",
+	"tx_sz_512_1023",
+	"tx_sz_1024_1518",
+	"tx_sz_1519_max",
+	"tx_1_col",
+	"tx_2_col",
+	"tx_late_col",
+	"tx_abort_col",
+	"tx_underrun",
+	"tx_rd_eop",
+	"tx_len_err",
+	"tx_trunc",
+	"tx_bcast_byte",
+	"tx_mcast_byte",
+	"tx_col",
+};
+
+#define EMAC_STATS_LEN	ARRAY_SIZE(emac_ethtool_stat_strings)
+
+static u32 emac_get_msglevel(struct net_device *netdev)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+
+	return adpt->msg_enable;
+}
+
+static void emac_set_msglevel(struct net_device *netdev, u32 data)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+
+	adpt->msg_enable = data;
+}
+
+static int emac_get_sset_count(struct net_device *netdev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return EMAC_STATS_LEN;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void emac_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
+{
+	unsigned int i;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < EMAC_STATS_LEN; i++) {
+			strlcpy(data, emac_ethtool_stat_strings[i],
+				ETH_GSTRING_LEN);
+			data += ETH_GSTRING_LEN;
+		}
+		break;
+	}
+}
+
+static void emac_get_ethtool_stats(struct net_device *netdev,
+				   struct ethtool_stats *stats,
+				   u64 *data)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+
+	spin_lock(&adpt->stats.lock);
+
+	emac_update_hw_stats(adpt);
+	memcpy(data, &adpt->stats, EMAC_STATS_LEN * sizeof(u64));
+
+	spin_unlock(&adpt->stats.lock);
+}
+
+static int emac_nway_reset(struct net_device *netdev)
+{
+	struct phy_device *phydev = netdev->phydev;
+
+	if (!phydev)
+		return -ENODEV;
+
+	return genphy_restart_aneg(phydev);
+}
+
+static const struct ethtool_ops emac_ethtool_ops = {
+	.get_link_ksettings = phy_ethtool_get_link_ksettings,
+	.set_link_ksettings = phy_ethtool_set_link_ksettings,
+
+	.get_msglevel    = emac_get_msglevel,
+	.set_msglevel    = emac_set_msglevel,
+
+	.get_sset_count  = emac_get_sset_count,
+	.get_strings = emac_get_strings,
+	.get_ethtool_stats = emac_get_ethtool_stats,
+
+	.nway_reset = emac_nway_reset,
+
+	.get_link = ethtool_op_get_link,
+};
+
+void emac_set_ethtool_ops(struct net_device *netdev)
+{
+	netdev->ethtool_ops = &emac_ethtool_ops;
+}
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 422289c..1ab4478 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -311,45 +311,55 @@ static int emac_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 	return phy_mii_ioctl(netdev->phydev, ifr, cmd);
 }
 
-/* Provide network statistics info for the interface */
-static struct rtnl_link_stats64 *emac_get_stats64(struct net_device *netdev,
-						  struct rtnl_link_stats64 *net_stats)
+/**
+ * emac_update_hw_stats - read the EMAC stat registers
+ *
+ * Reads the stats registers and write the values to adpt->stats.
+ *
+ * adpt->stats.lock must be held while calling this function.
+ */
+void emac_update_hw_stats(struct emac_adapter *adpt)
 {
-	struct emac_adapter *adpt = netdev_priv(netdev);
-	unsigned int addr = REG_MAC_RX_STATUS_BIN;
 	struct emac_stats *stats = &adpt->stats;
 	u64 *stats_itr = &adpt->stats.rx_ok;
-	u32 val;
-
-	spin_lock(&stats->lock);
+	void __iomem *base = adpt->base;
+	unsigned int addr;
 
+	addr = REG_MAC_RX_STATUS_BIN;
 	while (addr <= REG_MAC_RX_STATUS_END) {
-		val = readl_relaxed(adpt->base + addr);
-		*stats_itr += val;
+		*stats_itr += readl_relaxed(base + addr);
 		stats_itr++;
 		addr += sizeof(u32);
 	}
 
 	/* additional rx status */
-	val = readl_relaxed(adpt->base + EMAC_RXMAC_STATC_REG23);
-	adpt->stats.rx_crc_align += val;
-	val = readl_relaxed(adpt->base + EMAC_RXMAC_STATC_REG24);
-	adpt->stats.rx_jabbers += val;
+	stats->rx_crc_align += readl_relaxed(base + EMAC_RXMAC_STATC_REG23);
+	stats->rx_jabbers += readl_relaxed(base + EMAC_RXMAC_STATC_REG24);
 
 	/* update tx status */
 	addr = REG_MAC_TX_STATUS_BIN;
-	stats_itr = &adpt->stats.tx_ok;
+	stats_itr = &stats->tx_ok;
 
 	while (addr <= REG_MAC_TX_STATUS_END) {
-		val = readl_relaxed(adpt->base + addr);
-		*stats_itr += val;
-		++stats_itr;
+		*stats_itr += readl_relaxed(base + addr);
+		stats_itr++;
 		addr += sizeof(u32);
 	}
 
 	/* additional tx status */
-	val = readl_relaxed(adpt->base + EMAC_TXMAC_STATC_REG25);
-	adpt->stats.tx_col += val;
+	stats->tx_col += readl_relaxed(base + EMAC_TXMAC_STATC_REG25);
+}
+
+/* Provide network statistics info for the interface */
+static struct rtnl_link_stats64 *
+emac_get_stats64(struct net_device *netdev, struct rtnl_link_stats64 *net_stats)
+{
+	struct emac_adapter *adpt = netdev_priv(netdev);
+	struct emac_stats *stats = &adpt->stats;
+
+	spin_lock(&stats->lock);
+
+	emac_update_hw_stats(adpt);
 
 	/* return parsed statistics */
 	net_stats->rx_packets = stats->rx_ok;
@@ -620,6 +630,7 @@ static int emac_probe(struct platform_device *pdev)
 
 	dev_set_drvdata(&pdev->dev, netdev);
 	SET_NETDEV_DEV(netdev, &pdev->dev);
+	emac_set_ethtool_ops(netdev);
 
 	adpt = netdev_priv(netdev);
 	adpt->netdev = netdev;
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.h b/drivers/net/ethernet/qualcomm/emac/emac.h
index 0c76e6c..4b8483c 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.h
+++ b/drivers/net/ethernet/qualcomm/emac/emac.h
@@ -332,4 +332,7 @@ struct emac_adapter {
 void emac_reg_update32(void __iomem *addr, u32 mask, u32 val);
 irqreturn_t emac_isr(int irq, void *data);
 
+void emac_set_ethtool_ops(struct net_device *netdev);
+void emac_update_hw_stats(struct emac_adapter *adpt);
+
 #endif /* _EMAC_H_ */
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* Re: HalfSipHash Acceptable Usage
From: George Spelvin @ 2016-12-20 23:07 UTC (permalink / raw)
  To: Jason, tytso
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, vegard.nossum
In-Reply-To: <20161220213636.tiqj2o4uupasr4aj@thunk.org>

Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?

> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.

Because we don't rekey TCP sequence numbers, ever.  See commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec

To rekey them requires dividing the sequence number base into a "random"
part and some "generation" msbits.  While we can do better than the
previous 8+24 split (I'd suggest 4+28 or 3+29), only 2 is tricks, and
1 generation bit isn't enough.

So while it helps in the long term, it reduces the security offered by
the random part in the short term.  (If I know 4 bits of your ISN,
I only need to send 256 MB to hit your TCP window.)

At the time, I objected, and suggested doing two hashes, with a fixed
32-bit base plus a split rekeyed portion, but that was vetoed on the
grounds of performance.

On further consideration, the fixed base doesn't help much.
(Details below for anyone that cares.)

Suppose we let the TCP initial sequence number be:

(Hash(<srcIP,dstIP,srcPort,dstPort>, fixed_key) & 0xffffffff) +
(i << 28) + (Hash(<srcIP,dstIP,srcPort,dstPort>, key[i]) & 0x0fffffff) +
(current_time_in_nanoseconds / 64)

It's not hugely difficult to mount an effective attack against a
64-bit fixed_key.

As an attacker, I can ask the target to send me these numbers for dstPort
values i control and other values I know.  I can (with high probability)
detect the large jumps when the generation changes, so I can make a
significant number of queries with the same generation.  After 23-ish
queries, I have enough information to identify a 64-bit fixed_key.

I don't know the current generation counter "i", but I know it's the
same for all my queries, so for any two queries, the maximum difference
between the 28-bit hash values is 29 bits.  (We can also add a small
margin to allow for timeing uncertainty, but that's even less.)

So if I guess a fixed key, hash my known plaintexts with that guess,
subtract the ciphertexts from the observed sequence numbers, and the
difference between the remaining (unknown) 28-bit hash values plus
timestamps exceeds what's possible, my guess is wrong.

I can then repeat with additional known plaintexts, reducing the space
of admissible keys by about 3 bits each time.

Assuming I can rent GPU horsepower from a bitcoin miner to do this in a
reasonable period of time, after 22 known plaintext differences, I have
uniquely identified the key.

Of course, in practice I'd do is a first pass with maybe 6 plaintexts
on the GPU, and then deal with the candidates found in a second pass.
But either way, it's about 2.3 SipHash evaluations per key tested.
As I noted earlier, a bitcoin blockchain block, worth 25 bitcoins,
currently costs 2^71 evaluations of SHA-2 (2^70 evaluations of double
SHA-2), and that's accomplished every 10 minutes, this is definitely
practical.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox