Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
From: Rob Herring @ 2016-12-22 22:30 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland
In-Reply-To: <1482271793-7671-3-git-send-email-m-karicheri2@ti.com>

On Tue, Dec 20, 2016 at 05:09:45PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok <w-kwok2@ti.com>
> 
> The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
> functionality of a physical coding sublayer (PCS) on data being
> transferred between a demuxed XGMII and SerDes supporting a 16
> or 32 bit interface.  From the driver point of view, whether
> a ethernet link is up or not depends also on the status of the
> block-lock bit of the PCSR.  This patch adds the checking of that
> bit in order to determine the link status.

I would think this would be a common thing and the phy driver should 
provide the status, rather than trying to give the ethernet driver 
direct access to the phy registers. Is the PCSR the serdes phy or 
registers in addition to that?

Rob

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Hannes Frederic Sowa @ 2016-12-22 21:38 UTC (permalink / raw)
  To: George Spelvin, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <20161222211140.2816.qmail@ns.sciencehorizons.net>

On 22.12.2016 22:11, George Spelvin wrote:
>> I do tend to like Ted's version in which we use batched
>> get_random_bytes() output.  If it's fast enough, it's simpler and lets
>> us get the full strength of a CSPRNG.
> 
> With the ChaCha20 generator, that's fine, although note that this abandons
> anti-backtracking entirely.
> 
> It also takes locks, something the previous get_random_int code
> path avoided.  Do we need to audit the call sites to ensure that's safe?

We have spin_lock_irq* locks on the way. Of course they can hurt in when
contended. The situation should be the same as with get_random_bytes,
callable from every possible situation in the kernel without fear, so
this should also work for get_random_int. A lockdep test should still be
done. ;)

> And there is the issue that the existing callers assume that there's a
> fixed cost per word.  A good half of get_random_long calls are followed by
> "& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
> - 1)" to extract the low 28.  If we have a buffer we're going to have to
> pay to refill, it would be nice to use less than 8 bytes to satisfy those.
> 
> But that can be a followup patch.  I'm thinking
> 
> unsigned long get_random_bits(unsigned bits)
> 	E.g. get_random_bits(PAGE_SHIFT),
> 	     get_random_bits(mmap_rnd_bits),
> 	u32 imm_rnd = get_random_bits(32)
> 
> unsigned get_random_mod(unsigned modulus)
> 	E.g. get_random_mod(hole) & ~(alignment - 1);
> 	     get_random_mod(port_scan_backoff)
> 	(Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
> 	to prandom.)
> 
> with, until the audit is completed:
> #define get_random_int() get_random_bits(32)
> #define get_random_long() get_random_bits(BITS_PER_LONG)

Yes, that does look nice indeed. Accounting for bits instead of bytes
shouldn't be a huge problem either. Maybe it gets a bit more verbose in
case you can't satisfy a request with one batched entropy block and have
to consume randomness from two.

>> It could only mix the output back in every two calls, in which case
>> you can backtrack up to one call but you need to do 2^128 work to
>> backtrack farther.  But yes, this is getting excessively complicated.
> 
> No, if you're willing to accept limited backtrack, this is a perfectly
> acceptable solution, and not too complicated.  You could do it phase-less
> if you like; store the previous output, then after generating the new
> one, mix in both.  Then overwrite the previous output.  (But doing two
> rounds of a crypto primtive to avoid one conditional jump is stupid,
> so forget that.)

Can you quickly explain why we lose the backtracking capability?

ChaCha as a block cipher gives a "perfect" permutation from the output
of either the CRNG or the CPRNG, which actually itself has backtracking
protection.

Thanks for explaining,
Hannes

^ permalink raw reply

* [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
Once one sk has made it onto the list we know that there are no potential bind
conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
we know we have to do an extra check for subsequent reuseport sockets and skip
the expensive bind conflict check.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/inet_hashtables.h   |  4 ++++
 net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 756ed16..4ccc18f 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -74,12 +74,16 @@ struct inet_ehash_bucket {
  * users logged onto your box, isn't it nice to know that new data
  * ports are created in O(1) time?  I thought so. ;-)	-DaveM
  */
+#define FASTREUSEPORT_ANY	1
+#define FASTREUSEPORT_STRICT	2
+
 struct inet_bind_bucket {
 	possible_net_t		ib_net;
 	unsigned short		port;
 	signed char		fastreuse;
 	signed char		fastreuseport;
 	kuid_t			fastuid;
+	struct sock_common	fastsock;
 	int			num_owners;
 	struct hlist_node	node;
 	struct hlist_head	owners;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 923bd60..814382f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -236,6 +236,32 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	return head;
 }
 
+static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
+				     struct sock *sk)
+{
+	struct sock *sk2 = (struct sock *)&tb->fastsock;
+	kuid_t uid = sock_i_uid(sk);
+
+	if (tb->fastreuseport <= 0)
+		return 0;
+	if (!sk->sk_reuseport)
+		return 0;
+	if (rcu_access_pointer(sk->sk_reuseport_cb))
+		return 0;
+	if (!uid_eq(tb->fastuid, uid))
+		return 0;
+	/* We only need to check the rcv_saddr if this tb was once marked
+	 * without fastreuseport and then was reset, as we can only know that
+	 * the fastsock has no potential bind conflicts with the rest of the
+	 * possible socks on the owners list.
+	 */
+	if (tb->fastreuseport == FASTREUSEPORT_ANY)
+		return 1;
+	if (!inet_rcv_saddr_equal(sk, sk2, true))
+		return 0;
+	return 1;
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -275,9 +301,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			goto success;
 
 		if ((tb->fastreuse > 0 && reuse) ||
-		     (tb->fastreuseport > 0 &&
-		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
+		    sk_reuseport_match(tb, sk))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
@@ -288,14 +312,35 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
 			tb->fastuid = uid;
+			memcpy(&tb->fastsock, &sk->__sk_common,
+			       sizeof(struct sock_common));
 		} else {
 			tb->fastreuseport = 0;
 		}
 	} else {
 		if (!reuse)
 			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+		if (sk->sk_reuseport) {
+			/* We didn't match or we don't have fastreuseport set on
+			 * the tb, but we have sk_reuseport set on this socket
+			 * and we know that there are no bind conflicts with
+			 * this socket in this tb, so reset our tb's reuseport
+			 * settings so that any subsequent sockets that match
+			 * our current socket will be put on the fast path.
+			 *
+			 * If we reset we need to set FASTREUSEPORT_STRICT so we
+			 * do extra checking for all subsequent sk_reuseport
+			 * socks.
+			 */
+			if (!sk_reuseport_match(tb, sk)) {
+				tb->fastreuseport = FASTREUSEPORT_STRICT;
+				tb->fastuid = uid;
+				memcpy(&tb->fastsock, &sk->__sk_common,
+				       sizeof(struct sock_common));
+			}
+		} else {
 			tb->fastreuseport = 0;
+		}
 	}
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
-- 
2.5.5

^ permalink raw reply related

* [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a1c9055..d352366 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -165,7 +165,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
 	int ret = 1, attempts = 5, port = snum;
-	int smallest_size = -1, smallest_port;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -175,7 +174,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuseport_ok = !!snum;
 
 	if (port) {
-have_port:
 		head = &hinfo->bhash[inet_bhashfn(net, port,
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);
@@ -209,8 +207,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	 * We do the opposite to not pollute connect() users.
 	 */
 	offset |= 1U;
-	smallest_size = -1;
-	smallest_port = low; /* avoid compiler warning */
 
 other_parity_scan:
 	port = low + offset;
@@ -224,15 +220,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
-				if (((tb->fastreuse > 0 && reuse) ||
-				     (tb->fastreuseport > 0 &&
-				      sk->sk_reuseport &&
-				      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-				      uid_eq(tb->fastuid, uid))) &&
-				    (tb->num_owners < smallest_size || smallest_size == -1)) {
-					smallest_size = tb->num_owners;
-					smallest_port = port;
-				}
 				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
 					goto tb_found;
 				goto next_port;
@@ -243,10 +230,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		cond_resched();
 	}
 
-	if (smallest_size != -1) {
-		port = smallest_port;
-		goto have_port;
-	}
 	offset--;
 	if (!(offset & 1))
 		goto other_parity_scan;
@@ -268,19 +251,18 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		if (sk->sk_reuse == SK_FORCE_REUSE)
 			goto success;
 
-		if (((tb->fastreuse > 0 && reuse) ||
+		if ((tb->fastreuse > 0 && reuse) ||
 		     (tb->fastreuseport > 0 &&
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
-		    smallest_size == -1)
+		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
 			if ((reuse ||
 			     (tb->fastreuseport > 0 &&
 			      sk->sk_reuseport &&
 			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) &&
-			    !snum && smallest_size != -1 && --attempts >= 0) {
+			      uid_eq(tb->fastuid, uid))) && !snum &&
+			    --attempts >= 0) {
 				spin_unlock_bh(&head->lock);
 				goto again;
 			}
-- 
2.5.5

^ permalink raw reply related

* [PATCH 5/6 net-next] inet: split inet_csk_get_port into two functions
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use.  Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 66 +++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f6a34bc..923bd60 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -156,33 +156,21 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 	return sk2 != NULL;
 }
 
-/* Obtain a reference to a local port for the given sock,
- * if snum is zero it means select any available local port.
- * We try to allocate an odd port (and leave even ports for connect())
+/*
+ * Find an open port number for the socket.  Returns with the
+ * inet_bind_hashbucket lock held.
  */
-int inet_csk_get_port(struct sock *sk, unsigned short snum)
+static struct inet_bind_hashbucket *
+inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *port_ret)
 {
-	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, port = snum;
+	int port = 0;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
-	kuid_t uid = sock_i_uid(sk);
 	u32 remaining, offset;
-	bool reuseport_ok = !!snum;
 
-	if (port) {
-		head = &hinfo->bhash[inet_bhashfn(net, port,
-						  hinfo->bhash_size)];
-		spin_lock_bh(&head->lock);
-		inet_bind_bucket_for_each(tb, &head->chain)
-			if (net_eq(ib_net(tb), net) && tb->port == port)
-				goto tb_found;
-
-		goto tb_not_found;
-	}
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -221,11 +209,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
 				if (hlist_empty(&tb->owners))
 					goto success;
-				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
+				if (!inet_csk_bind_conflict(sk, tb, false, false))
 					goto success;
 				goto next_port;
 			}
-		goto tb_not_found;
+		tb = NULL;
+		goto success;
 next_port:
 		spin_unlock_bh(&head->lock);
 		cond_resched();
@@ -240,8 +229,41 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		attempt_half = 2;
 		goto other_half_scan;
 	}
-	return ret;
+	return NULL;
+success:
+	*port_ret = port;
+	*tb_ret = tb;
+	return head;
+}
+
+/* Obtain a reference to a local port for the given sock,
+ * if snum is zero it means select any available local port.
+ * We try to allocate an odd port (and leave even ports for connect())
+ */
+int inet_csk_get_port(struct sock *sk, unsigned short snum)
+{
+	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
+	int ret = 1, port = snum;
+	struct inet_bind_hashbucket *head;
+	struct net *net = sock_net(sk);
+	struct inet_bind_bucket *tb = NULL;
+	kuid_t uid = sock_i_uid(sk);
 
+	if (!port) {
+		head = inet_csk_find_open_port(sk, &tb, &port);
+		if (!head)
+			return ret;
+		if (!tb)
+			goto tb_not_found;
+		goto success;
+	}
+	head = &hinfo->bhash[inet_bhashfn(net, port,
+					  hinfo->bhash_size)];
+	spin_lock_bh(&head->lock);
+	inet_bind_bucket_for_each(tb, &head->chain)
+		if (net_eq(ib_net(tb), net) && tb->port == port)
+			goto tb_found;
 tb_not_found:
 	tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
 				     net, head, port);
@@ -257,7 +279,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
+		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
 	}
 success:
-- 
2.5.5

^ permalink raw reply related

* [PATCH 4/6 net-next] inet: don't check for bind conflicts twice when searching for a port
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer.  Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path.  Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d352366..f6a34bc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -164,7 +164,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 {
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, attempts = 5, port = snum;
+	int ret = 1, port = snum;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -183,7 +183,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
 		goto tb_not_found;
 	}
-again:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -220,8 +219,10 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
+				if (hlist_empty(&tb->owners))
+					goto success;
 				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
-					goto tb_found;
+					goto success;
 				goto next_port;
 			}
 		goto tb_not_found;
@@ -256,23 +257,11 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
-			if ((reuse ||
-			     (tb->fastreuseport > 0 &&
-			      sk->sk_reuseport &&
-			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) && !snum &&
-			    --attempts >= 0) {
-				spin_unlock_bh(&head->lock);
-				goto again;
-			}
+		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
 			goto fail_unlock;
-		}
-		if (!reuse)
-			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
-			tb->fastreuseport = 0;
-	} else {
+	}
+success:
+	if (!hlist_empty(&tb->owners)) {
 		tb->fastreuse = reuse;
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
@@ -280,8 +269,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		} else {
 			tb->fastreuseport = 0;
 		}
+	} else {
+		if (!reuse)
+			tb->fastreuse = 0;
+		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+			tb->fastreuseport = 0;
 	}
-success:
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
 	WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
-- 
2.5.5

^ permalink raw reply related

* [PATCH 2/6 net-next] inet: drop ->bind_conflict
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict
is how they check the rcv_saddr, so delete this call back and simply
change inet_csk_bind_conflict to call inet_rcv_saddr_equal.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/inet6_connection_sock.h |  5 -----
 include/net/inet_connection_sock.h  |  6 ------
 net/dccp/ipv4.c                     |  2 +-
 net/dccp/ipv6.c                     |  2 --
 net/ipv4/inet_connection_sock.c     | 22 +++++++-------------
 net/ipv4/tcp_ipv4.c                 |  2 +-
 net/ipv6/inet6_connection_sock.c    | 40 -------------------------------------
 net/ipv6/tcp_ipv6.c                 |  2 --
 8 files changed, 9 insertions(+), 72 deletions(-)

diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
index 3212b39..8ec87b6 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -15,16 +15,11 @@
 
 #include <linux/types.h>
 
-struct inet_bind_bucket;
 struct request_sock;
 struct sk_buff;
 struct sock;
 struct sockaddr;
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-			    const struct inet_bind_bucket *tb, bool relax,
-			    bool soreuseport_ok);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6,
 				      const struct request_sock *req, u8 proto);
 
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 85ee387..add75c7 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -62,9 +62,6 @@ struct inet_connection_sock_af_ops {
 				char __user *optval, int __user *optlen);
 #endif
 	void	    (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
-	int	    (*bind_conflict)(const struct sock *sk,
-				     const struct inet_bind_bucket *tb,
-				     bool relax, bool soreuseport_ok);
 	void	    (*mtu_reduced)(struct sock *sk);
 };
 
@@ -261,9 +258,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
 
 struct sock *inet_csk_accept(struct sock *sk, int flags, int *err);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-			   const struct inet_bind_bucket *tb, bool relax,
-			   bool soreuseport_ok);
 int inet_csk_get_port(struct sock *sk, unsigned short snum);
 
 struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d859a5c..ed6f99b 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -17,6 +17,7 @@
 #include <linux/skbuff.h>
 #include <linux/random.h>
 
+#include <net/addrconf.h>
 #include <net/icmp.h>
 #include <net/inet_common.h>
 #include <net/inet_hashtables.h>
@@ -904,7 +905,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv4_af_ops = {
 	.getsockopt	   = ip_getsockopt,
 	.addr2sockaddr	   = inet_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in),
-	.bind_conflict	   = inet_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ip_setsockopt,
 	.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index adfc790..08bcdc3 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -937,7 +937,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_af_ops = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
@@ -958,7 +957,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index ba597cb..a1c9055 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -116,9 +116,9 @@ void inet_get_local_port_range(struct net *net, int *low, int *high)
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
-int inet_csk_bind_conflict(const struct sock *sk,
-			   const struct inet_bind_bucket *tb, bool relax,
-			   bool reuseport_ok)
+static int inet_csk_bind_conflict(const struct sock *sk,
+				  const struct inet_bind_bucket *tb,
+				  bool relax, bool reuseport_ok)
 {
 	struct sock *sk2;
 	bool reuse = sk->sk_reuse;
@@ -134,7 +134,6 @@ int inet_csk_bind_conflict(const struct sock *sk,
 
 	sk_for_each_bound(sk2, &tb->owners) {
 		if (sk != sk2 &&
-		    !inet_v6_ipv6only(sk2) &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
@@ -144,23 +143,18 @@ int inet_csk_bind_conflict(const struct sock *sk,
 			     rcu_access_pointer(sk->sk_reuseport_cb) ||
 			     (sk2->sk_state != TCP_TIME_WAIT &&
 			     !uid_eq(uid, sock_i_uid(sk2))))) {
-
-				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
-				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
+				if (inet_rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
 			if (!relax && reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
-
-				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
-				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
+				if (inet_rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
 		}
 	}
 	return sk2 != NULL;
 }
-EXPORT_SYMBOL_GPL(inet_csk_bind_conflict);
 
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
@@ -239,8 +233,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 					smallest_size = tb->num_owners;
 					smallest_port = port;
 				}
-				if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false,
-									      reuseport_ok))
+				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
 					goto tb_found;
 				goto next_port;
 			}
@@ -281,8 +274,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
 		    smallest_size == -1)
 			goto success;
-		if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true,
-							     reuseport_ok)) {
+		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
 			if ((reuse ||
 			     (tb->fastreuseport > 0 &&
 			      sk->sk_reuseport &&
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 30d81f5..7664386 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -63,6 +63,7 @@
 #include <linux/times.h>
 #include <linux/slab.h>
 
+#include <net/addrconf.h>
 #include <net/net_namespace.h>
 #include <net/icmp.h>
 #include <net/inet_hashtables.h>
@@ -1817,7 +1818,6 @@ const struct inet_connection_sock_af_ops ipv4_specific = {
 	.getsockopt	   = ip_getsockopt,
 	.addr2sockaddr	   = inet_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in),
-	.bind_conflict	   = inet_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ip_setsockopt,
 	.compat_getsockopt = compat_ip_getsockopt,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 55ee2ea..97074c4 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -28,46 +28,6 @@
 #include <net/inet6_connection_sock.h>
 #include <net/sock_reuseport.h>
 
-int inet6_csk_bind_conflict(const struct sock *sk,
-			    const struct inet_bind_bucket *tb, bool relax,
-			    bool reuseport_ok)
-{
-	const struct sock *sk2;
-	bool reuse = !!sk->sk_reuse;
-	bool reuseport = !!sk->sk_reuseport && reuseport_ok;
-	kuid_t uid = sock_i_uid((struct sock *)sk);
-
-	/* We must walk the whole port owner list in this case. -DaveM */
-	/*
-	 * See comment in inet_csk_bind_conflict about sock lookup
-	 * vs net namespaces issues.
-	 */
-	sk_for_each_bound(sk2, &tb->owners) {
-		if (sk != sk2 &&
-		    (!sk->sk_bound_dev_if ||
-		     !sk2->sk_bound_dev_if ||
-		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) {
-			if ((!reuse || !sk2->sk_reuse ||
-			     sk2->sk_state == TCP_LISTEN) &&
-			    (!reuseport || !sk2->sk_reuseport ||
-			     rcu_access_pointer(sk->sk_reuseport_cb) ||
-			     (sk2->sk_state != TCP_TIME_WAIT &&
-			      !uid_eq(uid,
-				      sock_i_uid((struct sock *)sk2))))) {
-				if (inet_rcv_saddr_equal(sk, sk2, true))
-					break;
-			}
-			if (!relax && reuse && sk2->sk_reuse &&
-			    sk2->sk_state != TCP_LISTEN &&
-			    inet_rcv_saddr_equal(sk, sk2, true))
-				break;
-		}
-	}
-
-	return sk2 != NULL;
-}
-EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
-
 struct dst_entry *inet6_csk_route_req(const struct sock *sk,
 				      struct flowi6 *fl6,
 				      const struct request_sock *req,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 73bc8fc6..bd846d1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1620,7 +1620,6 @@ static const struct inet_connection_sock_af_ops ipv6_specific = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
@@ -1651,7 +1650,6 @@ static const struct inet_connection_sock_af_ops ipv6_mapped = {
 	.getsockopt	   = ipv6_getsockopt,
 	.addr2sockaddr	   = inet6_csk_addr2sockaddr,
 	.sockaddr_len	   = sizeof(struct sockaddr_in6),
-	.bind_conflict	   = inet6_csk_bind_conflict,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_ipv6_setsockopt,
 	.compat_getsockopt = compat_ipv6_getsockopt,
-- 
2.5.5

^ permalink raw reply related

* [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

We pass these per-protocol equal functions around in various places, but
we can just have one function that checks the sk->sk_family and then do
the right comparison function.  I've also changed the ipv4 version to
not cast to inet_sock since it is unneeded.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/addrconf.h           |  4 +--
 include/net/inet_hashtables.h    |  5 +--
 include/net/udp.h                |  1 -
 net/ipv4/inet_connection_sock.c  | 72 ++++++++++++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c       | 16 +++------
 net/ipv4/udp.c                   | 58 +++++++-------------------------
 net/ipv6/inet6_connection_sock.c |  4 +--
 net/ipv6/inet6_hashtables.c      | 46 +------------------------
 net/ipv6/udp.c                   |  2 +-
 9 files changed, 95 insertions(+), 113 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8f998af..17c6fd8 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr,
 		      u32 banned_flags);
 int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
 		    u32 banned_flags);
-int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
-			 bool match_wildcard);
-int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
 			 bool match_wildcard);
 void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
 void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr);
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 0574493..756ed16 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h);
 
 bool inet_ehash_insert(struct sock *sk, struct sock *osk);
 bool inet_ehash_nolisten(struct sock *sk, struct sock *osk);
-int __inet_hash(struct sock *sk, struct sock *osk,
-		int (*saddr_same)(const struct sock *sk1,
-				  const struct sock *sk2,
-				  bool match_wildcard));
+int __inet_hash(struct sock *sk, struct sock *osk);
 int inet_hash(struct sock *sk);
 void inet_unhash(struct sock *sk);
 
diff --git a/include/net/udp.h b/include/net/udp.h
index 1661791..c9d8b8e 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long timeout)
 }
 
 int udp_lib_get_port(struct sock *sk, unsigned short snum,
-		     int (*)(const struct sock *, const struct sock *, bool),
 		     unsigned int hash2_nulladdr);
 
 u32 udp_flow_hashrnd(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 19ea045..ba597cb 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
 EXPORT_SYMBOL(inet_csk_timer_bug_msg);
 #endif
 
+#if IS_ENABLED(CONFIG_IPV6)
+/* match_wildcard == true:  IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6
+ *                          only, and any IPv4 addresses if not IPv6 only
+ * match_wildcard == false: addresses must be exactly the same, i.e.
+ *                          IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY,
+ *                          and 0.0.0.0 equals to 0.0.0.0 only
+ */
+static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+				bool match_wildcard)
+{
+	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
+	int sk2_ipv6only = inet_v6_ipv6only(sk2);
+	int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
+	int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED;
+
+	/* if both are mapped, treat as IPv4 */
+	if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) {
+		if (!sk2_ipv6only) {
+			if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr)
+				return 1;
+			if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr)
+				return match_wildcard;
+		}
+		return 0;
+	}
+
+	if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY)
+		return 1;
+
+	if (addr_type2 == IPV6_ADDR_ANY && match_wildcard &&
+	    !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))
+		return 1;
+
+	if (addr_type == IPV6_ADDR_ANY && match_wildcard &&
+	    !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED))
+		return 1;
+
+	if (sk2_rcv_saddr6 &&
+	    ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6))
+		return 1;
+
+	return 0;
+}
+#endif
+
+/* match_wildcard == true:  0.0.0.0 equals to any IPv4 addresses
+ * match_wildcard == false: addresses must be exactly the same, i.e.
+ *                          0.0.0.0 only equals to 0.0.0.0
+ */
+static int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+				bool match_wildcard)
+{
+	if (!ipv6_only_sock(sk2)) {
+		if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr)
+			return 1;
+		if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr)
+			return match_wildcard;
+	}
+	return 0;
+}
+
+int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
+			 bool match_wildcard)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6)
+		return ipv6_rcv_saddr_equal(sk, sk2, match_wildcard);
+#endif
+	return ipv4_rcv_saddr_equal(sk, sk2, match_wildcard);
+}
+EXPORT_SYMBOL(inet_rcv_saddr_equal);
+
 void inet_get_local_port_range(struct net *net, int *low, int *high)
 {
 	unsigned int seq;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835..2ef9b01 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -435,10 +435,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk)
 EXPORT_SYMBOL_GPL(inet_ehash_nolisten);
 
 static int inet_reuseport_add_sock(struct sock *sk,
-				   struct inet_listen_hashbucket *ilb,
-				   int (*saddr_same)(const struct sock *sk1,
-						     const struct sock *sk2,
-						     bool match_wildcard))
+				   struct inet_listen_hashbucket *ilb)
 {
 	struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash;
 	struct sock *sk2;
@@ -451,7 +448,7 @@ static int inet_reuseport_add_sock(struct sock *sk,
 		    sk2->sk_bound_dev_if == sk->sk_bound_dev_if &&
 		    inet_csk(sk2)->icsk_bind_hash == tb &&
 		    sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) &&
-		    saddr_same(sk, sk2, false))
+		    inet_rcv_saddr_equal(sk, sk2, false))
 			return reuseport_add_sock(sk, sk2);
 	}
 
@@ -461,10 +458,7 @@ static int inet_reuseport_add_sock(struct sock *sk,
 	return 0;
 }
 
-int __inet_hash(struct sock *sk, struct sock *osk,
-		 int (*saddr_same)(const struct sock *sk1,
-				   const struct sock *sk2,
-				   bool match_wildcard))
+int __inet_hash(struct sock *sk, struct sock *osk)
 {
 	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
 	struct inet_listen_hashbucket *ilb;
@@ -479,7 +473,7 @@ int __inet_hash(struct sock *sk, struct sock *osk,
 
 	spin_lock(&ilb->lock);
 	if (sk->sk_reuseport) {
-		err = inet_reuseport_add_sock(sk, ilb, saddr_same);
+		err = inet_reuseport_add_sock(sk, ilb);
 		if (err)
 			goto unlock;
 	}
@@ -503,7 +497,7 @@ int inet_hash(struct sock *sk)
 
 	if (sk->sk_state != TCP_CLOSE) {
 		local_bh_disable();
-		err = __inet_hash(sk, NULL, ipv4_rcv_saddr_equal);
+		err = __inet_hash(sk, NULL);
 		local_bh_enable();
 	}
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ca279b..cffbb65 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -137,11 +137,7 @@ EXPORT_SYMBOL(udp_memory_allocated);
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
 			       unsigned long *bitmap,
-			       struct sock *sk,
-			       int (*saddr_comp)(const struct sock *sk1,
-						 const struct sock *sk2,
-						 bool match_wildcard),
-			       unsigned int log)
+			       struct sock *sk, unsigned int log)
 {
 	struct sock *sk2;
 	kuid_t uid = sock_i_uid(sk);
@@ -156,7 +152,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 		    (!sk2->sk_reuseport || !sk->sk_reuseport ||
 		     rcu_access_pointer(sk->sk_reuseport_cb) ||
 		     !uid_eq(uid, sock_i_uid(sk2))) &&
-		    saddr_comp(sk, sk2, true)) {
+		    inet_rcv_saddr_equal(sk, sk2, true)) {
 			if (!bitmap)
 				return 1;
 			__set_bit(udp_sk(sk2)->udp_port_hash >> log, bitmap);
@@ -171,10 +167,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
  */
 static int udp_lib_lport_inuse2(struct net *net, __u16 num,
 				struct udp_hslot *hslot2,
-				struct sock *sk,
-				int (*saddr_comp)(const struct sock *sk1,
-						  const struct sock *sk2,
-						  bool match_wildcard))
+				struct sock *sk)
 {
 	struct sock *sk2;
 	kuid_t uid = sock_i_uid(sk);
@@ -191,7 +184,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
 		    (!sk2->sk_reuseport || !sk->sk_reuseport ||
 		     rcu_access_pointer(sk->sk_reuseport_cb) ||
 		     !uid_eq(uid, sock_i_uid(sk2))) &&
-		    saddr_comp(sk, sk2, true)) {
+		    inet_rcv_saddr_equal(sk, sk2, true)) {
 			res = 1;
 			break;
 		}
@@ -200,10 +193,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num,
 	return res;
 }
 
-static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot,
-				  int (*saddr_same)(const struct sock *sk1,
-						    const struct sock *sk2,
-						    bool match_wildcard))
+static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot)
 {
 	struct net *net = sock_net(sk);
 	kuid_t uid = sock_i_uid(sk);
@@ -217,7 +207,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot,
 		    (udp_sk(sk2)->udp_port_hash == udp_sk(sk)->udp_port_hash) &&
 		    (sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
 		    sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) &&
-		    (*saddr_same)(sk, sk2, false)) {
+		    inet_rcv_saddr_equal(sk, sk2, false)) {
 			return reuseport_add_sock(sk, sk2);
 		}
 	}
@@ -233,14 +223,10 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot,
  *
  *  @sk:          socket struct in question
  *  @snum:        port number to look up
- *  @saddr_comp:  AF-dependent comparison of bound local IP addresses
  *  @hash2_nulladdr: AF-dependent hash value in secondary hash chains,
  *                   with NULL address
  */
 int udp_lib_get_port(struct sock *sk, unsigned short snum,
-		     int (*saddr_comp)(const struct sock *sk1,
-				       const struct sock *sk2,
-				       bool match_wildcard),
 		     unsigned int hash2_nulladdr)
 {
 	struct udp_hslot *hslot, *hslot2;
@@ -269,7 +255,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
 			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk,
-					    saddr_comp, udptable->log);
+					    udptable->log);
 
 			snum = first;
 			/*
@@ -301,12 +287,11 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			if (hslot->count < hslot2->count)
 				goto scan_primary_hash;
 
-			exist = udp_lib_lport_inuse2(net, snum, hslot2,
-						     sk, saddr_comp);
+			exist = udp_lib_lport_inuse2(net, snum, hslot2, sk);
 			if (!exist && (hash2_nulladdr != slot2)) {
 				hslot2 = udp_hashslot2(udptable, hash2_nulladdr);
 				exist = udp_lib_lport_inuse2(net, snum, hslot2,
-							     sk, saddr_comp);
+							     sk);
 			}
 			if (exist)
 				goto fail_unlock;
@@ -314,8 +299,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 				goto found;
 		}
 scan_primary_hash:
-		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk,
-					saddr_comp, 0))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, 0))
 			goto fail_unlock;
 	}
 found:
@@ -324,7 +308,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 	udp_sk(sk)->udp_portaddr_hash ^= snum;
 	if (sk_unhashed(sk)) {
 		if (sk->sk_reuseport &&
-		    udp_reuseport_add_sock(sk, hslot, saddr_comp)) {
+		    udp_reuseport_add_sock(sk, hslot)) {
 			inet_sk(sk)->inet_num = 0;
 			udp_sk(sk)->udp_port_hash = 0;
 			udp_sk(sk)->udp_portaddr_hash ^= snum;
@@ -356,24 +340,6 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 }
 EXPORT_SYMBOL(udp_lib_get_port);
 
-/* match_wildcard == true:  0.0.0.0 equals to any IPv4 addresses
- * match_wildcard == false: addresses must be exactly the same, i.e.
- *                          0.0.0.0 only equals to 0.0.0.0
- */
-int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2,
-			 bool match_wildcard)
-{
-	struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
-
-	if (!ipv6_only_sock(sk2)) {
-		if (inet1->inet_rcv_saddr == inet2->inet_rcv_saddr)
-			return 1;
-		if (!inet1->inet_rcv_saddr || !inet2->inet_rcv_saddr)
-			return match_wildcard;
-	}
-	return 0;
-}
-
 static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr,
 			      unsigned int port)
 {
@@ -389,7 +355,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)
 
 	/* precompute partial secondary hash */
 	udp_sk(sk)->udp_portaddr_hash = hash2_partial;
-	return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr);
+	return udp_lib_get_port(sk, snum, hash2_nulladdr);
 }
 
 static int compute_score(struct sock *sk, struct net *net,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 7396e75..55ee2ea 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -54,12 +54,12 @@ int inet6_csk_bind_conflict(const struct sock *sk,
 			     (sk2->sk_state != TCP_TIME_WAIT &&
 			      !uid_eq(uid,
 				      sock_i_uid((struct sock *)sk2))))) {
-				if (ipv6_rcv_saddr_equal(sk, sk2, true))
+				if (inet_rcv_saddr_equal(sk, sk2, true))
 					break;
 			}
 			if (!relax && reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN &&
-			    ipv6_rcv_saddr_equal(sk, sk2, true))
+			    inet_rcv_saddr_equal(sk, sk2, true))
 				break;
 		}
 	}
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 02761c9..d090091 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -268,54 +268,10 @@ int inet6_hash(struct sock *sk)
 
 	if (sk->sk_state != TCP_CLOSE) {
 		local_bh_disable();
-		err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal);
+		err = __inet_hash(sk, NULL);
 		local_bh_enable();
 	}
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(inet6_hash);
-
-/* match_wildcard == true:  IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6
- *                          only, and any IPv4 addresses if not IPv6 only
- * match_wildcard == false: addresses must be exactly the same, i.e.
- *                          IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY,
- *                          and 0.0.0.0 equals to 0.0.0.0 only
- */
-int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
-			 bool match_wildcard)
-{
-	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
-	int sk2_ipv6only = inet_v6_ipv6only(sk2);
-	int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
-	int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED;
-
-	/* if both are mapped, treat as IPv4 */
-	if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) {
-		if (!sk2_ipv6only) {
-			if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr)
-				return 1;
-			if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr)
-				return match_wildcard;
-		}
-		return 0;
-	}
-
-	if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY)
-		return 1;
-
-	if (addr_type2 == IPV6_ADDR_ANY && match_wildcard &&
-	    !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))
-		return 1;
-
-	if (addr_type == IPV6_ADDR_ANY && match_wildcard &&
-	    !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED))
-		return 1;
-
-	if (sk2_rcv_saddr6 &&
-	    ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6))
-		return 1;
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(ipv6_rcv_saddr_equal);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 649efc2..9d2886f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -103,7 +103,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum)
 
 	/* precompute partial secondary hash */
 	udp_sk(sk)->udp_portaddr_hash = hash2_partial;
-	return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal, hash2_nulladdr);
+	return udp_lib_get_port(sk, snum, hash2_nulladdr);
 }
 
 static void udp_v6_rehash(struct sock *sk)
-- 
2.5.5

^ permalink raw reply related

* [PATCH 0/6 net-next][V2] Rework inet_csk_get_port
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team

V1->V2:
-Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one'
 at Hannes' suggestion.
-Dropped ->bind_conflict and just use the new helper.
-Fixed a compile bug from the original ->bind_conflict patch.

The original description of the series follows

=========================================================

At some point recently the guys working on our load balancer added the ability
to use SO_REUSEPORT.  When they restarted their app with this option enabled
they immediately hit a softlockup on what appeared to be the
inet_bind_bucket->lock.  Eventually what all of our debugging and discussion led
us to was the fact that the application comes up without SO_REUSEPORT, shuts
down which creates around 100k twsk's, and then comes up and tries to open a
bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket
owners list under the lock.  Since this lock is needed for dealing with the
twsk's and basically anything else related to connections we would softlockup,
and sometimes not ever recover.

To solve this problem I did what you see in Path 5/5.  Once we have a
SO_REUSEPORT socket on the tb->owners list we know that the socket has no
conflicts with any of the other sockets on that list.  So we can add a copy of
the sock_common (really all we need is the recv_saddr but it seemed ugly to copy
just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've
copied the whole common) in order to check subsequent SO_REUSEPORT sockets.  If
they match the previous one then we can skip the expensive
inet_csk_bind_conflict check.  This is what eliminated the soft lockup that we
were seeing.

Patches 1-4 are cleanups and re-workings.  For instance when we specify port ==
0 we need to find an open port, but we would do two passes through
inet_csk_bind_conflict every time we found a possible port.  We would also keep
track of the smallest_port value in order to try and use it if we found no
port our first run through.  This however made no sense as it would have had to
fail the first pass through inet_csk_bind_conflict, so would not actually pass
the second pass through either.  Finally I split the function into two functions
in order to make it easier to read and to distinguish between the two behaviors.

I have tested this on one of our load balancing boxes during peak traffic and it
hasn't fallen over.  But this is not my area, so obviously feel free to point
out where I'm being stupid and I'll get it fixed up and retested.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap
From: Rob Herring @ 2016-12-22 21:24 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	grygorii.strashko-l0cyMroinI0, mugunthanvnm-l0cyMroinI0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, devicetree-u79uwXL29TY76Z2rM5mHXA,
	mark.rutland-5wv7dgnIgG8
In-Reply-To: <1482271793-7671-2-git-send-email-m-karicheri2-l0cyMroinI0@public.gmane.org>

On Tue, Dec 20, 2016 at 05:09:44PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok <w-kwok2-l0cyMroinI0@public.gmane.org>
> 
> 10gbe phy driver needs to access the 10gbe subsystem control
> register during phy initialization. To facilitate the shared
> access of the subsystem register region between the 10gbe Ethernet
> driver and the phy driver, this patch adds support of the
> subsystem register region defined by a syscon node in the dts.
> 
> Although there is no shared access to the gbe subsystem register
> region, using syscon for that is for the sake of consistency.
> 
> This change is backward compatible with previously released gbe
> devicetree bindings.
> 
> Signed-off-by: WingMan Kwok <w-kwok2-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Murali Karicheri <m-karicheri2-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Sekhar Nori <nsekhar-l0cyMroinI0@public.gmane.org>
> ---
>  .../devicetree/bindings/net/keystone-netcp.txt     |  16 ++-
>  drivers/net/ethernet/ti/netcp_ethss.c              | 140 +++++++++++++++++----
>  2 files changed, 127 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> index 04ba1dc..0854a73 100644
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -72,20 +72,24 @@ Required properties:
>  		"ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
>  		"ti,netcp-xgbe" for 10 GbE
>  
> +- syscon-subsys:	phandle to syscon node of the switch
> +			subsystem registers.
> +
>  - reg:		register location and the size for the following register
>  		regions in the specified order.
>  		- switch subsystem registers
> +		- sgmii module registers

This needs to go on the end of the list. Otherwise, it is not backwards 
compatible.

>  		- sgmii port3/4 module registers (only for NetCP 1.4)
>  		- switch module registers
>  		- serdes registers (only for 10G)
>  
>  		NetCP 1.4 ethss, here is the order
> -			index #0 - switch subsystem registers
> +			index #0 - sgmii module registers
>  			index #1 - sgmii port3/4 module registers
>  			index #2 - switch module registers
>  
>  		NetCP 1.5 ethss 9 port, 5 port and 2 port
> -			index #0 - switch subsystem registers
> +			index #0 - sgmii module registers
>  			index #1 - switch module registers
>  			index #2 - serdes registers
>  
> @@ -145,6 +149,11 @@ Optional properties:
>  
>  Example binding:
>  
> +gbe_subsys: subsys@2090000 {
> +	compatible = "syscon";
> +	reg = <0x02090000 0x100>;
> +};
> +
>  netcp: netcp@2000000 {
>  	reg = <0x2620110 0x8>;
>  	reg-names = "efuse";
> @@ -163,7 +172,8 @@ netcp: netcp@2000000 {
>  		ranges;
>  		gbe@90000 {
>  			label = "netcp-gbe";
> -			reg = <0x90000 0x300>, <0x90400 0x400>, <0x90800 0x700>;
> +			syscon-subsys = <&gbe_subsys>;
> +			reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
>  			/* enable-ale; */
>  			tx-queue = <648>;
>  			tx-channel = <8>;
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22 21:11 UTC (permalink / raw)
  To: linux, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrW63gL1821jCMvxnm8gR+o5ifdOXKnWHsLnqoFK77-epw@mail.gmail.com>

> I do tend to like Ted's version in which we use batched
> get_random_bytes() output.  If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.

With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.

It also takes locks, something the previous get_random_int code
path avoided.  Do we need to audit the call sites to ensure that's safe?

And there is the issue that the existing callers assume that there's a
fixed cost per word.  A good half of get_random_long calls are followed by
"& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
- 1)" to extract the low 28.  If we have a buffer we're going to have to
pay to refill, it would be nice to use less than 8 bytes to satisfy those.

But that can be a followup patch.  I'm thinking

unsigned long get_random_bits(unsigned bits)
	E.g. get_random_bits(PAGE_SHIFT),
	     get_random_bits(mmap_rnd_bits),
	u32 imm_rnd = get_random_bits(32)

unsigned get_random_mod(unsigned modulus)
	E.g. get_random_mod(hole) & ~(alignment - 1);
	     get_random_mod(port_scan_backoff)
	(Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
	to prandom.)

with, until the audit is completed:
#define get_random_int() get_random_bits(32)
#define get_random_long() get_random_bits(BITS_PER_LONG)

> It could only mix the output back in every two calls, in which case
> you can backtrack up to one call but you need to do 2^128 work to
> backtrack farther.  But yes, this is getting excessively complicated.

No, if you're willing to accept limited backtrack, this is a perfectly
acceptable solution, and not too complicated.  You could do it phase-less
if you like; store the previous output, then after generating the new
one, mix in both.  Then overwrite the previous output.  (But doing two
rounds of a crypto primtive to avoid one conditional jump is stupid,
so forget that.)

>> Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
>> directly and be done with it.  It won't be a bottleneck.

Isn't that what you already suggested?

I don't mind fewer primtives; I got a bit fixated on "Replace MD5 with
SipHash".  It's just the locking that I want to check isn't a problem.

^ permalink raw reply

* Re: [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
From: Roopa Prabhu @ 2016-12-22 21:10 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: netdev, stephen, roszenrami, ogerlitz, jiri, eladr, yotamg,
	idosch
In-Reply-To: <1482423795-6531-5-git-send-email-nogahf@mellanox.com>

On 12/22/16, 8:23 AM, Nogah Frankel wrote:
> Add support for extended statistics of SW only type, for counting only the
> packets that went via the cpu. (useful for systems with forward
> offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
> and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.
>
> It is under the name 'software'
> (or any shorten of it as 'soft' or simply 's')
>
> For example:
> ifstat -x s
>
>
Nogah, can we keep the option names closer to the attribute names ?
That would avoid some confusion and help with the follow-up stats.

ifstat -x offload cpu
or
ifstat -x cpu

for others it would be:

ifstat -x link [vlan|igmp]
ifstat -x vlan
ifstat -x igmp
ifstat -x lacp

and so on...

thanks!

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-22 20:02 UTC (permalink / raw)
  To: Andy Lutomirski, Alexei Starovoitov
  Cc: Daniel Borkmann, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrUWw2m8k3b8ibOeN-7o9b+8MrnX4btUWqS188WSsLzDjg@mail.gmail.com>

On 22.12.2016 20:56, Andy Lutomirski wrote:
> It's also not quite clear to me why userspace needs to be able to
> calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
> command that takes a BPF program as input and hashes it would seem to
> serve the same purpose, and that would allow the kernel to key the
> digest and change the algorithm down the road without breaking things.

I think that people expect digests of BPF programs to be stable over
time and reboots.

^ permalink raw reply

* Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Hannes Frederic Sowa @ 2016-12-22 20:00 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, kraigatgoog, eric.dumazet, tom, netdev, kernel-team
In-Reply-To: <1482333414.24490.12@smtp.office365.com>

On 21.12.2016 16:16, Josef Bacik wrote:
> On Wed, Dec 21, 2016 at 10:06 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>>  The only difference between inet6_csk_bind_conflict and
>>> inet_csk_bind_conflict
>>>  is how they check the rcv_saddr.  Since we want to be able to check
>>> the saddr in
>>>  other places just drop the protocol specific ->bind_conflict and
>>> replace it with
>>>  ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true
>>> bind conflict
>>>  function.
>>>
>>>  Signed-off-by: Josef Bacik <jbacik@fb.com>
>>>
>>
>>
>>
>>>  ---
>>>   include/net/inet6_connection_sock.h |  5 -----
>>>   include/net/inet_connection_sock.h  |  9 +++------
>>>   net/dccp/ipv4.c                     |  3 ++-
>>>   net/dccp/ipv6.c                     |  2 +-
>>>   net/ipv4/inet_connection_sock.c     | 22 +++++++-------------
>>>   net/ipv4/tcp_ipv4.c                 |  3 ++-
>>>   net/ipv4/udp.c                      |  1 +
>>>   net/ipv6/inet6_connection_sock.c    | 40
>>> -------------------------------------
>>>   net/ipv6/tcp_ipv6.c                 |  4 ++--
>>>   9 files changed, 18 insertions(+), 71 deletions(-)
>>>
>>>  diff --git a/include/net/inet6_connection_sock.h
>>> b/include/net/inet6_connection_sock.h
>>>  index 3212b39..8ec87b6 100644
>>>  --- a/include/net/inet6_connection_sock.h
>>>  +++ b/include/net/inet6_connection_sock.h
>>>  @@ -15,16 +15,11 @@
>>>
>>>   #include <linux/types.h>
>>>
>>>  -struct inet_bind_bucket;
>>>   struct request_sock;
>>>   struct sk_buff;
>>>   struct sock;
>>>   struct sockaddr;
>>>
>>>  -int inet6_csk_bind_conflict(const struct sock *sk,
>>>  -                const struct inet_bind_bucket *tb, bool relax,
>>>  -                bool soreuseport_ok);
>>>  -
>>>   struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct
>>> flowi6 *fl6,
>>>                         const struct request_sock *req, u8 proto);
>>>
>>>  diff --git a/include/net/inet_connection_sock.h
>>> b/include/net/inet_connection_sock.h
>>>  index ec0479a..9cd43c5 100644
>>>  --- a/include/net/inet_connection_sock.h
>>>  +++ b/include/net/inet_connection_sock.h
>>>  @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
>>>                   char __user *optval, int __user *optlen);
>>>   #endif
>>>       void        (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
>>>  -    int        (*bind_conflict)(const struct sock *sk,
>>>  -                     const struct inet_bind_bucket *tb,
>>>  -                     bool relax, bool soreuseport_ok);
>>>  +    int         (*rcv_saddr_equal)(const struct sock *sk1,
>>>  +                       const struct sock *sk2,
>>>  +                       bool match_wildcard);
>>>       void        (*mtu_reduced)(struct sock *sk);
>>>   };
>>>
>>>
>>
>> The patch looks as a nice code cleanup already!
>>
>> Have you looked if we can simply have one rcv_saddr_equal for both ipv4
>> and ipv6 that e.g. uses sk->sk_family instead of function pointers?
>> This could give us even more possibilities to remove some indirect
>> functions calls and thus might relieve some cycles?
> 
> I was going to do that but I'm not familiar enough with how sockets work
> to be comfortable.  My main concern is we have the ipv6_only() check on
> a socket, which seems to indicate to me that you can have a socket that
> can do both ipv4/ipv6, so what if we're specifically going through the
> ipv6 code, but we aren't ipv6_only() and we end up doing the ipv4
> address compare when we really need to do the ipv6 address compare?  If
> this can't happen (and honestly as I type it out it sounds crazier than
> it did in my head) then yeah I'll totally do that as well and we can
> just have a global function without the protocol specific callbacks, but
> I need you or somebody to tell me I'm crazy and that it would be ok to
> have it all in one function.  Thanks,

IPv6 sockets can do IPv4 via mapped addresses. The other way around
doesn't work, there are no IPv4 sockets that can speak IPv6. The
ipv6_only flags in IPv4 sockets should always stay 0 but they need to be
evaluated from there side for possible port conflicts.

My idea is to use sk->sk_family, which is in sync with
icsk->icsk_af_ops, which you use for the function pointer lookup, to
switch between those functions. Looking through a lot of callback, I
don't see this assumption violated so far.

This would also solve the problem which David described, your search
would be free of those indirect jumps.

Bye,
Hannes

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 19:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAADnVQ+memUT2zcc-wunP0QgWSLWpmnSJNRJZmfDdc+FBb=gEg@mail.gmail.com>

On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> We don't prevent ebpf programs being loaded based on the digest but
>>> just to uniquely identify loaded programs from user space and match up
>>> with their source.
>>
>> The commit log talks about using the hash to see if the program has
>> already been compiled and JITted.  If that's done, then a collision
>> will directly cause the kernel to malfunction.
>
> Andy, please read the code.
> we could have used jhash there just as well.
> Collisions are fine.

There's relevant in the code to read yet AFAICS.  The code exports it
via fdinfo, and userspace is expected to do something with it.  The
commit message says:

    When programs are pinned and retrieved by an ELF loader, the loader
    can check the program's digest through fdinfo and compare it against
    one that was generated over the ELF file's program section to see
    if the program needs to be reloaded.

I assume this means that a userspace component is expected to compare
the digest of a loaded program to a digest of a program it wants to
load and to use the result of the comparison to decide whether the
programs are the same.  If that's indeed the case (and it sure sounds
like it, and I fully expect CRIU to do very similar things when
support is added), then malicious collisions do matter.

It's also not quite clear to me why userspace needs to be able to
calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
command that takes a BPF program as input and hashes it would seem to
serve the same purpose, and that would allow the kernel to key the
digest and change the algorithm down the road without breaking things.

Regardless, adding a new hash algorithm that is almost-but-not-quite
SHA-1 and making it a stable interface to userspace is not a good
thing.

--Andy

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22 19:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jason A. Donenfeld, kernel-hardening, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <32db19ab-0c19-628c-3475-2ccb8050563a@stressinduktion.org>

On Thu, Dec 22, 2016 at 07:08:37PM +0100, Hannes Frederic Sowa wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).
> 
> In ext4, is it actually possible that a "disrupter" learns about the
> hashing secret in the way how the inodes are returned during getdents?

They'd have to be a local user, who can execute telldir(3) --- in
which case there are plenty of other denial of service attacks one
could carry out that would be far more devastating.

It might also be an issue if the file system is exposed via NFS, but
again, there are so many other ways an attacker could DoS a NFS server
that I don't think of it as a much of a concern.

Keep in mind that worst someone can do is cause directory inserts to
fail with an ENOSPC, and there are plenty of other ways of doing that
--- such as consuming all of the blocks and inodes in the file system,
for example.

So it's a threat, but not a high priority one as far as I'm concerned.
And if this was a problem in actual practice, users could switch to
the TEA based hash, which should be far harder to attack, and
available today.

					- Ted

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Alexei Starovoitov @ 2016-12-22 19:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrW_T1v4qKPJDs5dXwAnAit3M52AWMH-K+GJLb1WoLMuRQ@mail.gmail.com>

On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> We don't prevent ebpf programs being loaded based on the digest but
>> just to uniquely identify loaded programs from user space and match up
>> with their source.
>
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Andy, please read the code.
we could have used jhash there just as well.
Collisions are fine.

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Andy Lutomirski @ 2016-12-22 19:32 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andrew Lutomirski, Andi Kleen, David S. Miller, David Laight,
	D. J. Bernstein, Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jason A. Donenfeld, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds, Ted Ts'o
In-Reply-To: <20161222192445.963.qmail@ns.sciencehorizons.net>

On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
>> Having slept on this, I like it less.  The problem is that a
>> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
>> secret || ...) -- they learn the internal state of the hash function
>> that generates that value.  This probably breaks any attempt to apply
>> security properties of the hash function.  For example, the internal
>> state could easily contain a whole bunch of prior outputs it in
>> verbatim.
>
> The problem is, anti-backtracing is in severe tension with your desire
> to use unmodified SipHash.
>
> First of all, I'd like to repeat that it isn't a design goal of the current
> generator and isn't necessary.

Agreed.

> Now, the main point:  it's not likely to be solvable.
>
> The problem with unmodified SipHash is that is has only 64 bits of
> output.  No mix-back mechanism can get around the fundamental problem
> that that's too small to prevent a brute-force guessing attack.  You need
> wider mix-back.  And getting more output from unmodified SipHash requires
> more finalization rounds, which is expensive.

It could only mix the output back in every two calls, in which case
you can backtrack up to one call but you need to do 2^128 work to
backtrack farther.  But yes, this is getting excessively complicated.

> Finally, your discomfort about an attacker learning the internal state...
> if an attacker knows the key and the input, they can construct the
> internal state.  Yes, we could discard the internal state and construct
> a fresh one on the next call to get_random_int, but what are you going
> to key it with?  What are you going to feed it?  What keeps *that*
> internal state any more secret from an attacker than one that's explicitly
> stored?

I do tend to like Ted's version in which we use batched
get_random_bytes() output.  If it's fast enough, it's simpler and lets
us get the full strength of a CSPRNG.

(Aside: some day I want to move all that code from drivers/ to lib/
and teach it to be buildable in userspace, too, so it's easy to play
with it, feed it test vectors, confirm that it's equivalent to a
reference implementation, write up the actual design and try to get
real cryptographers to analyze it, etc.)

> For example, clearly stating the concern over starting new processes
> with predictable layout, and the limits on the fresh entropy supply,
> has made me realize that there *is* a possible source: each exec()
> is passed 128 bits from get_random_bytes in the AT_RANDOM element of
> its auxv.  Since get_random_int() accesses "current" anyway, we could
> store some seed material there rather than using "pid".  While this is
> not fresh for each call to get_random_int, it *is* fresh for each new
> address space whose layout is being randomized.

Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
directly and be done with it.  It won't be a bottleneck.

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22 19:24 UTC (permalink / raw)
  To: luto, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrUSM7KDNuVqh169241PNumJVOF5hrNQUq=k5Fnet6nSgA@mail.gmail.com>

> Having slept on this, I like it less.  The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value.  This probably breaks any attempt to apply
> security properties of the hash function.  For example, the internal
> state could easily contain a whole bunch of prior outputs it in
> verbatim.

The problem is, anti-backtracing is in severe tension with your desire
to use unmodified SipHash.

First of all, I'd like to repeat that it isn't a design goal of the current
generator and isn't necessary.

The current generator just returns hash[0] from the MD5 state, then
leaves the state stored.  The fact that it conceals earlier outputs is
an accident of the Davies-Meyer structure of md5_transform.

It isn't necessary, because every secret generated is stored unencrypted
for as long as it's of value.  A few values are used for retransmit
backoffs and random MAC addresses.  Those are revealed to the world as
soon as they're used.

Most values are used for ASLR.  These address are of interest to an
attacker trying to mount a buffer-overflow attack, but that only lasts
as long as the process is running to receive buffers.  After the process
exits, knowledge of its layout is worthless.

And this is stored as long as it's running in kernel-accessible data,
so storing a *second* copy in less conveniently kernel-accessible data
(the RNG state) doesn't make the situation any worse.

In addition to the above, if you're assuming a state capture, then
since we have (for necessary efficieny reasons) a negligible about of
fresh entropy, an attacker has the secret key and can predict *future*
outputs very easily.

Given that information, an attacker doesn't need to learn the layout of
vulnerable server X.  If they have a buffer overflow, they can crash
the current instance and wait for a fresh image to be started (with a
known address space) to launch their attack at.

Kernel state capture attacks are a very unlikely attack, mostly because
it's a narrow target a hair's breadth away from the much more interesting
outright kernel compromise (attacker gains write access as well as read)
which renders all this fancy cryptanaysis moot.

Now, the main point:  it's not likely to be solvable.

The problem with unmodified SipHash is that is has only 64 bits of
output.  No mix-back mechanism can get around the fundamental problem
that that's too small to prevent a brute-force guessing attack.  You need
wider mix-back.  And getting more output from unmodified SipHash requires
more finalization rounds, which is expensive.

(Aside: 64 bits does have the advantage that it can't be brute-forced on
the attacked machine.  It must be exfiltrated to the attacker, and the
solution returned to the attack code.  But doing this is getting easier
all the time.)

Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer
structure around its internal state.  Remember the internal state before
hashing in the entropy and secret, generate the output, then add the
previous and final states together for storage.

This is a standard textbook construction, very cheap, and doesn't touch
the compression function which is the target of analysis and attacks,
but it requires poking around in SipHash's internal state.  (And people
who read the textbooks without understanding them will get upset because
the textbooks all talk about using this construction with block ciphers,
and SipHash's compression function is not advertised as a block cipher.)

Alternative designs exist; you could extract additional output from
earlier rounds of SipHash, using the duplex sponge construction you
mentioned earlier.  That output would be used for mixback purposes *only*,
so wouldn't affect the security proof of the "primary" output.
But this is also getting creative with SipHash's internals.

Now, you could use a completely *different* cryptographic primitive
to enforce one-way-ness, and apply SipHash as a strong output transform,
but that doesn't feel like good design, and is probably more expensive.

Finally, your discomfort about an attacker learning the internal state...
if an attacker knows the key and the input, they can construct the
internal state.  Yes, we could discard the internal state and construct
a fresh one on the next call to get_random_int, but what are you going
to key it with?  What are you going to feed it?  What keeps *that*
internal state any more secret from an attacker than one that's explicitly
stored?

Keeping the internal state around is a cacheing optimization, that's all.

*If* you're assuming a state capture, the only thing secret from the
attacker is any fresh entropy collected between the time of capture
and the time of generation.  Due to mandatory efficiency requirements,
this is very small.  

I really think you're wishing for the impossible here.

A final note: although I'm disagreeing with you, thank you very much for 
the informed discussion.  Knowing that someone will read and think about
this message carefully has forced me to think it through carefully myself.

For example, clearly stating the concern over starting new processes
with predictable layout, and the limits on the fresh entropy supply,
has made me realize that there *is* a possible source: each exec()
is passed 128 bits from get_random_bytes in the AT_RANDOM element of
its auxv.  Since get_random_int() accesses "current" anyway, we could
store some seed material there rather than using "pid".  While this is
not fresh for each call to get_random_int, it *is* fresh for each new
address space whose layout is being randomized.

^ permalink raw reply

* Re: [PATCH iproute2 0/2] Fix the usage of TC tunnel key
From: Stephen Hemminger @ 2016-12-22 19:03 UTC (permalink / raw)
  To: Hadar Hen Zion; +Cc: netdev, Simon Horman, Or Gerlitz, Roi Dayan
In-Reply-To: <1482394481-19624-1-git-send-email-hadarh@mellanox.com>

On Thu, 22 Dec 2016 10:14:39 +0200
Hadar Hen Zion <hadarh@mellanox.com> wrote:

> Add dest UDP port parameter to the usage of tc tunnle key action and
> classifcation.
> 
> 
> Hadar Hen Zion (2):
>   tc/cls_flower: Add to the usage encapsulation dest UDP port
>   tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
> 
>  tc/f_flower.c     | 5 +++--
>  tc/m_tunnel_key.c | 2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Looks good, both applied

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Cong Wang @ 2016-12-22 19:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <585AEF24.4070800@iogearbox.net>

On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Ok, you mean for net. In that case I prefer the smaller sized fix to be
> honest. It also covers everything from the point where we fetch the chain
> via cops->tcf_chain() to the end of the function, which is where most of
> the complexity resides, and only the two mentioned commits do the relock,

I really wish the problem is only about relocking, but look at the code,
the deeper reason why we have this bug is the complexity of the logic
inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
idempotent; 2) the request logic itself is hard, because of tc filter design
and implementation.

This is why I worry more than just relocking.

> so as a fix I think it's fine as-is. As mentioned, if there's need to
> refactor tc_ctl_tfilter() net-next would be better, imho.

Refactor is a too strong word, just moving the replay out is not a refactor.
The end result will be still smaller than your commit d936377414fadbafb4,
which is already backported to stable.

^ permalink raw reply

* Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
From: Stephen Hemminger @ 2016-12-22 18:59 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: netdev, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch
In-Reply-To: <1482423795-6531-3-git-send-email-nogahf@mellanox.com>

On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:
On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:

>  }
> @@ -691,18 +804,22 @@ static const struct option longopts[] = {
>  	{ "interval", 1, 0, 't' },
>  	{ "version", 0, 0, 'V' },
>  	{ "zeros", 0, 0, 'z' },
> +	{ "extended", 1, 0, 'x'},
>  	{ 0 }
>  };
>  
> +
>  int main(int argc, char *argv[])

You let extra whitespace changes creep in.


> +		case 'x':
> +			is_extended = true;
> +			memset(stats_type, 0, 64);
> +			strncpy(stats_type, optarg, 63);
> +			break;

This seems like doing this either the paranoid or hard way.
Why not:
	const char *stats_type = NULL;
...

	case 'x':
		stats_type = optarg;
		break;
...
		if (stats_type)
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
				 getuid());
		else
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.ifstat.u%d", P_tmpdir, getuid());


Since:
	1) optarg points to area in argv that is persistent (avoid copy)
	2) don't need is_extended flag value then

Please cleanup and resubmit.

^ permalink raw reply

* [PATCH] tc: add missing limits.h header
From: Baruch Siach @ 2016-12-22 18:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach

This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
                             ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 tc/tc_util.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/tc_util.h b/tc/tc_util.h
index f198a4ad5554..4db26c6d5e25 100644
--- a/tc/tc_util.h
+++ b/tc/tc_util.h
@@ -2,6 +2,7 @@
 #define _TC_UTIL_H_ 1
 
 #define MAX_MSG 16384
+#include <limits.h>
 #include <linux/pkt_sched.h>
 #include <linux/pkt_cls.h>
 #include <linux/gen_stats.h>
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-22 18:48 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, Mark Rutland, netdev,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Mark Greer,
	Justin Bronder
In-Reply-To: <CAO7Z3WKmEbJCN_=srxTVKXvtz84vHGi41RS+9T7fmkmhRB6nEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Dec 19, 2016 at 5:23 PM, Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org> wrote:
> I can make that change, however, I worry that it may be a bit
> misleading, since there are only two supported clock frequencies, but
> a number like that to me implies that it could be set to any number
> you want.   I'm new at this, and so I'll go ahead and change it as you
> request, but I'd like to hear your thoughts on my concern.

Then the binding doc just needs to state what are the 2 valid frequencies.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Jason A. Donenfeld @ 2016-12-22 18:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

Okay, so in that case, a weak hashing function like SHA1 could result
in a real vulnerability. Therefore, this SHA1 stuff needs to be
reverted immediately, pending a different implementation. If this has
ever shipped in a kernel version, it could even deserve a CVE. No
SHA1!

> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Jeepers creepers. So for some ungodly reason, LKML has invented yet
another homebrewed crypto primitive. This story really gets more
horrifying every day. No bueno.

So yea, let's revert and re-commit (repeal and replace? just
kidding...). Out with SHA-1, in with Blake2 or SHA2.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox