netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] tcp: improve source port selection
@ 2015-05-20 17:59 Eric Dumazet
  2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

With increase of TCP sockets in hosts, we often hit limitations
caused by port selection, due to randomization and poor strategy.

Eric Dumazet (2):
  inet_hashinfo: remove bsocket counter
  tcp: improve REUSEADDR/NOREUSEADDR cohabitation

 include/net/inet_hashtables.h   |  2 --
 net/ipv4/inet_connection_sock.c | 19 ++++++++++++++-----
 net/ipv4/inet_hashtables.c      |  7 -------
 3 files changed, 14 insertions(+), 14 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter
  2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
@ 2015-05-20 17:59 ` Eric Dumazet
  2015-05-21 20:10   ` Flavio Leitner
  2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
  2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Marcelo Ricardo Leitner,
	Flavio Leitner

We no longer need bsocket atomic counter, as inet_csk_get_port()
calls bind_conflict() regardless of its value, after commit
2b05ad33e1e624e ("tcp: bind() fix autoselection to share ports")

This patch removes overhead of maintaining this counter and
double inet_csk_get_port() calls under pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
---
 include/net/inet_hashtables.h   | 2 --
 net/ipv4/inet_connection_sock.c | 5 -----
 net/ipv4/inet_hashtables.c      | 7 -------
 3 files changed, 14 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 73fe0f9525d9..774d24151d4a 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -148,8 +148,6 @@ struct inet_hashinfo {
 	 */
 	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
 					____cacheline_aligned_in_smp;
-
-	atomic_t			bsockets;
 };
 
 static inline struct inet_ehash_bucket *inet_ehash_bucket(
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8976ca423a07..b95fb263a13f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -127,11 +127,6 @@ again:
 					    (tb->num_owners < smallest_size || smallest_size == -1)) {
 						smallest_size = tb->num_owners;
 						smallest_rover = rover;
-						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1 &&
-						    !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false)) {
-							snum = smallest_rover;
-							goto tb_found;
-						}
 					}
 					if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false)) {
 						snum = rover;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index c6fb80bd5826..3766bddb3e8a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -90,10 +90,6 @@ void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct inet_bind_bucket
 void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
 		    const unsigned short snum)
 {
-	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
-
-	atomic_inc(&hashinfo->bsockets);
-
 	inet_sk(sk)->inet_num = snum;
 	sk_add_bind_node(sk, &tb->owners);
 	tb->num_owners++;
@@ -111,8 +107,6 @@ static void __inet_put_port(struct sock *sk)
 	struct inet_bind_hashbucket *head = &hashinfo->bhash[bhash];
 	struct inet_bind_bucket *tb;
 
-	atomic_dec(&hashinfo->bsockets);
-
 	spin_lock(&head->lock);
 	tb = inet_csk(sk)->icsk_bind_hash;
 	__sk_del_bind_node(sk);
@@ -608,7 +602,6 @@ void inet_hashinfo_init(struct inet_hashinfo *h)
 {
 	int i;
 
-	atomic_set(&h->bsockets, 0);
 	for (i = 0; i < INET_LHTABLE_SIZE; i++) {
 		spin_lock_init(&h->listening_hash[i].lock);
 		INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation
  2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
  2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
@ 2015-05-20 17:59 ` Eric Dumazet
  2015-05-21 20:37   ` Flavio Leitner
  2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Marcelo Ricardo Leitner,
	Flavio Leitner

inet_csk_get_port() randomization effort tends to spread
sockets on all the available range (ip_local_port_range)

This is unfortunate because SO_REUSEADDR sockets have
less requirements than non SO_REUSEADDR ones.

If an application uses SO_REUSEADDR hint, it is to try to
allow source ports being shared.

So instead of picking a random port number in ip_local_port_range,
lets try first in first half of the range.

This gives more chances to use upper half of the range for the
sockets with strong requirements (not using SO_REUSEADDR)

Note this patch does not add a new sysctl, and only changes
the way we try to pick port number.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
---
 net/ipv4/inet_connection_sock.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b95fb263a13f..60021d0d9326 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -99,6 +99,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	struct net *net = sock_net(sk);
 	int smallest_size = -1, smallest_rover;
 	kuid_t uid = sock_i_uid(sk);
+	int attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 
 	local_bh_disable();
 	if (!snum) {
@@ -106,6 +107,14 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
 again:
 		inet_get_local_port_range(net, &low, &high);
+		if (attempt_half) {
+			int half = low + ((high - low) >> 1);
+
+			if (attempt_half == 1)
+				high = half;
+			else
+				low = half;
+		}
 		remaining = (high - low) + 1;
 		smallest_rover = rover = prandom_u32() % remaining + low;
 
@@ -154,6 +163,11 @@ again:
 				snum = smallest_rover;
 				goto have_snum;
 			}
+			if (attempt_half == 1) {
+				/* OK we now try the upper half of the range */
+				attempt_half = 2;
+				goto again;
+			}
 			goto fail;
 		}
 		/* OK, here is the one we will use.  HEAD is
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter
  2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
@ 2015-05-21 20:10   ` Flavio Leitner
  0 siblings, 0 replies; 6+ messages in thread
From: Flavio Leitner @ 2015-05-21 20:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Eric Dumazet, Marcelo Ricardo Leitner

On Wed, May 20, 2015 at 10:59:01AM -0700, Eric Dumazet wrote:
> We no longer need bsocket atomic counter, as inet_csk_get_port()
> calls bind_conflict() regardless of its value, after commit
> 2b05ad33e1e624e ("tcp: bind() fix autoselection to share ports")
> 
> This patch removes overhead of maintaining this counter and
> double inet_csk_get_port() calls under pressure.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Cc: Flavio Leitner <fbl@redhat.com>
> ---

The patch looks good to me.
Acked-by: Flavio Leitner <fbl@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation
  2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
@ 2015-05-21 20:37   ` Flavio Leitner
  0 siblings, 0 replies; 6+ messages in thread
From: Flavio Leitner @ 2015-05-21 20:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Eric Dumazet, Marcelo Ricardo Leitner

On Wed, May 20, 2015 at 10:59:02AM -0700, Eric Dumazet wrote:
> inet_csk_get_port() randomization effort tends to spread
> sockets on all the available range (ip_local_port_range)
> 
> This is unfortunate because SO_REUSEADDR sockets have
> less requirements than non SO_REUSEADDR ones.
> 
> If an application uses SO_REUSEADDR hint, it is to try to
> allow source ports being shared.
> 
> So instead of picking a random port number in ip_local_port_range,
> lets try first in first half of the range.
> 
> This gives more chances to use upper half of the range for the
> sockets with strong requirements (not using SO_REUSEADDR)
> 
> Note this patch does not add a new sysctl, and only changes
> the way we try to pick port number.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Cc: Flavio Leitner <fbl@redhat.com>
> ---

The only downside I can see is that after the patch the applications
using the SO_REUSEADDR will reuse ports more often and that could
potentially trigger some bug.

Looks like a good change to me.

Acked-by: Flavio Leitner <fbl@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/2] tcp: improve source port selection
  2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
  2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
  2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
@ 2015-05-21 22:55 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2015-05-21 22:55 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 20 May 2015 10:59:00 -0700

> With increase of TCP sockets in hosts, we often hit limitations
> caused by port selection, due to randomization and poor strategy.

This looks fine, nice work, applied.

Thanks Eric!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-21 22:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
2015-05-21 20:10   ` Flavio Leitner
2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
2015-05-21 20:37   ` Flavio Leitner
2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).