* [PATCH net-next 0/2] tcp: improve source port selection
@ 2015-05-20 17:59 Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet
With increase of TCP sockets in hosts, we often hit limitations
caused by port selection, due to randomization and poor strategy.
Eric Dumazet (2):
inet_hashinfo: remove bsocket counter
tcp: improve REUSEADDR/NOREUSEADDR cohabitation
include/net/inet_hashtables.h | 2 --
net/ipv4/inet_connection_sock.c | 19 ++++++++++++++-----
net/ipv4/inet_hashtables.c | 7 -------
3 files changed, 14 insertions(+), 14 deletions(-)
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter
2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
@ 2015-05-20 17:59 ` Eric Dumazet
2015-05-21 20:10 ` Flavio Leitner
2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller
2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eric Dumazet, Eric Dumazet, Marcelo Ricardo Leitner,
Flavio Leitner
We no longer need bsocket atomic counter, as inet_csk_get_port()
calls bind_conflict() regardless of its value, after commit
2b05ad33e1e624e ("tcp: bind() fix autoselection to share ports")
This patch removes overhead of maintaining this counter and
double inet_csk_get_port() calls under pressure.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
---
include/net/inet_hashtables.h | 2 --
net/ipv4/inet_connection_sock.c | 5 -----
net/ipv4/inet_hashtables.c | 7 -------
3 files changed, 14 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 73fe0f9525d9..774d24151d4a 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -148,8 +148,6 @@ struct inet_hashinfo {
*/
struct inet_listen_hashbucket listening_hash[INET_LHTABLE_SIZE]
____cacheline_aligned_in_smp;
-
- atomic_t bsockets;
};
static inline struct inet_ehash_bucket *inet_ehash_bucket(
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8976ca423a07..b95fb263a13f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -127,11 +127,6 @@ again:
(tb->num_owners < smallest_size || smallest_size == -1)) {
smallest_size = tb->num_owners;
smallest_rover = rover;
- if (atomic_read(&hashinfo->bsockets) > (high - low) + 1 &&
- !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false)) {
- snum = smallest_rover;
- goto tb_found;
- }
}
if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, false)) {
snum = rover;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index c6fb80bd5826..3766bddb3e8a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -90,10 +90,6 @@ void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct inet_bind_bucket
void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
const unsigned short snum)
{
- struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
-
- atomic_inc(&hashinfo->bsockets);
-
inet_sk(sk)->inet_num = snum;
sk_add_bind_node(sk, &tb->owners);
tb->num_owners++;
@@ -111,8 +107,6 @@ static void __inet_put_port(struct sock *sk)
struct inet_bind_hashbucket *head = &hashinfo->bhash[bhash];
struct inet_bind_bucket *tb;
- atomic_dec(&hashinfo->bsockets);
-
spin_lock(&head->lock);
tb = inet_csk(sk)->icsk_bind_hash;
__sk_del_bind_node(sk);
@@ -608,7 +602,6 @@ void inet_hashinfo_init(struct inet_hashinfo *h)
{
int i;
- atomic_set(&h->bsockets, 0);
for (i = 0; i < INET_LHTABLE_SIZE; i++) {
spin_lock_init(&h->listening_hash[i].lock);
INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation
2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
@ 2015-05-20 17:59 ` Eric Dumazet
2015-05-21 20:37 ` Flavio Leitner
2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller
2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-05-20 17:59 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eric Dumazet, Eric Dumazet, Marcelo Ricardo Leitner,
Flavio Leitner
inet_csk_get_port() randomization effort tends to spread
sockets on all the available range (ip_local_port_range)
This is unfortunate because SO_REUSEADDR sockets have
less requirements than non SO_REUSEADDR ones.
If an application uses SO_REUSEADDR hint, it is to try to
allow source ports being shared.
So instead of picking a random port number in ip_local_port_range,
lets try first in first half of the range.
This gives more chances to use upper half of the range for the
sockets with strong requirements (not using SO_REUSEADDR)
Note this patch does not add a new sysctl, and only changes
the way we try to pick port number.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
---
net/ipv4/inet_connection_sock.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b95fb263a13f..60021d0d9326 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -99,6 +99,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
struct net *net = sock_net(sk);
int smallest_size = -1, smallest_rover;
kuid_t uid = sock_i_uid(sk);
+ int attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
local_bh_disable();
if (!snum) {
@@ -106,6 +107,14 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
again:
inet_get_local_port_range(net, &low, &high);
+ if (attempt_half) {
+ int half = low + ((high - low) >> 1);
+
+ if (attempt_half == 1)
+ high = half;
+ else
+ low = half;
+ }
remaining = (high - low) + 1;
smallest_rover = rover = prandom_u32() % remaining + low;
@@ -154,6 +163,11 @@ again:
snum = smallest_rover;
goto have_snum;
}
+ if (attempt_half == 1) {
+ /* OK we now try the upper half of the range */
+ attempt_half = 2;
+ goto again;
+ }
goto fail;
}
/* OK, here is the one we will use. HEAD is
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
@ 2015-05-21 20:10 ` Flavio Leitner
0 siblings, 0 replies; 6+ messages in thread
From: Flavio Leitner @ 2015-05-21 20:10 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Eric Dumazet, Marcelo Ricardo Leitner
On Wed, May 20, 2015 at 10:59:01AM -0700, Eric Dumazet wrote:
> We no longer need bsocket atomic counter, as inet_csk_get_port()
> calls bind_conflict() regardless of its value, after commit
> 2b05ad33e1e624e ("tcp: bind() fix autoselection to share ports")
>
> This patch removes overhead of maintaining this counter and
> double inet_csk_get_port() calls under pressure.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Cc: Flavio Leitner <fbl@redhat.com>
> ---
The patch looks good to me.
Acked-by: Flavio Leitner <fbl@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation
2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
@ 2015-05-21 20:37 ` Flavio Leitner
0 siblings, 0 replies; 6+ messages in thread
From: Flavio Leitner @ 2015-05-21 20:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Eric Dumazet, Marcelo Ricardo Leitner
On Wed, May 20, 2015 at 10:59:02AM -0700, Eric Dumazet wrote:
> inet_csk_get_port() randomization effort tends to spread
> sockets on all the available range (ip_local_port_range)
>
> This is unfortunate because SO_REUSEADDR sockets have
> less requirements than non SO_REUSEADDR ones.
>
> If an application uses SO_REUSEADDR hint, it is to try to
> allow source ports being shared.
>
> So instead of picking a random port number in ip_local_port_range,
> lets try first in first half of the range.
>
> This gives more chances to use upper half of the range for the
> sockets with strong requirements (not using SO_REUSEADDR)
>
> Note this patch does not add a new sysctl, and only changes
> the way we try to pick port number.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Cc: Flavio Leitner <fbl@redhat.com>
> ---
The only downside I can see is that after the patch the applications
using the SO_REUSEADDR will reuse ports more often and that could
potentially trigger some bug.
Looks like a good change to me.
Acked-by: Flavio Leitner <fbl@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 0/2] tcp: improve source port selection
2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
@ 2015-05-21 22:55 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2015-05-21 22:55 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 20 May 2015 10:59:00 -0700
> With increase of TCP sockets in hosts, we often hit limitations
> caused by port selection, due to randomization and poor strategy.
This looks fine, nice work, applied.
Thanks Eric!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-05-21 22:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-20 17:59 [PATCH net-next 0/2] tcp: improve source port selection Eric Dumazet
2015-05-20 17:59 ` [PATCH net-next 1/2] inet_hashinfo: remove bsocket counter Eric Dumazet
2015-05-21 20:10 ` Flavio Leitner
2015-05-20 17:59 ` [PATCH net-next 2/2] tcp: improve REUSEADDR/NOREUSEADDR cohabitation Eric Dumazet
2015-05-21 20:37 ` Flavio Leitner
2015-05-21 22:55 ` [PATCH net-next 0/2] tcp: improve source port selection David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).