netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
@ 2010-08-25  7:16 Anton Blanchard
  2010-08-25  7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
  2010-08-25  7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-25  7:16 UTC (permalink / raw)
  To: netdev; +Cc: miltonm


Hi,

We have a machine running a network test that regularly hits:

TCP: too many of orphaned sockets

Which comes from:

                int orphan_count = percpu_counter_read_positive(
                                                sk->sk_prot->orphan_count);

                sk_mem_reclaim(sk);
                if (tcp_too_many_orphans(sk, orphan_count)) {
                        if (net_ratelimit())
                                printk(KERN_INFO "TCP: too many of orphaned "
                                       "sockets\n");
                        tcp_set_state(sk, TCP_CLOSE);
                        tcp_send_active_reset(sk, GFP_ATOMIC);
                        NET_INC_STATS_BH(sock_net(sk),
                                        LINUX_MIB_TCPABORTONMEMORY);
                }

Looking closer we have:

# cat /proc/sys/net/ipv4/tcp_max_orphans
4096

# grep processor /proc/cpuinfo | wc -l
128

The problem is we are using percpu_counter_read_positive, so the value can be
out num_online_cpus() * percpu_counter_batch. percpu_counter_batch is going to
be 32, so we might be out by 32 * 128 = 4k. Considering tcp_max_orphans is 4k
that explains the spurious printout and the inability to allocate sockets.

A couple of issues:

1. We size sysctl_tcp_max_orphans based on some second order heuristic
that uses pages which could be anything from 4k to 64k:

        /* Try to be a bit smarter and adjust defaults depending
         * on available memory.
         */
        for (order = 0; ((1 << order) << PAGE_SHIFT) <
                        (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
                        order++)
                ;
        if (order >= 4) {
                tcp_death_row.sysctl_max_tw_buckets = 180000;
                sysctl_tcp_max_orphans = 4096 << (order - 4);
                sysctl_max_syn_backlog = 1024;
        } else if (order < 3) {
                tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
                sysctl_tcp_max_orphans >>= (3 - order);
                sysctl_max_syn_backlog = 128;
        }

I'll follow up with a patch to fix this for PAGE_SIZE != 4k

2. Even with this fixed we could hit the original issue. We have been known to
test on 1024 thread boxes and we would have the possibility of 32 * 1024
= 32k slack in the percpu counters. On this box tcp_max_orphans will be
64k after the fix which is a bit close for comfort. Should we do anything here?

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25  7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
@ 2010-08-25  7:17 ` Anton Blanchard
  2010-08-25  7:39   ` Eric Dumazet
  2010-08-25 17:50   ` Eric Dumazet
  2010-08-25  7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
  1 sibling, 2 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-25  7:17 UTC (permalink / raw)
  To: netdev; +Cc: miltonm


We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
causes problems when PAGE_SIZE is not 4k. We calculate an order based on
PAGE_SHIFT so the count should be based on PAGE_SIZE

Signed-off-By: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/net/ipv4/tcp.c
===================================================================
--- powerpc.git.orig/net/ipv4/tcp.c	2010-08-25 17:04:51.190305401 +1000
+++ powerpc.git/net/ipv4/tcp.c	2010-08-25 17:05:15.463884764 +1000
@@ -3270,7 +3270,7 @@ void __init tcp_init(void)
 		;
 	if (order >= 4) {
 		tcp_death_row.sysctl_max_tw_buckets = 180000;
-		sysctl_tcp_max_orphans = 4096 << (order - 4);
+		sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
 		sysctl_max_syn_backlog = 1024;
 	} else if (order < 3) {
 		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25  7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
@ 2010-08-25  7:39   ` Eric Dumazet
  2010-08-25  7:59     ` David Miller
  2010-08-25 17:50   ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25  7:39 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev, miltonm

Le mercredi 25 août 2010 à 17:17 +1000, Anton Blanchard a écrit :
> We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
> causes problems when PAGE_SIZE is not 4k. We calculate an order based on
> PAGE_SHIFT so the count should be based on PAGE_SIZE
> 
> Signed-off-By: Milton Miller <miltonm@bga.com>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Index: powerpc.git/net/ipv4/tcp.c
> ===================================================================
> --- powerpc.git.orig/net/ipv4/tcp.c	2010-08-25 17:04:51.190305401 +1000
> +++ powerpc.git/net/ipv4/tcp.c	2010-08-25 17:05:15.463884764 +1000
> @@ -3270,7 +3270,7 @@ void __init tcp_init(void)
>  		;
>  	if (order >= 4) {
>  		tcp_death_row.sysctl_max_tw_buckets = 180000;
> -		sysctl_tcp_max_orphans = 4096 << (order - 4);
> +		sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
>  		sysctl_max_syn_backlog = 1024;
>  	} else if (order < 3) {
>  		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
> --

Considering your previous mail, I suggest you add

	sysctl_tcp_max_orphans = max(sysctl_tcp_max_orphans,
		     percpu_counter_batch * num_possible_cpus());





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
  2010-08-25  7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
  2010-08-25  7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
@ 2010-08-25  7:59 ` David Miller
  2010-08-25  8:20   ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25  7:59 UTC (permalink / raw)
  To: anton; +Cc: netdev, miltonm

From: Anton Blanchard <anton@samba.org>
Date: Wed, 25 Aug 2010 17:16:26 +1000

> We have a machine running a network test that regularly hits:
> 
> TCP: too many of orphaned sockets
> 
> Which comes from:
> 
>                 int orphan_count = percpu_counter_read_positive(
>                                                 sk->sk_prot->orphan_count);
> 
>                 sk_mem_reclaim(sk);
>                 if (tcp_too_many_orphans(sk, orphan_count)) {
...
> 2. Even with this fixed we could hit the original issue. We have been known to
> test on 1024 thread boxes and we would have the possibility of 32 * 1024
> = 32k slack in the percpu counters. On this box tcp_max_orphans will be
> 64k after the fix which is a bit close for comfort. Should we do anything here?

Solution seems simple, if the too many orphan check triggers, simply
redo the check using the expensive but more accurate per-cpu counter
read (which avoids the skew) to make sure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25  7:39   ` Eric Dumazet
@ 2010-08-25  7:59     ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2010-08-25  7:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev, miltonm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 09:39:01 +0200

> Considering your previous mail, I suggest you add
> 
> 	sysctl_tcp_max_orphans = max(sysctl_tcp_max_orphans,
> 		     percpu_counter_batch * num_possible_cpus());

That seems unnecessary, see my other reply to Anton on how to fix this
skew problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
  2010-08-25  7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
@ 2010-08-25  8:20   ` David Miller
  2010-08-25  8:47     ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25  8:20 UTC (permalink / raw)
  To: anton; +Cc: netdev, miltonm, eric.dumazet

From: David Miller <davem@davemloft.net>
Date: Wed, 25 Aug 2010 00:59:29 -0700 (PDT)

> Solution seems simple, if the too many orphan check triggers, simply
> redo the check using the expensive but more accurate per-cpu counter
> read (which avoids the skew) to make sure.

Something like this:

tcp: Combat per-cpu skew in orphan tests.

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index df6a2eb..eaa9582 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -268,11 +268,21 @@ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)
 	return seq3 - seq2 >= seq1 - seq2;
 }
 
-static inline int tcp_too_many_orphans(struct sock *sk, int num)
+static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
 {
-	return (num > sysctl_tcp_max_orphans) ||
-		(sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-		 atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]);
+	struct percpu_counter *ocp = sk->sk_prot->orphan_count;
+	int orphans = percpu_counter_read_positive(ocp);
+
+	if (orphans << shift > sysctl_tcp_max_orphans) {
+		orphans = percpu_counter_sum_positive(ocp);
+		if (orphans << shift > sysctl_tcp_max_orphans)
+			return true;
+	}
+
+	if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
+	    atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
+		return true;
+	return false;
 }
 
 /* syncookies: remember time of last synqueue overflow */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 176e11a..197b9b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2011,11 +2011,8 @@ adjudge_to_death:
 		}
 	}
 	if (sk->sk_state != TCP_CLOSE) {
-		int orphan_count = percpu_counter_read_positive(
-						sk->sk_prot->orphan_count);
-
 		sk_mem_reclaim(sk);
-		if (tcp_too_many_orphans(sk, orphan_count)) {
+		if (tcp_too_many_orphans(sk, 0)) {
 			if (net_ratelimit())
 				printk(KERN_INFO "TCP: too many of orphaned "
 				       "sockets\n");
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 808bb92..c35b469 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -66,18 +66,18 @@ static void tcp_write_err(struct sock *sk)
 static int tcp_out_of_resources(struct sock *sk, int do_reset)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int orphans = percpu_counter_read_positive(&tcp_orphan_count);
+	int shift = 0;
 
 	/* If peer does not open window for long time, or did not transmit
 	 * anything for long time, penalize it. */
 	if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
-		orphans <<= 1;
+		shift++;
 
 	/* If some dubious ICMP arrived, penalize even more. */
 	if (sk->sk_err_soft)
-		orphans <<= 1;
+		shift++;
 
-	if (tcp_too_many_orphans(sk, orphans)) {
+	if (tcp_too_many_orphans(sk, shift)) {
 		if (net_ratelimit())
 			printk(KERN_INFO "Out of socket memory\n");
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
  2010-08-25  8:20   ` David Miller
@ 2010-08-25  8:47     ` Eric Dumazet
  2010-08-25  9:28       ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25  8:47 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev, miltonm

Le mercredi 25 août 2010 à 01:20 -0700, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 25 Aug 2010 00:59:29 -0700 (PDT)
> 
> > Solution seems simple, if the too many orphan check triggers, simply
> > redo the check using the expensive but more accurate per-cpu counter
> > read (which avoids the skew) to make sure.
> 
> Something like this:
> 
> tcp: Combat per-cpu skew in orphan tests.
> 
> As reported by Anton Blanchard when we use
> percpu_counter_read_positive() to make our orphan socket limit checks,
> the check can be off by up to num_cpus_online() * batch (which is 32
> by default) which on a 128 cpu machine can be as large as the default
> orphan limit itself.
> 
> Fix this by doing the full expensive sum check if the optimized check
> triggers.
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Very nice !

tcp_too_many_orphans() might be a bit large to still be inlined ...

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index df6a2eb..eaa9582 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -268,11 +268,21 @@ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)
>  	return seq3 - seq2 >= seq1 - seq2;
>  }
>  
> -static inline int tcp_too_many_orphans(struct sock *sk, int num)
> +static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
>  {
> -	return (num > sysctl_tcp_max_orphans) ||
> -		(sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
> -		 atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]);
> +	struct percpu_counter *ocp = sk->sk_prot->orphan_count;
> +	int orphans = percpu_counter_read_positive(ocp);
> +
> +	if (orphans << shift > sysctl_tcp_max_orphans) {
> +		orphans = percpu_counter_sum_positive(ocp);
> +		if (orphans << shift > sysctl_tcp_max_orphans)
> +			return true;
> +	}
> +
> +	if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
> +	    atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
> +		return true;
> +	return false;
>  }
>  
>  /* syncookies: remember time of last synqueue overflow */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 176e11a..197b9b7 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2011,11 +2011,8 @@ adjudge_to_death:
>  		}
>  	}
>  	if (sk->sk_state != TCP_CLOSE) {
> -		int orphan_count = percpu_counter_read_positive(
> -						sk->sk_prot->orphan_count);
> -
>  		sk_mem_reclaim(sk);
> -		if (tcp_too_many_orphans(sk, orphan_count)) {
> +		if (tcp_too_many_orphans(sk, 0)) {
>  			if (net_ratelimit())
>  				printk(KERN_INFO "TCP: too many of orphaned "
>  				       "sockets\n");
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 808bb92..c35b469 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -66,18 +66,18 @@ static void tcp_write_err(struct sock *sk)
>  static int tcp_out_of_resources(struct sock *sk, int do_reset)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> -	int orphans = percpu_counter_read_positive(&tcp_orphan_count);
> +	int shift = 0;
>  
>  	/* If peer does not open window for long time, or did not transmit
>  	 * anything for long time, penalize it. */
>  	if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
> -		orphans <<= 1;
> +		shift++;
>  
>  	/* If some dubious ICMP arrived, penalize even more. */
>  	if (sk->sk_err_soft)
> -		orphans <<= 1;
> +		shift++;
>  
> -	if (tcp_too_many_orphans(sk, orphans)) {
> +	if (tcp_too_many_orphans(sk, shift)) {
>  		if (net_ratelimit())
>  			printk(KERN_INFO "Out of socket memory\n");
>  



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
  2010-08-25  8:47     ` Eric Dumazet
@ 2010-08-25  9:28       ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2010-08-25  9:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev, miltonm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 10:47:37 +0200

> tcp_too_many_orphans() might be a bit large to still be inlined ...

We can look into that once this propagates to net-next-2.6

> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks for reviewing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25  7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
  2010-08-25  7:39   ` Eric Dumazet
@ 2010-08-25 17:50   ` Eric Dumazet
  2010-08-25 23:57     ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25 17:50 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev, miltonm

Le mercredi 25 août 2010 à 17:17 +1000, Anton Blanchard a écrit :
> We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
> causes problems when PAGE_SIZE is not 4k. We calculate an order based on
> PAGE_SHIFT so the count should be based on PAGE_SIZE
> 
> Signed-off-By: Milton Miller <miltonm@bga.com>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Index: powerpc.git/net/ipv4/tcp.c
> ===================================================================
> --- powerpc.git.orig/net/ipv4/tcp.c	2010-08-25 17:04:51.190305401 +1000
> +++ powerpc.git/net/ipv4/tcp.c	2010-08-25 17:05:15.463884764 +1000
> @@ -3270,7 +3270,7 @@ void __init tcp_init(void)
>  		;
>  	if (order >= 4) {
>  		tcp_death_row.sysctl_max_tw_buckets = 180000;
> -		sysctl_tcp_max_orphans = 4096 << (order - 4);
> +		sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
>  		sysctl_max_syn_backlog = 1024;
>  	} else if (order < 3) {
>  		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
> --

In fact, existing code makes litle sense....

(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))

is much bigger if spinlock debugging is on. Its strange to select bigger
limits in this case (where kernel structures are also bigger)

bhash_size max is 65536, and we get this value even for small machines. 

Sizing would probably be better if using ehash_size instead of
bhash_size

Maybe remove the 'order' loop and use ehash_size, already a result of
the available memory or thash_entries tunable.

unsigned int ehash_size = tcp_hashinfo.ehash_mask + 1;

tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
sysctl_tcp_max_orphans = cnt / 2;
sysctl_max_syn_backlog = min(128, cnt / 256);




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25 17:50   ` Eric Dumazet
@ 2010-08-25 23:57     ` David Miller
  2010-08-26  0:38       ` Anton Blanchard
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25 23:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev, miltonm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 19:50:34 +0200

> In fact, existing code makes litle sense....
> 
> (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
> 
> is much bigger if spinlock debugging is on. Its strange to select bigger
> limits in this case (where kernel structures are also bigger)
> 
> bhash_size max is 65536, and we get this value even for small machines. 
> 
> Sizing would probably be better if using ehash_size instead of
> bhash_size

Yes, that would avoid debugging unduly influencing this tunable.

Also, I think the "4096" constant is a reference to thing we now
call SK_MEM_QUANTUM.  Back when this sizing code was added in:

commit 1f28b683339f74f9664b77532f4a2f1aad512451
Author: davem <davem>
Date:   Sun Jan 16 05:10:52 2000 +0000

    Merge in TCP/UDP optimizations and
    bug fixing from softnet patches.  Softnet patch
    set decreases size by approx. 300k

of the netdev-vger-cvs tree, we used the '4096' constant explicitly.

> Maybe remove the 'order' loop and use ehash_size, already a result of
> the available memory or thash_entries tunable.
> 
> unsigned int ehash_size = tcp_hashinfo.ehash_mask + 1;
> 
> tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
> sysctl_tcp_max_orphans = cnt / 2;
> sysctl_max_syn_backlog = min(128, cnt / 256);

Yeah, something like the following, Anton?

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197b9b7..403c029 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3209,7 +3209,7 @@ void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long nr_pages, limit;
-	int order, i, max_share;
+	int order, i, max_share, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3258,22 +3258,11 @@ void __init tcp_init(void)
 		INIT_HLIST_HEAD(&tcp_hashinfo.bhash[i].chain);
 	}
 
-	/* Try to be a bit smarter and adjust defaults depending
-	 * on available memory.
-	 */
-	for (order = 0; ((1 << order) << PAGE_SHIFT) <
-			(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
-			order++)
-		;
-	if (order >= 4) {
-		tcp_death_row.sysctl_max_tw_buckets = 180000;
-		sysctl_tcp_max_orphans = 4096 << (order - 4);
-		sysctl_max_syn_backlog = 1024;
-	} else if (order < 3) {
-		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
-		sysctl_tcp_max_orphans >>= (3 - order);
-		sysctl_max_syn_backlog = 128;
-	}
+	cnt = tcp_hashinfo.ehash_mask + 1;
+
+	tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
+	sysctl_tcp_max_orphans = cnt / 2;
+	sysctl_max_syn_backlog = min(128, cnt / 256);
 
 	/* Set the pressure threshold to be a fraction of global memory that
 	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-25 23:57     ` David Miller
@ 2010-08-26  0:38       ` Anton Blanchard
  2010-08-26  3:53         ` David Miller
                           ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-26  0:38 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, miltonm


Hi Dave,

> Yeah, something like the following, Anton?

Thanks! I tested on a 512GB box. One thing:

# cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
128

We probably want to use max():

> -	sysctl_max_syn_backlog = min(128, cnt / 256);
> +	sysctl_max_syn_backlog = max(128, cnt / 256);

With that change:

# cat /proc/sys/net/ipv4/tcp_max_orphans 
262144
# cat /proc/sys/net/ipv4/tcp_max_tw_buckets
262144
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
2048


Tested-by: Anton Blanchard <anton@samba.org>

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-26  0:38       ` Anton Blanchard
@ 2010-08-26  3:53         ` David Miller
  2010-08-26  6:36           ` Anton Blanchard
  2010-08-26  4:45         ` Eric Dumazet
  2010-08-26  5:15         ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
  2 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-26  3:53 UTC (permalink / raw)
  To: anton; +Cc: eric.dumazet, netdev, miltonm

From: Anton Blanchard <anton@samba.org>
Date: Thu, 26 Aug 2010 10:38:34 +1000

> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 128
> 
> We probably want to use max():
> 
>> -	sysctl_max_syn_backlog = min(128, cnt / 256);
>> +	sysctl_max_syn_backlog = max(128, cnt / 256);
> 
> With that change:
 ...
> Tested-by: Anton Blanchard <anton@samba.org>

Thanks a lot Anton.

Did you test my percpu orphan fix yet?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-26  0:38       ` Anton Blanchard
  2010-08-26  3:53         ` David Miller
@ 2010-08-26  4:45         ` Eric Dumazet
  2010-08-26  5:15         ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26  4:45 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: David Miller, netdev, miltonm

Le jeudi 26 août 2010 à 10:38 +1000, Anton Blanchard a écrit :
> Hi Dave,
> 
> > Yeah, something like the following, Anton?
> 
> Thanks! I tested on a 512GB box. One thing:
> 
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 128
> 
> We probably want to use max():
> 
> > -	sysctl_max_syn_backlog = min(128, cnt / 256);
> > +	sysctl_max_syn_backlog = max(128, cnt / 256);
> 

I believe I make this min()/max() error once per week, sorry ;)

In my mind, I say "I want a minimum value of 128", then I write :

val = min(128, val);

instead of 

val = max(128, val);

Oh well...

> With that change:
> 
> # cat /proc/sys/net/ipv4/tcp_max_orphans 
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_tw_buckets
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 2048
> 
> 
> Tested-by: Anton Blanchard <anton@samba.org>
> 
> Anton

Seems pretty good to me, thanks !

BTW, we now auto-limit ehash to 512*1024 slots, so you probably have
same results on a 128GB or 4096GB machine ;)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] tcp: fix three tcp sysctls tuning
  2010-08-26  0:38       ` Anton Blanchard
  2010-08-26  3:53         ` David Miller
  2010-08-26  4:45         ` Eric Dumazet
@ 2010-08-26  5:15         ` Eric Dumazet
  2010-08-26  6:02           ` David Miller
  2 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26  5:15 UTC (permalink / raw)
  To: Anton Blanchard, David Miller; +Cc: netdev, miltonm

As discovered by Anton Blanchard, current code to autotune 
tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
sysctl_max_syn_backlog makes litle sense.

The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
machine in Anton's case.

(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its wrong to select bigger
limits in this case (where kernel structures are also bigger)

bhash_size max is 65536, and we get this value even for small machines. 

A better ground is to use size of ehash table, this also makes code
shorter and more obvious.

Based on a patch from Anton, and another from David.

Reported-and-tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp.c |   24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197b9b7..e2add5f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3209,7 +3209,7 @@ void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long nr_pages, limit;
-	int order, i, max_share;
+	int i, max_share, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3258,22 +3258,12 @@ void __init tcp_init(void)
 		INIT_HLIST_HEAD(&tcp_hashinfo.bhash[i].chain);
 	}
 
-	/* Try to be a bit smarter and adjust defaults depending
-	 * on available memory.
-	 */
-	for (order = 0; ((1 << order) << PAGE_SHIFT) <
-			(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
-			order++)
-		;
-	if (order >= 4) {
-		tcp_death_row.sysctl_max_tw_buckets = 180000;
-		sysctl_tcp_max_orphans = 4096 << (order - 4);
-		sysctl_max_syn_backlog = 1024;
-	} else if (order < 3) {
-		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
-		sysctl_tcp_max_orphans >>= (3 - order);
-		sysctl_max_syn_backlog = 128;
-	}
+
+	cnt = tcp_hashinfo.ehash_mask + 1;
+
+	tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
+	sysctl_tcp_max_orphans = cnt / 2;
+	sysctl_max_syn_backlog = max(128, cnt / 256);
 
 	/* Set the pressure threshold to be a fraction of global memory that
 	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: fix three tcp sysctls tuning
  2010-08-26  5:15         ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
@ 2010-08-26  6:02           ` David Miller
  2010-08-26  6:21             ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-26  6:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev, miltonm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 26 Aug 2010 07:15:53 +0200

> As discovered by Anton Blanchard, current code to autotune 
> tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
> sysctl_max_syn_backlog makes litle sense.
> 
> The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
> machine in Anton's case.
> 
> (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
> is much bigger if spinlock debugging is on. Its wrong to select bigger
> limits in this case (where kernel structures are also bigger)
> 
> bhash_size max is 65536, and we get this value even for small machines. 
> 
> A better ground is to use size of ehash table, this also makes code
> shorter and more obvious.
> 
> Based on a patch from Anton, and another from David.
> 
> Reported-and-tested-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I had something in my tree already, but since I didn't push it out
it's just as easy for me to use this patch instead :-)

Thanks everyone!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: fix three tcp sysctls tuning
  2010-08-26  6:02           ` David Miller
@ 2010-08-26  6:21             ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26  6:21 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev, miltonm

Le mercredi 25 août 2010 à 23:02 -0700, David Miller a écrit :

> I had something in my tree already, but since I didn't push it out
> it's just as easy for me to use this patch instead :-)
> 
> Thanks everyone!

You're welcome, thanks David.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
  2010-08-26  3:53         ` David Miller
@ 2010-08-26  6:36           ` Anton Blanchard
  0 siblings, 0 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-26  6:36 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, miltonm

 
Hi Dave,

> Did you test my percpu orphan fix yet?

The machine is running some tests, but I will either get time on it or
replicate the issue on a similar box.

Anton

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-26  6:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-25  7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
2010-08-25  7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
2010-08-25  7:39   ` Eric Dumazet
2010-08-25  7:59     ` David Miller
2010-08-25 17:50   ` Eric Dumazet
2010-08-25 23:57     ` David Miller
2010-08-26  0:38       ` Anton Blanchard
2010-08-26  3:53         ` David Miller
2010-08-26  6:36           ` Anton Blanchard
2010-08-26  4:45         ` Eric Dumazet
2010-08-26  5:15         ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
2010-08-26  6:02           ` David Miller
2010-08-26  6:21             ` Eric Dumazet
2010-08-25  7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
2010-08-25  8:20   ` David Miller
2010-08-25  8:47     ` Eric Dumazet
2010-08-25  9:28       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).