* Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
@ 2010-08-25 7:16 Anton Blanchard
2010-08-25 7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
2010-08-25 7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
0 siblings, 2 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-25 7:16 UTC (permalink / raw)
To: netdev; +Cc: miltonm
Hi,
We have a machine running a network test that regularly hits:
TCP: too many of orphaned sockets
Which comes from:
int orphan_count = percpu_counter_read_positive(
sk->sk_prot->orphan_count);
sk_mem_reclaim(sk);
if (tcp_too_many_orphans(sk, orphan_count)) {
if (net_ratelimit())
printk(KERN_INFO "TCP: too many of orphaned "
"sockets\n");
tcp_set_state(sk, TCP_CLOSE);
tcp_send_active_reset(sk, GFP_ATOMIC);
NET_INC_STATS_BH(sock_net(sk),
LINUX_MIB_TCPABORTONMEMORY);
}
Looking closer we have:
# cat /proc/sys/net/ipv4/tcp_max_orphans
4096
# grep processor /proc/cpuinfo | wc -l
128
The problem is we are using percpu_counter_read_positive, so the value can be
out num_online_cpus() * percpu_counter_batch. percpu_counter_batch is going to
be 32, so we might be out by 32 * 128 = 4k. Considering tcp_max_orphans is 4k
that explains the spurious printout and the inability to allocate sockets.
A couple of issues:
1. We size sysctl_tcp_max_orphans based on some second order heuristic
that uses pages which could be anything from 4k to 64k:
/* Try to be a bit smarter and adjust defaults depending
* on available memory.
*/
for (order = 0; ((1 << order) << PAGE_SHIFT) <
(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
order++)
;
if (order >= 4) {
tcp_death_row.sysctl_max_tw_buckets = 180000;
sysctl_tcp_max_orphans = 4096 << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
sysctl_tcp_max_orphans >>= (3 - order);
sysctl_max_syn_backlog = 128;
}
I'll follow up with a patch to fix this for PAGE_SIZE != 4k
2. Even with this fixed we could hit the original issue. We have been known to
test on 1024 thread boxes and we would have the possibility of 32 * 1024
= 32k slack in the percpu counters. On this box tcp_max_orphans will be
64k after the fix which is a bit close for comfort. Should we do anything here?
Anton
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
@ 2010-08-25 7:17 ` Anton Blanchard
2010-08-25 7:39 ` Eric Dumazet
2010-08-25 17:50 ` Eric Dumazet
2010-08-25 7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
1 sibling, 2 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-25 7:17 UTC (permalink / raw)
To: netdev; +Cc: miltonm
We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
causes problems when PAGE_SIZE is not 4k. We calculate an order based on
PAGE_SHIFT so the count should be based on PAGE_SIZE
Signed-off-By: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: powerpc.git/net/ipv4/tcp.c
===================================================================
--- powerpc.git.orig/net/ipv4/tcp.c 2010-08-25 17:04:51.190305401 +1000
+++ powerpc.git/net/ipv4/tcp.c 2010-08-25 17:05:15.463884764 +1000
@@ -3270,7 +3270,7 @@ void __init tcp_init(void)
;
if (order >= 4) {
tcp_death_row.sysctl_max_tw_buckets = 180000;
- sysctl_tcp_max_orphans = 4096 << (order - 4);
+ sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
@ 2010-08-25 7:39 ` Eric Dumazet
2010-08-25 7:59 ` David Miller
2010-08-25 17:50 ` Eric Dumazet
1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25 7:39 UTC (permalink / raw)
To: Anton Blanchard; +Cc: netdev, miltonm
Le mercredi 25 août 2010 à 17:17 +1000, Anton Blanchard a écrit :
> We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
> causes problems when PAGE_SIZE is not 4k. We calculate an order based on
> PAGE_SHIFT so the count should be based on PAGE_SIZE
>
> Signed-off-By: Milton Miller <miltonm@bga.com>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>
> Index: powerpc.git/net/ipv4/tcp.c
> ===================================================================
> --- powerpc.git.orig/net/ipv4/tcp.c 2010-08-25 17:04:51.190305401 +1000
> +++ powerpc.git/net/ipv4/tcp.c 2010-08-25 17:05:15.463884764 +1000
> @@ -3270,7 +3270,7 @@ void __init tcp_init(void)
> ;
> if (order >= 4) {
> tcp_death_row.sysctl_max_tw_buckets = 180000;
> - sysctl_tcp_max_orphans = 4096 << (order - 4);
> + sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
> sysctl_max_syn_backlog = 1024;
> } else if (order < 3) {
> tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
> --
Considering your previous mail, I suggest you add
sysctl_tcp_max_orphans = max(sysctl_tcp_max_orphans,
percpu_counter_batch * num_possible_cpus());
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
2010-08-25 7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
2010-08-25 7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
@ 2010-08-25 7:59 ` David Miller
2010-08-25 8:20 ` David Miller
1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25 7:59 UTC (permalink / raw)
To: anton; +Cc: netdev, miltonm
From: Anton Blanchard <anton@samba.org>
Date: Wed, 25 Aug 2010 17:16:26 +1000
> We have a machine running a network test that regularly hits:
>
> TCP: too many of orphaned sockets
>
> Which comes from:
>
> int orphan_count = percpu_counter_read_positive(
> sk->sk_prot->orphan_count);
>
> sk_mem_reclaim(sk);
> if (tcp_too_many_orphans(sk, orphan_count)) {
...
> 2. Even with this fixed we could hit the original issue. We have been known to
> test on 1024 thread boxes and we would have the possibility of 32 * 1024
> = 32k slack in the percpu counters. On this box tcp_max_orphans will be
> 64k after the fix which is a bit close for comfort. Should we do anything here?
Solution seems simple, if the too many orphan check triggers, simply
redo the check using the expensive but more accurate per-cpu counter
read (which avoids the skew) to make sure.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 7:39 ` Eric Dumazet
@ 2010-08-25 7:59 ` David Miller
0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2010-08-25 7:59 UTC (permalink / raw)
To: eric.dumazet; +Cc: anton, netdev, miltonm
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 09:39:01 +0200
> Considering your previous mail, I suggest you add
>
> sysctl_tcp_max_orphans = max(sysctl_tcp_max_orphans,
> percpu_counter_batch * num_possible_cpus());
That seems unnecessary, see my other reply to Anton on how to fix this
skew problem.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
2010-08-25 7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
@ 2010-08-25 8:20 ` David Miller
2010-08-25 8:47 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25 8:20 UTC (permalink / raw)
To: anton; +Cc: netdev, miltonm, eric.dumazet
From: David Miller <davem@davemloft.net>
Date: Wed, 25 Aug 2010 00:59:29 -0700 (PDT)
> Solution seems simple, if the too many orphan check triggers, simply
> redo the check using the expensive but more accurate per-cpu counter
> read (which avoids the skew) to make sure.
Something like this:
tcp: Combat per-cpu skew in orphan tests.
As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.
Fix this by doing the full expensive sum check if the optimized check
triggers.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index df6a2eb..eaa9582 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -268,11 +268,21 @@ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)
return seq3 - seq2 >= seq1 - seq2;
}
-static inline int tcp_too_many_orphans(struct sock *sk, int num)
+static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
{
- return (num > sysctl_tcp_max_orphans) ||
- (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
- atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]);
+ struct percpu_counter *ocp = sk->sk_prot->orphan_count;
+ int orphans = percpu_counter_read_positive(ocp);
+
+ if (orphans << shift > sysctl_tcp_max_orphans) {
+ orphans = percpu_counter_sum_positive(ocp);
+ if (orphans << shift > sysctl_tcp_max_orphans)
+ return true;
+ }
+
+ if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
+ atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
+ return true;
+ return false;
}
/* syncookies: remember time of last synqueue overflow */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 176e11a..197b9b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2011,11 +2011,8 @@ adjudge_to_death:
}
}
if (sk->sk_state != TCP_CLOSE) {
- int orphan_count = percpu_counter_read_positive(
- sk->sk_prot->orphan_count);
-
sk_mem_reclaim(sk);
- if (tcp_too_many_orphans(sk, orphan_count)) {
+ if (tcp_too_many_orphans(sk, 0)) {
if (net_ratelimit())
printk(KERN_INFO "TCP: too many of orphaned "
"sockets\n");
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 808bb92..c35b469 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -66,18 +66,18 @@ static void tcp_write_err(struct sock *sk)
static int tcp_out_of_resources(struct sock *sk, int do_reset)
{
struct tcp_sock *tp = tcp_sk(sk);
- int orphans = percpu_counter_read_positive(&tcp_orphan_count);
+ int shift = 0;
/* If peer does not open window for long time, or did not transmit
* anything for long time, penalize it. */
if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
- orphans <<= 1;
+ shift++;
/* If some dubious ICMP arrived, penalize even more. */
if (sk->sk_err_soft)
- orphans <<= 1;
+ shift++;
- if (tcp_too_many_orphans(sk, orphans)) {
+ if (tcp_too_many_orphans(sk, shift)) {
if (net_ratelimit())
printk(KERN_INFO "Out of socket memory\n");
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
2010-08-25 8:20 ` David Miller
@ 2010-08-25 8:47 ` Eric Dumazet
2010-08-25 9:28 ` David Miller
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25 8:47 UTC (permalink / raw)
To: David Miller; +Cc: anton, netdev, miltonm
Le mercredi 25 août 2010 à 01:20 -0700, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 25 Aug 2010 00:59:29 -0700 (PDT)
>
> > Solution seems simple, if the too many orphan check triggers, simply
> > redo the check using the expensive but more accurate per-cpu counter
> > read (which avoids the skew) to make sure.
>
> Something like this:
>
> tcp: Combat per-cpu skew in orphan tests.
>
> As reported by Anton Blanchard when we use
> percpu_counter_read_positive() to make our orphan socket limit checks,
> the check can be off by up to num_cpus_online() * batch (which is 32
> by default) which on a 128 cpu machine can be as large as the default
> orphan limit itself.
>
> Fix this by doing the full expensive sum check if the optimized check
> triggers.
>
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Very nice !
tcp_too_many_orphans() might be a bit large to still be inlined ...
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index df6a2eb..eaa9582 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -268,11 +268,21 @@ static inline int between(__u32 seq1, __u32 seq2, __u32 seq3)
> return seq3 - seq2 >= seq1 - seq2;
> }
>
> -static inline int tcp_too_many_orphans(struct sock *sk, int num)
> +static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
> {
> - return (num > sysctl_tcp_max_orphans) ||
> - (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
> - atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2]);
> + struct percpu_counter *ocp = sk->sk_prot->orphan_count;
> + int orphans = percpu_counter_read_positive(ocp);
> +
> + if (orphans << shift > sysctl_tcp_max_orphans) {
> + orphans = percpu_counter_sum_positive(ocp);
> + if (orphans << shift > sysctl_tcp_max_orphans)
> + return true;
> + }
> +
> + if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
> + atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
> + return true;
> + return false;
> }
>
> /* syncookies: remember time of last synqueue overflow */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 176e11a..197b9b7 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2011,11 +2011,8 @@ adjudge_to_death:
> }
> }
> if (sk->sk_state != TCP_CLOSE) {
> - int orphan_count = percpu_counter_read_positive(
> - sk->sk_prot->orphan_count);
> -
> sk_mem_reclaim(sk);
> - if (tcp_too_many_orphans(sk, orphan_count)) {
> + if (tcp_too_many_orphans(sk, 0)) {
> if (net_ratelimit())
> printk(KERN_INFO "TCP: too many of orphaned "
> "sockets\n");
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 808bb92..c35b469 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -66,18 +66,18 @@ static void tcp_write_err(struct sock *sk)
> static int tcp_out_of_resources(struct sock *sk, int do_reset)
> {
> struct tcp_sock *tp = tcp_sk(sk);
> - int orphans = percpu_counter_read_positive(&tcp_orphan_count);
> + int shift = 0;
>
> /* If peer does not open window for long time, or did not transmit
> * anything for long time, penalize it. */
> if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
> - orphans <<= 1;
> + shift++;
>
> /* If some dubious ICMP arrived, penalize even more. */
> if (sk->sk_err_soft)
> - orphans <<= 1;
> + shift++;
>
> - if (tcp_too_many_orphans(sk, orphans)) {
> + if (tcp_too_many_orphans(sk, shift)) {
> if (net_ratelimit())
> printk(KERN_INFO "Out of socket memory\n");
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Spurious "TCP: too many of orphaned sockets", unable to allocate sockets
2010-08-25 8:47 ` Eric Dumazet
@ 2010-08-25 9:28 ` David Miller
0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2010-08-25 9:28 UTC (permalink / raw)
To: eric.dumazet; +Cc: anton, netdev, miltonm
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 10:47:37 +0200
> tcp_too_many_orphans() might be a bit large to still be inlined ...
We can look into that once this propagates to net-next-2.6
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Thanks for reviewing.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
2010-08-25 7:39 ` Eric Dumazet
@ 2010-08-25 17:50 ` Eric Dumazet
2010-08-25 23:57 ` David Miller
1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-25 17:50 UTC (permalink / raw)
To: Anton Blanchard; +Cc: netdev, miltonm
Le mercredi 25 août 2010 à 17:17 +1000, Anton Blanchard a écrit :
> We were hard coding 4096 when sizing sysctl_tcp_max_orphans which
> causes problems when PAGE_SIZE is not 4k. We calculate an order based on
> PAGE_SHIFT so the count should be based on PAGE_SIZE
>
> Signed-off-By: Milton Miller <miltonm@bga.com>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>
> Index: powerpc.git/net/ipv4/tcp.c
> ===================================================================
> --- powerpc.git.orig/net/ipv4/tcp.c 2010-08-25 17:04:51.190305401 +1000
> +++ powerpc.git/net/ipv4/tcp.c 2010-08-25 17:05:15.463884764 +1000
> @@ -3270,7 +3270,7 @@ void __init tcp_init(void)
> ;
> if (order >= 4) {
> tcp_death_row.sysctl_max_tw_buckets = 180000;
> - sysctl_tcp_max_orphans = 4096 << (order - 4);
> + sysctl_tcp_max_orphans = PAGE_SIZE << (order - 4);
> sysctl_max_syn_backlog = 1024;
> } else if (order < 3) {
> tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
> --
In fact, existing code makes litle sense....
(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its strange to select bigger
limits in this case (where kernel structures are also bigger)
bhash_size max is 65536, and we get this value even for small machines.
Sizing would probably be better if using ehash_size instead of
bhash_size
Maybe remove the 'order' loop and use ehash_size, already a result of
the available memory or thash_entries tunable.
unsigned int ehash_size = tcp_hashinfo.ehash_mask + 1;
tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
sysctl_tcp_max_orphans = cnt / 2;
sysctl_max_syn_backlog = min(128, cnt / 256);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 17:50 ` Eric Dumazet
@ 2010-08-25 23:57 ` David Miller
2010-08-26 0:38 ` Anton Blanchard
0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-25 23:57 UTC (permalink / raw)
To: eric.dumazet; +Cc: anton, netdev, miltonm
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 19:50:34 +0200
> In fact, existing code makes litle sense....
>
> (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
>
> is much bigger if spinlock debugging is on. Its strange to select bigger
> limits in this case (where kernel structures are also bigger)
>
> bhash_size max is 65536, and we get this value even for small machines.
>
> Sizing would probably be better if using ehash_size instead of
> bhash_size
Yes, that would avoid debugging unduly influencing this tunable.
Also, I think the "4096" constant is a reference to thing we now
call SK_MEM_QUANTUM. Back when this sizing code was added in:
commit 1f28b683339f74f9664b77532f4a2f1aad512451
Author: davem <davem>
Date: Sun Jan 16 05:10:52 2000 +0000
Merge in TCP/UDP optimizations and
bug fixing from softnet patches. Softnet patch
set decreases size by approx. 300k
of the netdev-vger-cvs tree, we used the '4096' constant explicitly.
> Maybe remove the 'order' loop and use ehash_size, already a result of
> the available memory or thash_entries tunable.
>
> unsigned int ehash_size = tcp_hashinfo.ehash_mask + 1;
>
> tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
> sysctl_tcp_max_orphans = cnt / 2;
> sysctl_max_syn_backlog = min(128, cnt / 256);
Yeah, something like the following, Anton?
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197b9b7..403c029 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3209,7 +3209,7 @@ void __init tcp_init(void)
{
struct sk_buff *skb = NULL;
unsigned long nr_pages, limit;
- int order, i, max_share;
+ int order, i, max_share, cnt;
unsigned long jiffy = jiffies;
BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3258,22 +3258,11 @@ void __init tcp_init(void)
INIT_HLIST_HEAD(&tcp_hashinfo.bhash[i].chain);
}
- /* Try to be a bit smarter and adjust defaults depending
- * on available memory.
- */
- for (order = 0; ((1 << order) << PAGE_SHIFT) <
- (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
- order++)
- ;
- if (order >= 4) {
- tcp_death_row.sysctl_max_tw_buckets = 180000;
- sysctl_tcp_max_orphans = 4096 << (order - 4);
- sysctl_max_syn_backlog = 1024;
- } else if (order < 3) {
- tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
- sysctl_tcp_max_orphans >>= (3 - order);
- sysctl_max_syn_backlog = 128;
- }
+ cnt = tcp_hashinfo.ehash_mask + 1;
+
+ tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
+ sysctl_tcp_max_orphans = cnt / 2;
+ sysctl_max_syn_backlog = min(128, cnt / 256);
/* Set the pressure threshold to be a fraction of global memory that
* is up to 1/2 at 256 MB, decreasing toward zero with the amount of
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-25 23:57 ` David Miller
@ 2010-08-26 0:38 ` Anton Blanchard
2010-08-26 3:53 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-26 0:38 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, netdev, miltonm
Hi Dave,
> Yeah, something like the following, Anton?
Thanks! I tested on a 512GB box. One thing:
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog
128
We probably want to use max():
> - sysctl_max_syn_backlog = min(128, cnt / 256);
> + sysctl_max_syn_backlog = max(128, cnt / 256);
With that change:
# cat /proc/sys/net/ipv4/tcp_max_orphans
262144
# cat /proc/sys/net/ipv4/tcp_max_tw_buckets
262144
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog
2048
Tested-by: Anton Blanchard <anton@samba.org>
Anton
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-26 0:38 ` Anton Blanchard
@ 2010-08-26 3:53 ` David Miller
2010-08-26 6:36 ` Anton Blanchard
2010-08-26 4:45 ` Eric Dumazet
2010-08-26 5:15 ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
2 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-26 3:53 UTC (permalink / raw)
To: anton; +Cc: eric.dumazet, netdev, miltonm
From: Anton Blanchard <anton@samba.org>
Date: Thu, 26 Aug 2010 10:38:34 +1000
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog
> 128
>
> We probably want to use max():
>
>> - sysctl_max_syn_backlog = min(128, cnt / 256);
>> + sysctl_max_syn_backlog = max(128, cnt / 256);
>
> With that change:
...
> Tested-by: Anton Blanchard <anton@samba.org>
Thanks a lot Anton.
Did you test my percpu orphan fix yet?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-26 0:38 ` Anton Blanchard
2010-08-26 3:53 ` David Miller
@ 2010-08-26 4:45 ` Eric Dumazet
2010-08-26 5:15 ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
2 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26 4:45 UTC (permalink / raw)
To: Anton Blanchard; +Cc: David Miller, netdev, miltonm
Le jeudi 26 août 2010 à 10:38 +1000, Anton Blanchard a écrit :
> Hi Dave,
>
> > Yeah, something like the following, Anton?
>
> Thanks! I tested on a 512GB box. One thing:
>
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog
> 128
>
> We probably want to use max():
>
> > - sysctl_max_syn_backlog = min(128, cnt / 256);
> > + sysctl_max_syn_backlog = max(128, cnt / 256);
>
I believe I make this min()/max() error once per week, sorry ;)
In my mind, I say "I want a minimum value of 128", then I write :
val = min(128, val);
instead of
val = max(128, val);
Oh well...
> With that change:
>
> # cat /proc/sys/net/ipv4/tcp_max_orphans
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_tw_buckets
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog
> 2048
>
>
> Tested-by: Anton Blanchard <anton@samba.org>
>
> Anton
Seems pretty good to me, thanks !
BTW, we now auto-limit ehash to 512*1024 slots, so you probably have
same results on a 128GB or 4096GB machine ;)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] tcp: fix three tcp sysctls tuning
2010-08-26 0:38 ` Anton Blanchard
2010-08-26 3:53 ` David Miller
2010-08-26 4:45 ` Eric Dumazet
@ 2010-08-26 5:15 ` Eric Dumazet
2010-08-26 6:02 ` David Miller
2 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26 5:15 UTC (permalink / raw)
To: Anton Blanchard, David Miller; +Cc: netdev, miltonm
As discovered by Anton Blanchard, current code to autotune
tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
sysctl_max_syn_backlog makes litle sense.
The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
machine in Anton's case.
(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its wrong to select bigger
limits in this case (where kernel structures are also bigger)
bhash_size max is 65536, and we get this value even for small machines.
A better ground is to use size of ehash table, this also makes code
shorter and more obvious.
Based on a patch from Anton, and another from David.
Reported-and-tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/tcp.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197b9b7..e2add5f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3209,7 +3209,7 @@ void __init tcp_init(void)
{
struct sk_buff *skb = NULL;
unsigned long nr_pages, limit;
- int order, i, max_share;
+ int i, max_share, cnt;
unsigned long jiffy = jiffies;
BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3258,22 +3258,12 @@ void __init tcp_init(void)
INIT_HLIST_HEAD(&tcp_hashinfo.bhash[i].chain);
}
- /* Try to be a bit smarter and adjust defaults depending
- * on available memory.
- */
- for (order = 0; ((1 << order) << PAGE_SHIFT) <
- (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
- order++)
- ;
- if (order >= 4) {
- tcp_death_row.sysctl_max_tw_buckets = 180000;
- sysctl_tcp_max_orphans = 4096 << (order - 4);
- sysctl_max_syn_backlog = 1024;
- } else if (order < 3) {
- tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
- sysctl_tcp_max_orphans >>= (3 - order);
- sysctl_max_syn_backlog = 128;
- }
+
+ cnt = tcp_hashinfo.ehash_mask + 1;
+
+ tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
+ sysctl_tcp_max_orphans = cnt / 2;
+ sysctl_max_syn_backlog = max(128, cnt / 256);
/* Set the pressure threshold to be a fraction of global memory that
* is up to 1/2 at 256 MB, decreasing toward zero with the amount of
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: fix three tcp sysctls tuning
2010-08-26 5:15 ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
@ 2010-08-26 6:02 ` David Miller
2010-08-26 6:21 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-08-26 6:02 UTC (permalink / raw)
To: eric.dumazet; +Cc: anton, netdev, miltonm
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 26 Aug 2010 07:15:53 +0200
> As discovered by Anton Blanchard, current code to autotune
> tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
> sysctl_max_syn_backlog makes litle sense.
>
> The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
> machine in Anton's case.
>
> (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
> is much bigger if spinlock debugging is on. Its wrong to select bigger
> limits in this case (where kernel structures are also bigger)
>
> bhash_size max is 65536, and we get this value even for small machines.
>
> A better ground is to use size of ehash table, this also makes code
> shorter and more obvious.
>
> Based on a patch from Anton, and another from David.
>
> Reported-and-tested-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
I had something in my tree already, but since I didn't push it out
it's just as easy for me to use this patch instead :-)
Thanks everyone!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: fix three tcp sysctls tuning
2010-08-26 6:02 ` David Miller
@ 2010-08-26 6:21 ` Eric Dumazet
0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2010-08-26 6:21 UTC (permalink / raw)
To: David Miller; +Cc: anton, netdev, miltonm
Le mercredi 25 août 2010 à 23:02 -0700, David Miller a écrit :
> I had something in my tree already, but since I didn't push it out
> it's just as easy for me to use this patch instead :-)
>
> Thanks everyone!
You're welcome, thanks David.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k
2010-08-26 3:53 ` David Miller
@ 2010-08-26 6:36 ` Anton Blanchard
0 siblings, 0 replies; 17+ messages in thread
From: Anton Blanchard @ 2010-08-26 6:36 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, netdev, miltonm
Hi Dave,
> Did you test my percpu orphan fix yet?
The machine is running some tests, but I will either get time on it or
replicate the issue on a similar box.
Anton
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-08-26 6:39 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-25 7:16 Spurious "TCP: too many of orphaned sockets", unable to allocate sockets Anton Blanchard
2010-08-25 7:17 ` [PATCH] tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k Anton Blanchard
2010-08-25 7:39 ` Eric Dumazet
2010-08-25 7:59 ` David Miller
2010-08-25 17:50 ` Eric Dumazet
2010-08-25 23:57 ` David Miller
2010-08-26 0:38 ` Anton Blanchard
2010-08-26 3:53 ` David Miller
2010-08-26 6:36 ` Anton Blanchard
2010-08-26 4:45 ` Eric Dumazet
2010-08-26 5:15 ` [PATCH] tcp: fix three tcp sysctls tuning Eric Dumazet
2010-08-26 6:02 ` David Miller
2010-08-26 6:21 ` Eric Dumazet
2010-08-25 7:59 ` Spurious "TCP: too many of orphaned sockets", unable to allocate sockets David Miller
2010-08-25 8:20 ` David Miller
2010-08-25 8:47 ` Eric Dumazet
2010-08-25 9:28 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).