* [PATCH 1/2] socket: increase default maximum listen queue length @ 2011-03-25 18:31 Hagen Paul Pfeifer 2011-03-25 18:31 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer 2011-03-31 5:52 ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller 0 siblings, 2 replies; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-25 18:31 UTC (permalink / raw) To: netdev; +Cc: Eric Dumazet sysctl_somaxconn specifies the maximum number of sockets in state SYN_RECV per listen socket and is initialized with 128 (SOMAXCONN). sysctl_max_syn_backlog on the other hand provides similar functionality: provides a system wide upper limit of request sockets per listen socket. But sysctl_max_syn_backlog provides a more accurate value by considerate the actual memory situation of the system. 256 by default, 128 for systems with low memory and up to 1024 for larger systems. This patch increase sysctl_somaxconn to 256 and provide environments with a increased RTT and many connections/second a better default value by simultaneously provides the fallback that smaller systems will not suffer of an increased memory usage - sysctl_max_syn_backlog is already a good guard. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> --- include/linux/socket.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index edbb1d0..bf35ce2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -237,7 +237,7 @@ struct ucred { #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ -#define SOMAXCONN 128 +#define SOMAXCONN 256 /* Flags we can use with send/ and recv. Added those for 1003.1g not all are supported yet -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-25 18:31 [PATCH 1/2] socket: increase default maximum listen queue length Hagen Paul Pfeifer @ 2011-03-25 18:31 ` Hagen Paul Pfeifer 2011-03-25 20:24 ` Rick Jones 2011-03-31 5:52 ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller 1 sibling, 1 reply; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-25 18:31 UTC (permalink / raw) To: netdev; +Cc: Eric Dumazet In the case that a server programmer misjudge network characteristic the backlog parameter for listen(2) may not adequate to utilize hosts capabilities and lead to unrequired SYN retransmission - thus a underestimated backlog value can form an artificial limitation. A listen queue length of 8 is often a way to small, but several server authors does not about know this limitation (from Erics server setup): ss -a | head State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8 *:imaps *:* LISTEN 0 8 *:pop3s *:* LISTEN 0 50 *:mysql *:* LISTEN 0 8 *:pop3 *:* LISTEN 0 8 *:imap2 *:* LISTEN 0 511 *:www *:* Until now it was not possible for the system (network) administrator to increase this value. A bug report must be filled, the backlog increased, a new version released or even worse: if using closed source software you cannot make anything. sysctl_min_syn_backlog provides the ability to increase the minimum queue length. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/request_sock.h | 1 + net/core/request_sock.c | 6 +++++- net/ipv4/af_inet.c | 2 +- net/ipv4/sysctl_net_ipv4.c | 7 +++++++ 4 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 99e6e19..3e8865f 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -89,6 +89,7 @@ static inline void reqsk_free(struct request_sock *req) } extern int sysctl_max_syn_backlog; +extern int sysctl_min_syn_backlog; /** struct listen_sock - listen state * diff --git a/net/core/request_sock.c b/net/core/request_sock.c index 182236b..0e968b6 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -35,6 +35,9 @@ int sysctl_max_syn_backlog = 256; EXPORT_SYMBOL(sysctl_max_syn_backlog); +int sysctl_min_syn_backlog = 0; +EXPORT_SYMBOL(sysctl_min_syn_backlog); + int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries) { @@ -42,7 +45,8 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, struct listen_sock *lopt; nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); - nr_table_entries = max_t(u32, nr_table_entries, 8); + nr_table_entries = max_t(u32, nr_table_entries, + max_t(u32, 8, sysctl_min_syn_backlog)); nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); lopt_size += nr_table_entries * sizeof(struct request_sock *); if (lopt_size > PAGE_SIZE) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 807d83c..c580d7c 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -213,7 +213,7 @@ int inet_listen(struct socket *sock, int backlog) if (err) goto out; } - sk->sk_max_ack_backlog = backlog; + sk->sk_max_ack_backlog = max_t(u32, backlog, sysctl_min_syn_backlog); err = 0; out: diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 1a45665..cc03c62 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -298,6 +298,13 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_min_syn_backlog", + .data = &sysctl_min_syn_backlog, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, + { .procname = "ip_local_port_range", .data = &sysctl_local_ports.range, .maxlen = sizeof(sysctl_local_ports.range), -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-25 18:31 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer @ 2011-03-25 20:24 ` Rick Jones 2011-03-25 23:51 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 9+ messages in thread From: Rick Jones @ 2011-03-25 20:24 UTC (permalink / raw) To: Hagen Paul Pfeifer; +Cc: netdev, Eric Dumazet On Fri, 2011-03-25 at 19:31 +0100, Hagen Paul Pfeifer wrote: > In the case that a server programmer misjudge network characteristic the > backlog parameter for listen(2) may not adequate to utilize hosts > capabilities and lead to unrequired SYN retransmission - thus a > underestimated backlog value can form an artificial limitation. > > A listen queue length of 8 is often a way to small, but several > server authors does not about know this limitation (from Erics server > setup): > > ss -a | head > State Recv-Q Send-Q Local Address:Port Peer > Address:Port > LISTEN 0 8 *:imaps *:* > LISTEN 0 8 *:pop3s *:* > LISTEN 0 50 *:mysql *:* > LISTEN 0 8 *:pop3 *:* > LISTEN 0 8 *:imap2 *:* > LISTEN 0 511 *:www *:* > > Until now it was not possible for the system (network) administrator to > increase this value. A bug report must be filled, the backlog increased, > a new version released or even worse: if using closed source software > you cannot make anything. Well, one could LD_PRELOAD something that intercepted listen() calls no? > sysctl_min_syn_backlog provides the ability to increase the minimum > queue length. Is there already a similar minimum the admin can configure when the applications makes "too small" an explicit setsockopt() call against SO_SNDBUF or SO_RCVBUF? rick jones ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-25 20:24 ` Rick Jones @ 2011-03-25 23:51 ` Hagen Paul Pfeifer 2011-03-26 0:21 ` Rick Jones 2011-03-26 7:06 ` Eric Dumazet 0 siblings, 2 replies; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-25 23:51 UTC (permalink / raw) To: Rick Jones; +Cc: netdev, Eric Dumazet * Rick Jones | 2011-03-25 13:24:37 [-0700]: Hello Rick >Well, one could LD_PRELOAD something that intercepted listen() calls no? Noes, for dynamically linked programs yes, for statically linked ones no. Furthermore, for distribution shipped programs an administrator would not alter the init script or something. Editing /etc/sysctl.conf is as simple as ... >Is there already a similar minimum the admin can configure when the >applications makes "too small" an explicit setsockopt() call against >SO_SNDBUF or SO_RCVBUF? net.ipv4.tcp_rmem, net.ipv4.tcp_mem, net.core.rmem_default, ...? IMHO, _if_ a programmer modifies the send or receive buffer he _knows_ exactly why. If he does not modify the buffer it is fine too, because _we_ tune the buffers as good as we can - and we are good in this. But, the backlog is different. Often the programmer does _not_ know how to tune this variable. And, often the backlog depends on the target system, on the network characteristic and the like. Therefore we provide the system administrator the _ability_ to tune the actual backlog. Hagen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-25 23:51 ` Hagen Paul Pfeifer @ 2011-03-26 0:21 ` Rick Jones 2011-03-26 7:06 ` Eric Dumazet 1 sibling, 0 replies; 9+ messages in thread From: Rick Jones @ 2011-03-26 0:21 UTC (permalink / raw) To: Hagen Paul Pfeifer; +Cc: netdev, Eric Dumazet On Sat, 2011-03-26 at 00:51 +0100, Hagen Paul Pfeifer wrote: > * Rick Jones | 2011-03-25 13:24:37 [-0700]: > > Hello Rick > > >Well, one could LD_PRELOAD something that intercepted listen() calls no? > > Noes, for dynamically linked programs yes, for statically linked ones no. > > Furthermore, for distribution shipped programs an administrator would not > alter the init script or something. Editing /etc/sysctl.conf is as simple > as ... > > > >Is there already a similar minimum the admin can configure when the > >applications makes "too small" an explicit setsockopt() call against > >SO_SNDBUF or SO_RCVBUF? > > net.ipv4.tcp_rmem, net.ipv4.tcp_mem, net.core.rmem_default, ...? I believe (based on my netperf experience) tcp_rmem and tcp_wmem aren't consulted when one makes an explicit setsockopt() call against the SO_*BUF sizes. and the net.core.[rw]mem_default are used by UDP sockets: raj@tardy:~/netperf2_trunk$ uname -a Linux tardy 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux raj@tardy:~/netperf2_trunk$ sysctl net.ipv4.tcp_rmem net.ipv4.tcp_rmem = 4096 87380 4194304 raj@tardy:~/netperf2_trunk$ sysctl net.ipv4.tcp_wmem net.ipv4.tcp_wmem = 4096 16384 4194304 raj@tardy:~/netperf2_trunk$ sysctl net.core.wmem_default net.core.wmem_default = 126976 raj@tardy:~/netperf2_trunk$ sysctl net.core.rmem_default net.core.rmem_default = 126976 (lss == local socket send; rsr == remote socket receive) src/netperf -t omni -- -k lss_size,lss_size_end,rsr_size,rsr_size_end OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET : demo LSS_SIZE=16384 LSS_SIZE_END=2679048 RSR_SIZE=87380 RSR_SIZE_END=4194304 raj@tardy:~/netperf2_trunk$ src/netperf -t omni -- -k lss_size,lss_size_end,rsr_size,rsr_size_end -T udp -m 1024 OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET : demo LSS_SIZE=126976 LSS_SIZE_END=126976 RSR_SIZE=126976 RSR_SIZE_END=126976 I believe that net.core.[rw]mem_max are the upper limits (modulo the 2X?) applied when making explicit setsockopt() calls: raj@tardy:~/netperf2_trunk$ sysctl net.core.rmem_max net.core.rmem_max = 131071 raj@tardy:~/netperf2_trunk$ sysctl net.core.wmem_max net.core.wmem_max = 131071 raj@tardy:~/netperf2_trunk$ src/netperf -t omni -- -k lss_size,lss_size_end,rsr_size,rsr_size_end -T udp -m 1024 -s 1M -S 1M OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET : demo LSS_SIZE=262142 LSS_SIZE_END=262142 RSR_SIZE=262142 RSR_SIZE_END=262142 raj@tardy:~/netperf2_trunk$ src/netperf -t omni -- -k lss_size,lss_size_end,rsr_size,rsr_size_end -s 1M -S 1M OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET : demo LSS_SIZE=262142 LSS_SIZE_END=262142 RSR_SIZE=262142 RSR_SIZE_END=262142 When though one asks for single-byte socket buffers: raj@tardy:~/netperf2_trunk$ src/netperf -t omni -- -k lss_size,lss_size_end,rsr_size,rsr_size_end -s 1 -S 1 OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain (127.0.0.1) port 0 AF_INET : demo LSS_SIZE=2048 LSS_SIZE_END=2048 RSR_SIZE=256 RSR_SIZE_END=256 One gets values that at face value don't seem to be related to sysctl settings. Although perhaps the receive socket size comes from the min mss: raj@tardy:~/netperf2_trunk$ sysctl -a | grep 256 error: permission denied on key 'kernel.cad_pid' error: permission denied on key 'fs.binfmt_misc.register' vm.lowmem_reserve_ratio = 256 256 32 fs.mqueue.queues_max = 256 error: permission denied on key 'net.ipv4.route.flush' net.ipv4.route.min_adv_mss = 256 error: permission denied on key 'net.ipv6.route.flush' raj@tardy:~/netperf2_trunk$ sysctl -a | grep 2048 error: permission denied on key 'kernel.cad_pid' error: permission denied on key 'fs.binfmt_misc.register' error: permission denied on key 'net.ipv4.route.flush' net.core.optmem_max = 20480 net.ipv4.route.redirect_silence = 2048 net.ipv4.tcp_max_syn_backlog = 2048 net.ipv6.xfrm6_gc_thresh = 2048 error: permission denied on key 'net.ipv6.route.flush' > IMHO, _if_ a programmer modifies the send or receive buffer he _knows_ exactly > why. I admire your optimism - particularly in the face of all the 10GbE NIC vendors' suggestions that everyone use 16 MB socket buffers (or at least set the auto tuning limits to 16 MB). > If he does not modify the buffer it is fine too, because _we_ tune the > buffers as good as we can - and we are good in this. The "bloat" folks might disagree :) > But, the backlog is different. Often the programmer does _not_ know how to > tune this variable. And, often the backlog depends on the target system, on > the network characteristic and the like. As do the settings for socket buffer sizes. So, how is it that the programmer is educated and intelligent enough to set a minimum socket buffer size but not a minimum listen queue backlog? > Therefore we provide the system administrator the _ability_ to tune the actual > backlog. And, perhaps, do something that flies in the face of what the programmer was trying to do, by limiting how many connections could be queued and so changing the behaviour for the N+1st connection attempt while service was backlogged. It really is a rather existential "Who's right? The Programmer or the Administrator" question. And perhaps my asking if there should be a (possibly) foolish consistency. rick jones ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-25 23:51 ` Hagen Paul Pfeifer 2011-03-26 0:21 ` Rick Jones @ 2011-03-26 7:06 ` Eric Dumazet 1 sibling, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2011-03-26 7:06 UTC (permalink / raw) To: Hagen Paul Pfeifer; +Cc: Rick Jones, netdev Le samedi 26 mars 2011 à 00:51 +0100, Hagen Paul Pfeifer a écrit : > IMHO, _if_ a programmer modifies the send or receive buffer he _knows_ exactly > why. If he does not modify the buffer it is fine too, because _we_ tune the > buffers as good as we can - and we are good in this. > > But, the backlog is different. Often the programmer does _not_ know how to > tune this variable. And, often the backlog depends on the target system, on > the network characteristic and the like. > > Therefore we provide the system administrator the _ability_ to tune the actual > backlog. What you want to tune is not the backlog (max number of ready to be delivered connections to accept()), but the number of SYN_RECV half connections, still waiting for a second packet coming from clients. An application might really want to have a listen(fd, 1) to accept one incoming connection, but still be able to survive to a SYNFLOOD. By the way, you still are confused by the fact that tcp_max_syn_backlog has nothing to do with the 'backlog', as I already mentioned it, its a parameter to cap the size of the hash table associated to a listener socket. You can have a hash table with 1024 slots, and still have a backlog of 16384 for example. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] socket: increase default maximum listen queue length 2011-03-25 18:31 [PATCH 1/2] socket: increase default maximum listen queue length Hagen Paul Pfeifer 2011-03-25 18:31 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer @ 2011-03-31 5:52 ` David Miller 1 sibling, 0 replies; 9+ messages in thread From: David Miller @ 2011-03-31 5:52 UTC (permalink / raw) To: hagen; +Cc: netdev, eric.dumazet From: Hagen Paul Pfeifer <hagen@jauu.net> Date: Fri, 25 Mar 2011 19:31:38 +0100 > sysctl_max_syn_backlog on the other hand provides similar functionality: > provides a system wide upper limit of request sockets per listen socket. > But sysctl_max_syn_backlog provides a more accurate value by considerate > the actual memory situation of the system. 256 by default, 128 for > systems with low memory and up to 1024 for larger systems. sysctl_max_syn_backlog is not "256 by default", the calculation is: cnt = tcp_hashinfo.ehash_mask + 1; ... sysctl_max_syn_backlog = max(128, cnt / 256); And I think we should leave the SOMAXCONN define alone (it's a historic relic for userspace, and any sane server passes something like ~0 into listen()). And calculate the somaxconn sysctl at run time using a similar formula to that used by sysctl_max_syn_backlog. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] socket: increase default maximum listen queue length @ 2011-03-20 12:14 Hagen Paul Pfeifer 2011-03-20 23:04 ` [PATCH 1/2] " Hagen Paul Pfeifer 0 siblings, 1 reply; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-20 12:14 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev * Eric Dumazet | 2011-03-20 12:55:44 [+0100]: >I am not sure you understood what I said. > >Even if you change kernel limits, many applications still use low >limits : listen(fd, 8) Right, but there is a discrepance between system administrators and server authors: the later group will probably notice that listen(fd, 8) is not adequate (e.g. someone send a bug report). System administrators on the other hand have no obvious indicator that some goes wrong in the system. Most of then would not even notice that the backlog is overflowing. >I remember some other OS (was it HPUX or Solaris...) had a minimum >limit : Even if application said 8, an admin could impose a 256 value >for example. Not the baddest idea! It is nice that a server author can adjust that value. But between you and me: the system administrator may have more information about the network behavior (how many incoming connections/minute, RTT, memory characteristic, ...). The system administrator should be in the ability to increase the value, currently he is stucked up if the server author missed that. E.g. http://www.dovecot.org/list/dovecot-cvs/2009-September/014567.html I will spin a patch for that. Hagen ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] socket: increase default maximum listen queue length 2011-03-20 12:14 [PATCH] " Hagen Paul Pfeifer @ 2011-03-20 23:04 ` Hagen Paul Pfeifer 2011-03-20 23:04 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer 0 siblings, 1 reply; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-20 23:04 UTC (permalink / raw) To: netdev; +Cc: Hagen Paul Pfeifer, Eric Dumazet sysctl_somaxconn (SOMAXCONN: 128) specifies the maximum number of sockets in state SYN_RECV per listen socket queue. At listen(2) time the backlog is adjusted to this limit if bigger then that. Afterwards in reqsk_queue_alloc() the backlog value is checked again (nr_table_entries == backlog): nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); nr_table_entries = max_t(u32, nr_table_entries, 8); nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); sysctl_max_syn_backlog on the other hand is dynamically adjusted, depending on the memory characteristic of the system. Default is 256, 128 for small systems and up to 1024 for bigger systems. For real server work the defacto sysctl_somaxconn limit seems inadequate: Experiments with real servers show, that it is absolutely not enough even at 100conn/sec. 256 cures most of problems. Increase default sysctl_somaxconn from 128 to 256 to meet todays condition by simultaneously limit nr_table_entries by sysctl_max_syn_backlog which is based on memory condition (max(128, (tcp_hashinfo.ehash_mask + 1 / 256)). Signed_off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> --- include/linux/socket.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index edbb1d0..bf35ce2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -237,7 +237,7 @@ struct ucred { #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ -#define SOMAXCONN 128 +#define SOMAXCONN 256 /* Flags we can use with send/ and recv. Added those for 1003.1g not all are supported yet -- 1.7.4.1.57.g0466.dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-20 23:04 ` [PATCH 1/2] " Hagen Paul Pfeifer @ 2011-03-20 23:04 ` Hagen Paul Pfeifer 2011-03-21 7:36 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Hagen Paul Pfeifer @ 2011-03-20 23:04 UTC (permalink / raw) To: netdev; +Cc: Hagen Paul Pfeifer In the case that a server programmer misjudge network characteristic the backlog parameter for listen(2) may not adequate to utilize hosts capabilities and lead to unrequired SYN retransmission - a small backlog value can form an artificial limitation. From Erics server setup, a listen queue length of 8 is often a way to small): ss -a | head State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8 *:imaps *:* LISTEN 0 8 *:pop3s *:* LISTEN 0 50 *:mysql *:* LISTEN 0 8 *:pop3 *:* LISTEN 0 8 *:imap2 *:* LISTEN 0 511 *:www *:* Until now it is not possible for the system (network) administrator to increase this value. A bug report must be filled, the backlog increased, a new version released or even worse: if using closed source software you cannot make anything. sysctl_min_syn_backlog provides the ability to increase the minimum queue length. The default is 8. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> --- I will spin a second documentation patch if Davem accept this patch. --- include/net/request_sock.h | 1 + net/core/request_sock.c | 5 ++++- net/ipv4/sysctl_net_ipv4.c | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 99e6e19..3e8865f 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -89,6 +89,7 @@ static inline void reqsk_free(struct request_sock *req) } extern int sysctl_max_syn_backlog; +extern int sysctl_min_syn_backlog; /** struct listen_sock - listen state * diff --git a/net/core/request_sock.c b/net/core/request_sock.c index 182236b..e937e9c 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -35,6 +35,9 @@ int sysctl_max_syn_backlog = 256; EXPORT_SYMBOL(sysctl_max_syn_backlog); +int sysctl_min_syn_backlog = 8; +EXPORT_SYMBOL(sysctl_min_syn_backlog); + int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries) { @@ -42,7 +45,7 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, struct listen_sock *lopt; nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); - nr_table_entries = max_t(u32, nr_table_entries, 8); + nr_table_entries = max_t(u32, nr_table_entries, sysctl_min_syn_backlog); nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); lopt_size += nr_table_entries * sizeof(struct request_sock *); if (lopt_size > PAGE_SIZE) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 1a45665..cc03c62 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -298,6 +298,13 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_min_syn_backlog", + .data = &sysctl_min_syn_backlog, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, + { .procname = "ip_local_port_range", .data = &sysctl_local_ports.range, .maxlen = sizeof(sysctl_local_ports.range), -- 1.7.4.1.57.g0466.dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] socket: add minimum listen queue length sysctl 2011-03-20 23:04 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer @ 2011-03-21 7:36 ` Eric Dumazet 0 siblings, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2011-03-21 7:36 UTC (permalink / raw) To: Hagen Paul Pfeifer; +Cc: netdev Le lundi 21 mars 2011 à 00:04 +0100, Hagen Paul Pfeifer a écrit : > In the case that a server programmer misjudge network characteristic the > backlog parameter for listen(2) may not adequate to utilize hosts > capabilities and lead to unrequired SYN retransmission - a small backlog > value can form an artificial limitation. From Erics server setup, a > listen queue length of 8 is often a way to small): > > ss -a | head > State Recv-Q Send-Q Local Address:Port Peer > Address:Port > LISTEN 0 8 *:imaps *:* > LISTEN 0 8 *:pop3s *:* > LISTEN 0 50 *:mysql *:* > LISTEN 0 8 *:pop3 *:* > LISTEN 0 8 *:imap2 *:* > LISTEN 0 511 *:www *:* > > Until now it is not possible for the system (network) administrator to > increase this value. A bug report must be filled, the backlog increased, > a new version released or even worse: if using closed source software > you cannot make anything. > > sysctl_min_syn_backlog provides the ability to increase the minimum > queue length. The default is 8. > > Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> > > --- > I will spin a second documentation patch if Davem accept this patch. > --- > include/net/request_sock.h | 1 + > net/core/request_sock.c | 5 ++++- > net/ipv4/sysctl_net_ipv4.c | 7 +++++++ > 3 files changed, 12 insertions(+), 1 deletions(-) > > diff --git a/include/net/request_sock.h b/include/net/request_sock.h > index 99e6e19..3e8865f 100644 > --- a/include/net/request_sock.h > +++ b/include/net/request_sock.h > @@ -89,6 +89,7 @@ static inline void reqsk_free(struct request_sock *req) > } > > extern int sysctl_max_syn_backlog; > +extern int sysctl_min_syn_backlog; > > /** struct listen_sock - listen state > * > diff --git a/net/core/request_sock.c b/net/core/request_sock.c > index 182236b..e937e9c 100644 > --- a/net/core/request_sock.c > +++ b/net/core/request_sock.c > @@ -35,6 +35,9 @@ > int sysctl_max_syn_backlog = 256; > EXPORT_SYMBOL(sysctl_max_syn_backlog); > > +int sysctl_min_syn_backlog = 8; > +EXPORT_SYMBOL(sysctl_min_syn_backlog); > + > int reqsk_queue_alloc(struct request_sock_queue *queue, > unsigned int nr_table_entries) > { > @@ -42,7 +45,7 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, > struct listen_sock *lopt; > > nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); > - nr_table_entries = max_t(u32, nr_table_entries, 8); > + nr_table_entries = max_t(u32, nr_table_entries, sysctl_min_syn_backlog); > nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); > lopt_size += nr_table_entries * sizeof(struct request_sock *); > if (lopt_size > PAGE_SIZE) I believe you are mistaken. The code you change is the code sizing the hash table, not sk->sk_max_ack_backlog This only matters if one application is able to change its listen backlog during its lifetime. Say, it begins with : listen(fd, 1); Then, a bit later : listen(fd, 8192); This certainly is very unlikely... With current kernel, it does change the maximum SYN_RECV sockets in flight, but hash table is not resized and stay with 8 slots, so performance might be suboptimal, since chains are going to hold 1024 elements. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-03-31 5:53 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-25 18:31 [PATCH 1/2] socket: increase default maximum listen queue length Hagen Paul Pfeifer 2011-03-25 18:31 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer 2011-03-25 20:24 ` Rick Jones 2011-03-25 23:51 ` Hagen Paul Pfeifer 2011-03-26 0:21 ` Rick Jones 2011-03-26 7:06 ` Eric Dumazet 2011-03-31 5:52 ` [PATCH 1/2] socket: increase default maximum listen queue length David Miller -- strict thread matches above, loose matches on Subject: below -- 2011-03-20 12:14 [PATCH] " Hagen Paul Pfeifer 2011-03-20 23:04 ` [PATCH 1/2] " Hagen Paul Pfeifer 2011-03-20 23:04 ` [PATCH 2/2] socket: add minimum listen queue length sysctl Hagen Paul Pfeifer 2011-03-21 7:36 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).