SOMAXCONN too low

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* SOMAXCONN too low
@ 2003-10-29  6:58 David Mosberger
  2003-10-29 12:33 ` Andi Kleen
  2003-10-29 18:58 ` David S. Miller
  0 siblings, 2 replies; 10+ messages in thread
From: David Mosberger @ 2003-10-29  6:58 UTC (permalink / raw)
  To: netdev

I was a bit surprised to find that listen() is still capping at the
hard limit of SOMAXCONN (which is still 128).  This is ridiculously
low for high-performance servers.  Today's severs can do easily in
excess of 10,000 TCP connections/second so a queue length of 128
corresponds to just about 10ms, so it doesn't take much of bad
scheduling etc. to overflow the queue.

You obviously want some control over how big the listen queue can
grow, but it seems to me that a sysctl would be in place.  I found
this patch to do that, but no reaction to it:

  http://www.ussg.iu.edu/hypermail/linux/kernel/0205.0/1287.html

I also found this message:

  http://marc.theaimsgroup.com/?l=linux-net&m=98745977620384&w=2

but the argument makes little sense, because TUX bypasses sys_listen()
alltogether and therefore can set the listen queue length to anything
it wants.  In fact, tux2 defaults to a listen-queue size of 2048, so
if anything it is an argument _for_ increasing the max. listen-queue
size.

Also, it appears that current SuSE kernels do indeed have a
net.core.somaxconn sysctl to let a sysadmin choose a larger SOMAXCONN
value.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29  6:58 SOMAXCONN too low David Mosberger
@ 2003-10-29 12:33 ` Andi Kleen
  2003-10-29 17:13   ` David Mosberger
  2003-10-29 18:58 ` David S. Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2003-10-29 12:33 UTC (permalink / raw)
  To: davidm; +Cc: davidm, netdev

On Tue, 28 Oct 2003 22:58:00 -0800
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> Also, it appears that current SuSE kernels do indeed have a
> net.core.somaxconn sysctl to let a sysadmin choose a larger SOMAXCONN
> value.

Yes I did that patch some time ago for some server who needed it. If there is interest 
I can submit it  for 2.6. But I'm not sure it fits the "important bug fixes only" rule.

Another alternative would be to make it a fraction of the listen() argument per socket
(e.g. min(tcp_max_syn_backlog,min(128,10%*listenarg))) to allow the application 
to easily change it.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 12:33 ` Andi Kleen
@ 2003-10-29 17:13   ` David Mosberger
  2003-10-29 17:22     ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: David Mosberger @ 2003-10-29 17:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: davidm, netdev

>>>>> On Wed, 29 Oct 2003 13:33:15 +0100, Andi Kleen <ak@suse.de> said:

  Andi> On Tue, 28 Oct 2003 22:58:00 -0800
  Andi> David Mosberger <davidm@napali.hpl.hp.com> wrote:

  >> Also, it appears that current SuSE kernels do indeed have a
  >> net.core.somaxconn sysctl to let a sysadmin choose a larger SOMAXCONN
  >> value.

  Andi> Yes I did that patch some time ago for some server who needed
  Andi> it. If there is interest I can submit it for 2.6. But I'm not
  Andi> sure it fits the "important bug fixes only" rule.

Yes, I think the patch should be (re-)submitted.  It certainly fixes a
performance bug.  At the moment, in-kernel servers are at an unfair
advantage over user-space servers for this reason.

In my opinion, SOMAXCONN should also be bumped, but that's less
critical if there is a sysctl to override the default value (plus I
only have a hand-wavy argument as to what SOMAXCONN should be...).

Another argument _for_ including the patch in 2.6.0 is so that we
don't have to add a user-visible change in the middle of 2.6 (e.g.,
simplifies documentation etc).  Plus the patch is trivial.

  Andi> Another alternative would be to make it a fraction of the
  Andi> listen() argument per socket
  Andi> (e.g. min(tcp_max_syn_backlog,min(128,10%*listenarg))) to
  Andi> allow the application to easily change it.

I don't understand what purpose this would serve.  Seems to me it
would make life only more complicated for apps that know what they're
doing.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 17:13   ` David Mosberger
@ 2003-10-29 17:22     ` David S. Miller
  2003-10-29 18:08       ` David Mosberger
  0 siblings, 1 reply; 10+ messages in thread
From: David S. Miller @ 2003-10-29 17:22 UTC (permalink / raw)
  To: davidm; +Cc: davidm, ak, netdev

On Wed, 29 Oct 2003 09:13:44 -0800
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> >>>>> On Wed, 29 Oct 2003 13:33:15 +0100, Andi Kleen <ak@suse.de> said:
> 
>   Andi> Another alternative would be to make it a fraction of the
>   Andi> listen() argument per socket
>   Andi> (e.g. min(tcp_max_syn_backlog,min(128,10%*listenarg))) to
>   Andi> allow the application to easily change it.
> 
> I don't understand what purpose this would serve.  Seems to me it
> would make life only more complicated for apps that know what they're
> doing.

Andi's saying the the max backlog should be a function of
the queue length the user asks for when he makes the listen()
system call.

Also note that we'll need to tweak the TCP listening socket SYNQ hash
table size if we modify these kinds of things.

> At the moment, in-kernel servers are at an unfair advantage over
> user-space servers for this reason.

I totally disagree, the only reason things like TuX perform better
than their userland counterparts and don't run into SOMAXCONN issues
is because it is threaded properly.  This is where all of the "jitter"
stuff you keep talking about really comes from.

When I've asked in the past to see code for userland servers that have
a problem wrt. SOMAXCONN, it's always the case that the userland
server only lets one of it's threads take new connections via
accept().  It's no wonder they have problems....

This is why I'm usually big time against changing this value, if we
change it we may be less likely to discover userland server stupidity
like that I mentioned above.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 17:22     ` David S. Miller
@ 2003-10-29 18:08       ` David Mosberger
  2003-10-29 18:43         ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: David Mosberger @ 2003-10-29 18:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: davidm, ak, netdev

>>>>> On Wed, 29 Oct 2003 09:22:20 -0800, "David S. Miller" <davem@redhat.com> said:

  DaveM> On Wed, 29 Oct 2003 09:13:44 -0800 David Mosberger
  DaveM> <davidm@napali.hpl.hp.com> wrote:

  >> >>>>> On Wed, 29 Oct 2003 13:33:15 +0100, Andi Kleen <ak@suse.de>
  >> said:

  Andi> Another alternative would be to make it a fraction of the
  Andi> listen() argument per socket
  Andi> (e.g. min(tcp_max_syn_backlog,min(128,10%*listenarg))) to
  Andi> allow the application to easily change it.
  >>  I don't understand what purpose this would serve.  Seems to me
  >> it would make life only more complicated for apps that know what
  >> they're doing.

  DaveM> Andi's saying the the max backlog should be a function of the
  DaveM> queue length the user asks for when he makes the listen()
  DaveM> system call.

Sure, but I just don't see the point of doing that.

  DaveM> Also note that we'll need to tweak the TCP listening socket
  DaveM> SYNQ hash table size if we modify these kinds of things.

Perhaps, but is this really a first-order effect?  Since it won't
affect user-level, perhaps that could be done a bit later?  (In the
interest of minimizing 2.6.0 patch, I mean.)

  >> At the moment, in-kernel servers are at an unfair advantage over
  >> user-space servers for this reason.

  DaveM> I totally disagree, the only reason things like TuX perform
  DaveM> better than their userland counterparts and don't run into
  DaveM> SOMAXCONN issues is because it is threaded properly.  This is
  DaveM> where all of the "jitter" stuff you keep talking about really
  DaveM> comes from.

We noticed this problem with a server that uses one thread per CPU
(pinned).  Why don't you run tux with the "backlog" parameter set to
128 and see what happens under heavy load?

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 18:08       ` David Mosberger
@ 2003-10-29 18:43         ` David S. Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David S. Miller @ 2003-10-29 18:43 UTC (permalink / raw)
  To: davidm; +Cc: davidm, ak, netdev

On Wed, 29 Oct 2003 10:08:25 -0800
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> We noticed this problem with a server that uses one thread per CPU
> (pinned).  Why don't you run tux with the "backlog" parameter set to
> 128 and see what happens under heavy load?

Then TuX could be improved too, what can I say?  If the thread taking
in new connections does anything more involved than:

	while (1) {
		fd = accept(listen_fd);
		thr = pick_service_thread();
		spin_lock(new_conn_queue[thr]);
		append(fd, new_conn_queue[thr]);
		spin_unlock(new_conn_queue[thr]);
		wake(thr);
	}

it's broken.  I severly doubt that anyone can show that, when using
the above scheme, their multi-GHZ cpu cannot handle whatever
connection load you put on the system.

The fact that people have written web servers that outperform
TuX and handle the load better is something else to think about.
They existing within the SOMAXCONN limits.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29  6:58 SOMAXCONN too low David Mosberger
  2003-10-29 12:33 ` Andi Kleen
@ 2003-10-29 18:58 ` David S. Miller
  2003-10-29 19:15   ` David Mosberger
  2003-10-29 19:47   ` Andi Kleen
  1 sibling, 2 replies; 10+ messages in thread
From: David S. Miller @ 2003-10-29 18:58 UTC (permalink / raw)
  To: davidm; +Cc: davidm, netdev

On Tue, 28 Oct 2003 22:58:00 -0800
David Mosberger <davidm@napali.hpl.hp.com> wrote:

> You obviously want some control over how big the listen queue can
> grow, but it seems to me that a sysctl would be in place.  I found
> this patch to do that, but no reaction to it:
> 
>   http://www.ussg.iu.edu/hypermail/linux/kernel/0205.0/1287.html

I think I'm going to apply this patch.

People can then set the limit to what they want, the default
stays at 128, and the SOMAXCONN define itself does not change.

Ok David?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 18:58 ` David S. Miller
@ 2003-10-29 19:15   ` David Mosberger
  2003-10-29 19:47   ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: David Mosberger @ 2003-10-29 19:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: davidm, netdev

>>>>> On Wed, 29 Oct 2003 10:58:09 -0800, "David S. Miller" <davem@redhat.com> said:

  DaveM> On Tue, 28 Oct 2003 22:58:00 -0800 David Mosberger
  DaveM> <davidm@napali.hpl.hp.com> wrote:

  >> You obviously want some control over how big the listen queue can
  >> grow, but it seems to me that a sysctl would be in place.  I
  >> found this patch to do that, but no reaction to it:

  >> http://www.ussg.iu.edu/hypermail/linux/kernel/0205.0/1287.html

  DaveM> I think I'm going to apply this patch.

  DaveM> People can then set the limit to what they want, the default
  DaveM> stays at 128, and the SOMAXCONN define itself does not
  DaveM> change.

  DaveM> Ok David?

Yup.

Thanks!

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 18:58 ` David S. Miller
  2003-10-29 19:15   ` David Mosberger
@ 2003-10-29 19:47   ` Andi Kleen
  2003-10-29 19:45     ` David S. Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2003-10-29 19:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: davidm, davidm, netdev

On Wed, 29 Oct 2003 10:58:09 -0800
"David S. Miller" <davem@redhat.com> wrote:

> On Tue, 28 Oct 2003 22:58:00 -0800
> David Mosberger <davidm@napali.hpl.hp.com> wrote:
> 
> > You obviously want some control over how big the listen queue can
> > grow, but it seems to me that a sysctl would be in place.  I found
> > this patch to do that, but no reaction to it:
> > 
> >   http://www.ussg.iu.edu/hypermail/linux/kernel/0205.0/1287.html
> 
> I think I'm going to apply this patch.
> 
> People can then set the limit to what they want, the default
> stays at 128, and the SOMAXCONN define itself does not change.
> 
> Ok David?

Can I respectfully ask to name the sysctl net/core/somaxconn  ? 

That is the name used in the SuSE/UL kernels and has been shipping for some time
and already has a big user base, and there is no reason to break compatibility
for them.

Thanks,

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SOMAXCONN too low
  2003-10-29 19:47   ` Andi Kleen
@ 2003-10-29 19:45     ` David S. Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David S. Miller @ 2003-10-29 19:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: davidm, davidm, netdev

On Wed, 29 Oct 2003 20:47:07 +0100
Andi Kleen <ak@suse.de> wrote:

> Can I respectfully ask to name the sysctl net/core/somaxconn  ? 

Sure, no problem.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-10-29 19:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-29  6:58 SOMAXCONN too low David Mosberger
2003-10-29 12:33 ` Andi Kleen
2003-10-29 17:13   ` David Mosberger
2003-10-29 17:22     ` David S. Miller
2003-10-29 18:08       ` David Mosberger
2003-10-29 18:43         ` David S. Miller
2003-10-29 18:58 ` David S. Miller
2003-10-29 19:15   ` David Mosberger
2003-10-29 19:47   ` Andi Kleen
2003-10-29 19:45     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).