From: "George B." <georgeb@gmail.com>
To: Daniel Baluta <daniel.baluta@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Gaspar Chilingarov <gasparch@gmail.com>,
netdev <netdev@vger.kernel.org>
Subject: Re: 'tcp: bind() fix when many ports are bound' problem
Date: Wed, 27 Apr 2011 10:36:45 -0700 [thread overview]
Message-ID: <BANLkTik5-G2E+SKX2anv2t5+gJzd2WEp3Q@mail.gmail.com> (raw)
In-Reply-To: <AANLkTimuNM7vmeR89WKXtrUbJO1Wt1ivp9y=bQvtWg0j@mail.gmail.com>
On Wed, Jan 5, 2011 at 1:00 AM, Daniel Baluta <daniel.baluta@gmail.com> wrote:
>
> On Tue, Jan 4, 2011 at 1:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mardi 04 janvier 2011 à 13:12 +0400, Gaspar Chilingarov a écrit :
> >> Hi there!
> >>
> >> Well, that looks strange.
> >>
> >> On my own side I've just put workaround (manually binding to all ports
> >> in sequence :)
> >> and moved production code to FreeBSD as it has better scalable network stack.
> >>
> >> I can see the potential problem with that bind() problem on highly
> >> loaded DNS servers/resolvers which establish tons of outgoing UDP
> >> connections.
> >>
> >> In some cases that connections could fail and as not receiving the
> >> answer it is normal condition for DNS this will go totally unnoticed.
> >>
> >> I don't think anyone will hit this bug in production environment
> >> except the very high load applications.
> >
> > Dont mix TCP and UDP, they are not the same.
> >
> > Problem with TCP is you can have TIME_WAIT sockets, disallowing a port
> > to be reused. Not with UDP.
>
> Isn't SO_REUSEADDR supposed to fix this problem?
>
> Anyhow, inet_csk_get_port the function used by bind(0) performs bad.
>
> Short reminder:
> 100 IP addr - 70K sockets created => 700 sockets per IP.
> bind(0) for all sockets will results in a large number of
> (port, addr) duplicates although there are more than 30000 ports available.
>
> This wouldn't be so bad if you won't try to connect duplicates to the same
> remote (addr, port) which will result in a connect failure.
>
> Eric's patch introduced the restriction 'forbid two reuse enabled sockets
> to bind on same (addr, port) tuple with a (non ANY addr)'.
>
> Well, I think this will break rule #2 from inet_hastable.h:
> "
> If all sockets have sk->sk_reuse set, and none of them are in
> TCP_LISTEN state, the port may be shared.
> "
> and also caused problem ([1]), don't really know if they are the same.
>
> An attempt, to fix this was to "always allow a reuse listen if
> no other listen is already active on the same IP".
>
> The problem with this fix, is that at the moment of bind() we
> don't know what will be the usage of this socket. It can be,
> bind -> connect or bind -> listen.
> >
> > The connect() [without a previous bind()], or a sendto() [without a
> > previous bind()] problem is more an API problem.
>
> Can you share your thoughts on this?
>
> Going back to my first email, are there any follow ups on your
> "tcp: bind() fix when many ports are bound" patch. I've searched
> netdev archives but no luck. I might have missed something.
>
> I really appreciate your help.
>
> thanks,
> Daniel.
> --
This is also causing a problem for me in a very high load application
where more than 64K sockets are being sourced from multiple IP
addresses.
I, too, would like to know if this has been followed up on. The old
patch that was reverted was actually working well for us, we never
actually hit the TIME_WAIT problem but we are hitting the problem of
not being able to source a connection from an IP when the global
number of connections is >64K or so.
prev parent reply other threads:[~2011-04-27 17:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-04 8:53 'tcp: bind() fix when many ports are bound' problem Daniel Baluta
2011-01-04 9:12 ` Gaspar Chilingarov
2011-01-04 11:22 ` Eric Dumazet
2011-01-05 9:00 ` Daniel Baluta
2011-01-11 11:14 ` [PATCH] tcp: disallow bind() to reuse addr/port Eric Dumazet
2011-01-11 13:04 ` Daniel Baluta
2011-01-11 22:03 ` David Miller
2011-04-27 17:37 ` George B.
2011-04-27 17:40 ` Eric Dumazet
2011-04-27 17:54 ` George B.
2011-04-27 18:02 ` Eric Dumazet
2011-04-27 18:45 ` George B.
2011-04-27 17:36 ` George B. [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTik5-G2E+SKX2anv2t5+gJzd2WEp3Q@mail.gmail.com \
--to=georgeb@gmail.com \
--cc=daniel.baluta@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=gasparch@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).