* tcp_tw_recycle broken?
@ 2008-11-15 4:37 Karl Pickett
2008-11-15 5:57 ` Willy Tarreau
0 siblings, 1 reply; 7+ messages in thread
From: Karl Pickett @ 2008-11-15 4:37 UTC (permalink / raw)
To: linux-kernel, netdev
Hey. Developing a http proxy on fedora 9 (2.6.25) and running into a
strange issue.
Having the proxy set up and tear down 6000 tcp connections a second to
the same test server ip and port,
it quickly blows up (5 seconds) due to all 30000 ephemeral ports going
to TIME_WAIT.
setting tw_recycle=1 fixed the problem, and there are never more than
a couple hundred ports in TIME_WAIT.
BUT...
Changing the load test to alternate between two test server ips, it
blows up. Connect: can't assign requested address. (note I am not
binding before hand, I tried
and binding first to port 0 made no difference - it just blows up then
during the bind).
And there are ~28K ports in TIME_WAIT. For example:
proxy_ip:30000 load_test_1:8080 TIME_WAIT
proxy_ip:30000 load_test_2:8080 TIME_WAIT
...
but most are not duplicates of the same local port.
What. The. Heck.
So short of rebuilding the kernel with time_wait as 1 second, is there
any other way not to brick my proxy?
--
Karl Pickett
--
Karl Pickett
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
2008-11-15 4:37 tcp_tw_recycle broken? Karl Pickett
@ 2008-11-15 5:57 ` Willy Tarreau
2008-11-15 7:29 ` Karl Pickett
[not found] ` <f9be06770811142325j79ca0831j7d5820716199811@mail.gmail.com>
0 siblings, 2 replies; 7+ messages in thread
From: Willy Tarreau @ 2008-11-15 5:57 UTC (permalink / raw)
To: Karl Pickett; +Cc: linux-kernel, netdev
On Fri, Nov 14, 2008 at 11:37:06PM -0500, Karl Pickett wrote:
> Hey. Developing a http proxy on fedora 9 (2.6.25) and running into a
> strange issue.
>
> Having the proxy set up and tear down 6000 tcp connections a second to
> the same test server ip and port,
> it quickly blows up (5 seconds) due to all 30000 ephemeral ports going
> to TIME_WAIT.
> setting tw_recycle=1 fixed the problem, and there are never more than
> a couple hundred ports in TIME_WAIT.
>
> BUT...
>
> Changing the load test to alternate between two test server ips, it
> blows up. Connect: can't assign requested address. (note I am not
> binding before hand, I tried
> and binding first to port 0 made no difference - it just blows up then
> during the bind).
>
> And there are ~28K ports in TIME_WAIT. For example:
>
> proxy_ip:30000 load_test_1:8080 TIME_WAIT
> proxy_ip:30000 load_test_2:8080 TIME_WAIT
> ...
> but most are not duplicates of the same local port.
>
>
> What. The. Heck.
>
> So short of rebuilding the kernel with time_wait as 1 second, is there
> any other way not to brick my proxy?
two things :
- set tcp_tw_reuse to 1 too.
- do a setsockopt(SO_REUSEADDR) before connect()
Using this, my proxy has no problem at 35K sess/s on 2.6.25. I'm not sure
if disabling either option above still works.
Hoping this helps,
Willy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
2008-11-15 5:57 ` Willy Tarreau
@ 2008-11-15 7:29 ` Karl Pickett
2008-11-15 13:09 ` Andi Kleen
[not found] ` <f9be06770811142325j79ca0831j7d5820716199811@mail.gmail.com>
1 sibling, 1 reply; 7+ messages in thread
From: Karl Pickett @ 2008-11-15 7:29 UTC (permalink / raw)
To: linux-kernel, netdev
On Sat, Nov 15, 2008 at 12:57 AM, Willy Tarreau <w@1wt.eu> wrote:
> On Fri, Nov 14, 2008 at 11:37:06PM -0500, Karl Pickett wrote:
>> Hey. Developing a http proxy on fedora 9 (2.6.25) and running into a
>> strange issue.
>>
>> Having the proxy set up and tear down 6000 tcp connections a second to
>> the same test server ip and port,
>> it quickly blows up (5 seconds) due to all 30000 ephemeral ports going
>> to TIME_WAIT.
>> setting tw_recycle=1 fixed the problem, and there are never more than
>> a couple hundred ports in TIME_WAIT.
>>
>> BUT...
>>
>> Changing the load test to alternate between two test server ips, it
>> blows up. Connect: can't assign requested address. (note I am not
>> binding before hand, I tried
>> and binding first to port 0 made no difference - it just blows up then
>> during the bind).
>>
>> And there are ~28K ports in TIME_WAIT. For example:
>>
>> proxy_ip:30000 load_test_1:8080 TIME_WAIT
>> proxy_ip:30000 load_test_2:8080 TIME_WAIT
>> ...
>> but most are not duplicates of the same local port.
>>
>>
>> What. The. Heck.
>>
>> So short of rebuilding the kernel with time_wait as 1 second, is there
>> any other way not to brick my proxy?
>
> two things :
> - set tcp_tw_reuse to 1 too.
> - do a setsockopt(SO_REUSEADDR) before connect()
>
> Using this, my proxy has no problem at 35K sess/s on 2.6.25. I'm not sure
> if disabling either option above still works.
>
> Hoping this helps,
> Willy
>
>
Thanks for the help. Well, it looks like tw_reuse is what I wanted...
not tw_recycle. Based on a python test program over loopback,
tw_reuse alone solves the problem... so_reuseaddr doesn't do anything.
And apparently the tcp code is too much for me...looking at the
source I thought tw_reuse only can happen when timestamps are enabled.
But even after disabling timestamps tw_reuse still works over
loopback.
I'll have to wait until Monday to try it again in the lab. I was
trying combinations of tw_reuse and recycle, too many to remember
apparently.
May I just confirm.. is tcp_tw_reuse NOT dependent on receiving timestamps?
--
Karl Pickett
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
[not found] ` <f9be06770811142325j79ca0831j7d5820716199811@mail.gmail.com>
@ 2008-11-15 7:45 ` Willy Tarreau
0 siblings, 0 replies; 7+ messages in thread
From: Willy Tarreau @ 2008-11-15 7:45 UTC (permalink / raw)
To: Karl Pickett; +Cc: linux-kernel, netdev
On Sat, Nov 15, 2008 at 02:25:52AM -0500, Karl Pickett wrote:
> On Sat, Nov 15, 2008 at 12:57 AM, Willy Tarreau <w@1wt.eu> wrote:
>
> > On Fri, Nov 14, 2008 at 11:37:06PM -0500, Karl Pickett wrote:
> > > Hey. Developing a http proxy on fedora 9 (2.6.25) and running into a
> > > strange issue.
> > >
> > > Having the proxy set up and tear down 6000 tcp connections a second to
> > > the same test server ip and port,
> > > it quickly blows up (5 seconds) due to all 30000 ephemeral ports going
> > > to TIME_WAIT.
> > > setting tw_recycle=1 fixed the problem, and there are never more than
> > > a couple hundred ports in TIME_WAIT.
> > >
> > > BUT...
> > >
> > > Changing the load test to alternate between two test server ips, it
> > > blows up. Connect: can't assign requested address. (note I am not
> > > binding before hand, I tried
> > > and binding first to port 0 made no difference - it just blows up then
> > > during the bind).
> > >
> > > And there are ~28K ports in TIME_WAIT. For example:
> > >
> > > proxy_ip:30000 load_test_1:8080 TIME_WAIT
> > > proxy_ip:30000 load_test_2:8080 TIME_WAIT
> > > ...
> > > but most are not duplicates of the same local port.
> > >
> > >
> > > What. The. Heck.
> > >
> > > So short of rebuilding the kernel with time_wait as 1 second, is there
> > > any other way not to brick my proxy?
> >
> > two things :
> > - set tcp_tw_reuse to 1 too.
> > - do a setsockopt(SO_REUSEADDR) before connect()
> >
> > Using this, my proxy has no problem at 35K sess/s on 2.6.25. I'm not sure
> > if disabling either option above still works.
> >
> > Hoping this helps,
> > Willy
> >
> >
> Well, it looks like tw_reuse is what I wanted... not tw_recycle. Based on a
> python test program over loopback, tw_reuse alone solves the problem...
> so_reuseaddr doesn't do anything. And apparently the tcp code is too much
> for me...looking at the source I thought tw_reuse only can happen when
> timestamps are enabled. But even after disabling timestamps tw_reuse still
> works over loopback.
>
> I'll have to wait until Monday to try it again in the lab.
>
> May I just confirm.. is tcp_tw_reuse NOT dependent on receiving timestamps?
I never observed any dependency between both, though the code tends to
make me think there is. However, enabling timestamps is often needed when
you're reusing TW sockets, not because of your local system, but because
of possible intermediate systems between the client and the server, such
as firewalls which randomize sequence numbers. Not having tw_reuse prevents
ports from being reused too early. But having tw_reuse alone often makes
the client chose a source port for which a session still exists on a
middle host, with different sequence numbers, which causes trouble.
Enabling timestamps solves the problem when the other end supports PAWS.
So in general, I would add as a rule of thumb that if you need tw_reuse,
you should also enable timestamps "just in case".
Willy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
2008-11-15 7:29 ` Karl Pickett
@ 2008-11-15 13:09 ` Andi Kleen
2008-11-15 15:47 ` Karl Pickett
0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2008-11-15 13:09 UTC (permalink / raw)
To: Karl Pickett; +Cc: linux-kernel, netdev
"Karl Pickett" <karl.pickett@gmail.com> writes:
>
> May I just confirm.. is tcp_tw_reuse NOT dependent on receiving timestamps?
The big problem is that both are incompatible with NAT. So if you
ever talk to any NATed clients don't use it.
-Andi
--
ak@linux.intel.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
2008-11-15 13:09 ` Andi Kleen
@ 2008-11-15 15:47 ` Karl Pickett
2008-11-15 15:52 ` Willy Tarreau
0 siblings, 1 reply; 7+ messages in thread
From: Karl Pickett @ 2008-11-15 15:47 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, netdev
On Sat, Nov 15, 2008 at 8:09 AM, Andi Kleen <andi@firstfloor.org> wrote:
> "Karl Pickett" <karl.pickett@gmail.com> writes:
>>
>> May I just confirm.. is tcp_tw_reuse NOT dependent on receiving timestamps?
>
> The big problem is that both are incompatible with NAT. So if you
> ever talk to any NATed clients don't use it.
>
> -Andi
>
> --
> ak@linux.intel.com
>
Hmph. Running the test again - after getting a little sleep -
timestamps do indeed determine if tw_reuse/recyle work. I must not
have let all the tw buckets expire before changing my timestamp
settings last night.
Since
A. I don't want to rely on arbitrary web servers having timestamps
B. People say it breaks NAT for clients, and the settings are global only,
I will just set TCP_TIMEWAIT_LEN to 10 seconds and call it a day.
Sure would be nice if it was a tunable, so only the most heavily
loaded customers could set it...
--
Karl Pickett
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tcp_tw_recycle broken?
2008-11-15 15:47 ` Karl Pickett
@ 2008-11-15 15:52 ` Willy Tarreau
0 siblings, 0 replies; 7+ messages in thread
From: Willy Tarreau @ 2008-11-15 15:52 UTC (permalink / raw)
To: Karl Pickett; +Cc: Andi Kleen, linux-kernel, netdev
On Sat, Nov 15, 2008 at 10:47:10AM -0500, Karl Pickett wrote:
> On Sat, Nov 15, 2008 at 8:09 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > "Karl Pickett" <karl.pickett@gmail.com> writes:
> >>
> >> May I just confirm.. is tcp_tw_reuse NOT dependent on receiving timestamps?
> >
> > The big problem is that both are incompatible with NAT. So if you
> > ever talk to any NATed clients don't use it.
> >
> > -Andi
> >
> > --
> > ak@linux.intel.com
> >
>
>
> Hmph. Running the test again - after getting a little sleep -
> timestamps do indeed determine if tw_reuse/recyle work. I must not
> have let all the tw buckets expire before changing my timestamp
> settings last night.
>
> Since
> A. I don't want to rely on arbitrary web servers having timestamps
> B. People say it breaks NAT for clients, and the settings are global only,
>
> I will just set TCP_TIMEWAIT_LEN to 10 seconds and call it a day.
you should increase it a bit. I've encountered occasional issues at 15s,
but none at 20s.
> Sure would be nice if it was a tunable, so only the most heavily
> loaded customers could set it...
Indeed. other OSes (eg Solaris) ship with standard values and let us adjust
them.
Willy
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-11-15 15:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-15 4:37 tcp_tw_recycle broken? Karl Pickett
2008-11-15 5:57 ` Willy Tarreau
2008-11-15 7:29 ` Karl Pickett
2008-11-15 13:09 ` Andi Kleen
2008-11-15 15:47 ` Karl Pickett
2008-11-15 15:52 ` Willy Tarreau
[not found] ` <f9be06770811142325j79ca0831j7d5820716199811@mail.gmail.com>
2008-11-15 7:45 ` Willy Tarreau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).