The dreadful CLOSE

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* The dreadful CLOSE_WAIT
@ 2004-07-27  8:39 DervishD
  2004-07-27  9:28 ` Måns Rullgård
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: DervishD @ 2004-07-27  8:39 UTC (permalink / raw)
  To: Linux-kernel

    Hi all :))

    Seems under Linux that, when a connection is in the CLOSE_WAIT
state, the only wait to go to LAST_ACK is the application doing the
'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.

    Well, I think this is dangerous because a bad application (and a
couple of widely used servers have this problem) can exhaust system
network resources (difficult, but possible). For example, a
concurrent FTP server with a race condition that doesn't do the
shutdown when the remote end aborts. Writing such a 'bad app' is very
easy, just do the socket->bind->listen->accept and after accepting
the connection forget the connected socket and keeps on listening. If
the remote end aborts, the server leaves the connection in
CLOSE_WAIT. Sometimes it has a associated timer, when data remains in
the tx queue, it seems that the kernel tries to retransmit all that
data, which makes no sense: in CLOSE_WAIT state the other end is not
there... Surely I'm missing a lot :((

    Since I don't know if a timeout (or another solution) exists to
avoid this I won't give names, but it's pretty easy to do a DoS
attack over a very known FTP server just using 'wget' and your
favourite C-c keys.

    IMHO, Linux (Unix) is about not allowing a bad app to screw the
system, and the CLOSE_WAIT state allows that. I know: you can screw
the system using as root an application that allocates and locks
large chunks of memory, or other 'legal' bad things, the sysadmin
should not allow the use of crappy software, but will do any harm a
CLOSE_WAIT timeout?

    Thanks a lot for your help :)

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27  8:39 The dreadful CLOSE_WAIT DervishD
@ 2004-07-27  9:28 ` Måns Rullgård
  2004-07-27  9:57   ` DervishD
  2004-07-27 16:00 ` William Lee Irwin III
  2004-07-27 16:45 ` Mike Waychison
  2 siblings, 1 reply; 12+ messages in thread
From: Måns Rullgård @ 2004-07-27  9:28 UTC (permalink / raw)
  To: linux-kernel

DervishD <raul@pleyades.net> writes:

>     Hi all :))
>
>     Seems under Linux that, when a connection is in the CLOSE_WAIT
> state, the only wait to go to LAST_ACK is the application doing the
> 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.

Is that why some programs seem to hang forever when my NAT gateway
decides to drop a connection?

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27  9:28 ` Måns Rullgård
@ 2004-07-27  9:57   ` DervishD
  0 siblings, 0 replies; 12+ messages in thread
From: DervishD @ 2004-07-27  9:57 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-kernel

    Hi Måns :)

 * Måns Rullgård <mru@kth.se> dixit:
> >     Seems under Linux that, when a connection is in the CLOSE_WAIT
> > state, the only wait to go to LAST_ACK is the application doing the
> > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
> Is that why some programs seem to hang forever when my NAT gateway
> decides to drop a connection?

    I don't know. Look at the output of your netstat command. If you
have connections in the CLOSE_WAIT state related to the NAT gateway,
it may be the cause :? But anyway the effect is just the opposite. Is
not CLOSE_WAIT state that hangs a program, but a hung program (or at
least one not doing its duty) which puts a connection in CLOSE_WAIT
state.

    Hope this helps.

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27  8:39 The dreadful CLOSE_WAIT DervishD
  2004-07-27  9:28 ` Måns Rullgård
@ 2004-07-27 16:00 ` William Lee Irwin III
  2004-07-27 17:10   ` DervishD
  2004-07-27 16:45 ` Mike Waychison
  2 siblings, 1 reply; 12+ messages in thread
From: William Lee Irwin III @ 2004-07-27 16:00 UTC (permalink / raw)
  To: raul; +Cc: linux-kernel

On Tue, Jul 27, 2004 at 10:39:47AM +0200, DervishD wrote:
>     Seems under Linux that, when a connection is in the CLOSE_WAIT
> state, the only wait to go to LAST_ACK is the application doing the
> 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
>     Well, I think this is dangerous because a bad application (and a
> couple of widely used servers have this problem) can exhaust system
> network resources (difficult, but possible). For example, a
> concurrent FTP server with a race condition that doesn't do the
> shutdown when the remote end aborts. Writing such a 'bad app' is very
> easy, just do the socket->bind->listen->accept and after accepting
> the connection forget the connected socket and keeps on listening. If
> the remote end aborts, the server leaves the connection in
> CLOSE_WAIT. Sometimes it has a associated timer, when data remains in
> the tx queue, it seems that the kernel tries to retransmit all that
> data, which makes no sense: in CLOSE_WAIT state the other end is not
> there... Surely I'm missing a lot :((

Probably best to implement timeouts by hand in your network daemon.


-- wli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27 16:00 ` William Lee Irwin III
@ 2004-07-27 17:10   ` DervishD
  2004-07-27 23:27     ` William Lee Irwin III
  0 siblings, 1 reply; 12+ messages in thread
From: DervishD @ 2004-07-27 17:10 UTC (permalink / raw)
  To: William Lee Irwin III, linux-kernel

    Hi William :)

 * William Lee Irwin III <wli@holomorphy.com> dixit:
> >     Seems under Linux that, when a connection is in the CLOSE_WAIT
> > state, the only wait to go to LAST_ACK is the application doing the
> > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
> Probably best to implement timeouts by hand in your network daemon.

    Of course, this is a bug in the application, but anyway the
kernel (IMHO) shouldn't allow this.

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27 17:10   ` DervishD
@ 2004-07-27 23:27     ` William Lee Irwin III
  2004-07-28  9:09       ` DervishD
  0 siblings, 1 reply; 12+ messages in thread
From: William Lee Irwin III @ 2004-07-27 23:27 UTC (permalink / raw)
  To: raul; +Cc: linux-kernel

* William Lee Irwin III <wli@holomorphy.com> dixit:
>> Probably best to implement timeouts by hand in your network daemon.

On Tue, Jul 27, 2004 at 07:10:25PM +0200, DervishD wrote:
>     Of course, this is a bug in the application, but anyway the
> kernel (IMHO) shouldn't allow this.

I suspect the sysctls controlling this, tcp_fin_timeout, tcp_max_orphans,
etc., may be useful to you. Check Documentation/networking/ip-sysctl.txt


-- wli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27 23:27     ` William Lee Irwin III
@ 2004-07-28  9:09       ` DervishD
  2004-07-28  9:24         ` William Lee Irwin III
  0 siblings, 1 reply; 12+ messages in thread
From: DervishD @ 2004-07-28  9:09 UTC (permalink / raw)
  To: William Lee Irwin III, linux-kernel

    Hi William :)

 * William Lee Irwin III <wli@holomorphy.com> dixit:
> >> Probably best to implement timeouts by hand in your network daemon.
> >     Of course, this is a bug in the application, but anyway the
> > kernel (IMHO) shouldn't allow this.
> I suspect the sysctls controlling this, tcp_fin_timeout, tcp_max_orphans,
> etc., may be useful to you. Check Documentation/networking/ip-sysctl.txt

    tcp_fin_timeout is of no help here, since the server is not stuck
in FIN_WAIT2, and in addition to this, the connection is not closed,
that is exactly the problem. tcp_max_orphans refer to TCP connections
not attached to any user file handle, but a connection in state
CLOSE_WAIT is still attached to a file handle, to a valid one indeed.

    A grep in the kernel sources didn't give any useful guide about
which sysctl parameter will help :((

    Thanks anyway, William :) Maybe tcp_max_orphans can help, don't
know.

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-28  9:09       ` DervishD
@ 2004-07-28  9:24         ` William Lee Irwin III
  0 siblings, 0 replies; 12+ messages in thread
From: William Lee Irwin III @ 2004-07-28  9:24 UTC (permalink / raw)
  To: raul; +Cc: linux-kernel

On Wed, Jul 28, 2004 at 11:09:50AM +0200, DervishD wrote:
>     tcp_fin_timeout is of no help here, since the server is not stuck
> in FIN_WAIT2, and in addition to this, the connection is not closed,
> that is exactly the problem. tcp_max_orphans refer to TCP connections
> not attached to any user file handle, but a connection in state
> CLOSE_WAIT is still attached to a file handle, to a valid one indeed.
>     A grep in the kernel sources didn't give any useful guide about
> which sysctl parameter will help :((
>     Thanks anyway, William :) Maybe tcp_max_orphans can help, don't
> know.

I'd recommend trying various options and seeing if they help at all.


-- wli

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27  8:39 The dreadful CLOSE_WAIT DervishD
  2004-07-27  9:28 ` Måns Rullgård
  2004-07-27 16:00 ` William Lee Irwin III
@ 2004-07-27 16:45 ` Mike Waychison
  2004-07-27 17:09   ` DervishD
  2 siblings, 1 reply; 12+ messages in thread
From: Mike Waychison @ 2004-07-27 16:45 UTC (permalink / raw)
  To: DervishD; +Cc: Linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

DervishD wrote:
>     Hi all :))
>
>     Seems under Linux that, when a connection is in the CLOSE_WAIT
> state, the only wait to go to LAST_ACK is the application doing the
> 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
>

This is by design.  It is possible to close a single direction of data
transmission in TCP, hence the shutdown system call.


>     Well, I think this is dangerous because a bad application (and a
> couple of widely used servers have this problem) can exhaust system
> network resources (difficult, but possible). For example, a
> concurrent FTP server with a race condition that doesn't do the
> shutdown when the remote end aborts. Writing such a 'bad app' is very
> easy, just do the socket->bind->listen->accept and after accepting
> the connection forget the connected socket and keeps on listening. If
> the remote end aborts, the server leaves the connection in
> CLOSE_WAIT. Sometimes it has a associated timer, when data remains in
> the tx queue, it seems that the kernel tries to retransmit all that
> data, which makes no sense: in CLOSE_WAIT state the other end is not
> there... Surely I'm missing a lot :((

It may be half there.  It should be in FIN_WAIT1 state.

>
>     Since I don't know if a timeout (or another solution) exists to
> avoid this I won't give names, but it's pretty easy to do a DoS
> attack over a very known FTP server just using 'wget' and your
> favourite C-c keys.

This is broken application behaviour.  Forgetting about sockets (or any
other resource for that matter) is bad news.

>
>     IMHO, Linux (Unix) is about not allowing a bad app to screw the
> system, and the CLOSE_WAIT state allows that. I know: you can screw
> the system using as root an application that allocates and locks
> large chunks of memory, or other 'legal' bad things, the sysadmin
> should not allow the use of crappy software, but will do any harm a
> CLOSE_WAIT timeout?
>

This is the same idea of having a server run that loses a bit of memory
on each bad request.  It would be an application bug, and similarly, the
kernel would have no way to know whether the application was doing
something wrong or not.

If you are _really_ concerned, you'd cap out at NR_OPEN per process
anyway :)


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBBoaZdQs4kOxk3/MRArNBAJ91A7CCycwWfwZqUJNuL/y7GrlYngCfYOkC
mM1vvp7GVHe6pBrvPXtuEIY=
=VcrF
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-27 16:45 ` Mike Waychison
@ 2004-07-27 17:09   ` DervishD
       [not found]     ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>
  0 siblings, 1 reply; 12+ messages in thread
From: DervishD @ 2004-07-27 17:09 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Linux-kernel

    Hi Mike :)

 * Mike Waychison <Michael.Waychison@Sun.COM> dixit:
> >     Seems under Linux that, when a connection is in the CLOSE_WAIT
> > state, the only wait to go to LAST_ACK is the application doing the
> > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
> This is by design.  It is possible to close a single direction of data
> transmission in TCP, hence the shutdown system call.

    I know, that's the only 'harm' a CLOSE_WAIT timeout will have,
but anyway I don't see any point in having a permanent CLOSE_WAIT
state. The other end is not there, it has sent us a FIN.

> > the tx queue, it seems that the kernel tries to retransmit all that
> > data, which makes no sense: in CLOSE_WAIT state the other end is not
> > there... Surely I'm missing a lot :((
> It may be half there.  It should be in FIN_WAIT1 state.

    OK, that's what I was missing. But anyway FIN_WAIT1 has a
timeout, so if our 'bad' application doesn't do the close and the
connection goes from CLOSE_WAIT to LAST_ACK due to a (new) timeout,
the other end will have its FIN anyway and our 'bad' app won't hold a
resource.

> >     Since I don't know if a timeout (or another solution) exists to
> > avoid this I won't give names, but it's pretty easy to do a DoS
> > attack over a very known FTP server just using 'wget' and your
> > favourite C-c keys.
> This is broken application behaviour.  Forgetting about sockets (or any
> other resource for that matter) is bad news.

    Of course!, that's why I called it 'bad app'. Obviously is a bug
in the server, but I don't think it's a good idea to let a bug in an
application (well, we are all *VERY* good programmers, but bugs are,
you know, die hard...) eat resources without control.
 
> >     IMHO, Linux (Unix) is about not allowing a bad app to screw the
> > system, and the CLOSE_WAIT state allows that. I know: you can screw
> > the system using as root an application that allocates and locks
> > large chunks of memory, or other 'legal' bad things, the sysadmin
> > should not allow the use of crappy software, but will do any harm a
> > CLOSE_WAIT timeout?
> This is the same idea of having a server run that loses a bit of memory
> on each bad request.  It would be an application bug, and similarly, the
> kernel would have no way to know whether the application was doing
> something wrong or not.

    I think that being in CLOSE_WAIT more than a given amount of time
(this is system policy, I suppose, and kernel should not dictate
here) is wrong. I did a simple test and my 'bad server' (written that
way on purpose) held a connection in CLOSE_WAIT for a day! And it's
still there. Obviously the application is not doing anything correct.
I can't think of an scenario where holding a connection in CLOSE_WAIT
for ever is correct. I mean, not doing the 'close()'.
 
> If you are _really_ concerned, you'd cap out at NR_OPEN per process
> anyway :)

    Well, it may be an idea ;) Anyway if you have, let's say, a
maximum of 10 connections in your server, and I do 10 wget+C-c, you
no longer have a running server. The kernel should not allow that. A
timeout of 3600 seconds seems very reasonable, or somethink like
that, am I wrong?

    Thanks for your help :)

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>]

* Re: The dreadful CLOSE_WAIT
       [not found]     ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>
@ 2004-07-28 14:47       ` DervishD
  2004-07-29  9:46         ` Markus Schaber
  0 siblings, 1 reply; 12+ messages in thread
From: DervishD @ 2004-07-28 14:47 UTC (permalink / raw)
  To: Markus Schaber; +Cc: Linux-kernel

    Hi Markus :)

 * Markus Schaber <schabios@logi-track.com> dixit:
> >     I know, that's the only 'harm' a CLOSE_WAIT timeout will have,
> > but anyway I don't see any point in having a permanent CLOSE_WAIT
> > state. The other end is not there, it has sent us a FIN.
> Yes, but it may still want to read.

    I know, now I understand.

> >     Well, it may be an idea ;) Anyway if you have, let's say, a
> > maximum of 10 connections in your server, and I do 10 wget+C-c, you
> > no longer have a running server. The kernel should not allow that. A
> > timeout of 3600 seconds seems very reasonable, or somethink like
> > that, am I wrong?
> Well, when the other side is really dead, then connection keepalive
> should detect that (when enabled), by either timeout or getting a reset
> packet.

    But this must be enabled in the application, am I wrong? using
SO_KEEPALIVE. Can it be enabled using sysctl or the like.

    Thanks for the information. When I saw the transitions, I thought
that the server got the FIN after the client died, but obviously it
can get it when the client doesn a half-close, and I didn't think of
it. Thanks, Markus :)

    Now, is there any sysctl that enables a keepalive for this kind
of connections (dead remote end, local in CLOSE_WAIT) for all
connections?
    
    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: The dreadful CLOSE_WAIT
  2004-07-28 14:47       ` DervishD
@ 2004-07-29  9:46         ` Markus Schaber
  0 siblings, 0 replies; 12+ messages in thread
From: Markus Schaber @ 2004-07-29  9:46 UTC (permalink / raw)
  To: DervishD; +Cc: Linux-kernel

Hi, DervishD,

On Wed, 28 Jul 2004 16:47:23 +0200
DervishD <raul@pleyades.net> wrote:

>     Now, is there any sysctl that enables a keepalive for this kind
> of connections (dead remote end, local in CLOSE_WAIT) for all
> connections?

Hmm, on my 2.6.4 kernel, I have 

/proc/sys/net/ipv4$ for i in tcp_keepalive_* ; do echo $i $(cat $i) ; done
tcp_keepalive_intvl 75
tcp_keepalive_probes 9
tcp_keepalive_time 7200

So it seems those are only the tuning values for the keepalive. Linux
(following RFC112) by default waits for 7200 seconds = 2 hours before it
sends the probes. The reason for this is that idle connections (like
ssh) should not be dropped just because there are some temporary network
problems between the hosts. (See man tcp for details.)

It seems that there's no global enabling if you don't want to tweak the
kernel source yourself.

Markus

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-07-29  9:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-27  8:39 The dreadful CLOSE_WAIT DervishD
2004-07-27  9:28 ` Måns Rullgård
2004-07-27  9:57   ` DervishD
2004-07-27 16:00 ` William Lee Irwin III
2004-07-27 17:10   ` DervishD
2004-07-27 23:27     ` William Lee Irwin III
2004-07-28  9:09       ` DervishD
2004-07-28  9:24         ` William Lee Irwin III
2004-07-27 16:45 ` Mike Waychison
2004-07-27 17:09   ` DervishD
     [not found]     ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>
2004-07-28 14:47       ` DervishD
2004-07-29  9:46         ` Markus Schaber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox