* The dreadful CLOSE_WAIT
@ 2004-07-27 8:39 DervishD
2004-07-27 9:28 ` Måns Rullgård
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: DervishD @ 2004-07-27 8:39 UTC (permalink / raw)
To: Linux-kernel
Hi all :))
Seems under Linux that, when a connection is in the CLOSE_WAIT
state, the only wait to go to LAST_ACK is the application doing the
'shutdown()' or 'close()'. Doesn't seem to be a timeout for that.
Well, I think this is dangerous because a bad application (and a
couple of widely used servers have this problem) can exhaust system
network resources (difficult, but possible). For example, a
concurrent FTP server with a race condition that doesn't do the
shutdown when the remote end aborts. Writing such a 'bad app' is very
easy, just do the socket->bind->listen->accept and after accepting
the connection forget the connected socket and keeps on listening. If
the remote end aborts, the server leaves the connection in
CLOSE_WAIT. Sometimes it has a associated timer, when data remains in
the tx queue, it seems that the kernel tries to retransmit all that
data, which makes no sense: in CLOSE_WAIT state the other end is not
there... Surely I'm missing a lot :((
Since I don't know if a timeout (or another solution) exists to
avoid this I won't give names, but it's pretty easy to do a DoS
attack over a very known FTP server just using 'wget' and your
favourite C-c keys.
IMHO, Linux (Unix) is about not allowing a bad app to screw the
system, and the CLOSE_WAIT state allows that. I know: you can screw
the system using as root an application that allocates and locks
large chunks of memory, or other 'legal' bad things, the sysadmin
should not allow the use of crappy software, but will do any harm a
CLOSE_WAIT timeout?
Thanks a lot for your help :)
Raúl Núñez de Arenas Coronado
--
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: The dreadful CLOSE_WAIT 2004-07-27 8:39 The dreadful CLOSE_WAIT DervishD @ 2004-07-27 9:28 ` Måns Rullgård 2004-07-27 9:57 ` DervishD 2004-07-27 16:00 ` William Lee Irwin III 2004-07-27 16:45 ` Mike Waychison 2 siblings, 1 reply; 12+ messages in thread From: Måns Rullgård @ 2004-07-27 9:28 UTC (permalink / raw) To: linux-kernel DervishD <raul@pleyades.net> writes: > Hi all :)) > > Seems under Linux that, when a connection is in the CLOSE_WAIT > state, the only wait to go to LAST_ACK is the application doing the > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. Is that why some programs seem to hang forever when my NAT gateway decides to drop a connection? -- Måns Rullgård mru@kth.se ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 9:28 ` Måns Rullgård @ 2004-07-27 9:57 ` DervishD 0 siblings, 0 replies; 12+ messages in thread From: DervishD @ 2004-07-27 9:57 UTC (permalink / raw) To: Måns Rullgård; +Cc: linux-kernel Hi Måns :) * Måns Rullgård <mru@kth.se> dixit: > > Seems under Linux that, when a connection is in the CLOSE_WAIT > > state, the only wait to go to LAST_ACK is the application doing the > > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > Is that why some programs seem to hang forever when my NAT gateway > decides to drop a connection? I don't know. Look at the output of your netstat command. If you have connections in the CLOSE_WAIT state related to the NAT gateway, it may be the cause :? But anyway the effect is just the opposite. Is not CLOSE_WAIT state that hangs a program, but a hung program (or at least one not doing its duty) which puts a connection in CLOSE_WAIT state. Hope this helps. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 8:39 The dreadful CLOSE_WAIT DervishD 2004-07-27 9:28 ` Måns Rullgård @ 2004-07-27 16:00 ` William Lee Irwin III 2004-07-27 17:10 ` DervishD 2004-07-27 16:45 ` Mike Waychison 2 siblings, 1 reply; 12+ messages in thread From: William Lee Irwin III @ 2004-07-27 16:00 UTC (permalink / raw) To: raul; +Cc: linux-kernel On Tue, Jul 27, 2004 at 10:39:47AM +0200, DervishD wrote: > Seems under Linux that, when a connection is in the CLOSE_WAIT > state, the only wait to go to LAST_ACK is the application doing the > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > Well, I think this is dangerous because a bad application (and a > couple of widely used servers have this problem) can exhaust system > network resources (difficult, but possible). For example, a > concurrent FTP server with a race condition that doesn't do the > shutdown when the remote end aborts. Writing such a 'bad app' is very > easy, just do the socket->bind->listen->accept and after accepting > the connection forget the connected socket and keeps on listening. If > the remote end aborts, the server leaves the connection in > CLOSE_WAIT. Sometimes it has a associated timer, when data remains in > the tx queue, it seems that the kernel tries to retransmit all that > data, which makes no sense: in CLOSE_WAIT state the other end is not > there... Surely I'm missing a lot :(( Probably best to implement timeouts by hand in your network daemon. -- wli ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 16:00 ` William Lee Irwin III @ 2004-07-27 17:10 ` DervishD 2004-07-27 23:27 ` William Lee Irwin III 0 siblings, 1 reply; 12+ messages in thread From: DervishD @ 2004-07-27 17:10 UTC (permalink / raw) To: William Lee Irwin III, linux-kernel Hi William :) * William Lee Irwin III <wli@holomorphy.com> dixit: > > Seems under Linux that, when a connection is in the CLOSE_WAIT > > state, the only wait to go to LAST_ACK is the application doing the > > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > Probably best to implement timeouts by hand in your network daemon. Of course, this is a bug in the application, but anyway the kernel (IMHO) shouldn't allow this. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 17:10 ` DervishD @ 2004-07-27 23:27 ` William Lee Irwin III 2004-07-28 9:09 ` DervishD 0 siblings, 1 reply; 12+ messages in thread From: William Lee Irwin III @ 2004-07-27 23:27 UTC (permalink / raw) To: raul; +Cc: linux-kernel * William Lee Irwin III <wli@holomorphy.com> dixit: >> Probably best to implement timeouts by hand in your network daemon. On Tue, Jul 27, 2004 at 07:10:25PM +0200, DervishD wrote: > Of course, this is a bug in the application, but anyway the > kernel (IMHO) shouldn't allow this. I suspect the sysctls controlling this, tcp_fin_timeout, tcp_max_orphans, etc., may be useful to you. Check Documentation/networking/ip-sysctl.txt -- wli ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 23:27 ` William Lee Irwin III @ 2004-07-28 9:09 ` DervishD 2004-07-28 9:24 ` William Lee Irwin III 0 siblings, 1 reply; 12+ messages in thread From: DervishD @ 2004-07-28 9:09 UTC (permalink / raw) To: William Lee Irwin III, linux-kernel Hi William :) * William Lee Irwin III <wli@holomorphy.com> dixit: > >> Probably best to implement timeouts by hand in your network daemon. > > Of course, this is a bug in the application, but anyway the > > kernel (IMHO) shouldn't allow this. > I suspect the sysctls controlling this, tcp_fin_timeout, tcp_max_orphans, > etc., may be useful to you. Check Documentation/networking/ip-sysctl.txt tcp_fin_timeout is of no help here, since the server is not stuck in FIN_WAIT2, and in addition to this, the connection is not closed, that is exactly the problem. tcp_max_orphans refer to TCP connections not attached to any user file handle, but a connection in state CLOSE_WAIT is still attached to a file handle, to a valid one indeed. A grep in the kernel sources didn't give any useful guide about which sysctl parameter will help :(( Thanks anyway, William :) Maybe tcp_max_orphans can help, don't know. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-28 9:09 ` DervishD @ 2004-07-28 9:24 ` William Lee Irwin III 0 siblings, 0 replies; 12+ messages in thread From: William Lee Irwin III @ 2004-07-28 9:24 UTC (permalink / raw) To: raul; +Cc: linux-kernel On Wed, Jul 28, 2004 at 11:09:50AM +0200, DervishD wrote: > tcp_fin_timeout is of no help here, since the server is not stuck > in FIN_WAIT2, and in addition to this, the connection is not closed, > that is exactly the problem. tcp_max_orphans refer to TCP connections > not attached to any user file handle, but a connection in state > CLOSE_WAIT is still attached to a file handle, to a valid one indeed. > A grep in the kernel sources didn't give any useful guide about > which sysctl parameter will help :(( > Thanks anyway, William :) Maybe tcp_max_orphans can help, don't > know. I'd recommend trying various options and seeing if they help at all. -- wli ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 8:39 The dreadful CLOSE_WAIT DervishD 2004-07-27 9:28 ` Måns Rullgård 2004-07-27 16:00 ` William Lee Irwin III @ 2004-07-27 16:45 ` Mike Waychison 2004-07-27 17:09 ` DervishD 2 siblings, 1 reply; 12+ messages in thread From: Mike Waychison @ 2004-07-27 16:45 UTC (permalink / raw) To: DervishD; +Cc: Linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 DervishD wrote: > Hi all :)) > > Seems under Linux that, when a connection is in the CLOSE_WAIT > state, the only wait to go to LAST_ACK is the application doing the > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > This is by design. It is possible to close a single direction of data transmission in TCP, hence the shutdown system call. > Well, I think this is dangerous because a bad application (and a > couple of widely used servers have this problem) can exhaust system > network resources (difficult, but possible). For example, a > concurrent FTP server with a race condition that doesn't do the > shutdown when the remote end aborts. Writing such a 'bad app' is very > easy, just do the socket->bind->listen->accept and after accepting > the connection forget the connected socket and keeps on listening. If > the remote end aborts, the server leaves the connection in > CLOSE_WAIT. Sometimes it has a associated timer, when data remains in > the tx queue, it seems that the kernel tries to retransmit all that > data, which makes no sense: in CLOSE_WAIT state the other end is not > there... Surely I'm missing a lot :(( It may be half there. It should be in FIN_WAIT1 state. > > Since I don't know if a timeout (or another solution) exists to > avoid this I won't give names, but it's pretty easy to do a DoS > attack over a very known FTP server just using 'wget' and your > favourite C-c keys. This is broken application behaviour. Forgetting about sockets (or any other resource for that matter) is bad news. > > IMHO, Linux (Unix) is about not allowing a bad app to screw the > system, and the CLOSE_WAIT state allows that. I know: you can screw > the system using as root an application that allocates and locks > large chunks of memory, or other 'legal' bad things, the sysadmin > should not allow the use of crappy software, but will do any harm a > CLOSE_WAIT timeout? > This is the same idea of having a server run that loses a bit of memory on each bad request. It would be an application bug, and similarly, the kernel would have no way to know whether the application was doing something wrong or not. If you are _really_ concerned, you'd cap out at NR_OPEN per process anyway :) - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice http://www.sun.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBBoaZdQs4kOxk3/MRArNBAJ91A7CCycwWfwZqUJNuL/y7GrlYngCfYOkC mM1vvp7GVHe6pBrvPXtuEIY= =VcrF -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-27 16:45 ` Mike Waychison @ 2004-07-27 17:09 ` DervishD [not found] ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com> 0 siblings, 1 reply; 12+ messages in thread From: DervishD @ 2004-07-27 17:09 UTC (permalink / raw) To: Mike Waychison; +Cc: Linux-kernel Hi Mike :) * Mike Waychison <Michael.Waychison@Sun.COM> dixit: > > Seems under Linux that, when a connection is in the CLOSE_WAIT > > state, the only wait to go to LAST_ACK is the application doing the > > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > This is by design. It is possible to close a single direction of data > transmission in TCP, hence the shutdown system call. I know, that's the only 'harm' a CLOSE_WAIT timeout will have, but anyway I don't see any point in having a permanent CLOSE_WAIT state. The other end is not there, it has sent us a FIN. > > the tx queue, it seems that the kernel tries to retransmit all that > > data, which makes no sense: in CLOSE_WAIT state the other end is not > > there... Surely I'm missing a lot :(( > It may be half there. It should be in FIN_WAIT1 state. OK, that's what I was missing. But anyway FIN_WAIT1 has a timeout, so if our 'bad' application doesn't do the close and the connection goes from CLOSE_WAIT to LAST_ACK due to a (new) timeout, the other end will have its FIN anyway and our 'bad' app won't hold a resource. > > Since I don't know if a timeout (or another solution) exists to > > avoid this I won't give names, but it's pretty easy to do a DoS > > attack over a very known FTP server just using 'wget' and your > > favourite C-c keys. > This is broken application behaviour. Forgetting about sockets (or any > other resource for that matter) is bad news. Of course!, that's why I called it 'bad app'. Obviously is a bug in the server, but I don't think it's a good idea to let a bug in an application (well, we are all *VERY* good programmers, but bugs are, you know, die hard...) eat resources without control. > > IMHO, Linux (Unix) is about not allowing a bad app to screw the > > system, and the CLOSE_WAIT state allows that. I know: you can screw > > the system using as root an application that allocates and locks > > large chunks of memory, or other 'legal' bad things, the sysadmin > > should not allow the use of crappy software, but will do any harm a > > CLOSE_WAIT timeout? > This is the same idea of having a server run that loses a bit of memory > on each bad request. It would be an application bug, and similarly, the > kernel would have no way to know whether the application was doing > something wrong or not. I think that being in CLOSE_WAIT more than a given amount of time (this is system policy, I suppose, and kernel should not dictate here) is wrong. I did a simple test and my 'bad server' (written that way on purpose) held a connection in CLOSE_WAIT for a day! And it's still there. Obviously the application is not doing anything correct. I can't think of an scenario where holding a connection in CLOSE_WAIT for ever is correct. I mean, not doing the 'close()'. > If you are _really_ concerned, you'd cap out at NR_OPEN per process > anyway :) Well, it may be an idea ;) Anyway if you have, let's say, a maximum of 10 connections in your server, and I do 10 wget+C-c, you no longer have a running server. The kernel should not allow that. A timeout of 3600 seconds seems very reasonable, or somethink like that, am I wrong? Thanks for your help :) Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>]
* Re: The dreadful CLOSE_WAIT [not found] ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com> @ 2004-07-28 14:47 ` DervishD 2004-07-29 9:46 ` Markus Schaber 0 siblings, 1 reply; 12+ messages in thread From: DervishD @ 2004-07-28 14:47 UTC (permalink / raw) To: Markus Schaber; +Cc: Linux-kernel Hi Markus :) * Markus Schaber <schabios@logi-track.com> dixit: > > I know, that's the only 'harm' a CLOSE_WAIT timeout will have, > > but anyway I don't see any point in having a permanent CLOSE_WAIT > > state. The other end is not there, it has sent us a FIN. > Yes, but it may still want to read. I know, now I understand. > > Well, it may be an idea ;) Anyway if you have, let's say, a > > maximum of 10 connections in your server, and I do 10 wget+C-c, you > > no longer have a running server. The kernel should not allow that. A > > timeout of 3600 seconds seems very reasonable, or somethink like > > that, am I wrong? > Well, when the other side is really dead, then connection keepalive > should detect that (when enabled), by either timeout or getting a reset > packet. But this must be enabled in the application, am I wrong? using SO_KEEPALIVE. Can it be enabled using sysctl or the like. Thanks for the information. When I saw the transitions, I thought that the server got the FIN after the client died, but obviously it can get it when the client doesn a half-close, and I didn't think of it. Thanks, Markus :) Now, is there any sysctl that enables a keepalive for this kind of connections (dead remote end, local in CLOSE_WAIT) for all connections? Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: The dreadful CLOSE_WAIT 2004-07-28 14:47 ` DervishD @ 2004-07-29 9:46 ` Markus Schaber 0 siblings, 0 replies; 12+ messages in thread From: Markus Schaber @ 2004-07-29 9:46 UTC (permalink / raw) To: DervishD; +Cc: Linux-kernel Hi, DervishD, On Wed, 28 Jul 2004 16:47:23 +0200 DervishD <raul@pleyades.net> wrote: > Now, is there any sysctl that enables a keepalive for this kind > of connections (dead remote end, local in CLOSE_WAIT) for all > connections? Hmm, on my 2.6.4 kernel, I have /proc/sys/net/ipv4$ for i in tcp_keepalive_* ; do echo $i $(cat $i) ; done tcp_keepalive_intvl 75 tcp_keepalive_probes 9 tcp_keepalive_time 7200 So it seems those are only the tuning values for the keepalive. Linux (following RFC112) by default waits for 7200 seconds = 2 hours before it sends the probes. The reason for this is that idle connections (like ssh) should not be dropped just because there are some temporary network problems between the hosts. (See man tcp for details.) It seems that there's no global enabling if you don't want to tweak the kernel source yourself. Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-07-29 9:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-27 8:39 The dreadful CLOSE_WAIT DervishD
2004-07-27 9:28 ` Måns Rullgård
2004-07-27 9:57 ` DervishD
2004-07-27 16:00 ` William Lee Irwin III
2004-07-27 17:10 ` DervishD
2004-07-27 23:27 ` William Lee Irwin III
2004-07-28 9:09 ` DervishD
2004-07-28 9:24 ` William Lee Irwin III
2004-07-27 16:45 ` Mike Waychison
2004-07-27 17:09 ` DervishD
[not found] ` <20040728140622.2bc69fa5@kingfisher.intern.logi-track.com>
2004-07-28 14:47 ` DervishD
2004-07-29 9:46 ` Markus Schaber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox