From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S266465AbUG0RJd (ORCPT ); Tue, 27 Jul 2004 13:09:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266466AbUG0RJd (ORCPT ); Tue, 27 Jul 2004 13:09:33 -0400 Received: from madrid10.amenworld.com ([62.193.203.32]:7954 "EHLO madrid10.amenworld.com") by vger.kernel.org with ESMTP id S266465AbUG0RJY (ORCPT ); Tue, 27 Jul 2004 13:09:24 -0400 Date: Tue, 27 Jul 2004 19:09:07 +0200 From: DervishD To: Mike Waychison Cc: Linux-kernel Subject: Re: The dreadful CLOSE_WAIT Message-ID: <20040727170907.GA26136@DervishD> Mail-Followup-To: Mike Waychison , Linux-kernel References: <20040727083947.GB31766@DervishD> <4106869A.5030905@sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4106869A.5030905@sun.com> User-Agent: Mutt/1.4.2.1i Organization: Pleyades Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi Mike :) * Mike Waychison dixit: > > Seems under Linux that, when a connection is in the CLOSE_WAIT > > state, the only wait to go to LAST_ACK is the application doing the > > 'shutdown()' or 'close()'. Doesn't seem to be a timeout for that. > This is by design. It is possible to close a single direction of data > transmission in TCP, hence the shutdown system call. I know, that's the only 'harm' a CLOSE_WAIT timeout will have, but anyway I don't see any point in having a permanent CLOSE_WAIT state. The other end is not there, it has sent us a FIN. > > the tx queue, it seems that the kernel tries to retransmit all that > > data, which makes no sense: in CLOSE_WAIT state the other end is not > > there... Surely I'm missing a lot :(( > It may be half there. It should be in FIN_WAIT1 state. OK, that's what I was missing. But anyway FIN_WAIT1 has a timeout, so if our 'bad' application doesn't do the close and the connection goes from CLOSE_WAIT to LAST_ACK due to a (new) timeout, the other end will have its FIN anyway and our 'bad' app won't hold a resource. > > Since I don't know if a timeout (or another solution) exists to > > avoid this I won't give names, but it's pretty easy to do a DoS > > attack over a very known FTP server just using 'wget' and your > > favourite C-c keys. > This is broken application behaviour. Forgetting about sockets (or any > other resource for that matter) is bad news. Of course!, that's why I called it 'bad app'. Obviously is a bug in the server, but I don't think it's a good idea to let a bug in an application (well, we are all *VERY* good programmers, but bugs are, you know, die hard...) eat resources without control. > > IMHO, Linux (Unix) is about not allowing a bad app to screw the > > system, and the CLOSE_WAIT state allows that. I know: you can screw > > the system using as root an application that allocates and locks > > large chunks of memory, or other 'legal' bad things, the sysadmin > > should not allow the use of crappy software, but will do any harm a > > CLOSE_WAIT timeout? > This is the same idea of having a server run that loses a bit of memory > on each bad request. It would be an application bug, and similarly, the > kernel would have no way to know whether the application was doing > something wrong or not. I think that being in CLOSE_WAIT more than a given amount of time (this is system policy, I suppose, and kernel should not dictate here) is wrong. I did a simple test and my 'bad server' (written that way on purpose) held a connection in CLOSE_WAIT for a day! And it's still there. Obviously the application is not doing anything correct. I can't think of an scenario where holding a connection in CLOSE_WAIT for ever is correct. I mean, not doing the 'close()'. > If you are _really_ concerned, you'd cap out at NR_OPEN per process > anyway :) Well, it may be an idea ;) Anyway if you have, let's say, a maximum of 10 connections in your server, and I do 10 wget+C-c, you no longer have a running server. The kernel should not allow that. A timeout of 3600 seconds seems very reasonable, or somethink like that, am I wrong? Thanks for your help :) Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/