From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [chunkd patch 1/6] Fix the leak of suddenly closed connections Date: Tue, 25 May 2010 18:25:14 -0400 Message-ID: <4BFC4E4A.3090203@garzik.org> References: <20100520225401.1480c10c@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=BWlVOTfGRo3aIgSUqppTf7at8HhQNfaNZqBzuQlXKbk=; b=xF7bZeOjaaaN12m7Npt2G4aX9popxfrhS0CyLiTF71lRC5l5dvDnIKSIvf0JnCn1yC Zmp5bRbJrdh3FdwDvC0k7dE8adO3MjhEh/XYl4M0sSajYZQtRkXrHliIuhUvosgU5lvA MttS3ov46C7DjuAkfX3EDMrvjjV9OerbJaUtc= In-Reply-To: <20100520225401.1480c10c@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 05/21/2010 12:54 AM, Pete Zaitcev wrote: > After a period of uptime, chunkd may stop working with this: > > May 20 08:51:47 azdragon2 chunkd[4034]: tcp accept: Too many open files > > An examination with lsof shows that file descriptors for sockets and > object data files are leaked in neat pairs. As it turns out, the root > cause is not processing the case when tabled opens a connection to > read an object, then closes it before the data is transferred. > On some systems, sendfile returns no error in such case, but the > amount of data that it attempted to send before it recognized that > the socket was closed. If that happens, chunkd will not receive a > POLLOUT indication and the struct cli will linger forever with > non-empty write queue. > > The fix has two parts: > > 1. Permit a client in evt_recycle state to process outstanding > writes in the same manner a client in evt_dispose does. > > Note that in our specific failure case no actual processing > is going to occur, so this part has an effect of permitting > the dispatch to work. If we do not do this, a POLLIN may > throw us into the evt_read_fixed stage. > > 2. Once we're getting dispatched, dispose of clients that > had connections closed, using the unmaskable POLLHUP bit. > > As an aside, tabled 0.5-0.7.x resets the connections when Firefox > asks for a file that was modified after a certain date. In that case, > tabled wants to know when the file was modified, so it reads the > header off chunkd. If it turns out that the client is not interested > in the data, tabled simply closes the connection without reading > whatever data has arrived. This may change in the future, but the > bug in chunkd should be fixed anyway, for general robustness. > > Signed-off-by: Pete Zaitcev applied 1-6, after fixing truncation bug newly introduced