From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [cld patch 1/1] fix crash in cldc_close due to array reallocations Date: Sat, 15 May 2010 16:36:58 -0400 Message-ID: <4BEF05EA.60807@garzik.org> References: <20100515133227.24e72d0d@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=2cE/qkgsgVsqM1bTBVO4B359Z3tpwqoR71P6nG8c6gI=; b=ekG0dgJDUzpsx8ThAAnUotrrV54YvzifR1iy94D9Wnr/EBGAI4UQUNVWXb6VJLBhIX 2nDNfjbXg4CmIpZQr3AuS9F38Fd9BhGmlUT+cBbQNIsDcX2g1FujCvT+PFWAaTczJt0D ACOTDiPMU3KI14QrnMh8+mvAnIGrE8OIx3O9o= In-Reply-To: <20100515133227.24e72d0d@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 05/15/2010 03:32 PM, Pete Zaitcev wrote: > A long-running tabled is virtually guaranteed to crash when it's shut > down, with the traceback like this: > > Program terminated with signal 11, Segmentation fault. > #0 cldc_close (fh=0x10a71c8, copts=0x7fff2fdef780) at cldc.c:1049 > 1049 if (sess->expired) { > (gdb) where > #0 cldc_close (fh=0x10a71c8, copts=0x7fff2fdef780) at cldc.c:1049 > #1 0x00007f8a878ed7c7 in ncld_close (fh=0x10a7790) at cldc.c:2215 > #2 0x00007f8a878ed918 in ncld_sess_close (nsess=0x10a6f10) at cldc.c:2247 > #3 0x0000000000406252 in cld_end () at cldu.c:701 > #4 0x000000000040d6c4 in main (argc=, > argv=) at server.c:1980 > (gdb) print *fh > $3 = {fh = 140232956496032, sess = 0x0, valid = 176} > > Obviously the boolean value should not be 176. > > The reason this happens is array reallocation. The API for libcldc > (used internally by ncld) returns a pointer to struct cldc_fh that > the user is expected to save. However, this structure is placed > inside a GArray. As array expands, it can be relocated in memory > and the file handle pointer is left hanging. Tabled is vulnerable > to this because 1) it keeps an open handle for the CLD file that > it locked and 2) it periodically opens and closes CLD files for > chunkservers, causing array to grow. > > We observed that the indexes into the array are not the same > as file handles. Therefore, we never access elements by index > and thus it is not necessary to use an array structure. > > This patch replaces array with a list, so that pointers to handle > structures can be stable. > > Also, the patch frees the handles when they are closed, thus > ending the endless growth of the file handle table. This is > done in a compatible way, so handle is available to callers > of cldc_close (ncld does not need it, but it may be useful). > > Although we change globally visible structures, we only replace > one ostensibly opaque pointer type (GArray) with another one > (GList), so applications need not be rebuilt. > > Signed-off-by: Pete Zaitcev applied