From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [PATCH 2/6] chunkd: change the prefix length of object pathname from 4 to 3 Date: Wed, 13 Jan 2010 05:52:00 -0500 Message-ID: <4B4DA5D0.8050708@garzik.org> References: <1263212721-11210-1-git-send-email-akinobu.mita@gmail.com> <1263212721-11210-2-git-send-email-akinobu.mita@gmail.com> <4B4BE4AB.4040501@garzik.org> <961aa3351001120028t8c81487n62289c841a5dcb49@mail.gmail.com> <4B4C5C93.6010709@garzik.org> <961aa3351001121950x72f083d5w2ac596dcbca53b35@mail.gmail.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=QhOlR27+4xtUWubUCUH+zytsdBzVL9ribWFJPXeKNOU=; b=k/pcAdPWWhAV/RsBoBs921BpJx+PTcQQ/88KL5g4cMT8kJG4AjJqtOczLLrPc7FdZU wixwLyhP8phSKHEwUX9AT+QQlYKE5rYkftB/K33P0A+sEYh/9Q+L8tI1am8GaCsfpyFF H35eYMUIMh6gMvGqrokUBbb9Jxef+T9fR3F+A= In-Reply-To: <961aa3351001121950x72f083d5w2ac596dcbca53b35@mail.gmail.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Akinobu Mita Cc: hail-devel@vger.kernel.org On 01/12/2010 10:50 PM, Akinobu Mita wrote: >>>> This patch makes sense, but it also raises the question of whether or not >>>> we >>>> should move to a two-level directory scheme, eg. >>>> >>>> 123/456/7890ABCDEF >>>> rather than >>>> 123/4567890ABCDEF >>>> >>>> to limit the size of the top-level directories. It really depends on the >>>> object counts a typical chunkd node will be seeing. As with the other >>>> patch, I will give this some thought after sleep. >>> >>> Two-level directory scheme looks good. >>> >>> I will do it unless someone thinks 536,870,912,000(=4096*4096*32000) >>> objects in one table is not enough :) >> >> FWIW, 32000 is only the limit on directories-with-a-directory. You can >> easily have millions of regular files in a single ext3 directory. So it is >> really 4096*4096*millions. > > Oops, how embarrassing... so 1-level directory scheme with 3-bytes prefix > is nealy unlimited in maximum count of objects. Yes. It mainly becomes a question of balancing lookup costs, at that point: With a 1-level directory scheme, millions of objects could imply prohibitively long directory-lookup times as those directories [although super-large directories are better handled in ext3+htree, ext4, btrfs and XFS]. On the other hand, a 2-level directory scheme would reduce or eliminate the occurrence of large directories, with the cost of having to perform many more mkdir(2) calls during object creation. Additional costs include larger dcache footprint, and added fs_list_objs() complexity. > BTW, chunkd cannot have more than 32000 tables on ext3 by the same reason > (EXT3_MAX_LINK). So, should we use two or three-level directory scheme > for table_id in object pathname ? At this point, I think it is unlikely that people will create more than 32000 tables on a single server. If I am wrong, we can eliminate this limit at a later date. Jeff