From: Jeff Garzik <jeff@garzik.org>
To: Akinobu Mita <akinobu.mita@gmail.com>
Cc: hail-devel@vger.kernel.org
Subject: Re: [PATCH 2/6] chunkd: change the prefix length of object pathname from 4 to 3
Date: Wed, 13 Jan 2010 05:52:00 -0500 [thread overview]
Message-ID: <4B4DA5D0.8050708@garzik.org> (raw)
In-Reply-To: <961aa3351001121950x72f083d5w2ac596dcbca53b35@mail.gmail.com>
On 01/12/2010 10:50 PM, Akinobu Mita wrote:
>>>> This patch makes sense, but it also raises the question of whether or not
>>>> we
>>>> should move to a two-level directory scheme, eg.
>>>>
>>>> 123/456/7890ABCDEF
>>>> rather than
>>>> 123/4567890ABCDEF
>>>>
>>>> to limit the size of the top-level directories. It really depends on the
>>>> object counts a typical chunkd node will be seeing. As with the other
>>>> patch, I will give this some thought after sleep.
>>>
>>> Two-level directory scheme looks good.
>>>
>>> I will do it unless someone thinks 536,870,912,000(=4096*4096*32000)
>>> objects in one table is not enough :)
>>
>> FWIW, 32000 is only the limit on directories-with-a-directory. You can
>> easily have millions of regular files in a single ext3 directory. So it is
>> really 4096*4096*millions.
>
> Oops, how embarrassing... so 1-level directory scheme with 3-bytes prefix
> is nealy unlimited in maximum count of objects.
Yes. It mainly becomes a question of balancing lookup costs, at that point:
With a 1-level directory scheme, millions of objects could imply
prohibitively long directory-lookup times as those directories [although
super-large directories are better handled in ext3+htree, ext4, btrfs
and XFS].
On the other hand, a 2-level directory scheme would reduce or eliminate
the occurrence of large directories, with the cost of having to perform
many more mkdir(2) calls during object creation. Additional costs
include larger dcache footprint, and added fs_list_objs() complexity.
> BTW, chunkd cannot have more than 32000 tables on ext3 by the same reason
> (EXT3_MAX_LINK). So, should we use two or three-level directory scheme
> for table_id in object pathname ?
At this point, I think it is unlikely that people will create more than
32000 tables on a single server. If I am wrong, we can eliminate this
limit at a later date.
Jeff
next prev parent reply other threads:[~2010-01-13 10:52 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-11 12:25 [PATCH 1/6] chunkd: Don't hardcode the prefix length of object pathname Akinobu Mita
2010-01-11 12:25 ` [PATCH 2/6] chunkd: change the prefix length of object pathname from 4 to 3 Akinobu Mita
2010-01-12 2:55 ` Jeff Garzik
2010-01-12 8:28 ` Akinobu Mita
2010-01-12 11:27 ` Jeff Garzik
2010-01-13 3:50 ` Akinobu Mita
2010-01-13 10:52 ` Jeff Garzik [this message]
2010-01-11 12:25 ` [PATCH 3/6] chunkd: Don't return positive value on stc_get_recv failure Akinobu Mita
2010-01-11 12:25 ` [PATCH 4/6] chcli: Add -i option to read value from file Akinobu Mita
2010-01-11 12:25 ` [PATCH 5/6] chcli: fix error handling for write() Akinobu Mita
2010-01-11 12:25 ` [PATCH 6/6] chunkd: fix PUT/GET for more than 2GB values Akinobu Mita
2010-01-12 14:40 ` Jeff Garzik
2010-01-12 14:38 ` [PATCH 1/6] chunkd: Don't hardcode the prefix length of object pathname Jeff Garzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B4DA5D0.8050708@garzik.org \
--to=jeff@garzik.org \
--cc=akinobu.mita@gmail.com \
--cc=hail-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.