All of lore.kernel.org
 help / color / mirror / Atom feed
From: "U.Mutlu" <for-gmane@mutluit.com>
To: linux-ext4@vger.kernel.org
Subject: Re: Htree concept
Date: Thu, 14 May 2015 04:50:15 +0200	[thread overview]
Message-ID: <mj12h7$nq3$1@ger.gmane.org> (raw)
In-Reply-To: <20150513211854.GA25272@thunk.org>

Theodore Ts'o wrote on 05/13/2015 11:18 PM:
> On Wed, May 13, 2015 at 07:37:36PM +0200, U.Mutlu wrote:
>> I think I slowly grasp how HTree works: it keeps a (rb/avl tree)
>> b*tree-db (I guess it stores it on disk) of the hashes (as keys).
>
> The reason for using hashes is it keeps the fanout of the tree very
> high, which in turn keeps the depth of the tree very short.  This
> means that we can do search a very large directory using at most three
> disk reads (two levels of internal node, where each node can store up
> to 340 hashes plus pointers the next level of the tree, plus a
> directory leaf block).

Yes, I see. I'll do similar in my prj, perhaps adding one more level,
ie. 3 reads to locate an item among about 10 million items (as said
just a toy-fs for fun :-), plus 1 more read for the item itself.

>> In contrast to that here my idea: keep the hdr blocks (ie. where the
>> dir/file names are) always in a sorted order. Then a bsearch should be doable.
>> This would eliminate the need for any b*tree-db usage.
>
> The problem with using a binary search is (a) it's more expensive to
> search each disk read divides the search space in half (in contrast,
> in the best case using htree, the first disk read can divide the
> search space by factor of 340), and (b) insertions are very expensive;
> suppose you have a 400 megabyte directory, and you need to insert a
> filename into the very beginning of the list.  You will have to
> performance 800 megabytes of I/O to make room for directory entry, if
> you want to keep all of the directory entries sorted.

Yes, my initial idea to use bsearch leads to much more disk i/o.
Thx for the info.

-- 
cu
Uenal



      reply	other threads:[~2015-05-14  2:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13 15:37 Htree concept U.Mutlu
2015-05-13 16:24 ` U.Mutlu
2015-05-13 16:29 ` Eric Sandeen
2015-05-13 17:22   ` U.Mutlu
2015-05-13 17:37     ` U.Mutlu
2015-05-13 21:18       ` Theodore Ts'o
2015-05-14  2:50         ` U.Mutlu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='mj12h7$nq3$1@ger.gmane.org' \
    --to=for-gmane@mutluit.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.