Re: Btree directories (Re: Status of HFS+ support)

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: flar@allandria.com
To: dgatwood@deepspace.mklinux.org (David A. Gatwood)
Cc: ak@suse.de (Andi Kleen), lord@sgi.com (Steve Lord),
	matthew@wil.cx (Matthew Wilcox),
	khalfmann@libra.de (Halfmann Klaus),
	roman@augan.com (Roman Zippel),
	linuxppc-dev@lists.linuxppc.org, linux-fsdevel@vger.kernel.org
Subject: Re: Btree directories (Re: Status of HFS+ support)
Date: Wed, 30 Aug 2000 22:49:20 -0700 (PDT)	[thread overview]
Message-ID: <200008310549.WAA03911@marcus.allandria.com> (raw)
In-Reply-To: <Pine.LNX.3.96.1000830133511.10035C-100000@deepspace.mklinux.org> from "David A. Gatwood" at Aug 30, 2000 01:49:49 PM

David A. Gatwood wrote:
> On Wed, 30 Aug 2000 flar@allandria.com wrote:
> > Yes, that's the main problem with HFS/HFS+. All catalog entries are keyed
> > on a combination of parent directory ID and filename, because this is the
> > data MacOS asks for when a file is opened. These entries have to be sorted
> > inside the tree, and the sort rules are a little messy, since the string
> > compares are supposed to be case-insensitive. If you are opening a file,
> > or doing a lookup() it's very convenient. If you are doing readdir() it's
> > a massive headache. Basically you have to find the first entry with a
> > parent ID that matches the directory being read, and then read every
> > entry in order until you find one with a different parent. Because of
> > the sorting issue, it's not just a problem with shrinking the directory,
> > because growing the directory could cause the same problem. In fact,
> > growing or shrinking ANY directory on the filesystem could theoretically
> > change the entire tree by forcing the tree to add a new layer of index
> > nodes and rebalancing the tree. It gets very messy. There's a reason the
> > HFS filesystem code in the kernel has always had a habit of destroying
> > disks any time it has to do anything too complex.
>
> In my mind, this really screams the need for a pure vnode API in addition
> to keeping the current API for more traditional filesystems.  You could
> then keep track of these complex keys with either a table or hashing
> method into slots/buckets with 32 bit identifiers, then provide these to
> the vnode layer.  Sounds much easier than trying to make this look like a
> more traditional fs.

Actually, one of the main points of HFS+ is that it's more like a traditional
UNIX filesystem than HFS was.

> Maybe someone could create a filesystem of type vnodefs that simply
> provides, through the current VFS, a pure vnode API that could be used by
> sub-filesystems (which would then absolutely be required to definitively
> probe for fs type automatically...).  Thoughts?

Doesn't sound any cleaner, and it certainly sounds less efficient.

> Also, let me see if I understand this... HFS+ basically needs a
> hierarchial lock scheme with three states (shared, exclusive, unlocked) at
> each level, right?  And I'm assuming that the tree balancing is for speed
> reasons, and that an unbalanced tree doesn't rise to the level of
> "corruption", right?

Well, I'll try to explain. The tree doesn't have to be perfectly balanced,
but there are some rules to follow. The first rule is that all keys have
to be sorted at all times, although there can be empty space in the catalog
between two keys for adding in new ones. The other thing is that all leaf
nodes in the tree have to have the same height. This causes some interesting
problems when you fill up the currently allocated space, since you have to
restructure the tree pretty considerably in order to create more possible
entries.

Deleting in many cases is cleaner and simpler than adding. In most
cases, when you delete an entry, you only have to shuffle around a
few entries in order to make the tree consistent. Basically, the disk
space for the tree is divided into nodes, and each node can have as
many entries as can fit, but it can only have a single contiguous block
of free space. When the free space is all allocated in a node, you can
rebalance the tree using existing nodes if the tree has become heavily
unbalanced. However, if there is very little free space in the tree,
you have to basically rebuild the thing from scratch with an extra
layer of indexing which makes the tree deeper and thus slower to
search. When a new filesystem is created, it only has the root of
the tree, which is also the only leaf node. When that fills up, you
create a new root which is an index node and split the single leaf
nodes into multiple leaf nodes, with each having a key in the index
node. This continues as long as you need more room for entries in
the tree. It's hard to summarize, but the documentation from Apple
has some great diagrams.

http://developer.apple.com/technotes/tn/tn1150.html

Most things in HFS+ can be mapped easily into VFS, and readdir() is
the only thing that is giving me any real problems. From what I could
tell, readdir() is somewhat messy for every filesystem anyway. I've
been doing my coding on 2.2.x mostly, but I've looked at the 2.4.x
interfaces. I personally think it looks a lot cleaner than the 2.2.x
for many things, particularly for read/write of files. I will in the
near future try to find all the 2.4 documentation that wasn't finished
the last time I looked and see what I can figure out of the stuff that
wasn't obvious from the code.

I'm not really sure I answered your questions, but I hope this explains
some of the strange situations with HFS+ and HFS (which mostly fits the
above description as well)

	Brad Boyer
	flar@pants.nu

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

next prev parent reply	other threads:[~2000-08-31  5:49 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-08-30 12:06 Btree directories (Re: Status of HFS+ support) Steve Lord
2000-08-30 15:07 ` Andi Kleen
2000-08-30 16:07   ` flar
2000-08-30 20:49     ` David A. Gatwood
2000-08-30 20:45       ` Steve Lord
2000-08-30 21:05       ` Alexander Viro
2000-08-30 21:51         ` David A. Gatwood
2000-08-30 21:49           ` Alexander Viro
2000-08-30 22:13             ` David A. Gatwood
2000-08-30 22:17               ` Alexander Viro
2000-08-30 23:11                 ` David A. Gatwood
2000-08-31  0:10                   ` Alexander Viro
2000-08-31  5:49       ` flar [this message]
2000-08-31 10:33         ` Alexander Viro
2000-08-31 17:48           ` flar
2000-08-31 19:54             ` Alexander Viro
2000-08-31 22:09               ` flar
2000-09-01  2:40                 ` Alexander Viro
2000-09-01  3:52                   ` flar
2000-09-02  4:04                     ` Tony Mantler
  -- strict thread matches above, loose matches on Subject: below --
2000-09-01 17:05 Halfmann, Klaus
2000-09-01 17:40 ` flar
2000-08-29 16:40 Status of HFS+ support (was hfs support for blocksize != 512) Halfmann, Klaus
2000-08-29 17:18 ` Btree directories (Re: Status of HFS+ support) Matthew Wilcox
2000-08-29 18:45   ` Steve Lord
2000-08-29 21:25     ` Matthew Wilcox
2000-08-30  2:22   ` flar
2000-08-30  8:35     ` Andi Kleen
2000-08-30 14:25   ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200008310549.WAA03911@marcus.allandria.com \
    --to=flar@allandria.com \
    --cc=ak@suse.de \
    --cc=dgatwood@deepspace.mklinux.org \
    --cc=khalfmann@libra.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linuxppc-dev@lists.linuxppc.org \
    --cc=lord@sgi.com \
    --cc=matthew@wil.cx \
    --cc=roman@augan.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).