From: Mark Fasheh <mark.fasheh@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] OCFS2 features RFC
Date: Tue, 25 Apr 2006 15:24:33 -0700 [thread overview]
Message-ID: <20060425222433.GC10524@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20060425215548.GB16170@lst.de>
On Tue, Apr 25, 2006 at 11:55:48PM +0200, Christoph Hellwig wrote:
> On Tue, Apr 25, 2006 at 11:35:53AM -0700, Mark Fasheh wrote:
> > -Htree support
>
> Please not. htree is just the worst possible directory format around.
> Do some nice hashed or btree directories, but don't try this odd hack
> again. Especially as the only reason it was developed for in ext2/3
> doesn't work very well in a cluster filesystem anyway - to access the
> new htree all nodes would have to support the format anyway, so the
> whole easy up/downgrade thing doesn't matter at all.
Interesting. You make a good point about the up/downgrade code - we
certainly couldn't use that (at least not without jumping some hoops). I
have to admit that I haven't looked very deeply into htree yet but if it's
that bad and we won't be compatible in any case it certainly makes sense to
try something new. Would you mind pointing out a few of the htree issues
that make it so poor?
>
> > -Extended attributes: This might be another area where we
> > steal^H^H^H^H^Hcopy some good code from Ext3 :) On top of this one can
> > trivially implement posix acls. We're not likely to support EA block
> > sharing though as it becomes difficult to manage across the cluster.
>
> again the ext3 implementation might not be the best. I'd say look at
> jfs or xfs (in the latter case of course with a less monsterous btree
> implementation)
I agree the XFS implementation seems a bit overboard... The problem I'm
having is that I can't seem to determine what size the average set of
extended attributes will be. Basically, as far as I can tell, ext3 will
allow about 1 block plus whatever will fit in the inode, minus overhead.
We'd like to have inlined EA but want to be able to move them out to a block
in the case that the number of extents we need grows to the end of the inode
block - this is to avoid having to create an allocation btree. So then if we
take the one-block-attached-to-the-inode approach, we'd have a capacity a
little less than ext3.
I've also noticed that, while the ext3 EA entries are stored in sorted
order, the search for them is linear. I wonder if that could be improved
upon (or if it even matters if you're just limited to one block).
If one block is insufficient, then certainly we need to look at some other
format. My first inclination would be to have a single level tree with
pointers to leaf nodes stored in hashed order to speed up lookups.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
next prev parent reply other threads:[~2006-04-25 22:24 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-25 18:35 [Ocfs2-devel] OCFS2 features RFC Mark Fasheh
2006-04-25 21:55 ` Christoph Hellwig
2006-04-25 22:24 ` Mark Fasheh [this message]
2006-04-26 16:50 ` Daniel Phillips
2006-04-26 4:11 ` Andi Kleen
2006-04-26 18:06 ` Mark Fasheh
2006-04-26 18:08 ` Andi Kleen
2006-04-26 18:34 ` Daniel Phillips
2006-04-27 20:25 ` Paul Taysom
2006-05-03 23:04 ` [Ocfs2-devel] OCFS2 features RFC - separate journal? Daniel Phillips
2006-05-04 0:29 ` Zach Brown
2006-05-04 0:46 ` Daniel Phillips
2006-05-04 20:56 ` Zach Brown
2006-05-04 20:59 ` Wim Coekaerts
2006-05-04 22:23 ` Daniel Phillips
2006-05-04 22:30 ` Mark Fasheh
2006-05-05 3:05 ` Daniel Phillips
2006-05-05 18:25 ` Mark Fasheh
2006-05-06 3:09 ` Daniel Phillips
2006-05-05 17:12 ` Paul Taysom
2006-05-05 18:06 ` Daniel Phillips
2006-05-05 18:57 ` Sunil Mushran
2006-05-08 14:28 ` Paul Taysom
2006-05-08 17:43 ` Daniel Phillips
2006-05-08 18:00 ` Paul Taysom
2006-05-08 18:22 ` Daniel Phillips
2006-05-11 20:04 ` [Ocfs2-devel] OCFS2 features RFC Jeff Mahoney
2006-05-11 20:40 ` Paul Taysom
2006-05-11 20:55 ` Joel Becker
2006-05-11 21:16 ` Daniel Phillips
2006-05-17 1:44 ` Mark Fasheh
[not found] ` <446BBCF5.7040903@google.com>
[not found] ` <20060518024638.GY21588@ca-server1.us.oracle.com>
2006-05-19 0:35 ` Daniel Phillips
2006-05-19 15:16 ` J. Bruce Fields
2006-05-20 6:11 ` Mark Fasheh
2006-05-22 19:18 ` Daniel Phillips
2006-05-22 17:01 ` Paul Taysom
-- strict thread matches above, loose matches on Subject: below --
2006-05-02 18:22 [Ocfs2-devel] OCFS2 Features RFC Brian Long
2006-05-02 20:29 ` Sunil Mushran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060425222433.GC10524@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.