From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Fasheh Date: Tue, 25 Apr 2006 15:24:33 -0700 Subject: [Ocfs2-devel] OCFS2 features RFC In-Reply-To: <20060425215548.GB16170@lst.de> References: <20060425183553.GB10524@ca-server1.us.oracle.com> <20060425215548.GB16170@lst.de> Message-ID: <20060425222433.GC10524@ca-server1.us.oracle.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com On Tue, Apr 25, 2006 at 11:55:48PM +0200, Christoph Hellwig wrote: > On Tue, Apr 25, 2006 at 11:35:53AM -0700, Mark Fasheh wrote: > > -Htree support > > Please not. htree is just the worst possible directory format around. > Do some nice hashed or btree directories, but don't try this odd hack > again. Especially as the only reason it was developed for in ext2/3 > doesn't work very well in a cluster filesystem anyway - to access the > new htree all nodes would have to support the format anyway, so the > whole easy up/downgrade thing doesn't matter at all. Interesting. You make a good point about the up/downgrade code - we certainly couldn't use that (at least not without jumping some hoops). I have to admit that I haven't looked very deeply into htree yet but if it's that bad and we won't be compatible in any case it certainly makes sense to try something new. Would you mind pointing out a few of the htree issues that make it so poor? > > > -Extended attributes: This might be another area where we > > steal^H^H^H^H^Hcopy some good code from Ext3 :) On top of this one can > > trivially implement posix acls. We're not likely to support EA block > > sharing though as it becomes difficult to manage across the cluster. > > again the ext3 implementation might not be the best. I'd say look at > jfs or xfs (in the latter case of course with a less monsterous btree > implementation) I agree the XFS implementation seems a bit overboard... The problem I'm having is that I can't seem to determine what size the average set of extended attributes will be. Basically, as far as I can tell, ext3 will allow about 1 block plus whatever will fit in the inode, minus overhead. We'd like to have inlined EA but want to be able to move them out to a block in the case that the number of extents we need grows to the end of the inode block - this is to avoid having to create an allocation btree. So then if we take the one-block-attached-to-the-inode approach, we'd have a capacity a little less than ext3. I've also noticed that, while the ext3 EA entries are stored in sorted order, the search for them is linear. I wonder if that could be improved upon (or if it even matters if you're just limited to one block). If one block is insufficient, then certainly we need to look at some other format. My first inclination would be to have a single level tree with pointers to leaf nodes stored in hashed order to speed up lookups. --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh at oracle.com