From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Thu, 08 Feb 2007 09:42:38 +0000 Subject: [Cluster-devel] [GFS2 Patch] pass formal ino in do_filldir_main In-Reply-To: <1170887711.7725.14.camel@d-cthon07-w-63.Connectathon.ORG> References: <1170823892.3414.3.camel@d-cthon07-w-63.Connectathon.ORG> <1170845370.11001.417.camel@quoit.chygwyn.com> <1170887711.7725.14.camel@d-cthon07-w-63.Connectathon.ORG> Message-ID: <1170927758.11001.467.camel@quoit.chygwyn.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On Wed, 2007-02-07 at 17:35 -0500, Wendy Cheng wrote: > On Wed, 2007-02-07 at 10:49 +0000, Steven Whitehouse wrote: > > Hi, > > > > I'm not sure that this is the right fix. no_formal_ino is used only as a > > hash value in the lookup, the "real" lookup is always via no_addr. So we > > can alter that hash if required to just use no_addr, but I'd rather not > > have to return no_formal_ino here since we have no way to map that back > > to the disk location of the inode in general. In fact given a maximum > > sized GFS2 filesystem and enough nodes to fill it with inodes in a > > reasonable time (yes, I know thats a lot and a long time!), its not even > > certain to be unique, > > > > NFS won't run well without this patch - I originally included this into > an "if command == nfsd" clause. Later checking into Ken's description > (in cluster-list at redhat.com) about this issue, I think his intention > here was that any ino returned to user space should use no_formal_ino. > And this is exactly what the filldir should be doing (by returning > no_formal_ino back to caller). > I'd rather not use no_formal_ino at all if possible. It is not, in general, unique for any fs which has seen a lot of activity and there is also no way to look up an inode from its no_formal_ino alone without potentially searching the entire fs. To my mind an inode number should be guaranteed to be unique for the lifetime of the inode and also be useable as a reference by which the inode can be looked up (whether thats directly by making it identical to the disk block number or indirectly by using a lookup table, for example), no_formal_ino doesn't appear to satisfy either condition where as no_addr does. If there was an index to map no_formal_ino to a disk block, then I'd agree, but I'd much prefer that we use no_addr in the mean time. I suspect that all you need to do is to alter gfs2_ilookup() and gfs2_iget() to use no_addr rather than no_formal_ino as the hash value. Since that hash is only used internally, there should be no problem in altering it and I think that will result in the same thing. What I've not been able to work out is how your scheme can work in the case that the inode is not already in cache, since filldir doesn't then appear to get passed the no_addr at all. > Another option is temporarily disabling this feature and make this works > as GFS1 (with these two inode numbers identical) ? > > I like the current patch though. > > -- Wendy Well one other option is to use the inode version number that was added recently to each inode. The idea of no_formal_ino is that its not really an inode number at all in the traditional sense, but just a way to ensure that an inode isn't being accessed from a stale filehandle. I added the inode version number with the intention that we'd be able to use it instead of no_formal_ino, eventually, with NFS. One reason is that because the max sized filesystem is 2^64 blocks and the no_formal_ino is also 64 bits in size its quite likely to correlate with the size of the fs and thus generate identical numbers once its wrapped for similar disk offsets. Thats also true for any smaller fs which is a power of two in size I think, but with a large number of rewrites of the fs required in each case. With the version number scheme, the version numbers are unique to a particular resource group so we are getting 2^64/2^32 (for a max sized resource group) = 2^32 (average) reallocations of each block within the rgrp as an inode before the counter will wrap, Steve.