* problems with reiserfs + nfs using 2.4.2-pre4 @ 2001-02-19 0:18 dek_ml 2001-02-19 0:37 ` Neil Brown 2001-02-19 1:55 ` Alan Cox 0 siblings, 2 replies; 22+ messages in thread From: dek_ml @ 2001-02-19 0:18 UTC (permalink / raw) To: linux-kernel Hi, I migrated some exported disks over to reiserfs and had no luck when I mounted the disk via NFS on another machine. I've noticed many messages about reiser and NFS in the archives, but my understanding was that it had been cleared up. In particular, the Configure.help in 2.4.2-pre4 says "reiserfs can be used for anything that ext2 can be used for". Here's my setup: server box running RH71beta, kernel 2.4.2-pre4 compiled with gcc-2.95.2. client box running RH71beta, kernel 2.4.2-pre4 compiled with gcc-2.95.2. On the server, I built a standard kernel with NFS server, LVM, and reiser. On the client, I built a standard kernel with NFS client. On the server, the filesystem was created over an LVM logical volume. When I mounted the server-exported filesystem on the client node, there are no visible files or directories. Reverting to ext2 fixed the problem. I'd like to hear from people who have tried recent 2.4 kernels with RH7.1, NFS, and reiser, to see if anybody has had luck. Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 0:18 problems with reiserfs + nfs using 2.4.2-pre4 dek_ml @ 2001-02-19 0:37 ` Neil Brown 2001-02-19 2:56 ` dek_ml 2001-02-19 1:55 ` Alan Cox 1 sibling, 1 reply; 22+ messages in thread From: Neil Brown @ 2001-02-19 0:37 UTC (permalink / raw) To: dek_ml; +Cc: linux-kernel On Sunday February 18, dek_ml@konerding.com wrote: > > Hi, > > I migrated some exported disks over to reiserfs and had no luck when I > mounted the disk via NFS on another machine. I've noticed many messages > about reiser and NFS in the archives, but my understanding was that > it had been cleared up. In particular, the Configure.help in 2.4.2-pre4 > says "reiserfs can be used for anything that ext2 can be used for". pure marketing! Last I checked, it didn't have quotas either (well, it is available as an extra patch, but they might make incompatable changes later I gather). If you go to http://www.cse.unsw.edu.au/~neilb/patches/linux/2.4.2-pre3/ and pick up patch-B-nfsdops and patch-C-reisernfs you should get reasonable nfs service for reiserfs. Note that this is not final code though. The format of the filehandle will probably change shortly as it doesn't currently contain a generation number. A similar patch is available somewhere under www.namesys.com I believe. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 0:37 ` Neil Brown @ 2001-02-19 2:56 ` dek_ml 2001-02-19 3:40 ` Neil Brown 0 siblings, 1 reply; 22+ messages in thread From: dek_ml @ 2001-02-19 2:56 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel Neil Brown writes: >On Sunday February 18, dek_ml@konerding.com wrote: >> >> Hi, >> >> I migrated some exported disks over to reiserfs and had no luck when I >> mounted the disk via NFS on another machine. I've noticed many messages >> about reiser and NFS in the archives, but my understanding was that >> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4 >> says "reiserfs can be used for anything that ext2 can be used for". > > >If you go to > > http://www.cse.unsw.edu.au/~neilb/patches/linux/2.4.2-pre3/ > >and pick up patch-B-nfsdops and patch-C-reisernfs >you should get reasonable nfs service for reiserfs. >Note that this is not final code though. The format of the filehandle >will probably change shortly as it doesn't currently contain a >generation number. >A similar patch is available somewhere under www.namesys.com I >believe. OK, I grabbed these patches and applied them against 2.4.2-pre4 and recompiled, rebooted. I am now able to use reiserfs with NFS, basic operations appear to work as expected but I haven't done large amounts of file IO or lots of concurrent requests. What is the plan with regards to these patches, or ones like it, making it into the distribution? I noticedAlan Cox just posted the following: >The configure.help is wrong on that and one other thing. NFS doesnt work >without extra patches and big endian boxes dont work with reiserfs currently >Chris - would it be worth sending me a patch that notes the NFS thing in >Configure.help and includes the patch url ? which suggests that the Configure.help mis-statement is going to be corrected but doesn't imply the fix is going in any time soon. I think NFS over Reiser is going to be interesting to people who are running big fileservers, so if this patch really does fix it (without breaking anything else) then itwould make sense for it to go in. Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 2:56 ` dek_ml @ 2001-02-19 3:40 ` Neil Brown 2001-02-19 11:23 ` Alan Cox 0 siblings, 1 reply; 22+ messages in thread From: Neil Brown @ 2001-02-19 3:40 UTC (permalink / raw) To: dek_ml; +Cc: Neil Brown, linux-kernel, nfs On Sunday February 18, dek_ml@konerding.com wrote: > > OK, I grabbed these patches and applied them against 2.4.2-pre4 and > recompiled, rebooted. I am now able to use reiserfs with NFS, > basic operations appear to work as expected but I haven't done large amounts > of file IO or lots of concurrent requests. > > What is the plan with regards to these patches, or ones like it, making it into > the distribution? The changes to knfsd are fairly big and mean that every filesystem type that is to be exported needs to explicitly provide support for knfsd. I have almost completed doing this for ext2fs reiserfs isofs ufs efs which were the only ones that were mentioned when I asked "what filesystem types do you want to export" on the nfs list. The isofs stuff needs more work to be *right*, but it should be at least as good as the current code provides. (i.e. it works as long as things don't get flushed out of the dentry cache). The reiserfs bit needs work to get generation number checking done. This is being worked on by Danilov Nikita. I hope to put out a patch set for testing in a day or so and possibly suggest it to Alan for his -ac series. I don't see it going into 2.4.2, but 2.4.3 might be possible if Linus agrees. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 3:40 ` Neil Brown @ 2001-02-19 11:23 ` Alan Cox 2001-02-20 0:40 ` Neil Brown 0 siblings, 1 reply; 22+ messages in thread From: Alan Cox @ 2001-02-19 11:23 UTC (permalink / raw) To: Neil Brown; +Cc: dek_ml, Neil Brown, linux-kernel, nfs > I hope to put out a patch set for testing in a day or so and possibly > suggest it to Alan for his -ac series. I don't see it going into > 2.4.2, but 2.4.3 might be possible if Linus agrees. Im not interested in a patch that requires NFS is hacked for each file system that tells me the implementation is wrong. The previous setup worked perfectly for everything but reiserfs, so why isnt the newer setup one allowing each fs to override a generic behaviour which is the current working behaviour, thereby meaning all the other fs's work. FS authors shouldnt have to do extra work to support knfsd unless they are doing interesting and unusual things ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 11:23 ` Alan Cox @ 2001-02-20 0:40 ` Neil Brown 2001-02-20 1:15 ` Chris Mason ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Neil Brown @ 2001-02-20 0:40 UTC (permalink / raw) To: Alan Cox; +Cc: dek_ml, linux-kernel, nfs, mason On Monday February 19, alan@lxorguk.ukuu.org.uk wrote: > > I hope to put out a patch set for testing in a day or so and possibly > > suggest it to Alan for his -ac series. I don't see it going into > > 2.4.2, but 2.4.3 might be possible if Linus agrees. > > Im not interested in a patch that requires NFS is hacked for each file system > that tells me the implementation is wrong. The previous setup worked perfectly > for everything but reiserfs, so why isnt the newer setup one allowing each fs > to override a generic behaviour which is the current working behaviour, thereby > meaning all the other fs's work. > > FS authors shouldnt have to do extra work to support knfsd unless they are doing > interesting and unusual things A fair question (it was a question, wasn't it :-) and I will see if I can provide a convincing answer. My answer has three parts: 1/ iget abuse 2/ lookup("..") 3/ requirements to support knfsd. Part 1 also contains some comments about reiserfs for Chris Mason. --------------------------------------------------------------- 1/ The linux-2.4 Todo list has, in section "10. To Do But Non Showstopper" - iget abuse in knfsd Part of the need for change to knfsd and filesystems is to fix this abuse. But what is this abuse? Well.. iget is a (n often misunderstood I think) helper function for filesystems. It helps maintain an inode cache. If a filesystem chooses, it may provide a read_inode super_operation and then call iget whenever it wants to find a particular inode. If iget finds the inode in the cache, it will return it. If it doesn't it will create a new inode, set the i_ino field (and a few others) and pass this to read_inode for the filesystem to finish off. iget handles any locking needed to make sure no inode gets created or read_inode'd twice. It is worth noting that read_inode does *not* have to actually read in the inode, though this is the typical (i.e. ext2fs) thing to do. nfs_read_inode, for example, just initialises a few extra fields in the inode. After the call to iget completes, other parts of the nfs code fills in the rest of the inode from data received from the NFS server. It is also worth noting that not all filesystems use iget/read_inode. msdos/fat maintains it's own inode cache and so doesn't use the one provided by iget. A final point to note - and this is actually quite important - is that the value passed to iget (an inode number) needs to contain enough information to be able to answer "is this the right inode" (recognition) but does not need to be able to answer "where can I find this inode" (recall), though these two go together for ext2 and similar file systems. The case of nfs, already mentioned, shows this. A filehandle is needed to find an inode, but a filehandle is not passed to iget. nfs did need an extension to iget though, and hence created iget4. The assumption that 4bytes is enough to recognise an inode is a bit dated. NFSv3 needs at least 8bytes (because the fileid is 64bit) and Trond decided (quite reasonably) to use 16bytes (including the fsid). Because these cannot be squeezed into a 32bit ino_t, he create iget4 which also had an opaque datum (arbitrary size) and an "actor" which would compare that datum against a given inode to see if there was a match. This works fine, but is a bit awkward as the whole opaque datum is not available for hash calculations. Maybe this can be tided up in 2.5. Note that this extension did not change the fact the iget has enough information to recognise, but not necessarily to recall an inode. When reiserfs came along, it abused this, and re-interpreted the opaque datum to contain information for recalling (locating) an inode - if read_inode2 was defined. I think this is wrong. What reiserfs *should* do (I think) is: - Just pass the 32bit inode number to iget - this seems to be enough to recognise an inode in reiserfs. - have read_inode mark the inode as "incomplete" in some way. - In reiserfs_iget, after calling "iget", it should + lock the inode, + check if it is incomplete or not + if it is incomplete, read in the inode + unlock the inode. This removes the need for read_inode2 and maintains the same concurrency controls. Whether the locking can be done with I_LOCK/i_wait or not I am not sure. Anyway, back to nfsd: nfsd currently gets an inode number out of the file handle and passes it to iget. The above discussion of iget should show that this is the wrong thing to do. In particular, knfsd needs to be able to recall (locate) an inode, and iget only guarantees being able to recognise one. It happens to work for ext2fs, but it is an abuse of the interface and must change. If knfsd is not to use iget, it needs some other interface to use. My patch provides such an interface by defining an nfsd_operations structure which contains encode_fh and decode_fh so that a filesystem can determine exactly how to treat a filehandle for that filesystem. It also provides helper functions so that the filesystem needs to do very little work itself. But it does need to do some as the next section discusses. ------------------------------------------------------ 2/ lookup(".."). The VFS layer requires that every directory in the dcache be properly connected to the root of the filesystem. A particular reason for this is so that it can provide protection against two concurrent diretory renames creating an unreachable loop in the filesystem. See s_vfs_rename_sem and is_subdir. When knfsd maps a filehandle to a directory, the ancestors of this directory may not be in the dcache, and so nfsd has to put them there before it does anything with the directory dentry. To be able to do this it must be able to find the parent of a given directory. However, this is something that no other part of the filesystem code needs to do. You might think that chdir("..") needs to find the parent of a directory, but the filesystem never gets a chance to see the "..". This lookup is performed entirely in the dcache by the VFS layer. See follow_dotdot in fs/namei.c. Most filesystems (i.e. ext2fs) do support lookup("..") and will find the parent inode. isofs is actually an interesting case. It tries to even though there is no need, and it fails :-( So currently knfsd does a lookup of ".." to find the parent directory, and this works fairly well. However the locking issues are a bit complex. Doing the 'lookup("..")' through normal (i.e. VFS, channels) causes a dentry for ".." to be created, which is quite un-natural for the VFS. It breaks rules such as "one dentry per directory". I think that the code in the current kernel that does this gets all the locking right, but I'm not 100% confident, particularly because it is difficult to reason about. It is hard to reason about behaviour when you are breaking rules. In the most recent rewrite of the knfsd code where I started getting the filesystem involved in resolving file handles, I found that it was much easier to be confident of correctness if I got the filesystem to do the lookup("..") underneath the VFS layer. It could be argued that lookup_parent should be an inode_operation instead of an nfsd_operation, but either way, it needs to be exported for knfsd to be able to do it's job properly. ------------------------------------------------------------- 3/ requirements to support knfsd. It is already the case that a filesystem needs to do some special things to support knfsd: - it should maintain a generation number in i_generation so that kfnsd can tell if an inode number has been reused. - it must provide a read_inode routine that completely fills in the inode - the read_inode routine should cope if asked to read an inode that doesn't exist, but returning a bad_inode. - it should support lookup(".."). Before 2.2.14, no filesystem lived up to all of these. Since then, ext2 does, but no other. What is needed for knfsd support with my patches? 1/ A filesystem needs to assert that it is exportable. Doing a lookup by filehandle is quite different from doing a lookup by name, and knfsd needs to know if it is reasonable with a given filesystem. knfsd currently guess exportable filesystems be rejecting: those that have NODEV those that have no read_inode This is fairly adhoc. The NODEV requirement is still there, so knfsd can identify the filesystem, but instead of the read_inode test, a filesystem simply presents a non-NULL nfsd_operations structure. It can leave all fields NULL, but this cannot give full support: REQUIREMENT: set sb->s_nfsd_op to an nfsd_operation structure. The filesystem can define encode_fh and decode_fh itself to do whatever it likes. It can also define get_name and get_dentry to direct the filehandle lookup at a finer grained level. However if it want to use the simple, least "interesting and unusual" approach, it must meet the following requirements: 2/ As discussed above, knfsd needs to lookup(".."), and the vfs interface does not provide a clean way do do that, so: REQUIREMENT: define get_parent in the nfsd_operation structure to be a function which find the parent of a directory. This is typically about the same size as the inode_operation->lookup function. 3/ Any time that an inode number gets reused for a different file, i_generation must be set to a new value (with high degree of probability at least). As I said above, this is already a requirement, and only ext2 meets it. REQUIREMENT: maintain i_generation 4/ As discussed above: REQUIREMENT: read_inode, must completely fill in the inode, and must reject requests for non-existing inodes by returning a bad_inode 5/ If a normal VFS operation builds a dcache path down to a directory at the same time that knfsd is trying to build a dcache path up from that directory, there is a real possibility of clashing and making duplicate dentries for some directories. To guard against this, lookup and get_parent must co-operate with locking to make sure that only one of them attaches a dentry to a given inode. This is easily done by calling a support routine in dcache.c REQUIREMENT: lookup must call d_splice_alias when it finds an inode, and get_parent must call d_make_alias. This may seem like a lot, but several of these are already requirements which most filesystems don't meet, and other are there to tidy-up interfaces and make locking more straight forward. If there anything here has seems unreasonable? NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 0:40 ` Neil Brown @ 2001-02-20 1:15 ` Chris Mason 2001-02-20 1:34 ` Neil Brown 2001-02-20 1:24 ` Alan Cox 2001-02-20 2:38 ` Roman Zippel 2 siblings, 1 reply; 22+ messages in thread From: Chris Mason @ 2001-02-20 1:15 UTC (permalink / raw) To: Neil Brown, Alan Cox; +Cc: dek_ml, linux-kernel, nfs On Tuesday, February 20, 2001 11:40:24 AM +1100 Neil Brown <neilb@cse.unsw.edu.au> wrote: > > When reiserfs came along, it abused this, and re-interpreted the > opaque datum to contain information for recalling (locating) an > inode - if read_inode2 was defined. I think this is wrong. > I suggested to Al Viro a while ago to break iget up into two calls, and then got sucked into something else and didn't follow up. Independently, he came up with the below message during a thread on fs-devel about read_inode. I think it is very similar to what you've described, and it should work. I'm willing to help integrate/code once things settle down for 2.4. His last paragraph is also important, I'd rather see this as a new call...BTW, I believe XFS and GFS actually have 64 bit inode numbers, while reiserfs has a unique 32 bit inode number (objectid) and a unique 32 bit packing locality that are both required to find the object. Cut n' paste from Viro's mail, beware of newline munging: > From: Alexander Viro <viro@math.psu.edu> > To: Steve Whitehouse <Steve@ChyGwyn.com> > cc: linux-fsdevel@vger.kernel.org > Subject: Re: read_inode() extra argument ? > Date-Sent: Tuesday, December 19, 2000 06:41:42 AM -0500 > [snip] > Basically, read_inode() (and iget()) is _not_ a universal API. It's > a conveniency thing for filesystems that are comfortable with it. Notice > that quite a few filesystems do not have ->read_inode() at all. > > If you need more than just an inumber to fill the inode - don't use > iget() at all. It's just a wrong API for that kind of situations. > > In all fairness, ->read_inode() should not be a method. Correct way to > deal with that would be along the lines > > --- fs/inode.c Tue Dec 19 09:42:41 2000 > +++ fs/inode.c.new Tue Dec 19 09:59:37 2000 > @@ -661,22 +661,10 @@ > inode->i_ino = ino; > inode->i_flags = 0; > atomic_set(&inode->i_count, 1); > - inode->i_state = I_LOCK; > + inode->i_state = I_LOCK | I_NEW; > spin_unlock(&inode_lock); > > clean_inode(inode); > - sb->s_op->read_inode(inode); > - > - /* > - * This is special! We do not need the spinlock > - * when clearing I_LOCK, because we're guaranteed > - * that nobody else tries to do anything about the > - * state of the inode when it is locked, as we > - * just created it (so there can be no old holders > - * that haven't tested I_LOCK). > - */ > - inode->i_state &= ~I_LOCK; > - wake_up(&inode->i_wait); > > return inode; > } > @@ -693,6 +681,20 @@ > wait_on_inode(inode); > } > return inode; > +} > + > +void unlock_new_inode(struct inode *inode) > +{ > + /* > + * This is special! We do not need the spinlock > + * when clearing I_LOCK, because we're guaranteed > + * that nobody else tries to do anything about the > + * state of the inode when it is locked, as we > + * just created it (so there can be no old holders > + * that haven't tested I_LOCK). > + */ > + inode->i_state &= ~(I_LOCK|I_NEW); > + wake_up(&inode->i_wait); > } > > static inline unsigned long hash(struct super_block *sb, unsigned long > i_ino) --- fs/ext2/namei.c Tue Dec 19 09:59:54 2000 > +++ fs/ext2/namei.c.new Tue Dec 19 10:01:22 2000 > @@ -175,9 +175,12 @@ > unsigned long ino = le32_to_cpu(de->inode); > brelse (bh); > inode = iget(dir->i_sb, ino); > - > if (!inode) > return ERR_PTR(-EACCES); > + if (inode->i_state & I_NEW) { > + ext2_read_inode(inode); > + unlock_new_inode(inode); > + } > } > d_add(dentry, inode); > return NULL; > > (modulo obvious changes to fs.h, exporting (or inlining) > unlock_new_inode(), etc.) > > The point being: allow filesystems to use whatever means they find > convenient for filling the inode. All we really need from icache is an > analog of iget() that would leave new struct inode locked, recognizable > as new and would _not_ do anything fs-specific. Quite possibly we want a > new function for that leaving iget() as-is so that existing callers would > stay intact - patch above just describes the general idea. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 1:15 ` Chris Mason @ 2001-02-20 1:34 ` Neil Brown 0 siblings, 0 replies; 22+ messages in thread From: Neil Brown @ 2001-02-20 1:34 UTC (permalink / raw) To: Chris Mason; +Cc: linux-kernel, nfs On Monday February 19, mason@suse.com wrote: > > > On Tuesday, February 20, 2001 11:40:24 AM +1100 Neil Brown > <neilb@cse.unsw.edu.au> wrote: > > > > When reiserfs came along, it abused this, and re-interpreted the > > opaque datum to contain information for recalling (locating) an > > inode - if read_inode2 was defined. I think this is wrong. > > > > I suggested to Al Viro a while ago to break iget up into two calls, and > then got sucked into something else and didn't follow up. Independently, > he came up with the below message during a thread on fs-devel about > read_inode. I think it is very similar to what you've described, and it > should work. I'm willing to help integrate/code once things settle down > for 2.4. I must have missed that thread... Yes, what Al Viro suggests is exactly the same idea as mine, except that I suggest leaving iget as it is and getting the filesystem to do a bit of locking. > > His last paragraph is also important, I'd rather see this as a new > call...BTW, I believe XFS and GFS actually have 64 bit inode numbers, while > reiserfs has a unique 32 bit inode number (objectid) and a unique 32 bit > packing locality that are both required to find the object. I think the particular need for a new call is to handle long inode identifiers better. The current iget4 is a bit ugly. If/when a new iget is done to handle long identifiers elegantly, it would probably be good to include the I_NEW stuff as well, but for now (2.4) both long identifiers and delayed fill-in can be done without changing iget. NeilBrown (what's happened to Al Viro anyway, he has been awfully quite lately?) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 0:40 ` Neil Brown 2001-02-20 1:15 ` Chris Mason @ 2001-02-20 1:24 ` Alan Cox 2001-02-20 1:37 ` Neil Brown 2001-02-20 2:11 ` dek_ml 2001-02-20 2:38 ` Roman Zippel 2 siblings, 2 replies; 22+ messages in thread From: Alan Cox @ 2001-02-20 1:24 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, dek_ml, linux-kernel, nfs, mason > This may seem like a lot, but several of these are already > requirements which most filesystems don't meet, and other are there > to tidy-up interfaces and make locking more straight forward. As a 2.5 thing it sounds like a very sensible path. It will also provide some of the operations groundwork needed for file systems that can only use NFS4 temporary handles ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 1:24 ` Alan Cox @ 2001-02-20 1:37 ` Neil Brown 2001-02-20 2:11 ` dek_ml 1 sibling, 0 replies; 22+ messages in thread From: Neil Brown @ 2001-02-20 1:37 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, nfs On Tuesday February 20, alan@lxorguk.ukuu.org.uk wrote: > > This may seem like a lot, but several of these are already > > requirements which most filesystems don't meet, and other are there > > to tidy-up interfaces and make locking more straight forward. > > As a 2.5 thing it sounds like a very sensible path. It will also provide > some of the operations groundwork needed for file systems that can only use > NFS4 temporary handles I had wanted to get it in before 2.4, but didn't quite make it. I guess that means that, from your perspective, nfs exporting of reiserfs remains an extra-kernel patch at least until 2.5 starts up and the code has been tested extensively and it gets backported to 2.4, or maybe even until 2.6/3.0. Such is life. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 1:24 ` Alan Cox 2001-02-20 1:37 ` Neil Brown @ 2001-02-20 2:11 ` dek_ml 2001-02-20 22:54 ` Brian May 1 sibling, 1 reply; 22+ messages in thread From: dek_ml @ 2001-02-20 2:11 UTC (permalink / raw) To: Alan Cox; +Cc: Neil Brown, linux-kernel, nfs, mason, dek_ml Alan Cox writes: >> This may seem like a lot, but several of these are already >> requirements which most filesystems don't meet, and other are there >> to tidy-up interfaces and make locking more straight forward. > >As a 2.5 thing it sounds like a very sensible path. It will also provide >some of the operations groundwork needed for file systems that can only use >NFS4 temporary handles > OK so I think what I can take from this is: for kernel 2.4 in the foreseeable future, reiserfs over NFS won't work without a special patch. And, filesystems other than ext2 in general might not support NFS. This all has to do with internal design decisions, results of people coding against those decisions, possibly abusing the decisions to acheive their own goals, etc, etc, all of which lead to problems when the underlying design decisions changed and broke code which depended on the old decisions. For the foreseeable future I am going to stick with ext2 for my NFS-exported directories. I think these problems need to be carefully stated in the Configure.help for each filesystem which does not support NFS ... Configure.help is often the most authoritative and up-to-date technical description of a particular kernel's capabilities. Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 2:11 ` dek_ml @ 2001-02-20 22:54 ` Brian May 2001-02-20 23:30 ` Chris Mason 0 siblings, 1 reply; 22+ messages in thread From: Brian May @ 2001-02-20 22:54 UTC (permalink / raw) To: linux-kernel >>>>> "dek" == dek ml <dek_ml@konerding.com> writes: dek> OK so I think what I can take from this is: for kernel 2.4 in dek> the foreseeable future, reiserfs over NFS won't work without dek> a special patch. And, filesystems other than ext2 in general Does this apply to the user space NFS server? kernel space NFS server? Or both? -- Brian May <bam@snoopy.apana.org.au> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 22:54 ` Brian May @ 2001-02-20 23:30 ` Chris Mason 0 siblings, 0 replies; 22+ messages in thread From: Chris Mason @ 2001-02-20 23:30 UTC (permalink / raw) To: Brian May, linux-kernel On Wednesday, February 21, 2001 09:54:19 AM +1100 Brian May <bam@snoopy.apana.org.au> wrote: >>>>>> "dek" == dek ml <dek_ml@konerding.com> writes: > > dek> OK so I think what I can take from this is: for kernel 2.4 in > dek> the foreseeable future, reiserfs over NFS won't work without > dek> a special patch. And, filesystems other than ext2 in general > > Does this apply to the user space NFS server? kernel space NFS server? > Or both? Both, but in different ways. Andi Kleen has a set patches for using the userspace NFS server with reiserfs: ftp.suse.com/pub/people/ak/nfs -chris ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 0:40 ` Neil Brown 2001-02-20 1:15 ` Chris Mason 2001-02-20 1:24 ` Alan Cox @ 2001-02-20 2:38 ` Roman Zippel 2001-02-20 9:44 ` [NFS] " Trond Myklebust 2001-02-21 3:02 ` Neil Brown 2 siblings, 2 replies; 22+ messages in thread From: Roman Zippel @ 2001-02-20 2:38 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Cox, dek_ml, linux-kernel, nfs, mason Hi, On Tue, 20 Feb 2001, Neil Brown wrote: > 2/ lookup(".."). A small question: Why exactly is this needed? bye, Roman ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 2:38 ` Roman Zippel @ 2001-02-20 9:44 ` Trond Myklebust 2001-02-20 14:02 ` Roman Zippel 2001-02-21 3:02 ` Neil Brown 1 sibling, 1 reply; 22+ messages in thread From: Trond Myklebust @ 2001-02-20 9:44 UTC (permalink / raw) To: Roman Zippel; +Cc: Neil Brown, Alan Cox, dek_ml, linux-kernel, nfs, mason >>>>> " " == Roman Zippel <zippel@fh-brandenburg.de> writes: > Hi, On Tue, 20 Feb 2001, Neil Brown wrote: >> 2/ lookup(".."). > A small question: Why exactly is this needed? Short answer: the existence of 'rename' makes it necessary, since it means that the directory path is volatile as far as the client is concerned. IIRC several NFS implementations (not Linux though) rely on being able to walk back up the directory tree in order to discover the path at any given moment. Under Linux, our reliance on dentries doesn't allow for directory paths to be volatile, so we don't support it. Instead we end up having to support aliased directories. Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 9:44 ` [NFS] " Trond Myklebust @ 2001-02-20 14:02 ` Roman Zippel 2001-02-20 15:16 ` Trond Myklebust 0 siblings, 1 reply; 22+ messages in thread From: Roman Zippel @ 2001-02-20 14:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Alan Cox, dek_ml, linux-kernel, nfs, mason Hi, On 20 Feb 2001, Trond Myklebust wrote: > IIRC several NFS implementations (not Linux though) rely on being able > to walk back up the directory tree in order to discover the path at > any given moment. If I read the source correctly, namespace operation are done with dir file handle + file name. I'm playing with the idea if we could relax the rule, that all dentries must be connected to the root. Inode to dentry lookups are really evil, e.g. the current code ignores that there might be a fs that supports links to dirs (besides that vfs doesn't support that very well either). What IMO knfsd needs is only a file handle <-> inode operation and as long as the inode is not connected to a dcache entry (i_dentry is empty) it gets a dummy dentry, which is used for further lookups. As soon as a real dentry lookups that inode, we can flush the dummy dentry (small change to d_instantiate()). This would make it possible to support fs, that can't lookup ".." or it would avoid extra checks for fs, that don't have real ".." dir entries. All what a fs needs to do is to generate a 16(?) byte cookie, which can be used to find the inode back (with the default to i_ino + i_generation). This is nothing for 2.4, but IMO something that could be tried with 2.5. bye, Roman PS: /me is searching his fire proof underwear. :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 14:02 ` Roman Zippel @ 2001-02-20 15:16 ` Trond Myklebust 2001-02-20 16:26 ` Roman Zippel 0 siblings, 1 reply; 22+ messages in thread From: Trond Myklebust @ 2001-02-20 15:16 UTC (permalink / raw) To: Roman Zippel; +Cc: Neil Brown, Alan Cox, dek_ml, linux-kernel, nfs, mason >>>>> " " == Roman Zippel <zippel@fh-brandenburg.de> writes: > If I read the source correctly, namespace operation are done > with dir file handle + file name. I'm playing with the idea if > we could relax the rule, that all dentries must be connected to > the root. Inode to dentry lookups are really evil, e.g. the > current code ignores that there might be a fs that supports > links to dirs (besides that vfs doesn't support that very well > either). What IMO knfsd needs is only a file handle <-> inode > operation and as long as the inode is not connected to a dcache > entry (i_dentry is empty) it gets a dummy dentry, which is used > for further lookups. As soon as a real dentry lookups that > inode, we can flush the dummy dentry (small change to > d_instantiate()). This would make it possible to support fs, > that can't lookup ".." or it would avoid extra checks for fs, > that don't have real ".." dir entries. All what a fs needs to > do is to generate a 16(?) byte cookie, which can be used to > find the inode back (with the default to i_ino + i_generation). > This is nothing for 2.4, but IMO something that could be tried > with 2.5. Isn't this more or less the default already in 2.4? If I read the code correctly, we set the dentry d_flag DCACHE_NFSD_DISCONNECTED on such dummy dentries. We only force a lookup of the full path if the inode represents a directory or the NFSEXP_NOSUBTREECHECK export flag is not set. It doesn't seem like a major change to delay that full path lookup of the dentry until nfsd_lookup('..') is actually called (in the case where the 'subtree_check' flag isn't used). However, outright banning lookups of '..' by any one filesystem isn't an option: path lookups are used for a lot more than just `getcwd'. Imagine for instance trying to follow a relative soft link across such a filesystem. Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 15:16 ` Trond Myklebust @ 2001-02-20 16:26 ` Roman Zippel 0 siblings, 0 replies; 22+ messages in thread From: Roman Zippel @ 2001-02-20 16:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Alan Cox, dek_ml, linux-kernel, nfs, mason Hi, On Tue, 20 Feb 2001, Trond Myklebust wrote: > If I read the code correctly, we set the dentry d_flag > DCACHE_NFSD_DISCONNECTED on such dummy dentries. We only force a > lookup of the full path if the inode represents a directory or the > NFSEXP_NOSUBTREECHECK export flag is not set. IMO you can't safely delay the release of the dummy entry without help of vfs. Are these dummy entries always properly released? It seems I forgot about the subtree check, so it seems a fs that can't provide a get_parent, can only be exported completely? > It doesn't seem like a major change to delay that full path lookup of > the dentry until nfsd_lookup('..') is actually called (in the case > where the 'subtree_check' flag isn't used). > However, outright banning lookups of '..' by any one filesystem isn't > an option: path lookups are used for a lot more than just > `getcwd'. Imagine for instance trying to follow a relative soft link > across such a filesystem. AFAIK this is already done in the generic code (in path_walk(), which is also called by vfs_follow_link()). bye, Roman ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-20 2:38 ` Roman Zippel 2001-02-20 9:44 ` [NFS] " Trond Myklebust @ 2001-02-21 3:02 ` Neil Brown 2001-02-21 12:43 ` [NFS] " Trond Myklebust 1 sibling, 1 reply; 22+ messages in thread From: Neil Brown @ 2001-02-21 3:02 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel, nfs On Tuesday February 20, zippel@fh-brandenburg.de wrote: > Hi, > > On Tue, 20 Feb 2001, Neil Brown wrote: > > > 2/ lookup(".."). > > A small question: > Why exactly is this needed? > > bye, Roman Having read the subsequent posts, I now see what you are thinking and know how to answer this. The problem that I see is indeed related to rename, but not in the way that Trond mentions. (Tronds issue is a real, though very different, issue). Suppose I have a directory path /a/b/c/d/e/f/g Where filehandle X refers to /a/b and filehande Y refers to /a/b/c/d/e/f/g Then an NFS request comes in saying RENAME X/c to Y/h If this was performed, then you would end up with a directory path /a/b and a disconnected loop: d/e/f/g/h ^ v +-------+ which is not good. This possibility is protected against by the VFS layer. It serialises all directory renames using s_vfs_rename_sem and then checks that the source of the rename isn't an ancestor of the target. For this check to be reliable, each dentry for a directory must be properly connected to the root. Now if you have a filesystem that: - cannot do ".." lookups efficiently, or doesn't want to and - can protect against this sort of loop (and any other issues that the VFS usually protects against) itself then it can (with my patch) simply define decode_fh and encode_fh and do whatever it likes, such as create a dentry that isn't properly connected. get_parent is only called by nfsd_find_fh_dentry which is a helper routine which is normally called by the decode_fh function (and is called by the default decode_fh function). If a filesystem provides it's own decode_fh which doesn't call nfsd_find_fh_dentry, then get_parent won't be called and isn't needed. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [NFS] Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-21 3:02 ` Neil Brown @ 2001-02-21 12:43 ` Trond Myklebust 0 siblings, 0 replies; 22+ messages in thread From: Trond Myklebust @ 2001-02-21 12:43 UTC (permalink / raw) To: Neil Brown; +Cc: Roman Zippel, linux-kernel, nfs >>>>> " " == Neil Brown <neilb@cse.unsw.edu.au> writes: > - cannot do ".." lookups efficiently, or doesn't want to and > - can protect against this sort of loop (and any other issues > that > the VFS usually protects against) itself > then it can (with my patch) simply define decode_fh and > encode_fh and do whatever it likes, such as create a dentry > that isn't properly connected. Do we really want to design for this though? Being able to look up the parent directory is explicitly encoded in all revisions of the NFS protocol. In NFSv2/v3 the names '.' and '..' have a clearly defined meaning, whereas in NFSv4 you are required to support LOOKUPP. I can't speculate about the needs of other networked filesystems which may want to use this handle interface, but as far NFS is concerned we are not interested in considering such special cases. Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 0:18 problems with reiserfs + nfs using 2.4.2-pre4 dek_ml 2001-02-19 0:37 ` Neil Brown @ 2001-02-19 1:55 ` Alan Cox 2001-02-19 15:37 ` Chris Mason 1 sibling, 1 reply; 22+ messages in thread From: Alan Cox @ 2001-02-19 1:55 UTC (permalink / raw) To: dek_ml; +Cc: linux-kernel > it had been cleared up. In particular, the Configure.help in 2.4.2-pre4 > says "reiserfs can be used for anything that ext2 can be used for". The configure.help is wrong on that and one other thing. NFS doesnt work without extra patches and big endian boxes dont work with reiserfs currently Chris - would it be worth sending me a patch that notes the NFS thing in Configure.help and includes the patch url ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: problems with reiserfs + nfs using 2.4.2-pre4 2001-02-19 1:55 ` Alan Cox @ 2001-02-19 15:37 ` Chris Mason 0 siblings, 0 replies; 22+ messages in thread From: Chris Mason @ 2001-02-19 15:37 UTC (permalink / raw) To: Alan Cox, dek_ml; +Cc: linux-kernel On Monday, February 19, 2001 01:55:57 AM +0000 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: >> it had been cleared up. In particular, the Configure.help in 2.4.2-pre4 >> says "reiserfs can be used for anything that ext2 can be used for". > > The configure.help is wrong on that and one other thing. NFS doesnt work > without extra patches and big endian boxes dont work with reiserfs > currently > > Chris - would it be worth sending me a patch that notes the NFS thing in > Configure.help and includes the patch url ? Yes, it would. I'd rather put a link to www.reiserfs.org, where the quota/NFS/big endian patches will be linked to. Sound good? -chris ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2001-02-21 12:43 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-02-19 0:18 problems with reiserfs + nfs using 2.4.2-pre4 dek_ml 2001-02-19 0:37 ` Neil Brown 2001-02-19 2:56 ` dek_ml 2001-02-19 3:40 ` Neil Brown 2001-02-19 11:23 ` Alan Cox 2001-02-20 0:40 ` Neil Brown 2001-02-20 1:15 ` Chris Mason 2001-02-20 1:34 ` Neil Brown 2001-02-20 1:24 ` Alan Cox 2001-02-20 1:37 ` Neil Brown 2001-02-20 2:11 ` dek_ml 2001-02-20 22:54 ` Brian May 2001-02-20 23:30 ` Chris Mason 2001-02-20 2:38 ` Roman Zippel 2001-02-20 9:44 ` [NFS] " Trond Myklebust 2001-02-20 14:02 ` Roman Zippel 2001-02-20 15:16 ` Trond Myklebust 2001-02-20 16:26 ` Roman Zippel 2001-02-21 3:02 ` Neil Brown 2001-02-21 12:43 ` [NFS] " Trond Myklebust 2001-02-19 1:55 ` Alan Cox 2001-02-19 15:37 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox