* Urgent help needed on an NFS question, please help!!! @ 2006-08-10 5:04 Xin Zhao 2006-08-10 5:11 ` Neil Brown 0 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 5:04 UTC (permalink / raw) To: linux-kernel; +Cc: linux-fsdevel I just ran into a problem about NFS. It might be a fundmental problem of my current work. So please help! I am wondering how NFS guarantees a client didn't get wrong file attributes. Consider the following scenario: Suppose we have an NFS server S and two clients C1 and C2. Now C1 needs to access the file attributes of file X, it first does lookup() to get the file handle of file X. After C1 gets X's file handle and before C1 issues the getattr() request, C2 cuts in. Now C2 deletes file X and creates a new file X1, which has different name but the same inode number and device ID as the nonexistent file X. When C1 issues getattr() with the old file handle, it may get file attribute on wrong file X1. Is this true? If not, how NFS avoid this problem? Please direct me to the code that verifies this. Many many thanks! -x ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 5:04 Urgent help needed on an NFS question, please help!!! Xin Zhao @ 2006-08-10 5:11 ` Neil Brown 2006-08-10 5:54 ` Xin Zhao 2006-08-10 6:04 ` Xin Zhao 0 siblings, 2 replies; 24+ messages in thread From: Neil Brown @ 2006-08-10 5:11 UTC (permalink / raw) To: Xin Zhao; +Cc: linux-kernel, linux-fsdevel On Thursday August 10, uszhaoxin@gmail.com wrote: > I just ran into a problem about NFS. It might be a fundmental problem > of my current work. So please help! > > I am wondering how NFS guarantees a client didn't get wrong file > attributes. Consider the following scenario: > > Suppose we have an NFS server S and two clients C1 and C2. > > Now C1 needs to access the file attributes of file X, it first does > lookup() to get the file handle of file X. > > After C1 gets X's file handle and before C1 issues the getattr() > request, C2 cuts in. Now C2 deletes file X and creates a new file X1, > which has different name but the same inode number and device ID as > the nonexistent file X. > > When C1 issues getattr() with the old file handle, it may get file > attribute on wrong file X1. Is this true? > > If not, how NFS avoid this problem? Please direct me to the code that > verifies this. Generation numbers. When the filesystem creates a new file it assigns a random number as the 'generation' number and stores that in the inode. This gets included in the filehandle, and checked when the filehandle lookup is done. Look for references to 'i_generation' in fs/ext3/* Other files systems may approach this slightly differently, but the filesystem is responsible for providing a unique-over-time filehandle, and 'generation number' is the 'standard' way of doing this. NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 5:11 ` Neil Brown @ 2006-08-10 5:54 ` Xin Zhao 2006-08-10 6:03 ` Neil Brown 2006-08-10 6:04 ` Xin Zhao 1 sibling, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 5:54 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, linux-fsdevel Many thanks for your kind help! Your answer is what I expected. But what frustrated me is that I cannot find the code that verifies the generation number in NFS V3 codes. Do you know where it check the generation number? Thanks, -x On 8/10/06, Neil Brown <neilb@suse.de> wrote: > On Thursday August 10, uszhaoxin@gmail.com wrote: > > I just ran into a problem about NFS. It might be a fundmental problem > > of my current work. So please help! > > > > I am wondering how NFS guarantees a client didn't get wrong file > > attributes. Consider the following scenario: > > > > Suppose we have an NFS server S and two clients C1 and C2. > > > > Now C1 needs to access the file attributes of file X, it first does > > lookup() to get the file handle of file X. > > > > After C1 gets X's file handle and before C1 issues the getattr() > > request, C2 cuts in. Now C2 deletes file X and creates a new file X1, > > which has different name but the same inode number and device ID as > > the nonexistent file X. > > > > When C1 issues getattr() with the old file handle, it may get file > > attribute on wrong file X1. Is this true? > > > > If not, how NFS avoid this problem? Please direct me to the code that > > verifies this. > > Generation numbers. > > When the filesystem creates a new file it assigns a random number > as the 'generation' number and stores that in the inode. > This gets included in the filehandle, and checked when the filehandle > lookup is done. > > Look for references to 'i_generation' in fs/ext3/* > > Other files systems may approach this slightly differently, but the > filesystem is responsible for providing a unique-over-time filehandle, > and 'generation number' is the 'standard' way of doing this. > > NeilBrown > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 5:54 ` Xin Zhao @ 2006-08-10 6:03 ` Neil Brown 2006-08-10 15:15 ` Xin Zhao 0 siblings, 1 reply; 24+ messages in thread From: Neil Brown @ 2006-08-10 6:03 UTC (permalink / raw) To: Xin Zhao; +Cc: linux-kernel, linux-fsdevel On Thursday August 10, uszhaoxin@gmail.com wrote: > Many thanks for your kind help! > > Your answer is what I expected. But what frustrated me is that I > cannot find the code that verifies the generation number in NFS V3 > codes. Do you know where it check the generation number? NFSD doesn't. The individual filesystem does. You need to look in the filesystem code. Some filesystems use common code from fs/exportfs/expfs.c See "export_iget". NeilBrown. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 6:03 ` Neil Brown @ 2006-08-10 15:15 ` Xin Zhao 2006-08-10 16:11 ` Matthew Wilcox 0 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 15:15 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, linux-fsdevel Hi, I am considering another possibility: suppose client C1 does lookup() on file X and gets a file handle, which include inode number, generation number and parent's inode number. Before C1 issues getattr(), C2 move the parent directory to a different place, which will not change the parent's inode number, neither the file X's inode, i_generation. So when C1 issues a getattr() request with this file handle, the server seems to have no way to detect that file X is not existent at the original path. Instead, the server will returns the moved X's attributes, which are correct, but semantically wrong. Is there any way that server deal with this problem? Thanks a lot! -x On 8/10/06, Neil Brown <neilb@suse.de> wrote: > On Thursday August 10, uszhaoxin@gmail.com wrote: > > Many thanks for your kind help! > > > > Your answer is what I expected. But what frustrated me is that I > > cannot find the code that verifies the generation number in NFS V3 > > codes. Do you know where it check the generation number? > > NFSD doesn't. The individual filesystem does. You need to look in > the filesystem code. > > Some filesystems use common code from fs/exportfs/expfs.c > See "export_iget". > > NeilBrown. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 15:15 ` Xin Zhao @ 2006-08-10 16:11 ` Matthew Wilcox 2006-08-10 16:23 ` Xin Zhao 0 siblings, 1 reply; 24+ messages in thread From: Matthew Wilcox @ 2006-08-10 16:11 UTC (permalink / raw) To: Xin Zhao; +Cc: Neil Brown, linux-kernel, linux-fsdevel On Thu, Aug 10, 2006 at 11:15:57AM -0400, Xin Zhao wrote: > I am considering another possibility: suppose client C1 does lookup() > on file X and gets a file handle, which include inode number, > generation number and parent's inode number. Before C1 issues > getattr(), C2 move the parent directory to a different place, which > will not change the parent's inode number, neither the file X's inode, > i_generation. So when C1 issues a getattr() request with this file > handle, the server seems to have no way to detect that file X is not > existent at the original path. Instead, the server will returns the > moved X's attributes, which are correct, but semantically wrong. Is > there any way that server deal with this problem? It isn't semantically wrong. There is no way for the application to distinguish between the events: open() stat() mv and open() mv stat() As long as the results are consistent with the former case, it doesn't matter if the latter case actually happened. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:11 ` Matthew Wilcox @ 2006-08-10 16:23 ` Xin Zhao 2006-08-10 16:54 ` Matthew Wilcox ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: Xin Zhao @ 2006-08-10 16:23 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Neil Brown, linux-kernel, linux-fsdevel That makes sense. Can we make the following two conclusions? 1. In a single machine, inode+dev ID+i_generation can uniquely identify a file 2. Given a stored file handle and an inode object received from the server, an NFS client can safely determine whether this inode corresponds to the file handle by checking the inode+dev+i_generation. Thanks, -x On 8/10/06, Matthew Wilcox <matthew@wil.cx> wrote: > On Thu, Aug 10, 2006 at 11:15:57AM -0400, Xin Zhao wrote: > > I am considering another possibility: suppose client C1 does lookup() > > on file X and gets a file handle, which include inode number, > > generation number and parent's inode number. Before C1 issues > > getattr(), C2 move the parent directory to a different place, which > > will not change the parent's inode number, neither the file X's inode, > > i_generation. So when C1 issues a getattr() request with this file > > handle, the server seems to have no way to detect that file X is not > > existent at the original path. Instead, the server will returns the > > moved X's attributes, which are correct, but semantically wrong. Is > > there any way that server deal with this problem? > > It isn't semantically wrong. There is no way for the application to > distinguish between the events: > > open() > stat() > mv > > and > > open() > mv > stat() > > As long as the results are consistent with the former case, it doesn't > matter if the latter case actually happened. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:23 ` Xin Zhao @ 2006-08-10 16:54 ` Matthew Wilcox 2006-08-10 17:08 ` Xin Zhao 2006-08-10 17:28 ` Trond Myklebust ` (2 subsequent siblings) 3 siblings, 1 reply; 24+ messages in thread From: Matthew Wilcox @ 2006-08-10 16:54 UTC (permalink / raw) To: Xin Zhao; +Cc: Neil Brown, linux-kernel, linux-fsdevel On Thu, Aug 10, 2006 at 12:23:12PM -0400, Xin Zhao wrote: > That makes sense. > > Can we make the following two conclusions? > 1. In a single machine, inode+dev ID+i_generation can uniquely identify a > file sure. > 2. Given a stored file handle and an inode object received from the > server, an NFS client can safely determine whether this inode > corresponds to the file handle by checking the inode+dev+i_generation. The NFS client makes up its own inode numbers for use on the local machine. It doesn't know the device+inode+generation numbers on the server (and indeed, the server may not even have the concepts of inodes). To quote RFC 1813: The file handle contains all the information the server needs to distinguish an individual file. To the client, the file handle is opaque. The client stores file handles for use in a later request and can compare two file handles from the same server for equality by doing a byte-by-byte comparison, but cannot otherwise interpret the contents of file handles. If two file handles from the same server are equal, they must refer to the same file, but if they are not equal, no conclusions can be drawn. Servers should try to maintain a one-to-one correspondence between file handles and files, but this is not required. Clients should use file handle comparisons only to improve performance, not for correct behavior. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:54 ` Matthew Wilcox @ 2006-08-10 17:08 ` Xin Zhao 2006-08-10 17:38 ` Trond Myklebust 0 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 17:08 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Neil Brown, linux-kernel, linux-fsdevel Well. For regular NFS, because it needs to consider interoperability, it cannot use file handle as an opaque object. However, in our case, we essentially derived a VM based data sharing infrastructure from NFS. This would allow multiple virtual machines in a single server to share data efficiently. With some tricks, we are able to export inode cache from server to client. Also, we modify the file handle composer to carry the server-side inode address, inode number, i_gen, dev along with a file handle. Upon receiving a file handle, a client can directly access the inode object in the exported inode cache and bypass the inter-VM communication. So, in our case, we don't need to consider interoperability (at least for now), and we DO know the inode number, generation, as well as exported device info. I think this explains why I want to make sure the conclusion is right: Conclusion: Given a stored file handle and an inode object received from the server, an NFS client can safely determine whether this inode corresponds to the file handle by checking the inode+dev+i_generation. Many thanks for this helpful discussion. Xin On 8/10/06, Matthew Wilcox <matthew@wil.cx> wrote: > On Thu, Aug 10, 2006 at 12:23:12PM -0400, Xin Zhao wrote: > > That makes sense. > > > > Can we make the following two conclusions? > > 1. In a single machine, inode+dev ID+i_generation can uniquely identify a > > file > > sure. > > > 2. Given a stored file handle and an inode object received from the > > server, an NFS client can safely determine whether this inode > > corresponds to the file handle by checking the inode+dev+i_generation. > > The NFS client makes up its own inode numbers for use on the local > machine. It doesn't know the device+inode+generation numbers on the > server (and indeed, the server may not even have the concepts of > inodes). To quote RFC 1813: > > The file handle contains all the information the server needs to > distinguish an individual file. To the client, the file handle is > opaque. The client stores file handles for use in a later request > and can compare two file handles from the same server for equality by > doing a byte-by-byte comparison, but cannot otherwise interpret the > contents of file handles. If two file handles from the same server > are equal, they must refer to the same file, but if they are not > equal, no conclusions can be drawn. Servers should try to maintain > a one-to-one correspondence between file handles and files, but this > is not required. Clients should use file handle comparisons only to > improve performance, not for correct behavior. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 17:08 ` Xin Zhao @ 2006-08-10 17:38 ` Trond Myklebust 0 siblings, 0 replies; 24+ messages in thread From: Trond Myklebust @ 2006-08-10 17:38 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel On Thu, 2006-08-10 at 13:08 -0400, Xin Zhao wrote: > Well. For regular NFS, because it needs to consider interoperability, > it cannot use file handle as an opaque object. > > However, in our case, we essentially derived a VM based data sharing > infrastructure from NFS. This would allow multiple virtual machines in > a single server to share data efficiently. With some tricks, we are > able to export inode cache from server to client. Also, we modify the > file handle composer to carry the server-side inode address, inode > number, i_gen, dev along with a file handle. Upon receiving a file > handle, a client can directly access the inode object in the exported > inode cache and bypass the inter-VM communication. The correct way to do this sort of thing is to use pNFS, which has protocol support for this sort of thing, and is part of the draft NFSv4 minor version 1 specification. See http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-04.txt Cheers, Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:23 ` Xin Zhao 2006-08-10 16:54 ` Matthew Wilcox @ 2006-08-10 17:28 ` Trond Myklebust 2006-08-10 18:02 ` Xin Zhao 2006-08-10 17:50 ` Bryan Henderson 2006-08-10 21:00 ` Peter Staubach 3 siblings, 1 reply; 24+ messages in thread From: Trond Myklebust @ 2006-08-10 17:28 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel On Thu, 2006-08-10 at 12:23 -0400, Xin Zhao wrote: > That makes sense. > > Can we make the following two conclusions? > 1. In a single machine, inode+dev ID+i_generation can uniquely identify a file Not really. The device id is frequently subject to change on server reboot or device disconnect/reconnect. > 2. Given a stored file handle and an inode object received from the > server, an NFS client can safely determine whether this inode > corresponds to the file handle by checking the inode+dev+i_generation. No! The file handle is an opaque bag of bytes as far as clients are concerned. If you change the server, then the filehandle format can and will change. On linux, even changing the setting of the subtree_checking export option will suffice to change the filehandle. Cheers, Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 17:28 ` Trond Myklebust @ 2006-08-10 18:02 ` Xin Zhao 2006-08-10 19:59 ` Trond Myklebust 0 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 18:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel Thanks. Trond. The device is subject to change when server reboot? I don't quite understand. If the backing device at the server side is not changed, how come server reboot will cause device ID change? One possibilty that can cause device ID to change is exported device change AFTER server reboots. But this can be detected by adding a server generation number or device generation number. So maybe we can say: "In a single machine, inode+dev ID+i_generation+server_generation can uniquely identify a file". Is this true? About your comment on the second conclusion, I already explained in one of my previous email. We assume that both server and clients are under our control. That is, we don't consider too much about interoperability. The file handle format will be static even the NFS server is changed. Actually, in our inter-VM inode sharing scheme, we don't even care about the normal file handle contents. Instead, we only check our extended fields, which include: server-side inode address, ino, dev info, i_generation and server_generation. An NFS client first uses the server-side inode address to locate the inode object in the server inode cache (we dynamically remapped the inode cache into the client, in order to expedite metadata retrieval and bypass inter-VM communication). After getting the inode object, the NFS client has to validate this inode object corresponds to the file handle so that it can read the right file attributes stored in the inode. There are many possibilities that can cause a located inode stores false information: the inode has been released because someone on the server remove the file, the inode was filled by another file's inode (other possibilities?). So we must validate the inode before using the file attributes retrieved from the mapped inode. That's why we bring up this question. Also, does someone compare NFS v4's delegation mechanism with the speculative execution mechanism proposed in SOSP 2005 http://www.cs.cmu.edu/~dga/15-849/papers/speculator-sosp2005.pdf? What are the pros and cons of these two mechanisms? I put the content of my previous email below. ----My previous email --- Well. For regular NFS, because it needs to consider interoperability, it cannot use file handle as an opaque object. However, in our case, we essentially derived a VM based data sharing infrastructure from NFS. This would allow multiple virtual machines in a single server to share data efficiently. With some tricks, we are able to export inode cache from server to client. Also, we modify the file handle composer to carry the server-side inode address, inode number, i_gen, dev along with a file handle. Upon receiving a file handle, a client can directly access the inode object in the exported inode cache and bypass the inter-VM communication. So, in our case, we don't need to consider interoperability (at least for now), and we DO know the inode number, generation, as well as exported device info. I think this explains why I want to make sure the conclusion is right: Conclusion: Given a stored file handle and an inode object received from the server, an NFS client can safely determine whether this inode corresponds to the file handle by checking the inode+dev+i_generation. Many thanks for this helpful discussion. On 8/10/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Thu, 2006-08-10 at 12:23 -0400, Xin Zhao wrote: > > That makes sense. > > > > Can we make the following two conclusions? > > 1. In a single machine, inode+dev ID+i_generation can uniquely identify a file > > Not really. The device id is frequently subject to change on server > reboot or device disconnect/reconnect. > > > 2. Given a stored file handle and an inode object received from the > > server, an NFS client can safely determine whether this inode > > corresponds to the file handle by checking the inode+dev+i_generation. > > No! The file handle is an opaque bag of bytes as far as clients are > concerned. If you change the server, then the filehandle format can and > will change. On linux, even changing the setting of the subtree_checking > export option will suffice to change the filehandle. > > Cheers, > Trond > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 18:02 ` Xin Zhao @ 2006-08-10 19:59 ` Trond Myklebust 2006-08-10 22:25 ` Xin Zhao ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Trond Myklebust @ 2006-08-10 19:59 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel On Thu, 2006-08-10 at 14:02 -0400, Xin Zhao wrote: > Thanks. Trond. > > The device is subject to change when server reboot? I don't quite > understand. If the backing device at the server side is not changed, > how come server reboot will cause device ID change? Things like USB, firewire, and fibre channel allocate their device ids on the fly. There is no such thing as a fixed device id in those cases. > About your comment on the second conclusion, I already explained in > one of my previous email. We assume that both server and clients are > under our control. That is, we don't consider too much about > interoperability. The file handle format will be static even the NFS > server is changed. Actually, in our inter-VM inode sharing scheme, we > don't even care about the normal file handle contents. Instead, we > only check our extended fields, which include: server-side inode > address, ino, dev info, i_generation and server_generation. An NFS > client first uses the server-side inode address to locate the inode > object in the server inode cache (we dynamically remapped the inode > cache into the client, in order to expedite metadata retrieval and > bypass inter-VM communication). After getting the inode object, the > NFS client has to validate this inode object corresponds to the file > handle so that it can read the right file attributes stored in the > inode. There are many possibilities that can cause a located inode > stores false information: the inode has been released because someone > on the server remove the file, the inode was filled by another file's > inode (other possibilities?). So we must validate the inode before > using the file attributes retrieved from the mapped inode. > > That's why we bring up this question. Why do this, when people are working on standards and implementations for doing precisely the above within the NFSv4 protocol? > Also, does someone compare NFS v4's delegation mechanism with the > speculative execution mechanism proposed in SOSP 2005 > http://www.cs.cmu.edu/~dga/15-849/papers/speculator-sosp2005.pdf? > > What are the pros and cons of these two mechanisms? Delegations are all about caching. This paper appears to be about getting round the bottlenecks due to synchronous operations. How are the two issues related? Cheers, Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 19:59 ` Trond Myklebust @ 2006-08-10 22:25 ` Xin Zhao 2006-08-11 0:44 ` Trond Myklebust 2006-08-10 22:28 ` Xin Zhao 2006-08-10 23:42 ` Bryan Henderson 2 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 22:25 UTC (permalink / raw) To: Trond Myklebust; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel The inter-VM inode helps reduce communication cost used to retrieve file attributes in a VM environment. In a network environment, it is possible for a client to direct see the inode caches of the server. But in the virtual server environment, where both client and server running on the same physical host, this would be possible. If clients have read-only access to server's inode cache, they can directly retrieve file attributes without incurring expensive getattr() rpc call. Of couse the delegation is able to allow a client to trust local cached file attributes without worry about server change. But this only works when file is not shared by multiple clients. Right? Does NFS4 has some other mechanisms that can further improve performance on metadata access? Thanks, -x On 8/10/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Thu, 2006-08-10 at 14:02 -0400, Xin Zhao wrote: > > Thanks. Trond. > > > > The device is subject to change when server reboot? I don't quite > > understand. If the backing device at the server side is not changed, > > how come server reboot will cause device ID change? > > Things like USB, firewire, and fibre channel allocate their device ids > on the fly. There is no such thing as a fixed device id in those cases. > > > About your comment on the second conclusion, I already explained in > > one of my previous email. We assume that both server and clients are > > under our control. That is, we don't consider too much about > > interoperability. The file handle format will be static even the NFS > > server is changed. Actually, in our inter-VM inode sharing scheme, we > > don't even care about the normal file handle contents. Instead, we > > only check our extended fields, which include: server-side inode > > address, ino, dev info, i_generation and server_generation. An NFS > > client first uses the server-side inode address to locate the inode > > object in the server inode cache (we dynamically remapped the inode > > cache into the client, in order to expedite metadata retrieval and > > bypass inter-VM communication). After getting the inode object, the > > NFS client has to validate this inode object corresponds to the file > > handle so that it can read the right file attributes stored in the > > inode. There are many possibilities that can cause a located inode > > stores false information: the inode has been released because someone > > on the server remove the file, the inode was filled by another file's > > inode (other possibilities?). So we must validate the inode before > > using the file attributes retrieved from the mapped inode. > > > > That's why we bring up this question. > > Why do this, when people are working on standards and implementations > for doing precisely the above within the NFSv4 protocol? > > > Also, does someone compare NFS v4's delegation mechanism with the > > speculative execution mechanism proposed in SOSP 2005 > > http://www.cs.cmu.edu/~dga/15-849/papers/speculator-sosp2005.pdf? > > > > What are the pros and cons of these two mechanisms? > > Delegations are all about caching. This paper appears to be about > getting round the bottlenecks due to synchronous operations. How are the > two issues related? > > Cheers, > Trond > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 22:25 ` Xin Zhao @ 2006-08-11 0:44 ` Trond Myklebust 0 siblings, 0 replies; 24+ messages in thread From: Trond Myklebust @ 2006-08-11 0:44 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel On Thu, 2006-08-10 at 18:25 -0400, Xin Zhao wrote: > The inter-VM inode helps reduce communication cost used to retrieve > file attributes in a VM environment. In a network environment, it is > possible for a client to direct see the inode caches of the server. > But in the virtual server environment, where both client and server > running on the same physical host, this would be possible. > > If clients have read-only access to server's inode cache, they can > directly retrieve file attributes without incurring expensive > getattr() rpc call. Of couse the delegation is able to allow a client > to trust local cached file attributes without worry about server > change. But this only works when file is not shared by multiple > clients. Right? Does NFS4 has some other mechanisms that can further > improve performance on metadata access? Not metadata access, no. That would require some seriously messy locking rules. It improves performance by allowing a client to access the block device directly for data reads and writes if it has the capability of doing so. Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 19:59 ` Trond Myklebust 2006-08-10 22:25 ` Xin Zhao @ 2006-08-10 22:28 ` Xin Zhao 2006-08-11 0:38 ` Trond Myklebust 2006-08-10 23:42 ` Bryan Henderson 2 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 22:28 UTC (permalink / raw) To: Trond Myklebust; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel Also, delegations are about caching. That's true. It improve NFS performance because a client with a lease does not need to worry about server change and can manipulate files using local cache. But if speculative execution can achieve the same goal without incurring the cost of lease renewal and revoke, delegation becomes less useful. So my question is essentially: if speculative execution is there, why do we still need delegation? Can delegation do anything better? Xin On 8/10/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Thu, 2006-08-10 at 14:02 -0400, Xin Zhao wrote: > > Thanks. Trond. > > > > The device is subject to change when server reboot? I don't quite > > understand. If the backing device at the server side is not changed, > > how come server reboot will cause device ID change? > > Things like USB, firewire, and fibre channel allocate their device ids > on the fly. There is no such thing as a fixed device id in those cases. > > > About your comment on the second conclusion, I already explained in > > one of my previous email. We assume that both server and clients are > > under our control. That is, we don't consider too much about > > interoperability. The file handle format will be static even the NFS > > server is changed. Actually, in our inter-VM inode sharing scheme, we > > don't even care about the normal file handle contents. Instead, we > > only check our extended fields, which include: server-side inode > > address, ino, dev info, i_generation and server_generation. An NFS > > client first uses the server-side inode address to locate the inode > > object in the server inode cache (we dynamically remapped the inode > > cache into the client, in order to expedite metadata retrieval and > > bypass inter-VM communication). After getting the inode object, the > > NFS client has to validate this inode object corresponds to the file > > handle so that it can read the right file attributes stored in the > > inode. There are many possibilities that can cause a located inode > > stores false information: the inode has been released because someone > > on the server remove the file, the inode was filled by another file's > > inode (other possibilities?). So we must validate the inode before > > using the file attributes retrieved from the mapped inode. > > > > That's why we bring up this question. > > Why do this, when people are working on standards and implementations > for doing precisely the above within the NFSv4 protocol? > > > Also, does someone compare NFS v4's delegation mechanism with the > > speculative execution mechanism proposed in SOSP 2005 > > http://www.cs.cmu.edu/~dga/15-849/papers/speculator-sosp2005.pdf? > > > > What are the pros and cons of these two mechanisms? > > Delegations are all about caching. This paper appears to be about > getting round the bottlenecks due to synchronous operations. How are the > two issues related? > > Cheers, > Trond > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 22:28 ` Xin Zhao @ 2006-08-11 0:38 ` Trond Myklebust 0 siblings, 0 replies; 24+ messages in thread From: Trond Myklebust @ 2006-08-11 0:38 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel On Thu, 2006-08-10 at 18:28 -0400, Xin Zhao wrote: > Also, delegations are about caching. That's true. It improve NFS > performance because a client with a lease does not need to worry about > server change and can manipulate files using local cache. But if > speculative execution can achieve the same goal without incurring the > cost of lease renewal and revoke, delegation becomes less useful. What am I missing? AFAICS the main purpose of speculative execution would appear to be to reduce the latency of syscall execution on clients. That doesn't suffice to replace caching even by a long shot. Delegations are all about _not_ sending commands to the server when you don't need to. They make NFS scale to larger numbers of clients. > So my question is essentially: if speculative execution is there, why > do we still need delegation? Can delegation do anything better? Speculative execution is where? I see one academic paper detailing a couple of lab experiments, but no published code. Do you know of anyone who has reproduced these results in real life environments? I'm particularly curious to see how they resolved the requirement that "...speculative state should never be visible to the user or any external device.". The fact that they need to discuss having to roll back operations like "mkdir", which create (very) user-visible state on the server, is rather telling... Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 19:59 ` Trond Myklebust 2006-08-10 22:25 ` Xin Zhao 2006-08-10 22:28 ` Xin Zhao @ 2006-08-10 23:42 ` Bryan Henderson 2 siblings, 0 replies; 24+ messages in thread From: Bryan Henderson @ 2006-08-10 23:42 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel, Matthew Wilcox, Neil Brown, Xin Zhao >Things like USB, firewire, and fibre channel allocate their device ids >on the fly. There is no such thing as a fixed device id in those cases. Also good old ATA and parallel SCSI. These are more stable than the ones where you routinely plug stuff in and out, but still the device numbers are chosen at each boot, typically according to order of discovery so that if a different set of devices is operational at boot time, the device numbers of the disk devices will be different. There's also the issue of moving a device -- medium and all -- to a different place in the configuration and moving a filesystem from one device to another. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:23 ` Xin Zhao 2006-08-10 16:54 ` Matthew Wilcox 2006-08-10 17:28 ` Trond Myklebust @ 2006-08-10 17:50 ` Bryan Henderson 2006-08-10 18:15 ` Xin Zhao 2006-08-10 21:00 ` Peter Staubach 3 siblings, 1 reply; 24+ messages in thread From: Bryan Henderson @ 2006-08-10 17:50 UTC (permalink / raw) To: Xin Zhao; +Cc: linux-fsdevel, Matthew Wilcox, Neil Brown >Can we make the following two conclusions? >1. In a single machine, inode+dev ID+i_generation can uniquely identify a file Without knowing the context, it's hard to say how picky you want to be, but here are some reasons the conclusion wouldn't be valid: - In some filesystems, the inode number isn't as unique and stable as you'd like it to be. In these, the inode number has no internal use and is just something made up to satisfy stat(). And no, it doesn't satisfy all requirements of stat(), but there's a compromise happening. This compromise is especially visible in filesystems that can have more than 4G filesystem objects, because inode numbers are 32 bits. Consequently, NFS file handles for filesystem objects in these filesystems do not involve inode numbers. - In some filesystems, there is no device number to speak of. There's a made-up one to satisfy Linux structures designed in the days that a filesystem lived on a disk, but you have to define some restrictions to consider that device number usable to uniquely identify a file. For this reason and others, Linux doesn't always use a device number in a file handle. In general, it uses an "export ID." - In a traditional filesystem such as ext3, you have to define some restrictions in order for a device number to help uniquely identify a file. That's because in general, a filesystem's device number can change, especially because a disk device's device number can change. >2. Given a stored file handle and an inode object received from the >server, an NFS client can safely determine whether this inode >corresponds to the file handle by checking the inode+dev+i_generation. There's an unsettling theme in your questions that suggests that the NFS protocol involves inodes and inode numbers. It doesn't, and I think assuming it does can lead you down some bad paths. There is no inode number mentioned in NFS specs, and an NFS client doesn't know what the inode number of a file on the server is. The individual server decides how to construct an NFS file handle. In the earliest implementations of NFS servers, and continuing today in NFS serving of an ext3 filesystem, the server decides to use inode numbers and device numbers to construct the NFS file handle. But it doesn't have to. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 17:50 ` Bryan Henderson @ 2006-08-10 18:15 ` Xin Zhao 2006-08-11 0:07 ` Bryan Henderson 0 siblings, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 18:15 UTC (permalink / raw) To: Bryan Henderson; +Cc: linux-fsdevel, Matthew Wilcox, Neil Brown Hi, Bryan, You gave several good points here. But can you be more specific: what file system does not use unique inode? I am sorry for asking this dumb question. But I am really eager to konw that. Also, you mentioned that some filesystems have no device number. I guess you are talking about ramfs or similar file system, right? In our context, we don't consider this possibility. I know many people dislike to use inode number or device number in file handle. NFS specs also avoid mentioning inodes, because it wants to provide maximum interoperability. But our system is essentially a NFS derived file system dedicated to sharing data across multiple VMs in a single virtual server. So we can safely assume that both NFS client and server are under our control, and no further interoperabilitiy needs to be considered. But your points seem to be valid as long as you can give more details on "some file systems". ;-) -x On 8/10/06, Bryan Henderson <hbryan@us.ibm.com> wrote: > >Can we make the following two conclusions? > >1. In a single machine, inode+dev ID+i_generation can uniquely identify a > file > > Without knowing the context, it's hard to say how picky you want to be, > but here are some reasons the conclusion wouldn't be valid: > > - In some filesystems, the inode number isn't as unique and stable as > you'd like it to be. In these, the inode > number has no internal use and is just something made up to satisfy > stat(). And no, it doesn't satisfy all > requirements of stat(), but there's a compromise happening. This > compromise is especially visible in > filesystems that can have more than 4G filesystem objects, because > inode numbers are 32 bits. > > Consequently, NFS file handles for filesystem objects in these > filesystems do not involve inode numbers. > > - In some filesystems, there is no device number to speak of. There's a > made-up one to satisfy Linux structures > designed in the days that a filesystem lived on a disk, but you have > to define some restrictions to consider > that device number usable to uniquely identify a file. > > For this reason and others, Linux doesn't always use a device number > in a file handle. In general, it uses an > "export ID." > > - In a traditional filesystem such as ext3, you have to define some > restrictions in order for a device number to > help uniquely identify a file. That's because in general, a > filesystem's device number can change, especially > because a disk device's device number can change. > > >2. Given a stored file handle and an inode object received from the > >server, an NFS client can safely determine whether this inode > >corresponds to the file handle by checking the inode+dev+i_generation. > > There's an unsettling theme in your questions that suggests that the NFS > protocol involves inodes and inode numbers. It doesn't, and I think > assuming it does can lead you down some bad paths. There is no inode > number mentioned in NFS specs, and an NFS client doesn't know what the > inode number of a file on the server is. The individual server decides > how to construct an NFS file handle. In the earliest implementations of > NFS servers, and continuing today in NFS serving of an ext3 filesystem, > the server decides to use inode numbers and device numbers to construct > the NFS file handle. But it doesn't have to. > > -- > Bryan Henderson IBM Almaden Research Center > San Jose CA Filesystems > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 18:15 ` Xin Zhao @ 2006-08-11 0:07 ` Bryan Henderson 0 siblings, 0 replies; 24+ messages in thread From: Bryan Henderson @ 2006-08-11 0:07 UTC (permalink / raw) To: Xin Zhao; +Cc: linux-fsdevel, Matthew Wilcox, Neil Brown >But your points seem to be valid as long as you can give more details >on "some file systems". ;-) I don't think I can do that. I couldn't even list ALL the filesystem types, much less tell which ones are implemented which way -- in which versions of Linux. I'd embarass myself if I tried. I can barely remember the generalities; I don't find the actual filesystem type inventory to be very interesting myself. If you're willing to build something that doesn't work with Linux filesystems in general, but just works with the "important" ones, then it makes more sense to go the other way -- identify the ones you care about and ask if they fit the assumptions you require. I believe ext3 everywhere it exists today meets your expectations for inodes and device numbers -- if you add some conditions so that the device number - filesystem association is permanent. Again being general, but a little less, I can say that the inode assumption clearly doesn't work on a system that has more than 4G files. I would suspect the various supercomputing cluster filesystems, and I heard a long time ago AFS broke the 32 bit inode number barrier. Filesystems imported from non-Unix places often have trouble synthesizing a Unix inode number. Finding a filesystem without a useful normal device number to identify it is a lot easier. Any filesystem that doesn't live by design on a single disk device: network filesystem, distributed filesystem, multi-volume filesystem, non-storage filesystem (like proc), or unconventional storage filesystem such as ramfs or tmpfs. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 16:23 ` Xin Zhao ` (2 preceding siblings ...) 2006-08-10 17:50 ` Bryan Henderson @ 2006-08-10 21:00 ` Peter Staubach 3 siblings, 0 replies; 24+ messages in thread From: Peter Staubach @ 2006-08-10 21:00 UTC (permalink / raw) To: Xin Zhao; +Cc: Matthew Wilcox, Neil Brown, linux-kernel, linux-fsdevel Xin Zhao wrote: > That makes sense. > > Can we make the following two conclusions? > 1. In a single machine, inode+dev ID+i_generation can uniquely > identify a file > 2. Given a stored file handle and an inode object received from the > server, an NFS client can safely determine whether this inode > corresponds to the file handle by checking the inode+dev+i_generation. > #1 seems to safe enough to assume. #2 either doesn't make sense to me or is assuming things about the file handle that the client is not allowed to assume. A file handle is an opaque string of bytes to the client. The only entity allowed to interpret the contents is the entity which generated the file handle. --- Is this situation any different than an application opens file, "A". Another process then renames "A" to "B". Now, the original application is reading and writing from and to a file called "B" and has no knowledge of this. --- The bottom line is that the file handle uniquely identifies a particular entity on a file system on the server. The name of the entity does not matter. Thanx... ps ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 5:11 ` Neil Brown 2006-08-10 5:54 ` Xin Zhao @ 2006-08-10 6:04 ` Xin Zhao 2006-08-10 6:15 ` Xin Zhao 1 sibling, 1 reply; 24+ messages in thread From: Xin Zhao @ 2006-08-10 6:04 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, linux-fsdevel I think nfs_compare_fh() might do the file handle verification task. However, it is still possible that AFTER C1 gets a valid file handle, BUT BEFORE C1 sends out the getattr() request, C2 deletes file X and creates a different file X1 which has the same inode number. Looks like the server side must verify the generation number carried in the file handle. Unfortunately, I didn't find this code at the server side. Any further insight on this? Thanks, Xin On 8/10/06, Neil Brown <neilb@suse.de> wrote: > On Thursday August 10, uszhaoxin@gmail.com wrote: > > I just ran into a problem about NFS. It might be a fundmental problem > > of my current work. So please help! > > > > I am wondering how NFS guarantees a client didn't get wrong file > > attributes. Consider the following scenario: > > > > Suppose we have an NFS server S and two clients C1 and C2. > > > > Now C1 needs to access the file attributes of file X, it first does > > lookup() to get the file handle of file X. > > > > After C1 gets X's file handle and before C1 issues the getattr() > > request, C2 cuts in. Now C2 deletes file X and creates a new file X1, > > which has different name but the same inode number and device ID as > > the nonexistent file X. > > > > When C1 issues getattr() with the old file handle, it may get file > > attribute on wrong file X1. Is this true? > > > > If not, how NFS avoid this problem? Please direct me to the code that > > verifies this. > > Generation numbers. > > When the filesystem creates a new file it assigns a random number > as the 'generation' number and stores that in the inode. > This gets included in the filehandle, and checked when the filehandle > lookup is done. > > Look for references to 'i_generation' in fs/ext3/* > > Other files systems may approach this slightly differently, but the > filesystem is responsible for providing a unique-over-time filehandle, > and 'generation number' is the 'standard' way of doing this. > > NeilBrown > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Urgent help needed on an NFS question, please help!!! 2006-08-10 6:04 ` Xin Zhao @ 2006-08-10 6:15 ` Xin Zhao 0 siblings, 0 replies; 24+ messages in thread From: Xin Zhao @ 2006-08-10 6:15 UTC (permalink / raw) To: Neil Brown; +Cc: linux-kernel, linux-fsdevel I found where the server checks the generation number. It's in fh_verify(). :) Many thanks for your help, Neil! -x On 8/10/06, Xin Zhao <uszhaoxin@gmail.com> wrote: > I think nfs_compare_fh() might do the file handle verification task. > However, it is still possible that AFTER C1 gets a valid file handle, > BUT BEFORE C1 sends out the getattr() request, C2 deletes file X and > creates a different file X1 which has the same inode number. Looks > like the server side must verify the generation number carried in the > file handle. Unfortunately, I didn't find this code at the server > side. Any further insight on this? > > Thanks, > Xin > > On 8/10/06, Neil Brown <neilb@suse.de> wrote: > > On Thursday August 10, uszhaoxin@gmail.com wrote: > > > I just ran into a problem about NFS. It might be a fundmental problem > > > of my current work. So please help! > > > > > > I am wondering how NFS guarantees a client didn't get wrong file > > > attributes. Consider the following scenario: > > > > > > Suppose we have an NFS server S and two clients C1 and C2. > > > > > > Now C1 needs to access the file attributes of file X, it first does > > > lookup() to get the file handle of file X. > > > > > > After C1 gets X's file handle and before C1 issues the getattr() > > > request, C2 cuts in. Now C2 deletes file X and creates a new file X1, > > > which has different name but the same inode number and device ID as > > > the nonexistent file X. > > > > > > When C1 issues getattr() with the old file handle, it may get file > > > attribute on wrong file X1. Is this true? > > > > > > If not, how NFS avoid this problem? Please direct me to the code that > > > verifies this. > > > > Generation numbers. > > > > When the filesystem creates a new file it assigns a random number > > as the 'generation' number and stores that in the inode. > > This gets included in the filehandle, and checked when the filehandle > > lookup is done. > > > > Look for references to 'i_generation' in fs/ext3/* > > > > Other files systems may approach this slightly differently, but the > > filesystem is responsible for providing a unique-over-time filehandle, > > and 'generation number' is the 'standard' way of doing this. > > > > NeilBrown > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-08-11 0:45 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-10 5:04 Urgent help needed on an NFS question, please help!!! Xin Zhao 2006-08-10 5:11 ` Neil Brown 2006-08-10 5:54 ` Xin Zhao 2006-08-10 6:03 ` Neil Brown 2006-08-10 15:15 ` Xin Zhao 2006-08-10 16:11 ` Matthew Wilcox 2006-08-10 16:23 ` Xin Zhao 2006-08-10 16:54 ` Matthew Wilcox 2006-08-10 17:08 ` Xin Zhao 2006-08-10 17:38 ` Trond Myklebust 2006-08-10 17:28 ` Trond Myklebust 2006-08-10 18:02 ` Xin Zhao 2006-08-10 19:59 ` Trond Myklebust 2006-08-10 22:25 ` Xin Zhao 2006-08-11 0:44 ` Trond Myklebust 2006-08-10 22:28 ` Xin Zhao 2006-08-11 0:38 ` Trond Myklebust 2006-08-10 23:42 ` Bryan Henderson 2006-08-10 17:50 ` Bryan Henderson 2006-08-10 18:15 ` Xin Zhao 2006-08-11 0:07 ` Bryan Henderson 2006-08-10 21:00 ` Peter Staubach 2006-08-10 6:04 ` Xin Zhao 2006-08-10 6:15 ` Xin Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).