* NFS dentry caching mechanism @ 2006-01-26 22:40 Usha Ketineni 2006-01-26 23:14 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Usha Ketineni @ 2006-01-26 22:40 UTC (permalink / raw) To: nfs [-- Attachment #1: Type: text/plain, Size: 1608 bytes --] We are investigating an issue with the NFS client code in 2.4.21 kernel: To reproduce the issue: (using machine A and machine B, and a file system mounted off an NFS server called /home) 1) On Machine A: ls home/source ls: /home/source: No such file or directory 2) On machine B: touch /home/source 3) back on machine A: rm /home/source rm: cannot lstat `source': No such file or directory But source *does* exist. This shows the problem. === There are workarounds: 1) Mount the file system with acdirmin=0 and acdirmax=0. But this then affects all system calls, not just unlink(). And it hurts NFS performance. 2) Mount the file system with the noac option, but the same negative effect as in #1 applies. What happens is this: 0) Let F be a filename on the NFS file system. Initially this file does not exist. 1) The application on the machine A does a stat() on F. The NFS client in the kernel sends a LOOKUP request to the NFS server, which obviously returns failure. The stat() fails with ENOENT. OK so far. 2) Immediately afterwards (a few seconds max), the application on machine B creates the file F. No problems so far. 3) When B is done with F, a few seconds later the application on machine A does an unlink() on F. Because of the negative dentry caching in the Linux kernel, it doesn't even bother to send an NFS REMOVE request to the NFS server, as (it thinks) it knows for sure the file doesn't exist. It lets the unlink() fail with ENOENT. But the file definitely exists. Is there any other solution for this (including moving to a newer kernel)? Thanks Usha [-- Attachment #2: Type: text/html, Size: 2198 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-26 22:40 NFS dentry caching mechanism Usha Ketineni @ 2006-01-26 23:14 ` Trond Myklebust 2006-01-27 13:38 ` Peter Staubach 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2006-01-26 23:14 UTC (permalink / raw) To: uketinen; +Cc: nfs On Thu, 2006-01-26 at 14:40 -0800, Usha Ketineni wrote: > > > > We are investigating an issue with the NFS client code in 2.4.21 > kernel: > > To reproduce the issue: > > (using machine A and machine B, and a file system mounted off an NFS > server > called /home) > > 1) On Machine A: > > ls home/source > ls: /home/source: No such file or directory > > 2) On machine B: > touch /home/source > > 3) back on machine A: > rm /home/source > rm: cannot lstat `source': No such file or directory > > But source *does* exist. > Why on earth is 'rm' trying to lstat the file? That is both racy and unnecessary. > This shows the problem. > > === > > There are workarounds: > > 1) Mount the file system with acdirmin=0 and acdirmax=0. But this then > affects > all system calls, not just unlink(). And it hurts NFS performance. > > 2) Mount the file system with the noac option, but the same negative > effect as > in #1 applies. > > What happens is this: > > 0) Let F be a filename on the NFS file system. Initially this file > does not exist. > > 1) The application on the machine A does a stat() on F. The NFS > client in the kernel sends a LOOKUP request to the NFS server, which > obviously returns failure. The stat() fails with ENOENT. OK so far. > > 2) Immediately afterwards (a few seconds max), the application on > machine B creates the file F. No problems so far. > > 3) When B is done with F, a few seconds later the application on > machine A does an unlink() on F. Because of the negative dentry > caching in the Linux kernel, it doesn't even bother to send an NFS > REMOVE request to the NFS server, as (it thinks) it knows for sure the > file doesn't exist. It lets the unlink() fail with ENOENT. But the > file definitely exists. > > Is there any other solution for this (including moving to a newer > kernel)? > I suppose one could add a VFS intent for unlink in order to force nfs_lookup_revalidate() to drop the negative dentry. We don't do that on any existing kernels though (particularly not on 2.4 kernels, as they don't support intents). However I suspect that most non-linux clients will similarly cache negative DNLC entries, and be vulnerable to the same problem. Cheers, Trond ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-26 23:14 ` Trond Myklebust @ 2006-01-27 13:38 ` Peter Staubach 2006-01-27 13:44 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Peter Staubach @ 2006-01-27 13:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: uketinen, nfs Trond Myklebust wrote: >On Thu, 2006-01-26 at 14:40 -0800, Usha Ketineni wrote: > > >> >>We are investigating an issue with the NFS client code in 2.4.21 >>kernel: >> >>To reproduce the issue: >> >>(using machine A and machine B, and a file system mounted off an NFS >>server >>called /home) >> >>1) On Machine A: >> >>ls home/source >>ls: /home/source: No such file or directory >> >>2) On machine B: >>touch /home/source >> >>3) back on machine A: >>rm /home/source >>rm: cannot lstat `source': No such file or directory >> >>But source *does* exist. >> >> >> > >Why on earth is 'rm' trying to lstat the file? That is both racy and >unnecessary. > > > >>This shows the problem. >> >>=== >> >>There are workarounds: >> >>1) Mount the file system with acdirmin=0 and acdirmax=0. But this then >>affects >>all system calls, not just unlink(). And it hurts NFS performance. >> >>2) Mount the file system with the noac option, but the same negative >>effect as >>in #1 applies. >> >>What happens is this: >> >>0) Let F be a filename on the NFS file system. Initially this file >>does not exist. >> >>1) The application on the machine A does a stat() on F. The NFS >>client in the kernel sends a LOOKUP request to the NFS server, which >>obviously returns failure. The stat() fails with ENOENT. OK so far. >> >>2) Immediately afterwards (a few seconds max), the application on >>machine B creates the file F. No problems so far. >> >>3) When B is done with F, a few seconds later the application on >>machine A does an unlink() on F. Because of the negative dentry >>caching in the Linux kernel, it doesn't even bother to send an NFS >>REMOVE request to the NFS server, as (it thinks) it knows for sure the >>file doesn't exist. It lets the unlink() fail with ENOENT. But the >>file definitely exists. >> >>Is there any other solution for this (including moving to a newer >>kernel)? >> >> >> >I suppose one could add a VFS intent for unlink in order to force >nfs_lookup_revalidate() to drop the negative dentry. We don't do that on >any existing kernels though (particularly not on 2.4 kernels, as they >don't support intents). > >However I suspect that most non-linux clients will similarly cache >negative DNLC entries, and be vulnerable to the same problem. > For systems which are based on the ONC+ code (Ie. Solaris), on write-able file systems, the negative cache entries are _always_ validated using a forced over the wire GETATTR operation. Read-only file systems are treated slightly differently by using the normal attribute cache mechanism to do the validation. This keeps the client from falling into this trap. It is okay for the client to think that a file exists which may not, because it can detect the difference. It is not okay for a client to decide that a file does not exist without a strong validation mechanism because there is no way for the application to determine otherwise. ps ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 13:38 ` Peter Staubach @ 2006-01-27 13:44 ` Trond Myklebust 2006-01-27 13:49 ` Peter Staubach 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2006-01-27 13:44 UTC (permalink / raw) To: Peter Staubach; +Cc: uketinen, nfs On Fri, 2006-01-27 at 08:38 -0500, Peter Staubach wrote: > For systems which are based on the ONC+ code (Ie. Solaris), on write-able > file systems, the negative cache entries are _always_ validated using a > forced over the wire GETATTR operation. Read-only file systems are > treated slightly differently by using the normal attribute cache mechanism > to do the validation. This keeps the client from falling into this trap. > > It is okay for the client to think that a file exists which may not, because > it can detect the difference. It is not okay for a client to decide that a > file does not exist without a strong validation mechanism because there is > no way for the application to determine otherwise. That makes negative dentries more or less worthless: if you are going to force a GETATTR call every time, you might as well do a full lookup. We revalidate the parent directory (following the standard attribute caching rules - no forced GETATTR). If the parent directory has changed, we drop the negative dentry, and force a new lookup. Cheers, Trond ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 13:44 ` Trond Myklebust @ 2006-01-27 13:49 ` Peter Staubach 2006-01-27 14:26 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Peter Staubach @ 2006-01-27 13:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: uketinen, nfs Trond Myklebust wrote: >On Fri, 2006-01-27 at 08:38 -0500, Peter Staubach wrote: > > > >>For systems which are based on the ONC+ code (Ie. Solaris), on write-able >>file systems, the negative cache entries are _always_ validated using a >>forced over the wire GETATTR operation. Read-only file systems are >>treated slightly differently by using the normal attribute cache mechanism >>to do the validation. This keeps the client from falling into this trap. >> >>It is okay for the client to think that a file exists which may not, because >>it can detect the difference. It is not okay for a client to decide that a >>file does not exist without a strong validation mechanism because there is >>no way for the application to determine otherwise. >> >> > >That makes negative dentries more or less worthless: if you are going to >force a GETATTR call every time, you might as well do a full lookup. > > > Well, the need for the stronger consistency in this case reduces the performance benefits, but does not eliminate them. A GETATTR will always be cheaper than a LOOKUP, especially one that will mostly likely return ENOENT. >We revalidate the parent directory (following the standard attribute >caching rules - no forced GETATTR). If the parent directory has changed, >we drop the negative dentry, and force a new lookup. > And this leads to the unacceptable problem that a correctly written application may not work because of this cache. Correctness first, then performance. Thanx... ps ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 13:49 ` Peter Staubach @ 2006-01-27 14:26 ` Trond Myklebust 2006-01-27 14:43 ` Peter Staubach 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2006-01-27 14:26 UTC (permalink / raw) To: Peter Staubach; +Cc: uketinen, nfs On Fri, 2006-01-27 at 08:49 -0500, Peter Staubach wrote: > Well, the need for the stronger consistency in this case reduces the > performance benefits, but does not eliminate them. A GETATTR will > always be cheaper than a LOOKUP, especially one that will mostly > likely return ENOENT. This is still unacceptable: it leads to whole truckloads of unnecessary forced GETATTR calls on something like an nfsroot system, where $PATH and $LD_LIBRARY_PATH need to be explored every single time the user types in a command. You appeared to imply that the read-only filesystem case was treated differently on Solaris, but that sucks too: a read-only flag just means that _you_ can't modify the filesystem, not that others can't. Furthermore, in cases such as the one that Usha describes, we don't actually _care_ about revalidating a negative dentry and/or looking up a new dentry. Do it using intents, and you can probably skip all the crap in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send a valid RMDIR command is the filehandle of the parent, and a name. Cheers, Trond ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 14:26 ` Trond Myklebust @ 2006-01-27 14:43 ` Peter Staubach 2006-01-27 15:13 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Peter Staubach @ 2006-01-27 14:43 UTC (permalink / raw) To: Trond Myklebust; +Cc: uketinen, nfs Trond Myklebust wrote: >On Fri, 2006-01-27 at 08:49 -0500, Peter Staubach wrote: > > >>Well, the need for the stronger consistency in this case reduces the >>performance benefits, but does not eliminate them. A GETATTR will >>always be cheaper than a LOOKUP, especially one that will mostly >>likely return ENOENT. >> >> > >This is still unacceptable: it leads to whole truckloads of unnecessary >forced GETATTR calls on something like an nfsroot system, where $PATH >and $LD_LIBRARY_PATH need to be explored every single time the user >types in a command. >You appeared to imply that the read-only filesystem case was treated >differently on Solaris, but that sucks too: a read-only flag just means >that _you_ can't modify the filesystem, not that others can't. > > > With no negative cache, you get LOOKUP operations which are most likely all going to fail. With the negative cache, you can trade these failed LOOKUP operations for GETATTR operations for a net win in CPU on both the client and the server and also in network utilization because the GETATTR requests and responses are smaller than the LOOKUP requests and responses. You can also retain the consistency semantics to be as correct as possible. Read-only file systems are treated differently because it seems a fairly safe assumption that a file system which is read-only to a client is probably changing slowly and thus, the normal attribute caching mechanism is probably sufficient. If only we knew that a file system was read-only throughout the entire path and then we could eliminate all of the consistency checks... :-) >Furthermore, in cases such as the one that Usha describes, we don't >actually _care_ about revalidating a negative dentry and/or looking up a >new dentry. Do it using intents, and you can probably skip all the crap >in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send >a valid RMDIR command is the filehandle of the parent, and a name. > Well, yes, this would address this one particular aspect, but does not solve the more general problem. Bad things can occur when the kernel tells an application that a file does not exist, when it truly does. This is bad because the application can not discover the difference. Telling an application that a file does exist when it does not is not quite so bad because the application can discover the difference. This situation could be addressed as described, but I suspect that we just end up in the next situation and eventually needing to fix the problem for real. Thanx... ps ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 14:43 ` Peter Staubach @ 2006-01-27 15:13 ` Trond Myklebust 2006-01-27 15:36 ` Peter Staubach 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2006-01-27 15:13 UTC (permalink / raw) To: Peter Staubach; +Cc: uketinen, nfs On Fri, 2006-01-27 at 09:43 -0500, Peter Staubach wrote: > With no negative cache, you get LOOKUP operations which are most likely > all going to fail. With the negative cache, you can trade these failed > LOOKUP operations for GETATTR operations for a net win in CPU on both > the client and the server and also in network utilization because the > GETATTR requests and responses are smaller than the LOOKUP requests and > responses. You can also retain the consistency semantics to be as > correct as possible. On a Linux server, the lookup and getattr have roughly the same overhead since the server has to set up dentries for them. > Read-only file systems are treated differently because it seems a > fairly safe assumption that a file system which is read-only to a client > is probably changing slowly and thus, the normal attribute caching > mechanism is probably sufficient. > > If only we knew that a file system was read-only throughout the entire > path and then we could eliminate all of the consistency checks... :-) The v4.1 draft w/ the spec for directory delegations is approaching final form. > >Furthermore, in cases such as the one that Usha describes, we don't > >actually _care_ about revalidating a negative dentry and/or looking up a > >new dentry. Do it using intents, and you can probably skip all the crap > >in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send > >a valid RMDIR command is the filehandle of the parent, and a name. > > > > Well, yes, this would address this one particular aspect, but does not solve > the more general problem. Bad things can occur when the kernel tells an > application that a file does not exist, when it truly does. This is bad > because the application can not discover the difference. Telling an > application that a file does exist when it does not is not quite so bad > because the application can discover the difference. I'm not sure I understand what you mean here. We have exclusive create semantics on most operations that need them, so the application can definitely discover the difference in those cases. Operations such as RMDIR and unlink() do have a race, but in the case where you have one client creating a directory and another client destroying it, there will always be a race unless you have some method of synchronisation between the processes on the clients. There is a potential caching race if you try to open the file, but that is (as I said previously) quite intentional: it is done for scalability reasons. > This situation could be addressed as described, but I suspect that we just > end up in the next situation and eventually needing to fix the problem for > real. Note that we already use intents in order to eliminate the need for negative dentry validation for the case of O_EXCL opens. We could probably do the same for mkdir(), symlink() and link() (for the case of the target). That would fix the issue where you do have some method of synchronisation between the clients. Cheers, Trond ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 15:13 ` Trond Myklebust @ 2006-01-27 15:36 ` Peter Staubach 2006-01-27 17:13 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Peter Staubach @ 2006-01-27 15:36 UTC (permalink / raw) To: Trond Myklebust; +Cc: uketinen, nfs Trond Myklebust wrote: >On Fri, 2006-01-27 at 09:43 -0500, Peter Staubach wrote: > > >>With no negative cache, you get LOOKUP operations which are most likely >>all going to fail. With the negative cache, you can trade these failed >>LOOKUP operations for GETATTR operations for a net win in CPU on both >>the client and the server and also in network utilization because the >>GETATTR requests and responses are smaller than the LOOKUP requests and >>responses. You can also retain the consistency semantics to be as >>correct as possible. >> >> > >On a Linux server, the lookup and getattr have roughly the same overhead >since the server has to set up dentries for them. > > > On most other servers that I have seen, the file handle is translated to something like a vnode. For a LOOKUP, a VOP_LOOKUP and then a VOP_GETATTR is done as part of the processing for the post operation attributes. For a GETATTR, only the VOP_GETATTR is done. For Linux servers, the cost may be a wash, but for others, there is a difference. If we exploit that difference, then everything is better. >>Read-only file systems are treated differently because it seems a >>fairly safe assumption that a file system which is read-only to a client >>is probably changing slowly and thus, the normal attribute caching >>mechanism is probably sufficient. >> >>If only we knew that a file system was read-only throughout the entire >>path and then we could eliminate all of the consistency checks... :-) >> >> > >The v4.1 draft w/ the spec for directory delegations is approaching >final form. > > > Cool! >>>Furthermore, in cases such as the one that Usha describes, we don't >>>actually _care_ about revalidating a negative dentry and/or looking up a >>>new dentry. Do it using intents, and you can probably skip all the crap >>>in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send >>>a valid RMDIR command is the filehandle of the parent, and a name. >>> >>> >>> >>Well, yes, this would address this one particular aspect, but does not solve >>the more general problem. Bad things can occur when the kernel tells an >>application that a file does not exist, when it truly does. This is bad >>because the application can not discover the difference. Telling an >>application that a file does exist when it does not is not quite so bad >>because the application can discover the difference. >> >> > >I'm not sure I understand what you mean here. We have exclusive create >semantics on most operations that need them, so the application can >definitely discover the difference in those cases. > > > The situation that I usually think of can be something like a software development environment which uses a distributed make scheme to use multiple machines to build. All machines in the environment use NFS to mount the source and build target spaces. First, the master decides that it needs to build foo.o from foo.c. It looks for the existence of foo.o, but it does not exist yet. The NFS client on the master then creates a negative entry for foo.o. The master then farms out a compile on one of the slave build servers. This system compiles foo.c into foo.o and informs the master that the compile is done. The build process on the master then attempts to use foo.o, but because of the negative cache entry, is told that the file still does not exist. Oops. With close-to-open consistency and no negative caching, this should work as expected. With negative caching and strong cache validation on the negative entries, this should also work as expected. With negative caching and the relaxed cache validation, then this probably won't work because the compile will probably be faster then the timeout value which controls the negative entries. >Operations such as RMDIR and unlink() do have a race, but in the case >where you have one client creating a directory and another client >destroying it, there will always be a race unless you have some method >of synchronisation between the processes on the clients. > >There is a potential caching race if you try to open the file, but that >is (as I said previously) quite intentional: it is done for scalability >reasons. > > > I don't think that I understand this last paragraph. Does this mean that the consistency was purposefully relaxed in order to increase performance? I think that it would have been nice to get all of the perceived possible benefits from the negative cache entries, but in practice, I don't think that the benefits outweigh the possible negative aspects. Thanx... ps >>This situation could be addressed as described, but I suspect that we just >>end up in the next situation and eventually needing to fix the problem for >>real. >> >> > >Note that we already use intents in order to eliminate the need for >negative dentry validation for the case of O_EXCL opens. We could >probably do the same for mkdir(), symlink() and link() (for the case of >the target). That would fix the issue where you do have some method of >synchronisation between the clients. > >Cheers, > Trond > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 15:36 ` Peter Staubach @ 2006-01-27 17:13 ` Trond Myklebust 2006-01-27 18:20 ` Peter Staubach 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2006-01-27 17:13 UTC (permalink / raw) To: Peter Staubach; +Cc: uketinen, nfs On Fri, 2006-01-27 at 10:36 -0500, Peter Staubach wrote: > The situation that I usually think of can be something like a software > development environment which uses a distributed make scheme to use > multiple machines to build. All machines in the environment use NFS to > mount the source and build target spaces. > > First, the master decides that it needs to build foo.o from foo.c. It > looks for the existence of foo.o, but it does not exist yet. The NFS > client on the master then creates a negative entry for foo.o. The master > then farms out a compile on one of the slave build servers. This system > compiles foo.c into foo.o and informs the master that the compile is done. > The build process on the master then attempts to use foo.o, but because of > the negative cache entry, is told that the file still does not exist. > Oops. > > With close-to-open consistency and no negative caching, this should work > as expected. With negative caching and strong cache validation on the > negative entries, this should also work as expected. With negative caching > and the relaxed cache validation, then this probably won't work because the > compile will probably be faster then the timeout value which controls the > negative entries. Note that the strong cache validation your describe also relies heavily on the mtime accuracy on the server. A typical exported ext3 or reiserfs filesystem will still fail for the distributed make case since it has an mtime resolution of 1 second. > >Operations such as RMDIR and unlink() do have a race, but in the case > >where you have one client creating a directory and another client > >destroying it, there will always be a race unless you have some method > >of synchronisation between the processes on the clients. > > > >There is a potential caching race if you try to open the file, but that > >is (as I said previously) quite intentional: it is done for scalability > >reasons. > > > > > > > > I don't think that I understand this last paragraph. Does this mean that > the consistency was purposefully relaxed in order to increase performance? Not performance as such, but scalability. Both the server and network suffer in the case where you have nfsroot clients flooding the system with GETATTR requests in order to revalidate negative dentries. Consider for instance at all the little shared libraries, config files, and other junk that a typical GNOME or KDE desktop login involves, and you'll know what I mean. The difference between negative dentry caching and not is very significant in those cases, and so we were seeing some nasty network floods in the early 2.4 series when it was briefly turned off. I am basically very wary of increasing the number of GETATTR calls: we're already seeing a large number of HPC sites complaining about the scalability problems those cause on their servers, and asking for a reduction in the number of unnecessary revalidations (particularly so for 2.6 kernels). Cheers, Trond ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: NFS dentry caching mechanism 2006-01-27 17:13 ` Trond Myklebust @ 2006-01-27 18:20 ` Peter Staubach 0 siblings, 0 replies; 11+ messages in thread From: Peter Staubach @ 2006-01-27 18:20 UTC (permalink / raw) To: Trond Myklebust; +Cc: uketinen, nfs Trond Myklebust wrote: > >Note that the strong cache validation your describe also relies heavily >on the mtime accuracy on the server. A typical exported ext3 or reiserfs >filesystem will still fail for the distributed make case since it has an >mtime resolution of 1 second. > > > True. I don't believe that the client is the right place to work around such a problem with a server though. We should get the server fixed, or in this case, any relevant local file systems fixed so that they support the required semantics. There are lots of NFS servers in the world which do not have this issue. I know that fixing/changing these file systems will be difficult, but as they are, they don't make for very good NFS server service. >>>Operations such as RMDIR and unlink() do have a race, but in the case >>>where you have one client creating a directory and another client >>>destroying it, there will always be a race unless you have some method >>>of synchronisation between the processes on the clients. >>> >>>There is a potential caching race if you try to open the file, but that >>>is (as I said previously) quite intentional: it is done for scalability >>>reasons. >>> >>> >>> >>> >>> >>I don't think that I understand this last paragraph. Does this mean that >>the consistency was purposefully relaxed in order to increase performance? >> >> > >Not performance as such, but scalability. Both the server and network >suffer in the case where you have nfsroot clients flooding the system >with GETATTR requests in order to revalidate negative dentries. Consider >for instance at all the little shared libraries, config files, and other >junk that a typical GNOME or KDE desktop login involves, and you'll know >what I mean. The difference between negative dentry caching and not is >very significant in those cases, and so we were seeing some nasty >network floods in the early 2.4 series when it was briefly turned off. > >I am basically very wary of increasing the number of GETATTR calls: >we're already seeing a large number of HPC sites complaining about the >scalability problems those cause on their servers, and asking for a >reduction in the number of unnecessary revalidations (particularly so >for 2.6 kernels). > I agree completely. It in the decision making for "unnecessary" where the issue lie. It is very easy to go too far and relax the consistency too much or go too far in the other direction and end up with the extra over the wire round trips for little or no gain. Thanx... ps ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-01-27 18:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-26 22:40 NFS dentry caching mechanism Usha Ketineni 2006-01-26 23:14 ` Trond Myklebust 2006-01-27 13:38 ` Peter Staubach 2006-01-27 13:44 ` Trond Myklebust 2006-01-27 13:49 ` Peter Staubach 2006-01-27 14:26 ` Trond Myklebust 2006-01-27 14:43 ` Peter Staubach 2006-01-27 15:13 ` Trond Myklebust 2006-01-27 15:36 ` Peter Staubach 2006-01-27 17:13 ` Trond Myklebust 2006-01-27 18:20 ` Peter Staubach
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.