NFS dentry caching mechanism

All of lore.kernel.org
 help / color / mirror / Atom feed

* NFS dentry caching mechanism
@ 2006-01-26 22:40 Usha Ketineni
  2006-01-26 23:14 ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Usha Ketineni @ 2006-01-26 22:40 UTC (permalink / raw)
  To: nfs

[-- Attachment #1: Type: text/plain, Size: 1608 bytes --]

We are investigating an issue with the NFS client code in 2.4.21 kernel:

To reproduce the issue:

(using machine A and machine B, and a file system mounted off an NFS 
server 
called /home)

1) On Machine A: 

ls home/source 
ls: /home/source: No such file or directory 

2) On machine B: 
touch /home/source 

3) back on machine A: 
rm /home/source 
rm: cannot lstat `source': No such file or directory 

But source *does* exist. 

This shows the problem.

===

There are workarounds:

1) Mount the file system with acdirmin=0 and acdirmax=0. But this then 
affects 
all system calls, not just unlink(). And it hurts NFS performance.

2) Mount the file system with the noac option, but the same negative 
effect as 
in #1 applies.

What happens is this: 

0) Let F be a filename on the NFS file system. Initially this file
does not exist.

1) The application on the machine A does a stat() on F. The NFS
client in the kernel sends a LOOKUP request to the NFS server, which
obviously returns failure. The stat() fails with ENOENT. OK so far.

2) Immediately afterwards (a few seconds max), the application on
machine B creates the file F. No problems so far.

3) When B is done with F, a few seconds later the application on
machine A does an unlink() on F. Because of the negative dentry
caching in the Linux kernel, it doesn't even bother to send an NFS
REMOVE request to the NFS server, as (it thinks) it knows for sure the
file doesn't exist. It lets the unlink() fail with ENOENT. But the
file definitely exists.

Is there any other solution for this (including moving to a newer kernel)?

Thanks
Usha

[-- Attachment #2: Type: text/html, Size: 2198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-26 22:40 NFS dentry caching mechanism Usha Ketineni
@ 2006-01-26 23:14 ` Trond Myklebust
  2006-01-27 13:38   ` Peter Staubach
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2006-01-26 23:14 UTC (permalink / raw)
  To: uketinen; +Cc: nfs

On Thu, 2006-01-26 at 14:40 -0800, Usha Ketineni wrote:
> 
> 
> 
> We are investigating an issue with the NFS client code in 2.4.21
> kernel:
> 
> To reproduce the issue: 
> 
> (using machine A and machine B, and a file system mounted off an NFS
> server 
> called /home)
> 
> 1) On Machine A: 
> 
> ls home/source 
> ls: /home/source: No such file or directory 
> 
> 2) On machine B: 
> touch /home/source 
> 
> 3) back on machine A: 
> rm /home/source 
> rm: cannot lstat `source': No such file or directory 
> 
> But source *does* exist. 
> 

Why on earth is 'rm' trying to lstat the file? That is both racy and
unnecessary.

> This shows the problem.
> 
> ===
> 
> There are workarounds:
> 
> 1) Mount the file system with acdirmin=0 and acdirmax=0. But this then
> affects 
> all system calls, not just unlink(). And it hurts NFS performance.
> 
> 2) Mount the file system with the noac option, but the same negative
> effect as 
> in #1 applies.
> 
> What happens is this: 
> 
> 0) Let F be a filename on the NFS file system. Initially this file
> does not exist.
> 
> 1) The application on the machine A does a stat() on F. The NFS
> client in the kernel sends a LOOKUP request to the NFS server, which
> obviously returns failure. The stat() fails with ENOENT. OK so far.
> 
> 2) Immediately afterwards (a few seconds max), the application on
> machine B creates the file F. No problems so far.
> 
> 3) When B is done with F, a few seconds later the application on
> machine A does an unlink() on F. Because of the negative dentry
> caching in the Linux kernel, it doesn't even bother to send an NFS
> REMOVE request to the NFS server, as (it thinks) it knows for sure the
> file doesn't exist. It lets the unlink() fail with ENOENT. But the
> file definitely exists. 
> 
> Is there any other solution for this (including moving to a newer
> kernel)? 
> 
I suppose one could add a VFS intent for unlink in order to force
nfs_lookup_revalidate() to drop the negative dentry. We don't do that on
any existing kernels though (particularly not on 2.4 kernels, as they
don't support intents).

However I suspect that most non-linux clients will similarly cache
negative DNLC entries, and be vulnerable to the same problem.

Cheers,
  Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-26 23:14 ` Trond Myklebust
@ 2006-01-27 13:38   ` Peter Staubach
  2006-01-27 13:44     ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Staubach @ 2006-01-27 13:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: uketinen, nfs

Trond Myklebust wrote:

>On Thu, 2006-01-26 at 14:40 -0800, Usha Ketineni wrote:
>  
>
>>
>>We are investigating an issue with the NFS client code in 2.4.21
>>kernel:
>>
>>To reproduce the issue: 
>>
>>(using machine A and machine B, and a file system mounted off an NFS
>>server 
>>called /home)
>>
>>1) On Machine A: 
>>
>>ls home/source 
>>ls: /home/source: No such file or directory 
>>
>>2) On machine B: 
>>touch /home/source 
>>
>>3) back on machine A: 
>>rm /home/source 
>>rm: cannot lstat `source': No such file or directory 
>>
>>But source *does* exist. 
>>
>>    
>>
>
>Why on earth is 'rm' trying to lstat the file? That is both racy and
>unnecessary.
>
>  
>
>>This shows the problem.
>>
>>===
>>
>>There are workarounds:
>>
>>1) Mount the file system with acdirmin=0 and acdirmax=0. But this then
>>affects 
>>all system calls, not just unlink(). And it hurts NFS performance.
>>
>>2) Mount the file system with the noac option, but the same negative
>>effect as 
>>in #1 applies.
>>
>>What happens is this: 
>>
>>0) Let F be a filename on the NFS file system. Initially this file
>>does not exist.
>>
>>1) The application on the machine A does a stat() on F. The NFS
>>client in the kernel sends a LOOKUP request to the NFS server, which
>>obviously returns failure. The stat() fails with ENOENT. OK so far.
>>
>>2) Immediately afterwards (a few seconds max), the application on
>>machine B creates the file F. No problems so far.
>>
>>3) When B is done with F, a few seconds later the application on
>>machine A does an unlink() on F. Because of the negative dentry
>>caching in the Linux kernel, it doesn't even bother to send an NFS
>>REMOVE request to the NFS server, as (it thinks) it knows for sure the
>>file doesn't exist. It lets the unlink() fail with ENOENT. But the
>>file definitely exists. 
>>
>>Is there any other solution for this (including moving to a newer
>>kernel)? 
>>
>>    
>>
>I suppose one could add a VFS intent for unlink in order to force
>nfs_lookup_revalidate() to drop the negative dentry. We don't do that on
>any existing kernels though (particularly not on 2.4 kernels, as they
>don't support intents).
>
>However I suspect that most non-linux clients will similarly cache
>negative DNLC entries, and be vulnerable to the same problem.
>

For systems which are based on the ONC+ code (Ie. Solaris), on write-able
file systems, the negative cache entries are _always_ validated using a
forced over the wire GETATTR operation.  Read-only file systems are
treated slightly differently by using the normal attribute cache mechanism
to do the validation.  This keeps the client from falling into this trap.

It is okay for the client to think that a file exists which may not, because
it can detect the difference.  It is not okay for a client to decide that a
file does not exist without a strong validation mechanism because there is
no way for the application to determine otherwise.

       ps


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 13:38   ` Peter Staubach
@ 2006-01-27 13:44     ` Trond Myklebust
  2006-01-27 13:49       ` Peter Staubach
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2006-01-27 13:44 UTC (permalink / raw)
  To: Peter Staubach; +Cc: uketinen, nfs

On Fri, 2006-01-27 at 08:38 -0500, Peter Staubach wrote:

> For systems which are based on the ONC+ code (Ie. Solaris), on write-able
> file systems, the negative cache entries are _always_ validated using a
> forced over the wire GETATTR operation.  Read-only file systems are
> treated slightly differently by using the normal attribute cache mechanism
> to do the validation.  This keeps the client from falling into this trap.
> 
> It is okay for the client to think that a file exists which may not, because
> it can detect the difference.  It is not okay for a client to decide that a
> file does not exist without a strong validation mechanism because there is
> no way for the application to determine otherwise.

That makes negative dentries more or less worthless: if you are going to
force a GETATTR call every time, you might as well do a full lookup.

We revalidate the parent directory (following the standard attribute
caching rules - no forced GETATTR). If the parent directory has changed,
we drop the negative dentry, and force a new lookup.

Cheers,
  Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 13:44     ` Trond Myklebust
@ 2006-01-27 13:49       ` Peter Staubach
  2006-01-27 14:26         ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Staubach @ 2006-01-27 13:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: uketinen, nfs

Trond Myklebust wrote:

>On Fri, 2006-01-27 at 08:38 -0500, Peter Staubach wrote:
>
>  
>
>>For systems which are based on the ONC+ code (Ie. Solaris), on write-able
>>file systems, the negative cache entries are _always_ validated using a
>>forced over the wire GETATTR operation.  Read-only file systems are
>>treated slightly differently by using the normal attribute cache mechanism
>>to do the validation.  This keeps the client from falling into this trap.
>>
>>It is okay for the client to think that a file exists which may not, because
>>it can detect the difference.  It is not okay for a client to decide that a
>>file does not exist without a strong validation mechanism because there is
>>no way for the application to determine otherwise.
>>    
>>
>
>That makes negative dentries more or less worthless: if you are going to
>force a GETATTR call every time, you might as well do a full lookup.
>
>  
>

Well, the need for the stronger consistency in this case reduces the
performance benefits, but does not eliminate them.  A GETATTR will
always be cheaper than a LOOKUP, especially one that will mostly
likely return ENOENT.

>We revalidate the parent directory (following the standard attribute
>caching rules - no forced GETATTR). If the parent directory has changed,
>we drop the negative dentry, and force a new lookup.
>

And this leads to the unacceptable problem that a correctly written
application may not work because of this cache.

Correctness first, then performance.

    Thanx...

       ps


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 13:49       ` Peter Staubach
@ 2006-01-27 14:26         ` Trond Myklebust
  2006-01-27 14:43           ` Peter Staubach
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2006-01-27 14:26 UTC (permalink / raw)
  To: Peter Staubach; +Cc: uketinen, nfs

On Fri, 2006-01-27 at 08:49 -0500, Peter Staubach wrote:
> Well, the need for the stronger consistency in this case reduces the
> performance benefits, but does not eliminate them.  A GETATTR will
> always be cheaper than a LOOKUP, especially one that will mostly
> likely return ENOENT.

This is still unacceptable: it leads to whole truckloads of unnecessary
forced GETATTR calls on something like an nfsroot system, where $PATH
and $LD_LIBRARY_PATH need to be explored every single time the user
types in a command.
You appeared to imply that the read-only filesystem case was treated
differently on Solaris, but that sucks too: a read-only flag just means
that _you_ can't modify the filesystem, not that others can't.

Furthermore, in cases such as the one that Usha describes, we don't
actually _care_ about revalidating a negative dentry and/or looking up a
new dentry. Do it using intents, and you can probably skip all the crap
in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send
a valid RMDIR command is the filehandle of the parent, and a name.

Cheers,
  Trond

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 14:26         ` Trond Myklebust
@ 2006-01-27 14:43           ` Peter Staubach
  2006-01-27 15:13             ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Staubach @ 2006-01-27 14:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: uketinen, nfs

Trond Myklebust wrote:

>On Fri, 2006-01-27 at 08:49 -0500, Peter Staubach wrote:
>  
>
>>Well, the need for the stronger consistency in this case reduces the
>>performance benefits, but does not eliminate them.  A GETATTR will
>>always be cheaper than a LOOKUP, especially one that will mostly
>>likely return ENOENT.
>>    
>>
>
>This is still unacceptable: it leads to whole truckloads of unnecessary
>forced GETATTR calls on something like an nfsroot system, where $PATH
>and $LD_LIBRARY_PATH need to be explored every single time the user
>types in a command.
>You appeared to imply that the read-only filesystem case was treated
>differently on Solaris, but that sucks too: a read-only flag just means
>that _you_ can't modify the filesystem, not that others can't.
>
>  
>

With no negative cache, you get LOOKUP operations which are most likely
all going to fail.  With the negative cache, you can trade these failed
LOOKUP operations for GETATTR operations for a net win in CPU on both
the client and the server and also in network utilization because the
GETATTR requests and responses are smaller than the LOOKUP requests and
responses.  You can also retain the consistency semantics to be as
correct as possible.

Read-only file systems are treated differently because it seems a
fairly safe assumption that a file system which is read-only to a client
is probably changing slowly and thus, the normal attribute caching
mechanism is probably sufficient.

If only we knew that a file system was read-only throughout the entire
path and then we could eliminate all of the consistency checks...  :-)

>Furthermore, in cases such as the one that Usha describes, we don't
>actually _care_ about revalidating a negative dentry and/or looking up a
>new dentry. Do it using intents, and you can probably skip all the crap
>in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send
>a valid RMDIR command is the filehandle of the parent, and a name.
>

Well, yes, this would address this one particular aspect, but does not solve
the more general problem.  Bad things can occur when the kernel tells an
application that a file does not exist, when it truly does.  This is bad
because the application can not discover the difference.  Telling an
application that a file does exist when it does not is not quite so bad
because the application can discover the difference.

This situation could be addressed as described, but I suspect that we just
end up in the next situation and eventually needing to fix the problem for
real.

    Thanx...

       ps

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 14:43           ` Peter Staubach
@ 2006-01-27 15:13             ` Trond Myklebust
  2006-01-27 15:36               ` Peter Staubach
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2006-01-27 15:13 UTC (permalink / raw)
  To: Peter Staubach; +Cc: uketinen, nfs

On Fri, 2006-01-27 at 09:43 -0500, Peter Staubach wrote:
> With no negative cache, you get LOOKUP operations which are most likely
> all going to fail.  With the negative cache, you can trade these failed
> LOOKUP operations for GETATTR operations for a net win in CPU on both
> the client and the server and also in network utilization because the
> GETATTR requests and responses are smaller than the LOOKUP requests and
> responses.  You can also retain the consistency semantics to be as
> correct as possible.

On a Linux server, the lookup and getattr have roughly the same overhead
since the server has to set up dentries for them.

> Read-only file systems are treated differently because it seems a
> fairly safe assumption that a file system which is read-only to a client
> is probably changing slowly and thus, the normal attribute caching
> mechanism is probably sufficient.
> 
> If only we knew that a file system was read-only throughout the entire
> path and then we could eliminate all of the consistency checks...  :-)

The v4.1 draft w/ the spec for directory delegations is approaching
final form.

> >Furthermore, in cases such as the one that Usha describes, we don't
> >actually _care_ about revalidating a negative dentry and/or looking up a
> >new dentry. Do it using intents, and you can probably skip all the crap
> >in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send
> >a valid RMDIR command is the filehandle of the parent, and a name.
> >
> 
> Well, yes, this would address this one particular aspect, but does not solve
> the more general problem.  Bad things can occur when the kernel tells an
> application that a file does not exist, when it truly does.  This is bad
> because the application can not discover the difference.  Telling an
> application that a file does exist when it does not is not quite so bad
> because the application can discover the difference.

I'm not sure I understand what you mean here. We have exclusive create
semantics on most operations that need them, so the application can
definitely discover the difference in those cases.

Operations such as RMDIR and unlink() do have a race, but in the case
where you have one client creating a directory and another client
destroying it, there will always be a race unless you have some method
of synchronisation between the processes on the clients.

There is a potential caching race if you try to open the file, but that
is (as I said previously) quite intentional: it is done for scalability
reasons.

> This situation could be addressed as described, but I suspect that we just
> end up in the next situation and eventually needing to fix the problem for
> real.

Note that we already use intents in order to eliminate the need for
negative dentry validation for the case of O_EXCL opens. We could
probably do the same for mkdir(), symlink() and link() (for the case of
the target). That would fix the issue where you do have some method of
synchronisation between the clients.

Cheers,
  Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 15:13             ` Trond Myklebust
@ 2006-01-27 15:36               ` Peter Staubach
  2006-01-27 17:13                 ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Staubach @ 2006-01-27 15:36 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: uketinen, nfs

Trond Myklebust wrote:

>On Fri, 2006-01-27 at 09:43 -0500, Peter Staubach wrote:
>  
>
>>With no negative cache, you get LOOKUP operations which are most likely
>>all going to fail.  With the negative cache, you can trade these failed
>>LOOKUP operations for GETATTR operations for a net win in CPU on both
>>the client and the server and also in network utilization because the
>>GETATTR requests and responses are smaller than the LOOKUP requests and
>>responses.  You can also retain the consistency semantics to be as
>>correct as possible.
>>    
>>
>
>On a Linux server, the lookup and getattr have roughly the same overhead
>since the server has to set up dentries for them.
>
>  
>

On most other servers that I have seen, the file handle is translated to
something like a vnode.  For a LOOKUP, a VOP_LOOKUP and then a VOP_GETATTR
is done as part of the processing for the post operation attributes.  For
a GETATTR, only the VOP_GETATTR is done.

For Linux servers, the cost may be a wash, but for others, there is a
difference.  If we exploit that difference, then everything is better.

>>Read-only file systems are treated differently because it seems a
>>fairly safe assumption that a file system which is read-only to a client
>>is probably changing slowly and thus, the normal attribute caching
>>mechanism is probably sufficient.
>>
>>If only we knew that a file system was read-only throughout the entire
>>path and then we could eliminate all of the consistency checks...  :-)
>>    
>>
>
>The v4.1 draft w/ the spec for directory delegations is approaching
>final form.
>
>  
>

Cool!

>>>Furthermore, in cases such as the one that Usha describes, we don't
>>>actually _care_ about revalidating a negative dentry and/or looking up a
>>>new dentry. Do it using intents, and you can probably skip all the crap
>>>in nfs_lookup_revalidate+nfs_lookup: after all you need in order to send
>>>a valid RMDIR command is the filehandle of the parent, and a name.
>>>
>>>      
>>>
>>Well, yes, this would address this one particular aspect, but does not solve
>>the more general problem.  Bad things can occur when the kernel tells an
>>application that a file does not exist, when it truly does.  This is bad
>>because the application can not discover the difference.  Telling an
>>application that a file does exist when it does not is not quite so bad
>>because the application can discover the difference.
>>    
>>
>
>I'm not sure I understand what you mean here. We have exclusive create
>semantics on most operations that need them, so the application can
>definitely discover the difference in those cases.
>
>  
>

The situation that I usually think of can be something like a software
development environment which uses a distributed make scheme to use
multiple machines to build.  All machines in the environment use NFS to
mount the source and build target spaces.

First, the master decides that it needs to build foo.o from foo.c.  It
looks for the existence of foo.o, but it does not exist yet.  The NFS
client on the master then creates a negative entry for foo.o.  The master
then farms out a compile on one of the slave build servers.  This system
compiles foo.c into foo.o and informs the master that the compile is done.
The build process on the master then attempts to use foo.o, but because of
the negative cache entry, is told that the file still does not exist.
Oops.

With close-to-open consistency and no negative caching, this should work
as expected.  With negative caching and strong cache validation on the
negative entries, this should also work as expected.  With negative caching
and the relaxed cache validation, then this probably won't work because the
compile will probably be faster then the timeout value which controls the
negative entries.

>Operations such as RMDIR and unlink() do have a race, but in the case
>where you have one client creating a directory and another client
>destroying it, there will always be a race unless you have some method
>of synchronisation between the processes on the clients.
>
>There is a potential caching race if you try to open the file, but that
>is (as I said previously) quite intentional: it is done for scalability
>reasons.
>
>  
>

I don't think that I understand this last paragraph.  Does this mean that
the consistency was purposefully relaxed in order to increase performance?

I think that it would have been nice to get all of the perceived possible
benefits from the negative cache entries, but in practice, I don't think
that the benefits outweigh the possible negative aspects.

    Thanx...

       ps

>>This situation could be addressed as described, but I suspect that we just
>>end up in the next situation and eventually needing to fix the problem for
>>real.
>>    
>>
>
>Note that we already use intents in order to eliminate the need for
>negative dentry validation for the case of O_EXCL opens. We could
>probably do the same for mkdir(), symlink() and link() (for the case of
>the target). That would fix the issue where you do have some method of
>synchronisation between the clients.
>
>Cheers,
>  Trond
>
>  
>

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 15:36               ` Peter Staubach
@ 2006-01-27 17:13                 ` Trond Myklebust
  2006-01-27 18:20                   ` Peter Staubach
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2006-01-27 17:13 UTC (permalink / raw)
  To: Peter Staubach; +Cc: uketinen, nfs

On Fri, 2006-01-27 at 10:36 -0500, Peter Staubach wrote:

> The situation that I usually think of can be something like a software
> development environment which uses a distributed make scheme to use
> multiple machines to build.  All machines in the environment use NFS to
> mount the source and build target spaces.
> 
> First, the master decides that it needs to build foo.o from foo.c.  It
> looks for the existence of foo.o, but it does not exist yet.  The NFS
> client on the master then creates a negative entry for foo.o.  The master
> then farms out a compile on one of the slave build servers.  This system
> compiles foo.c into foo.o and informs the master that the compile is done.
> The build process on the master then attempts to use foo.o, but because of
> the negative cache entry, is told that the file still does not exist.
> Oops.
> 
> With close-to-open consistency and no negative caching, this should work
> as expected.  With negative caching and strong cache validation on the
> negative entries, this should also work as expected.  With negative caching
> and the relaxed cache validation, then this probably won't work because the
> compile will probably be faster then the timeout value which controls the
> negative entries.

Note that the strong cache validation your describe also relies heavily
on the mtime accuracy on the server. A typical exported ext3 or reiserfs
filesystem will still fail for the distributed make case since it has an
mtime resolution of 1 second.

> >Operations such as RMDIR and unlink() do have a race, but in the case
> >where you have one client creating a directory and another client
> >destroying it, there will always be a race unless you have some method
> >of synchronisation between the processes on the clients.
> >
> >There is a potential caching race if you try to open the file, but that
> >is (as I said previously) quite intentional: it is done for scalability
> >reasons.
> >
> >  
> >
> 
> I don't think that I understand this last paragraph.  Does this mean that
> the consistency was purposefully relaxed in order to increase performance?

Not performance as such, but scalability. Both the server and network
suffer in the case where you have nfsroot clients flooding the system
with GETATTR requests in order to revalidate negative dentries. Consider
for instance at all the little shared libraries, config files, and other
junk that a typical GNOME or KDE desktop login involves, and you'll know
what I mean. The difference between negative dentry caching and not is
very significant in those cases, and so we were seeing some nasty
network floods in the early 2.4 series when it was briefly turned off.

I am basically very wary of increasing the number of GETATTR calls:
we're already seeing a large number of HPC sites complaining about the
scalability problems those cause on their servers, and asking for a
reduction in the number of unnecessary revalidations (particularly so
for 2.6 kernels).

Cheers,
  Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFS dentry caching mechanism
  2006-01-27 17:13                 ` Trond Myklebust
@ 2006-01-27 18:20                   ` Peter Staubach
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Staubach @ 2006-01-27 18:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: uketinen, nfs

Trond Myklebust wrote:

>
>Note that the strong cache validation your describe also relies heavily
>on the mtime accuracy on the server. A typical exported ext3 or reiserfs
>filesystem will still fail for the distributed make case since it has an
>mtime resolution of 1 second.
>
>  
>

True.  I don't believe that the client is the right place to work around
such a problem with a server though.  We should get the server fixed, or
in this case, any relevant local file systems fixed so that they support
the required semantics.  There are lots of NFS servers in the world which
do not have this issue.

I know that fixing/changing these file systems will be difficult, but
as they are, they don't make for very good NFS server service.

>>>Operations such as RMDIR and unlink() do have a race, but in the case
>>>where you have one client creating a directory and another client
>>>destroying it, there will always be a race unless you have some method
>>>of synchronisation between the processes on the clients.
>>>
>>>There is a potential caching race if you try to open the file, but that
>>>is (as I said previously) quite intentional: it is done for scalability
>>>reasons.
>>>
>>> 
>>>
>>>      
>>>
>>I don't think that I understand this last paragraph.  Does this mean that
>>the consistency was purposefully relaxed in order to increase performance?
>>    
>>
>
>Not performance as such, but scalability. Both the server and network
>suffer in the case where you have nfsroot clients flooding the system
>with GETATTR requests in order to revalidate negative dentries. Consider
>for instance at all the little shared libraries, config files, and other
>junk that a typical GNOME or KDE desktop login involves, and you'll know
>what I mean. The difference between negative dentry caching and not is
>very significant in those cases, and so we were seeing some nasty
>network floods in the early 2.4 series when it was briefly turned off.
>
>I am basically very wary of increasing the number of GETATTR calls:
>we're already seeing a large number of HPC sites complaining about the
>scalability problems those cause on their servers, and asking for a
>reduction in the number of unnecessary revalidations (particularly so
>for 2.6 kernels).
>

I agree completely.  It in the decision making for "unnecessary" where
the issue lie.  It is very easy to go too far and relax the consistency
too much or go too far in the other direction and end up with the extra
over the wire round trips for little or no gain.

    Thanx...

       ps


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-01-27 18:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-26 22:40 NFS dentry caching mechanism Usha Ketineni
2006-01-26 23:14 ` Trond Myklebust
2006-01-27 13:38   ` Peter Staubach
2006-01-27 13:44     ` Trond Myklebust
2006-01-27 13:49       ` Peter Staubach
2006-01-27 14:26         ` Trond Myklebust
2006-01-27 14:43           ` Peter Staubach
2006-01-27 15:13             ` Trond Myklebust
2006-01-27 15:36               ` Peter Staubach
2006-01-27 17:13                 ` Trond Myklebust
2006-01-27 18:20                   ` Peter Staubach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.