* [PATCH] mtime attribute is not being updated on client
@ 2005-04-08 0:52 Linda Dunaphant
2005-04-08 13:11 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Linda Dunaphant @ 2005-04-08 0:52 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-kernel
Hi Trond,
The acregmin (default=3) and acregmax (default=60) NFS attributes that
control the min and max attribute cache lifetimes don't appear to be
working after the first few timeouts. Using a test program that loops
on the following sequence:
- write to a file on an NFS3 mounted filesystem from the client
- sleep for one second
- stat the file to get the mtime
you see the mtime change once after ~56 seconds, but no further mtime
changes are detected by the test.
nfs_refresh_inode() currently bypasses the inode vs fattr mtime comparison
if data_unstable is true. At the end of nfs_refresh_inode(), it resets the
attribute timer. Since nfs_refresh_inode() is being called after every
write to the server (which occurs more often than the attribute timer is
set to expire), the attribute timer never expires again for this file past
the ~56 sec point.
In nfs_refresh_inode() I believe the mtime comparison should be moved outside
the check for data_unstable. The server might already have a newer value for
mtime than the value on the client. With this change, the test sees the mtime
change after each write completes on the server.
Regards,
Linda
The following patch is for 2.6.12-rc2:
diff -Nura base/fs/nfs/inode.c new/fs/nfs/inode.c
--- base/fs/nfs/inode.c 2005-04-07 16:04:40.933611205 -0400
+++ new/fs/nfs/inode.c 2005-04-07 16:12:34.484299540 -0400
@@ -1176,9 +1176,11 @@
}
/* Verify a few of the more important attributes */
+ if (!timespec_equal(&inode->i_mtime, &fattr->mtime))
+ nfsi->flags |= NFS_INO_INVALID_ATTR;
+
if (!data_unstable) {
- if (!timespec_equal(&inode->i_mtime, &fattr->mtime)
- || cur_size != new_isize)
+ if (cur_size != new_isize)
nfsi->flags |= NFS_INO_INVALID_ATTR;
} else if (S_ISREG(inode->i_mode) && new_isize > cur_size)
nfsi->flags |= NFS_INO_INVALID_ATTR;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mtime attribute is not being updated on client
2005-04-08 0:52 [PATCH] mtime attribute is not being updated on client Linda Dunaphant
@ 2005-04-08 13:11 ` Trond Myklebust
2005-04-08 20:54 ` Linda Dunaphant
0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2005-04-08 13:11 UTC (permalink / raw)
To: linda.dunaphant; +Cc: linux-kernel
to den 07.04.2005 Klokka 20:52 (-0400) skreiv Linda Dunaphant:
> Hi Trond,
>
> The acregmin (default=3) and acregmax (default=60) NFS attributes that
> control the min and max attribute cache lifetimes don't appear to be
> working after the first few timeouts. Using a test program that loops
> on the following sequence:
> - write to a file on an NFS3 mounted filesystem from the client
> - sleep for one second
> - stat the file to get the mtime
> you see the mtime change once after ~56 seconds, but no further mtime
> changes are detected by the test.
>
> nfs_refresh_inode() currently bypasses the inode vs fattr mtime comparison
> if data_unstable is true. At the end of nfs_refresh_inode(), it resets the
> attribute timer. Since nfs_refresh_inode() is being called after every
> write to the server (which occurs more often than the attribute timer is
> set to expire), the attribute timer never expires again for this file past
> the ~56 sec point.
>
> In nfs_refresh_inode() I believe the mtime comparison should be moved outside
> the check for data_unstable. The server might already have a newer value for
> mtime than the value on the client. With this change, the test sees the mtime
> change after each write completes on the server.
>
> Regards,
> Linda
Hi Linda,
I'm a bit unclear as to what your end-goal is here. Is it basically to
ensure that fstat() always return the correct value for the mtime?
The reason I ask is that I think your change is likely to have nasty
consequences for the general performance in a lot of other syscalls that
use nfs_revalidate_inode(). I would expect a particularly nasty hit in
the of the write() syscalls themselves, and they really shouldn't have
to worry about the value of mtime in the close-to-open cache consistency
model.
I therefore think we should look for a more fine-grained solution that
addresses more precisely the issues you see.
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mtime attribute is not being updated on client
2005-04-08 13:11 ` Trond Myklebust
@ 2005-04-08 20:54 ` Linda Dunaphant
2005-04-09 1:23 ` Linda Dunaphant
0 siblings, 1 reply; 9+ messages in thread
From: Linda Dunaphant @ 2005-04-08 20:54 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
On Fri, 2005-04-08 at 09:11, Trond Myklebust wrote:
> I'm a bit unclear as to what your end-goal is here. Is it basically to
>ensure that fstat() always return the correct value for the mtime?
>
> The reason I ask is that I think your change is likely to have nasty
>consequences for the general performance in a lot of other syscalls that
>use nfs_revalidate_inode(). I would expect a particularly nasty hit in
>the of the write() syscalls themselves, and they really shouldn't have
>to worry about the value of mtime in the close-to-open cache consistency
>model.
>I therefore think we should look for a more fine-grained solution that
>addresses more precisely the issues you see.
>
>Cheers,
> Trond
Hi Trond,
The goal wasn't to ensure that fstat() always return the correct value for
mtime. The goal is to update the mtime within the bounds of the min and max
attribute cache timeouts, which was not happening before if the test ran
for more than a minute.
nfs_refresh_inode() was already being called after every write to the server
and fattr->mtime was already set to the server's updated mtime value. However,
it didn't check for an updated mtime value if data_unstable was set. Since
nfs_refresh_inode() always resets the attribute timer (even when it skipped
the mtime check), and the calls to it occurred more frequently than the
attribute timer could expire, nfs_update_inode() was never being called
again to update the inode's mtime.
With the change I proposed, the test shows an mtime change every ~32 secs
which corresponds to when the client writes the data to the server. Before
this change, the test only showed one mtime change, even when it was run
for > 10 mins. I did not see any increase in the calls to either
nfs_revalidate_inode() or __nfs_revalidate_inode().
Do you think it would be better for nfs_refresh_inode() to check the mtime,
perform the mtime update if needed, and not set the NFS_INO_INVALID_ATTR
flag if the data_unstable flag is set? This is how nfs_update_inode()
handles its mtime check.
Regards,
Linda
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mtime attribute is not being updated on client
2005-04-08 20:54 ` Linda Dunaphant
@ 2005-04-09 1:23 ` Linda Dunaphant
2005-06-07 0:28 ` Linda Dunaphant
0 siblings, 1 reply; 9+ messages in thread
From: Linda Dunaphant @ 2005-04-09 1:23 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
On Fri, 2005-04-08 at 16:54, Linda Dunaphant wrote:
>Do you think it would be better for nfs_refresh_inode() to check the mtime,
>perform the mtime update if needed, and not set the NFS_INO_INVALID_ATTR
>flag if the data_unstable flag is set? This is how nfs_update_inode()
>handles its mtime check.
Hi again Trond,
I updated my first patch to nfs_refresh_inode() to be similar to the way
nfs_update_inode() does the check and update of the mtime. nfs_refresh_inode()
now checks to see if the mtime changed, and if so, it does the update of
the mtime. It only sets NFS_INO_INVALID_ATTR if data_unstable is not set.
I temporarily added some printk's to selected functions to monitor some of
the functions called after the data write to the server occurs. With this
latest patch, the sequence that I see with the test program is now the
same as it was originally without any patch - except the mtime is has been
updated:
nfs3_xdr_writeargs
xdr_decode_fattr <--- new mtime from server
nfs_refresh_inode <--- updates mtime in inode
nfs_attribute_timeout
nfs_attribute_timeout
xdr_decode_fattr
nfs_refresh_inode
With the first patch I proposed this sequence was:
nfs3_xdr_writeargs
xdr_decode_fattr <--- new mtime from server
nfs_refresh_inode <--- NFS_INO_INVALID_ATTR set
xdr_decode_fattr
nfs_update_inode <--- updates mtime in inode
nfs_attribute_timeout
xdr_decode_fattr
Thank you for pointing out that there may be other consequences if the
NFS_INO_INVALID_ATTR is always set by nfs_refresh_inode() when the mtime
values are different. I believe this second patch fixes the original
problem without affecting any other behaviour.
Cheers,
Linda
diff -ura base/fs/nfs/inode.c new/fs/nfs/inode.c
--- base/fs/nfs/inode.c 2005-04-07 16:04:40.000000000 -0400
+++ new/fs/nfs/inode.c 2005-04-08 19:23:44.151698674 -0400
@@ -1176,9 +1176,17 @@
}
/* Verify a few of the more important attributes */
+ if (!timespec_equal(&inode->i_mtime, &fattr->mtime)) {
+ memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
+#ifdef NFS_DEBUG_VERBOSE
+ printk(KERN_DEBUG "NFS: mtime change on %s/%ld\n", inode->i_sb->s_id, inode->i_ino);
+#endif
+ if (!data_unstable)
+ nfsi->flags |= NFS_INO_INVALID_ATTR;
+ }
+
if (!data_unstable) {
- if (!timespec_equal(&inode->i_mtime, &fattr->mtime)
- || cur_size != new_isize)
+ if (cur_size != new_isize)
nfsi->flags |= NFS_INO_INVALID_ATTR;
} else if (S_ISREG(inode->i_mode) && new_isize > cur_size)
nfsi->flags |= NFS_INO_INVALID_ATTR;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mtime attribute is not being updated on client
2005-04-09 1:23 ` Linda Dunaphant
@ 2005-06-07 0:28 ` Linda Dunaphant
2005-06-07 2:29 ` NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname Linda Dunaphant
0 siblings, 1 reply; 9+ messages in thread
From: Linda Dunaphant @ 2005-06-07 0:28 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
Trond,
After running for awhile with the mtime related changes that I made to
nfs_refresh_inode() on 4/8/05, I noticed that I would sometimes see
stale file data on an an NFS client if the file had been updated but
it's new filesize was the same as the old filesize. The clients were
seeing the correct mtime for the file, but the data was stale.
In my previous change for mtime in nfs_refresh_inode(), the inode was
being marked invalid, but the data was not. In the following patch, if
the mtime is updated and the data_unstable flag is not true, the inode
and the data will both be marked invalid. These changes solved the
original mtime issue as well as the stale data problem that I was
seeing.
Linda
The following patch is for 2.6.12-rc2:
diff -Nrau base/fs/nfs/inode.c new/fs/nfs/inode.c
--- base/fs/nfs/inode.c 2005-04-07 16:04:40.000000000 -0400
+++ new/fs/nfs/inode.c 2005-04-18 21:48:28.000000000 -0400
@@ -1176,9 +1176,17 @@
}
/* Verify a few of the more important attributes */
+ if (!timespec_equal(&inode->i_mtime, &fattr->mtime)) {
+ memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
+#ifdef NFS_DEBUG_VERBOSE
+ printk(KERN_DEBUG "NFS: mtime change on %s/%ld\n", inode->i_sb->s_id, inode->i_ino);
+#endif
+ if (!data_unstable)
+ nfsi->flags |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+ }
+
if (!data_unstable) {
- if (!timespec_equal(&inode->i_mtime, &fattr->mtime)
- || cur_size != new_isize)
+ if (cur_size != new_isize)
nfsi->flags |= NFS_INO_INVALID_ATTR;
} else if (S_ISREG(inode->i_mode) && new_isize > cur_size)
nfsi->flags |= NFS_INO_INVALID_ATTR;
^ permalink raw reply [flat|nested] 9+ messages in thread
* NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname
2005-06-07 0:28 ` Linda Dunaphant
@ 2005-06-07 2:29 ` Linda Dunaphant
2005-06-07 3:01 ` Trond Myklebust
2005-06-07 13:41 ` Trond Myklebust
0 siblings, 2 replies; 9+ messages in thread
From: Linda Dunaphant @ 2005-06-07 2:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
Hi Trond,
One of our applications was doing an open on a NFS client with the flags
set for O_WRONLY|O_CREAT|O_EXCL|0x4. There are numerous symlinks in the
pathname for the file that go into two different NFS filesystems. The
file create works properly if the filesystems are mounted as NFS2, but
it fails with NFS3 mounts.
In nfs_lookup() there is a check to see if it is an exclusive create.
If it is exclusive, nfs_lookup() bypasses ("optimizes" away) the rest
of the lookup. When the problem occurs, I see it stop the lookup too
early. This only occurs when the basename portion of the pathname is
not currently in the NFS cache from a previous non- O_EXCL pathname lookup.
The nfs_is_exclusive_create() was returning TRUE, even when the
search wasn't at the last pathname component. This occurred because
the LOOKUP_CONTINUE flag is reset when it crosses the second
mountpoint into the NFS filesystem. nfs_is_exclusive_create() was
trying to use this flag to determine if it was at the end point
of the pathname search. Since it was reset when it crossed the
mountpoint boundary, it returned TRUE several pathname levels too
early.
I changed the nfs_is_exclusive_create() to use the LOOKUP_PARENT
flag instead of the LOOKUP_CONTINUE flag. I found that LOOKUP_PARENT
stays set until you get to the very last pathname level, which is
the correct point for it to check whether it is an exclusive create.
The following patch for 2.6.12-rc5 fixed the problem.
Regards,
Linda
diff -ura base/fs/nfs/dir.c new/fs/nfs/dir.c
--- base/fs/nfs/dir.c 2005-06-06 20:59:22.000000000 -0400
+++ new/fs/nfs/dir.c 2005-06-06 22:14:24.000000000 -0400
@@ -705,7 +705,7 @@
{
if (NFS_PROTO(dir)->version == 2)
return 0;
- if (!nd || (nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_CREATE))
+ if (!nd || (nd->flags & LOOKUP_PARENT) || !(nd->flags & LOOKUP_CREATE))
return 0;
return (nd->intent.open.flags & O_EXCL) != 0;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname
2005-06-07 2:29 ` NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname Linda Dunaphant
@ 2005-06-07 3:01 ` Trond Myklebust
2005-06-07 13:41 ` Trond Myklebust
1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2005-06-07 3:01 UTC (permalink / raw)
To: linda.dunaphant; +Cc: linux-kernel
må den 06.06.2005 Klokka 22:29 (-0400) skreiv Linda Dunaphant:
> Hi Trond,
>
> One of our applications was doing an open on a NFS client with the flags
> set for O_WRONLY|O_CREAT|O_EXCL|0x4. There are numerous symlinks in the
> pathname for the file that go into two different NFS filesystems. The
> file create works properly if the filesystems are mounted as NFS2, but
> it fails with NFS3 mounts.
>
> In nfs_lookup() there is a check to see if it is an exclusive create.
> If it is exclusive, nfs_lookup() bypasses ("optimizes" away) the rest
> of the lookup. When the problem occurs, I see it stop the lookup too
> early. This only occurs when the basename portion of the pathname is
> not currently in the NFS cache from a previous non- O_EXCL pathname lookup.
>
> The nfs_is_exclusive_create() was returning TRUE, even when the
> search wasn't at the last pathname component. This occurred because
> the LOOKUP_CONTINUE flag is reset when it crosses the second
> mountpoint into the NFS filesystem. nfs_is_exclusive_create() was
> trying to use this flag to determine if it was at the end point
> of the pathname search. Since it was reset when it crossed the
> mountpoint boundary, it returned TRUE several pathname levels too
> early.
Then the correct fix here is to fix the LOOKUP_CONTINUE flag so that it
is set in the above situations.
> I changed the nfs_is_exclusive_create() to use the LOOKUP_PARENT
> flag instead of the LOOKUP_CONTINUE flag. I found that LOOKUP_PARENT
> stays set until you get to the very last pathname level, which is
> the correct point for it to check whether it is an exclusive create.
Vetoed. LOOKUP_PARENT and LOOKUP_CONTINUE are _very_ different flags.
The former tells you what the goal of the lookup operation is. The
latter tells you about the current state of the lookup operation (namely
whether or not the VFS thinks it is on the last element of the path
lookup).
Cheers,
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname
2005-06-07 2:29 ` NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname Linda Dunaphant
2005-06-07 3:01 ` Trond Myklebust
@ 2005-06-07 13:41 ` Trond Myklebust
2005-06-07 22:10 ` Linda Dunaphant
1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2005-06-07 13:41 UTC (permalink / raw)
To: linda.dunaphant; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]
må den 06.06.2005 Klokka 22:29 (-0400) skreiv Linda Dunaphant:
> I changed the nfs_is_exclusive_create() to use the LOOKUP_PARENT
> flag instead of the LOOKUP_CONTINUE flag. I found that LOOKUP_PARENT
> stays set until you get to the very last pathname level, which is
> the correct point for it to check whether it is an exclusive create.
The intent information as it stands today should really only be applied
if we're looking up the very last path component of an open()/create()
or access() call. When we're walking the path doing lookups, that rule
basically boils down to: never apply the intent if either
LOOKUP_CONTINUE or LOOKUP_PARENT are set.
It therefore turns out that your patch is in fact a valid fix in the
case of nfs_is_exclusive_create() since the VFS is guaranteed to always
call LOOKUP_PARENT first (my apologies).
However, when hunting through the NFS lookup code, I found several other
cases where we check for an intent and where we do need the more general
test. To avoid confusion, I'd therefore prefer to introduce a helper
that _always_ returns the correct intent information for the component
that is being looked up.
Could you therefore check if this patch works for you?
Cheers,
Trond
[-- Attachment #2: linux-2.6.12-00-fix_O_EXCL.dif --]
[-- Type: text/plain, Size: 3671 bytes --]
[NFS] Fix lookup intent handling
We should never apply a lookup intent to the path component of a
LOOKUP_PARENT call.
Introduce nd_lookup_intent() which always returns zero if LOOKUP_CONTINUE
or LOOKUP_PARENT is set, and otherwise returns the intent flags.
Problem noticed by Linda Dunaphant <linda.dunaphant@ccur.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
dir.c | 49 +++++++++++++++++++++++++++++++++++--------------
1 files changed, 35 insertions(+), 14 deletions(-)
Index: linux-2.6.12-rc6/fs/nfs/dir.c
===================================================================
--- linux-2.6.12-rc6.orig/fs/nfs/dir.c
+++ linux-2.6.12-rc6/fs/nfs/dir.c
@@ -528,19 +528,39 @@ static inline void nfs_renew_times(struc
dentry->d_time = jiffies;
}
+/*
+ * Return the intent data that applies to this particular path component
+ *
+ * Note that the current set of intents only apply to the very last
+ * component of the path.
+ * We check for this using LOOKUP_CONTINUE and LOOKUP_PARENT.
+ */
+static inline unsigned int lookup_check_intent(struct nameidata *nd, unsigned int mask)
+{
+ if (nd->flags & (LOOKUP_CONTINUE|LOOKUP_PARENT))
+ return 0;
+ return nd->flags & mask;
+}
+
+/*
+ * Inode and filehandle revalidation for lookups.
+ *
+ * We force revalidation in the cases where the VFS sets LOOKUP_REVAL,
+ * or if the intent information indicates that we're about to open this
+ * particular file and the "nocto" mount flag is not set.
+ *
+ */
static inline
int nfs_lookup_verify_inode(struct inode *inode, struct nameidata *nd)
{
struct nfs_server *server = NFS_SERVER(inode);
if (nd != NULL) {
- int ndflags = nd->flags;
/* VFS wants an on-the-wire revalidation */
- if (ndflags & LOOKUP_REVAL)
+ if (nd->flags & LOOKUP_REVAL)
goto out_force;
/* This is an open(2) */
- if ((ndflags & LOOKUP_OPEN) &&
- !(ndflags & LOOKUP_CONTINUE) &&
+ if (lookup_check_intent(nd, LOOKUP_OPEN) != 0 &&
!(server->flags & NFS_MOUNT_NOCTO))
goto out_force;
}
@@ -560,12 +580,8 @@ static inline
int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
- int ndflags = 0;
-
- if (nd)
- ndflags = nd->flags;
/* Don't revalidate a negative dentry if we're creating a new file */
- if ((ndflags & LOOKUP_CREATE) && !(ndflags & LOOKUP_CONTINUE))
+ if (nd != NULL && lookup_check_intent(nd, LOOKUP_CREATE) != 0)
return 0;
return !nfs_check_verifier(dir, dentry);
}
@@ -700,12 +716,16 @@ struct dentry_operations nfs_dentry_oper
.d_iput = nfs_dentry_iput,
};
+/*
+ * Use intent information to check whether or not we're going to do
+ * an O_EXCL create using this path component.
+ */
static inline
int nfs_is_exclusive_create(struct inode *dir, struct nameidata *nd)
{
if (NFS_PROTO(dir)->version == 2)
return 0;
- if (!nd || (nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_CREATE))
+ if (nd == NULL || lookup_check_intent(nd, LOOKUP_CREATE) == 0)
return 0;
return (nd->intent.open.flags & O_EXCL) != 0;
}
@@ -772,12 +792,13 @@ struct dentry_operations nfs4_dentry_ope
.d_iput = nfs_dentry_iput,
};
+/*
+ * Use intent information to determine whether we need to substitute
+ * the NFSv4-style stateful OPEN for the LOOKUP call
+ */
static int is_atomic_open(struct inode *dir, struct nameidata *nd)
{
- if (!nd)
- return 0;
- /* Check that we are indeed trying to open this file */
- if ((nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_OPEN))
+ if (nd == NULL || lookup_check_intent(nd, LOOKUP_OPEN) == 0)
return 0;
/* NFS does not (yet) have a stateful open for directories */
if (nd->flags & LOOKUP_DIRECTORY)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname
2005-06-07 13:41 ` Trond Myklebust
@ 2005-06-07 22:10 ` Linda Dunaphant
0 siblings, 0 replies; 9+ messages in thread
From: Linda Dunaphant @ 2005-06-07 22:10 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
Hi Trond,
Thank you so much for looking into this further!
I duplicated the original problem using the unpatched 2.6.12-rc6. I then
applied your patch to 2.6.12-rc6 and retested. The original problem was
fixed by your patch.
I now need to backfit your patch to the older kernel that I am working
with.
Thanks again very much for your help!
Cheers!
Linda
On Tue, 2005-06-07 at 09:41, Trond Myklebust wrote:
> må den 06.06.2005 Klokka 22:29 (-0400) skreiv Linda Dunaphant:
> > I changed the nfs_is_exclusive_create() to use the LOOKUP_PARENT
> > flag instead of the LOOKUP_CONTINUE flag. I found that LOOKUP_PARENT
> > stays set until you get to the very last pathname level, which is
> > the correct point for it to check whether it is an exclusive create.
>
> The intent information as it stands today should really only be applied
> if we're looking up the very last path component of an open()/create()
> or access() call. When we're walking the path doing lookups, that rule
> basically boils down to: never apply the intent if either
> LOOKUP_CONTINUE or LOOKUP_PARENT are set.
>
> It therefore turns out that your patch is in fact a valid fix in the
> case of nfs_is_exclusive_create() since the VFS is guaranteed to always
> call LOOKUP_PARENT first (my apologies).
> However, when hunting through the NFS lookup code, I found several other
> cases where we check for an intent and where we do need the more general
> test. To avoid confusion, I'd therefore prefer to introduce a helper
> that _always_ returns the correct intent information for the component
> that is being looked up.
>
> Could you therefore check if this patch works for you?
>
> Cheers,
> Trond
>
>
> ______________________________________________________________________
> [NFS] Fix lookup intent handling
>
> We should never apply a lookup intent to the path component of a
> LOOKUP_PARENT call.
> Introduce nd_lookup_intent() which always returns zero if LOOKUP_CONTINUE
> or LOOKUP_PARENT is set, and otherwise returns the intent flags.
>
> Problem noticed by Linda Dunaphant <linda.dunaphant@ccur.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> dir.c | 49 +++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 35 insertions(+), 14 deletions(-)
>
> Index: linux-2.6.12-rc6/fs/nfs/dir.c
> ===================================================================
> --- linux-2.6.12-rc6.orig/fs/nfs/dir.c
> +++ linux-2.6.12-rc6/fs/nfs/dir.c
> @@ -528,19 +528,39 @@ static inline void nfs_renew_times(struc
> dentry->d_time = jiffies;
> }
>
> +/*
> + * Return the intent data that applies to this particular path component
> + *
> + * Note that the current set of intents only apply to the very last
> + * component of the path.
> + * We check for this using LOOKUP_CONTINUE and LOOKUP_PARENT.
> + */
> +static inline unsigned int lookup_check_intent(struct nameidata *nd, unsigned int mask)
> +{
> + if (nd->flags & (LOOKUP_CONTINUE|LOOKUP_PARENT))
> + return 0;
> + return nd->flags & mask;
> +}
> +
> +/*
> + * Inode and filehandle revalidation for lookups.
> + *
> + * We force revalidation in the cases where the VFS sets LOOKUP_REVAL,
> + * or if the intent information indicates that we're about to open this
> + * particular file and the "nocto" mount flag is not set.
> + *
> + */
> static inline
> int nfs_lookup_verify_inode(struct inode *inode, struct nameidata *nd)
> {
> struct nfs_server *server = NFS_SERVER(inode);
>
> if (nd != NULL) {
> - int ndflags = nd->flags;
> /* VFS wants an on-the-wire revalidation */
> - if (ndflags & LOOKUP_REVAL)
> + if (nd->flags & LOOKUP_REVAL)
> goto out_force;
> /* This is an open(2) */
> - if ((ndflags & LOOKUP_OPEN) &&
> - !(ndflags & LOOKUP_CONTINUE) &&
> + if (lookup_check_intent(nd, LOOKUP_OPEN) != 0 &&
> !(server->flags & NFS_MOUNT_NOCTO))
> goto out_force;
> }
> @@ -560,12 +580,8 @@ static inline
> int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
> struct nameidata *nd)
> {
> - int ndflags = 0;
> -
> - if (nd)
> - ndflags = nd->flags;
> /* Don't revalidate a negative dentry if we're creating a new file */
> - if ((ndflags & LOOKUP_CREATE) && !(ndflags & LOOKUP_CONTINUE))
> + if (nd != NULL && lookup_check_intent(nd, LOOKUP_CREATE) != 0)
> return 0;
> return !nfs_check_verifier(dir, dentry);
> }
> @@ -700,12 +716,16 @@ struct dentry_operations nfs_dentry_oper
> .d_iput = nfs_dentry_iput,
> };
>
> +/*
> + * Use intent information to check whether or not we're going to do
> + * an O_EXCL create using this path component.
> + */
> static inline
> int nfs_is_exclusive_create(struct inode *dir, struct nameidata *nd)
> {
> if (NFS_PROTO(dir)->version == 2)
> return 0;
> - if (!nd || (nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_CREATE))
> + if (nd == NULL || lookup_check_intent(nd, LOOKUP_CREATE) == 0)
> return 0;
> return (nd->intent.open.flags & O_EXCL) != 0;
> }
> @@ -772,12 +792,13 @@ struct dentry_operations nfs4_dentry_ope
> .d_iput = nfs_dentry_iput,
> };
>
> +/*
> + * Use intent information to determine whether we need to substitute
> + * the NFSv4-style stateful OPEN for the LOOKUP call
> + */
> static int is_atomic_open(struct inode *dir, struct nameidata *nd)
> {
> - if (!nd)
> - return 0;
> - /* Check that we are indeed trying to open this file */
> - if ((nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_OPEN))
> + if (nd == NULL || lookup_check_intent(nd, LOOKUP_OPEN) == 0)
> return 0;
> /* NFS does not (yet) have a stateful open for directories */
> if (nd->flags & LOOKUP_DIRECTORY)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-06-07 22:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-08 0:52 [PATCH] mtime attribute is not being updated on client Linda Dunaphant
2005-04-08 13:11 ` Trond Myklebust
2005-04-08 20:54 ` Linda Dunaphant
2005-04-09 1:23 ` Linda Dunaphant
2005-06-07 0:28 ` Linda Dunaphant
2005-06-07 2:29 ` NFS: NFS3 lookup fails if creating a file with O_EXCL & multiple mountpoints in pathname Linda Dunaphant
2005-06-07 3:01 ` Trond Myklebust
2005-06-07 13:41 ` Trond Myklebust
2005-06-07 22:10 ` Linda Dunaphant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox