All of lore.kernel.org
 help / color / mirror / Atom feed
* time_after cannot be used alone by NFS code in 32bit architectures
@ 2007-04-26  4:33 Fabio Olive Leite
  2007-04-26  6:13 ` Neil Brown
  2007-05-02 11:52 ` Ian Kent
  0 siblings, 2 replies; 13+ messages in thread
From: Fabio Olive Leite @ 2007-04-26  4:33 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

Hi nfs@,

This is going to be tough, so bear with me for a while. :)

While I was trying to understand the effects of wrapping jiffies in
the various timestamp comparisons the NFS code does among its data
structures, I was struck by cases where the code flow just did not
make sense (checking with nfs_debug set to 32767 and additional
systemtap scripts to extract debug info), considering the jiffy
timestamps involved. This is the investigation that ensued.

time_after and friends provide a very clever way to compare two jiffy
counts when those two counts are close together, and indeed does
provide sign flip resistance, when the two values are close together.
So it is ideal for short-lived timers and timeouts, since one would be
usually comparing values a few thousands or even millions apart.

But unfortunately for the NFS code, time_after and friends alone are
not well suited if we're just comparing jiffy counts taken at random
times in the past instead of at moments close together. Exactly the
protection against sign flips backfires when comparing timestamps over
2 billion counts apart.

The NFS data structures can easily be left idle for over 2 billion
jiffies, which is about 25 days with an HZ of 1000. So consider a
file's nfs_fattr data has a timestamp of 0x3fffffff. About 25 days
later this file is accessed again, and the new attributes from the
server get tagged with a timestamp of 0xc0000000. Those new attributes
are clearly newer than what we had before, yet
time_after(0xc0000000,0x3fffffff) returns 0, and the older attributes
remain in use. Yes, those values are corner cases, but they are
actually just the limits of a range of values with similar behavior.

The ascii chart below shows that the sign flip protection of
time_after extends to a lot more than what we'd like for the NFS
timestamp comparisons.

t      b  0 0 0 3 4 7 7 7 8 8 8 b c f f f
 i        0 0 0 f 0 f f f 0 0 0 f 0 f f f
  m       0 0 0 f 0 f f f 0 0 0 f 0 f f f
   e      0 0 0 f 0 f f f 0 0 0 f 0 f f f
    _     0 0 0 f 0 f f f 0 0 0 f 0 f f f
     a    0 0 0 f 0 f f f 0 0 0 f 0 f f f
      f   0 0 0 f 0 f f f 0 0 0 f 0 f f f
a      t  0 1 2 f 0 d e f 0 1 2 f 0 d e f

00000000  0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 
00000001  1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 
00000002  1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 
3fffffff  1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 
40000000  1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 
7ffffffd  1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 
7ffffffe  1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 
7fffffff  1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 
80000000  1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 
80000001  0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 
80000002  0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 
bfffffff  0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 
c0000000  0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 
fffffffd  0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 
fffffffe  0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 
ffffffff  0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 

This leads to several problematic situations in long-lived NFS mounts.
We can see old attributes not being updated, negative dentries that do
not revalidate even after touching their parent dir, etc.

So something more robust is needed when comparing two timestamps in
NFS data structures, or we should convert the code to use 64bit
jiffies everywhere, even on 32bit architectures. This probably affects
many other things in the rest of the kernel, but I don't think many
other data structures are as long-lived as the NFS ones.

Attached are two small programs to help understand the issue, one that
generates the matrix above, the other being a simple time_after
calculator for checking specific values. Compile and run on 32bit
boxes, of course.

Although this sounds like the perfect push for using autofs, I'm sure
long-lived (over 100 days, allowing for two rounds of u32 jiffies and
timestamps all over the place) NFS mounts on 32bit architectures will
still be very common for a long time. :)

Comments? I'd love to be proved wrong, as right now I'm very scared.

Regards,
Fábio
-- 
ex sed lex awk yacc, e pluribus unix, amem

[-- Attachment #2: time_after_matrix.c --]
[-- Type: text/plain, Size: 1521 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({      type __dummy; \
        typeof(x) __dummy2; \
        (void)(&__dummy == &__dummy2); \
        1; \
})

#define time_after(a,b)         \
        (typecheck(unsigned long, a) && \
         typecheck(unsigned long, b) && \
         ((long)(b) - (long)(a) < 0))
#define time_before(a,b)        time_after(b,a)

#define time_after_eq(a,b)      \
        (typecheck(unsigned long, a) && \
         typecheck(unsigned long, b) && \
         ((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b)     time_after_eq(b,a)

int main() {
	unsigned long i, j;
	unsigned long values[] = {
		0x00000000,
		0x00000001,
		0x00000002,
		0x3fffffff,
		0x40000000,
		0x7ffffffd,
		0x7ffffffe,
		0x7fffffff,
		0x80000000,
		0x80000001,
		0x80000002,
		0xbfffffff,
		0xc0000000,
		0xfffffffd,
		0xfffffffe,
		0xffffffff
	};

	/* Lame! =) */
	printf("t      b  0 0 0 3 4 7 7 7 8 8 8 b c f f f\n");
	printf(" i        0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("  m       0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("   e      0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("    _     0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("     a    0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("      f   0 0 0 f 0 f f f 0 0 0 f 0 f f f\n");
	printf("a      t  0 1 2 f 0 d e f 0 1 2 f 0 d e f\n");
	printf("\n");

	for (i = 0; i < 16; i++) {
		printf("%08lx  ", values[i]);
		for (j = 0; j < 16; j++) {
			printf("%d ", time_after(values[i],values[j]));
		}
		printf("\n");
	}

	return 0;
}

[-- Attachment #3: time_after_calc.c --]
[-- Type: text/plain, Size: 802 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({      type __dummy; \
        typeof(x) __dummy2; \
        (void)(&__dummy == &__dummy2); \
        1; \
})

#define time_after(a,b)         \
        (typecheck(unsigned long, a) && \
         typecheck(unsigned long, b) && \
         ((long)(b) - (long)(a) < 0))
#define time_before(a,b)        time_after(b,a)

#define time_after_eq(a,b)      \
        (typecheck(unsigned long, a) && \
         typecheck(unsigned long, b) && \
         ((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b)     time_after_eq(b,a)

int main() {
	unsigned long a, b;

	printf("Type in series of two space-separated hex values, ^D to end.\n");

	while (scanf("%lx %lx", &a, &b) == 2)
		printf("time_after(%lx,%lx) == %d\n", a, b, time_after(a,b));

	return 0;
}

[-- Attachment #4: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #5: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-04-26  4:33 time_after cannot be used alone by NFS code in 32bit architectures Fabio Olive Leite
@ 2007-04-26  6:13 ` Neil Brown
  2007-05-01  4:04   ` Ian Kent
  2007-05-02 11:52 ` Ian Kent
  1 sibling, 1 reply; 13+ messages in thread
From: Neil Brown @ 2007-04-26  6:13 UTC (permalink / raw)
  To: Fabio Olive Leite; +Cc: linux-nfs

On Thursday April 26, fleite@redhat.com wrote:
> Hi nfs@,
> 
> This is going to be tough, so bear with me for a while. :)
> 
> While I was trying to understand the effects of wrapping jiffies in
> the various timestamp comparisons the NFS code does among its data
> structures, I was struck by cases where the code flow just did not
> make sense (checking with nfs_debug set to 32767 and additional
> systemtap scripts to extract debug info), considering the jiffy
> timestamps involved. This is the investigation that ensued.
> 
> time_after and friends provide a very clever way to compare two jiffy
> counts when those two counts are close together, and indeed does
> provide sign flip resistance, when the two values are close together.
> So it is ideal for short-lived timers and timeouts, since one would be
> usually comparing values a few thousands or even millions apart.
> 
> But unfortunately for the NFS code, time_after and friends alone are
> not well suited if we're just comparing jiffy counts taken at random
> times in the past instead of at moments close together. Exactly the
> protection against sign flips backfires when comparing timestamps over
> 2 billion counts apart.

Having found and fixed one issue with jiffie wrap recently (after 24
days, some web servers using NFS to access static content started
generating much higher levels of GETATTR requests), I understand the
possible problems, and agree that there could be issues.

However when I went looking to see if there could be other issues with
jiffie wrapping, my cursory look suggested not.  Most data structures
(inodes, dentries) that are completely unused for 24 days are
extremely likely to get flushed out of cache (there will be something
else that wants the memory) and anything that is used will get
timestamps refreshed regularly based on the attribute timeout in nfs
(which is usually less than one minute).

So I think this is only likely to be a problem for timestamps that do
not get refreshed in the general running on NFS.

One example is nfsi->cache_change_attribute.  If a file is unmodified
for 24 days, that can wrap, but there is now a fix for this in place.
Commit 3e7d950a528454ad749a264feef3c8bad3faa108 in Linus' git tree.

You seem to have found an identical issue with directories.
Presumably a similar fix can be used there.

Are you aware of any other jiffie-based timestamps that can actually
overflow before they are purged from memory or refreshed?

It is possible that 'cache_change_attribute' should be switch to a
64bit jiffie counter, but I don't think the problem is more general
than that.

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-04-26  6:13 ` Neil Brown
@ 2007-05-01  4:04   ` Ian Kent
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Kent @ 2007-05-01  4:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Thu, 2007-04-26 at 16:13 +1000, Neil Brown wrote:
> On Thursday April 26, fleite@redhat.com wrote:
> > Hi nfs@,
> > 
> > This is going to be tough, so bear with me for a while. :)
> > 
> > While I was trying to understand the effects of wrapping jiffies in
> > the various timestamp comparisons the NFS code does among its data
> > structures, I was struck by cases where the code flow just did not
> > make sense (checking with nfs_debug set to 32767 and additional
> > systemtap scripts to extract debug info), considering the jiffy
> > timestamps involved. This is the investigation that ensued.
> > 
> > time_after and friends provide a very clever way to compare two jiffy
> > counts when those two counts are close together, and indeed does
> > provide sign flip resistance, when the two values are close together.
> > So it is ideal for short-lived timers and timeouts, since one would be
> > usually comparing values a few thousands or even millions apart.
> > 
> > But unfortunately for the NFS code, time_after and friends alone are
> > not well suited if we're just comparing jiffy counts taken at random
> > times in the past instead of at moments close together. Exactly the
> > protection against sign flips backfires when comparing timestamps over
> > 2 billion counts apart.
> 
> Having found and fixed one issue with jiffie wrap recently (after 24
> days, some web servers using NFS to access static content started
> generating much higher levels of GETATTR requests), I understand the
> possible problems, and agree that there could be issues.
> 
> However when I went looking to see if there could be other issues with
> jiffie wrapping, my cursory look suggested not.  Most data structures
> (inodes, dentries) that are completely unused for 24 days are
> extremely likely to get flushed out of cache (there will be something
> else that wants the memory) and anything that is used will get
> timestamps refreshed regularly based on the attribute timeout in nfs
> (which is usually less than one minute).

Yes, but I think Fabio is talking about negative "hashed" dentrys here.
They won't be going away will they?

> 
> So I think this is only likely to be a problem for timestamps that do
> not get refreshed in the general running on NFS.
> 
> One example is nfsi->cache_change_attribute.  If a file is unmodified
> for 24 days, that can wrap, but there is now a fix for this in place.
> Commit 3e7d950a528454ad749a264feef3c8bad3faa108 in Linus' git tree.

Yes, I saw that when I was poking around.
I got the impression there was another problem field but I haven't
looked further, I'm probably wrong about that.

This patch relies on the revalidation of the directory of a negative
hashed dentry during lookup which will happen during the path walk.

Fabio, can you offer a case where this won't work?

Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-04-26  4:33 time_after cannot be used alone by NFS code in 32bit architectures Fabio Olive Leite
  2007-04-26  6:13 ` Neil Brown
@ 2007-05-02 11:52 ` Ian Kent
  2007-05-08  2:37   ` Ian Kent
  1 sibling, 1 reply; 13+ messages in thread
From: Ian Kent @ 2007-05-02 11:52 UTC (permalink / raw)
  To: Fabio Olive Leite; +Cc: linux-nfs

On Thu, 2007-04-26 at 01:33 -0300, Fabio Olive Leite wrote: 
> This leads to several problematic situations in long-lived NFS mounts.
> We can see old attributes not being updated, negative dentries that do
> not revalidate even after touching their parent dir, etc.
> 

This is quite an interesting issue and I'm still having trouble
understanding it so let me try and explain my understanding of part of
it.

Lets consider the d_fsdata field of a negative dentry and the
nfs_neg_need_reval() function. Assuming no in progress io and the
request is not an exclusive open then nfs_neg_need_reval() will return
false when d_fsdata is after the modification timestamp of the
directory. This leads to the dentry being considered valid and so no
(VFS) lookup is performed. This is the only problem case I can see here
because otherwise the dentry is unhashed by nfs_lookup_revalidate()
forcing a new (VFS) lookup to be done (assuming the reference count of
the negative dentry is really 0).

So the problem you describe actually happens when the directory inode
cache_change_attribute warps and the dentry d_fstada timestamp is later
than inode cache_change_attribute. So updating the
cache_change_attribute field, as in the patch that Neil referred to,
will often not fix it.

Have I got this right yet?

Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-02 11:52 ` Ian Kent
@ 2007-05-08  2:37   ` Ian Kent
  2007-05-08  2:43     ` Ian Kent
  2007-05-08  6:19     ` Neil Brown
  0 siblings, 2 replies; 13+ messages in thread
From: Ian Kent @ 2007-05-08  2:37 UTC (permalink / raw)
  To: Fabio Olive Leite; +Cc: linux-nfs

On Wed, 2007-05-02 at 19:52 +0800, Ian Kent wrote:
> On Thu, 2007-04-26 at 01:33 -0300, Fabio Olive Leite wrote: 
> > This leads to several problematic situations in long-lived NFS mounts.
> > We can see old attributes not being updated, negative dentries that do
> > not revalidate even after touching their parent dir, etc.
> > 
> 
> This is quite an interesting issue and I'm still having trouble
> understanding it so let me try and explain my understanding of part of
> it.
> 

snip ...

> So the problem you describe actually happens when the directory inode
> cache_change_attribute warps and the dentry d_fstada timestamp is later
> than inode cache_change_attribute. So updating the
> cache_change_attribute field, as in the patch that Neil referred to,
> will often not fix it.

I believe the patch that Neil refers to will ensure that the inode
cache_change_attribute is updated to a reasonably recent time.

> 
> Have I got this right yet?

No response so I guess so!

Then how about this patch.
It doesn't step on other dentry fields.
I haven't tested this except that it compiles because I'm after comments
as to whether it is a sensible approach.

---
--- linux-2.6.21/fs/nfs/dir.c.use-timespec-for-verifier	2007-05-08 09:37:14.000000000 +0800
+++ linux-2.6.21/fs/nfs/dir.c	2007-05-08 10:33:31.000000000 +0800
@@ -633,28 +633,49 @@
 }
 
 /*
+ * Allocate verifier timespec once during lookup and free
+ * it during d_release.
+ */
+static inline int nfs_alloc_verifier(struct dentry *dentry)
+{
+	dentry->d_fsdata = kmalloc(sizeof(struct timespec), GFP_KERNEL);
+	if (!dentry->d_fsdata)
+		return 0;
+	return 1;
+}
+
+/*
  * A check for whether or not the parent directory has changed.
  * In the case it has, we assume that the dentries are untrustworthy
  * and may need to be looked up again.
  */
 static int nfs_check_verifier(struct inode *dir, struct dentry *dentry)
 {
+	struct timespec *chattr = dentry->d_fsdata;
+	struct timespec cache_chattr;
+
 	if (IS_ROOT(dentry))
 		return 1;
 	if ((NFS_I(dir)->cache_validity & NFS_INO_INVALID_ATTR) != 0
 			|| nfs_attribute_timeout(dir))
 		return 0;
-	return nfs_verify_change_attribute(dir, (unsigned long)dentry->d_fsdata);
+	jiffies_to_timespec(NFS_I(dir)->cache_change_attribute, &cache_chattr);
+	return !nfs_caches_unstable(dir)
+		&& (timespec_compare(chattr, &cache_chattr) >= 0);
 }
 
 static inline void nfs_set_verifier(struct dentry * dentry, unsigned long verf)
 {
-	dentry->d_fsdata = (void *)verf;
+	jiffies_to_timespec(verf, (struct timespec *) &dentry->d_fsdata);
 }
 
 static void nfs_refresh_verifier(struct dentry * dentry, unsigned long verf)
 {
-	if (time_after(verf, (unsigned long)dentry->d_fsdata))
+	struct timespec *last = dentry->d_fsdata;
+	struct timespec verifier;
+
+	jiffies_to_timespec(verf, &verifier);
+	if (timespec_compare(&verifier, last) > 0)
 		nfs_set_verifier(dentry, verf);
 }
 
@@ -860,10 +881,17 @@
 	iput(inode);
 }
 
+static void nfs_dentry_release(struct dentry *dentry)
+{
+	if (dentry->d_fsdata)
+		kfree(dentry->d_fsdata);
+}
+
 struct dentry_operations nfs_dentry_operations = {
 	.d_revalidate	= nfs_lookup_revalidate,
 	.d_delete	= nfs_dentry_delete,
 	.d_iput		= nfs_dentry_iput,
+	.d_release	= nfs_dentry_release,
 };
 
 /*
@@ -909,6 +937,8 @@
 
 	res = ERR_PTR(-ENOMEM);
 	dentry->d_op = NFS_PROTO(dir)->dentry_ops;
+	if (!nfs_alloc_verifier(dentry))
+		goto out;
 
 	lock_kernel();
 
@@ -967,6 +997,7 @@
 	.d_revalidate	= nfs_open_revalidate,
 	.d_delete	= nfs_dentry_delete,
 	.d_iput		= nfs_dentry_iput,
+	.d_release	= nfs_dentry_release,
 };
 
 /*



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08  2:37   ` Ian Kent
@ 2007-05-08  2:43     ` Ian Kent
  2007-05-08  6:19     ` Neil Brown
  1 sibling, 0 replies; 13+ messages in thread
From: Ian Kent @ 2007-05-08  2:43 UTC (permalink / raw)
  To: Fabio Olive Leite; +Cc: linux-nfs

On Tue, 2007-05-08 at 10:37 +0800, Ian Kent wrote:
> On Wed, 2007-05-02 at 19:52 +0800, Ian Kent wrote:
> > On Thu, 2007-04-26 at 01:33 -0300, Fabio Olive Leite wrote: 
> > > This leads to several problematic situations in long-lived NFS mounts.
> > > We can see old attributes not being updated, negative dentries that do
> > > not revalidate even after touching their parent dir, etc.
> > > 
> > 
> > This is quite an interesting issue and I'm still having trouble
> > understanding it so let me try and explain my understanding of part of
> > it.
> > 
> 
> snip ...
> 
> > So the problem you describe actually happens when the directory inode
> > cache_change_attribute warps and the dentry d_fstada timestamp is later
> > than inode cache_change_attribute. So updating the
> > cache_change_attribute field, as in the patch that Neil referred to,
> > will often not fix it.
> 
> I believe the patch that Neil refers to will ensure that the inode
> cache_change_attribute is updated to a reasonably recent time.
> 
> > 
> > Have I got this right yet?
> 
> No response so I guess so!
> 
> Then how about this patch.
> It doesn't step on other dentry fields.
> I haven't tested this except that it compiles because I'm after comments
> as to whether it is a sensible approach.
> 
> ---
> --- linux-2.6.21/fs/nfs/dir.c.use-timespec-for-verifier	2007-05-08 09:37:14.000000000 +0800
> +++ linux-2.6.21/fs/nfs/dir.c	2007-05-08 10:33:31.000000000 +0800
> @@ -633,28 +633,49 @@
>  }
>  
>  /*
> + * Allocate verifier timespec once during lookup and free
> + * it during d_release.
> + */
> +static inline int nfs_alloc_verifier(struct dentry *dentry)
> +{
> +	dentry->d_fsdata = kmalloc(sizeof(struct timespec), GFP_KERNEL);
> +	if (!dentry->d_fsdata)
> +		return 0;
> +	return 1;
> +}
> +
> +/*
>   * A check for whether or not the parent directory has changed.
>   * In the case it has, we assume that the dentries are untrustworthy
>   * and may need to be looked up again.
>   */
>  static int nfs_check_verifier(struct inode *dir, struct dentry *dentry)
>  {
> +	struct timespec *chattr = dentry->d_fsdata;
> +	struct timespec cache_chattr;
> +
>  	if (IS_ROOT(dentry))
>  		return 1;
>  	if ((NFS_I(dir)->cache_validity & NFS_INO_INVALID_ATTR) != 0
>  			|| nfs_attribute_timeout(dir))
>  		return 0;
> -	return nfs_verify_change_attribute(dir, (unsigned long)dentry->d_fsdata);
> +	jiffies_to_timespec(NFS_I(dir)->cache_change_attribute, &cache_chattr);
> +	return !nfs_caches_unstable(dir)
> +		&& (timespec_compare(chattr, &cache_chattr) >= 0);
>  }
>  
>  static inline void nfs_set_verifier(struct dentry * dentry, unsigned long verf)
>  {
> -	dentry->d_fsdata = (void *)verf;
> +	jiffies_to_timespec(verf, (struct timespec *) &dentry->d_fsdata);

Oops, that's wrong.
The & needs to be removed but you get the idea.

>  }
>  
>  static void nfs_refresh_verifier(struct dentry * dentry, unsigned long verf)
>  {
> -	if (time_after(verf, (unsigned long)dentry->d_fsdata))
> +	struct timespec *last = dentry->d_fsdata;
> +	struct timespec verifier;
> +
> +	jiffies_to_timespec(verf, &verifier);
> +	if (timespec_compare(&verifier, last) > 0)
>  		nfs_set_verifier(dentry, verf);
>  }
>  
> @@ -860,10 +881,17 @@
>  	iput(inode);
>  }
>  
> +static void nfs_dentry_release(struct dentry *dentry)
> +{
> +	if (dentry->d_fsdata)
> +		kfree(dentry->d_fsdata);
> +}
> +
>  struct dentry_operations nfs_dentry_operations = {
>  	.d_revalidate	= nfs_lookup_revalidate,
>  	.d_delete	= nfs_dentry_delete,
>  	.d_iput		= nfs_dentry_iput,
> +	.d_release	= nfs_dentry_release,
>  };
>  
>  /*
> @@ -909,6 +937,8 @@
>  
>  	res = ERR_PTR(-ENOMEM);
>  	dentry->d_op = NFS_PROTO(dir)->dentry_ops;
> +	if (!nfs_alloc_verifier(dentry))
> +		goto out;
>  
>  	lock_kernel();
>  
> @@ -967,6 +997,7 @@
>  	.d_revalidate	= nfs_open_revalidate,
>  	.d_delete	= nfs_dentry_delete,
>  	.d_iput		= nfs_dentry_iput,
> +	.d_release	= nfs_dentry_release,
>  };
>  
>  /*
> 
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08  2:37   ` Ian Kent
  2007-05-08  2:43     ` Ian Kent
@ 2007-05-08  6:19     ` Neil Brown
  2007-05-08  8:50       ` Ian Kent
  2007-05-08 12:34       ` Trond Myklebust
  1 sibling, 2 replies; 13+ messages in thread
From: Neil Brown @ 2007-05-08  6:19 UTC (permalink / raw)
  To: Ian Kent; +Cc: linux-nfs

On Tuesday May 8, raven@themaw.net wrote:
> On Wed, 2007-05-02 at 19:52 +0800, Ian Kent wrote:
> > On Thu, 2007-04-26 at 01:33 -0300, Fabio Olive Leite wrote: 
> > > This leads to several problematic situations in long-lived NFS mounts.
> > > We can see old attributes not being updated, negative dentries that do
> > > not revalidate even after touching their parent dir, etc.
> > > 
> > 
> > This is quite an interesting issue and I'm still having trouble
> > understanding it so let me try and explain my understanding of part of
> > it.
> > 
> 
> snip ...
> 
> > So the problem you describe actually happens when the directory inode
> > cache_change_attribute warps and the dentry d_fstada timestamp is later
> > than inode cache_change_attribute. So updating the
> > cache_change_attribute field, as in the patch that Neil referred to,
> > will often not fix it.
> 
> I believe the patch that Neil refers to will ensure that the inode
> cache_change_attribute is updated to a reasonably recent time.
> 
> > 
> > Have I got this right yet?
> 
> No response so I guess so!

Maybe....

I'm still having trouble picturing the exact failure mode.

The only one I can see is:

   Perform a lookup of "foo/bar".
     Find that it doesn't exist.
     Create a negative dentry
   Wait 24 days (jiffie wrap time)
     During this time the negative dentry
     is never looked at, and doesn't fall out of cache.
   Some other access on 'foo' makes sure its
     change attributes is uptodate
   Now try to look at foo/bar again.  Due to
     jiffie wrap, it doesn't seem to be necessary to
     revalidate, so we don't.

Having the negative dentry remain in cache for 24 days while untouched
seems unlikely?

If this is the problem situation, then just updating the jiffie stamp
on the negative dentry whenever it is seem to be in the future
wouldn't work because we hardly ever look at it.  I think the best
solution would be to somehow to force dcache entries out before the
jiffie stamp had a chance to wrap.  But I not convinced that doesn't
happen already simply because 24days is a LONG time.
   

> 
> Then how about this patch.

As far as I can see, all it does is store the same number (jiffies) in
a larger field (a struct timespec) but doesn't actually do anything
different.



Are we sure this actually is a problem in practice??

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08  6:19     ` Neil Brown
@ 2007-05-08  8:50       ` Ian Kent
  2007-05-08  9:43         ` Ian Kent
  2007-05-08 12:34       ` Trond Myklebust
  1 sibling, 1 reply; 13+ messages in thread
From: Ian Kent @ 2007-05-08  8:50 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Tue, 2007-05-08 at 16:19 +1000, Neil Brown wrote:
> On Tuesday May 8, raven@themaw.net wrote:
> > On Wed, 2007-05-02 at 19:52 +0800, Ian Kent wrote:
> > > On Thu, 2007-04-26 at 01:33 -0300, Fabio Olive Leite wrote: 
> > > > This leads to several problematic situations in long-lived NFS mounts.
> > > > We can see old attributes not being updated, negative dentries that do
> > > > not revalidate even after touching their parent dir, etc.
> > > > 
> > > 
> > > This is quite an interesting issue and I'm still having trouble
> > > understanding it so let me try and explain my understanding of part of
> > > it.
> > > 
> > 
> > snip ...
> > 
> > > So the problem you describe actually happens when the directory inode
> > > cache_change_attribute warps and the dentry d_fstada timestamp is later
> > > than inode cache_change_attribute. So updating the
> > > cache_change_attribute field, as in the patch that Neil referred to,
> > > will often not fix it.
> > 
> > I believe the patch that Neil refers to will ensure that the inode
> > cache_change_attribute is updated to a reasonably recent time.
> > 
> > > 
> > > Have I got this right yet?
> > 
> > No response so I guess so!
> 
> Maybe....

But then maybe not, as you point out, oops!

> 
> I'm still having trouble picturing the exact failure mode.
> 
> The only one I can see is:
> 
>    Perform a lookup of "foo/bar".
>      Find that it doesn't exist.
>      Create a negative dentry
>    Wait 24 days (jiffie wrap time)
>      During this time the negative dentry
>      is never looked at, and doesn't fall out of cache.
>    Some other access on 'foo' makes sure its
>      change attributes is uptodate
>    Now try to look at foo/bar again.  Due to
>      jiffie wrap, it doesn't seem to be necessary to
>      revalidate, so we don't.

Yep, that's all I can see once the cache_change_attribute overflow patch
is present.

> 
> Having the negative dentry remain in cache for 24 days while untouched
> seems unlikely?

You'd think so, yes. It was surprising to me too.
But as Fabio pointed out to me dentrys are almost always kept around
after a lookup, hashed and negative if the lookup fails.
 
> 
> If this is the problem situation, then just updating the jiffie stamp
> on the negative dentry whenever it is seem to be in the future
> wouldn't work because we hardly ever look at it.  I think the best
> solution would be to somehow to force dcache entries out before the
> jiffie stamp had a chance to wrap.  But I not convinced that doesn't
> happen already simply because 24days is a LONG time.

Yep, I asked the same of Fabio and he replied:

<quote>
Please read the code in nfs_lookup(). If the server returns -ENOENT, it
will create a negative dentry and hash it, as it should. Otherwise this
negative dentry will not speed up any subsequent lookups.

Also, please look at nfs_dentry_delete(), and notice that NFS dentries almost
never get unhashed, so they can survive many u32 jiffies overflows.
</quote>

Which convinced me, maybe you'll see it differently.
I'm under the impression that this mechanism is used to optimize away
revalidate lookups occurring a reasonably short time after a failed
lookup and the d_fsdata (dentry) is expected to rapidly become less than
the cache_change_attribute (of the directory) so the lookup is again
done. It also occurred to me that the update of d_fsdata in the
revalidate might actually prevent the client from noticing changes as
well but I've not gone there yet.

>    
> 
> > 
> > Then how about this patch.
> 
> As far as I can see, all it does is store the same number (jiffies) in
> a larger field (a struct timespec) but doesn't actually do anything
> different.

Ya .. there's no reference to xtime in jiffies_to_timespec. ;)

> 
> 
> 
> Are we sure this actually is a problem in practice??

Think so.
Fabio has a bunch of test information in bug
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228801.
Have a look see, if you have time, we may have it completely wrong, like
my patch.

Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08  8:50       ` Ian Kent
@ 2007-05-08  9:43         ` Ian Kent
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Kent @ 2007-05-08  9:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Tue, 2007-05-08 at 16:50 +0800, Ian Kent wrote:
> > 
> > If this is the problem situation, then just updating the jiffie stamp
> > on the negative dentry whenever it is seem to be in the future
> > wouldn't work because we hardly ever look at it.  I think the best
> > solution would be to somehow to force dcache entries out before the
> > jiffie stamp had a chance to wrap.  But I not convinced that doesn't
> > happen already simply because 24days is a LONG time.

Sure would be best to have some way of identifying these dentrys once
they were older than some fairly short time following creation but I
can't think of how to do it.

It's of particular interest to me because I plan on using something
similar for caching failed mount attempts in the autofs4 module when I
re-implement it.

Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08  6:19     ` Neil Brown
  2007-05-08  8:50       ` Ian Kent
@ 2007-05-08 12:34       ` Trond Myklebust
  2007-05-08 13:07         ` Ian Kent
                           ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Trond Myklebust @ 2007-05-08 12:34 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs, Ian Kent

On Tue, 2007-05-08 at 16:19 +1000, Neil Brown wrote:
> I'm still having trouble picturing the exact failure mode.
> 
> The only one I can see is:
> 
>    Perform a lookup of "foo/bar".
>      Find that it doesn't exist.
>      Create a negative dentry
>    Wait 24 days (jiffie wrap time)
>      During this time the negative dentry
>      is never looked at, and doesn't fall out of cache.
>    Some other access on 'foo' makes sure its
>      change attributes is uptodate
>    Now try to look at foo/bar again.  Due to
>      jiffie wrap, it doesn't seem to be necessary to
>      revalidate, so we don't.
> 
> Having the negative dentry remain in cache for 24 days while untouched
> seems unlikely?

We can pretty trivially fix that scenario. See below.

Trond

------------------------------------------
commit 6d9b786c6c9b5daf8b20927c7495e42e1c7fde6a
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon May 7 22:41:18 2007 -0400

    NFS: Fix a jiffie wraparound issue
    
    dentry verifiers are always set to the parent directory's
    cache_change_attribute. There is no reason to be testing for anything other
    than equality when we're trying to find out if the dentry has been checked
    since the last time the directory was modified.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 625d8e5..fced7d1 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -650,12 +650,17 @@ int nfs_fsync_dir(struct file *filp, struct dentry *dentry, int datasync)
  */
 static int nfs_check_verifier(struct inode *dir, struct dentry *dentry)
 {
+	unsigned long verf;
+
 	if (IS_ROOT(dentry))
 		return 1;
+	verf = (unsigned long)dentry->d_fsdata;
 	if ((NFS_I(dir)->cache_validity & NFS_INO_INVALID_ATTR) != 0
-			|| nfs_attribute_timeout(dir))
+			|| nfs_attribute_timeout(dir)
+			|| nfs_caches_unstable(dir)
+			|| verf != NFS_I(dir)->cache_change_attribute)
 		return 0;
-	return nfs_verify_change_attribute(dir, (unsigned long)dentry->d_fsdata);
+	return 1;
 }
 
 static inline void nfs_set_verifier(struct dentry * dentry, unsigned long verf)
@@ -665,8 +670,7 @@ static inline void nfs_set_verifier(struct dentry * dentry, unsigned long verf)
 
 static void nfs_refresh_verifier(struct dentry * dentry, unsigned long verf)
 {
-	if (time_after(verf, (unsigned long)dentry->d_fsdata))
-		nfs_set_verifier(dentry, verf);
+	nfs_set_verifier(dentry, verf);
 }
 
 /*



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08 12:34       ` Trond Myklebust
@ 2007-05-08 13:07         ` Ian Kent
  2007-05-08 13:18         ` Neil Brown
  2007-05-08 17:09         ` Fabio Olive Leite
  2 siblings, 0 replies; 13+ messages in thread
From: Ian Kent @ 2007-05-08 13:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, linux-nfs

On Tue, 2007-05-08 at 08:34 -0400, Trond Myklebust wrote:
> ------------------------------------------
> commit 6d9b786c6c9b5daf8b20927c7495e42e1c7fde6a
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Mon May 7 22:41:18 2007 -0400
> 
>     NFS: Fix a jiffie wraparound issue
>     
>     dentry verifiers are always set to the parent directory's
>     cache_change_attribute. There is no reason to be testing for anything other
>     than equality when we're trying to find out if the dentry has been checked
>     since the last time the directory was modified.

Of course.
Ha .. why is it so hard to see something so simple!
Thanks Trond

Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08 12:34       ` Trond Myklebust
  2007-05-08 13:07         ` Ian Kent
@ 2007-05-08 13:18         ` Neil Brown
  2007-05-08 17:09         ` Fabio Olive Leite
  2 siblings, 0 replies; 13+ messages in thread
From: Neil Brown @ 2007-05-08 13:18 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Ian Kent

On Tuesday May 8, trond.myklebust@fys.uio.no wrote:
> 
> We can pretty trivially fix that scenario. See below.
> 
> Trond
> 
> ------------------------------------------
> commit 6d9b786c6c9b5daf8b20927c7495e42e1c7fde6a
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Mon May 7 22:41:18 2007 -0400
> 
>     NFS: Fix a jiffie wraparound issue
>     
>     dentry verifiers are always set to the parent directory's
>     cache_change_attribute. There is no reason to be testing for anything other
>     than equality when we're trying to find out if the dentry has been checked
>     since the last time the directory was modified.
>     
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Neat-o!!
It pays to have someone around who really knows the code well, doesn't
it :-)

Looks good to me!

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: time_after cannot be used alone by NFS code in 32bit architectures
  2007-05-08 12:34       ` Trond Myklebust
  2007-05-08 13:07         ` Ian Kent
  2007-05-08 13:18         ` Neil Brown
@ 2007-05-08 17:09         ` Fabio Olive Leite
  2 siblings, 0 replies; 13+ messages in thread
From: Fabio Olive Leite @ 2007-05-08 17:09 UTC (permalink / raw)
  To: linux-nfs

On Tue, May 08, 2007 at 08:34:34AM -0400, Trond Myklebust wrote:
> =

>     dentry verifiers are always set to the parent directory's
>     cache_change_attribute. There is no reason to be testing for
>     anything other than equality when we're trying to find out if
>     the dentry has been checked since the last time the directory
>     was modified.

I think this is exactly the simple fix I was trying to get at, but
couldn't see. Thanks a real lot for taking care of it.

Now, if anyone wants to argument the 1 in 2^32 chance of hitting the
correct millisecond 50 days after the current update... =3D) joking!

Thanks,
F=E1bio
-- =

ex sed lex awk yacc, e pluribus unix, amem

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-05-08 17:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26  4:33 time_after cannot be used alone by NFS code in 32bit architectures Fabio Olive Leite
2007-04-26  6:13 ` Neil Brown
2007-05-01  4:04   ` Ian Kent
2007-05-02 11:52 ` Ian Kent
2007-05-08  2:37   ` Ian Kent
2007-05-08  2:43     ` Ian Kent
2007-05-08  6:19     ` Neil Brown
2007-05-08  8:50       ` Ian Kent
2007-05-08  9:43         ` Ian Kent
2007-05-08 12:34       ` Trond Myklebust
2007-05-08 13:07         ` Ian Kent
2007-05-08 13:18         ` Neil Brown
2007-05-08 17:09         ` Fabio Olive Leite

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.