Linux NFS development
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Fabio Olive Leite <fleite@redhat.com>
Cc: linux-nfs <nfs@lists.sourceforge.net>
Subject: Re: [PATCH] Attribute timeout handling and wrapping u32	jiffies
Date: Wed, 25 Jul 2007 10:00:13 -0400	[thread overview]
Message-ID: <1185372013.6585.73.camel@localhost> (raw)
In-Reply-To: <20070725030809.GB27619@sleipnir.redhat.com>

On Wed, 2007-07-25 at 00:08 -0300, Fabio Olive Leite wrote:
> Hi all,
> 
> I would like to discuss the idea that the current checks for attribute
> timeout using time_after are inadequate for 32bit architectures, since
> time_after works correctly only when the two timestamps being compared
> are within 2^31 jiffies of each other. The signed overflow caused by
> comparing values more than 2^31 jiffies apart will flip the result,
> causing incorrect assumptions of validity.
> 
> 2^31 jiffies is a fairly large period of time (~25 days) when compared
> to the lifetime of most kernel data structures, but for long lived NFS
> mounts that can sit idle for months (think that for some reason autofs
> cannot be used), it is easy to compare inode attribute timestamps with
> very disparate or even bogus values (as in when jiffies have wrapped
> many times, where the comparison doesn't even make sense).
> 
> Currently the code tests for attribute timeout by simply adding the
> desired amount of jiffies to the stored timestamp and comparing that
> with the current timestamp of obtained attribute data with time_after.
> This is incorrect, as it returns true for the desired timeout period
> and another full 2^31 range of jiffies.
> 
> I would like to propose a different way for testing for timeouts: we
> first make sure the timestamps are comparable, meaning the "current"
> timestamp is checked for being after the "stored" timestamp (if it is
> before we assume it has wrapped), and after that we check if the
> difference between them is greater than the desired timeout/validity
> period.
> 
> This test does not cast the values to signed, and instead of comparing
> two possibly unrelated points in time, it actually tests if the
> timestamp wrapped, and if the period of time between the two
> timestamps is greater than the period represented by the timeout
> value. It seems to make more sense this way.
> 
> In testing with artificial jumps (several small jumps, not one big
> crank) of the jiffies I was able to reproduce a problem found in a
> server with very long lived NFS mounts, where attributes would not be
> refreshed even after touching files and directories in the server:
> 
> Initial uptime:
> 03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07
> 
> NFS volume is mounted and time is advanced:
> 03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08
> 
> # ls -l /local/A/foo/bar /nfs/A/foo/bar
> -rw-r--r--  1 root root 0 Dec 17 03:38 /local/A/foo/bar
> -rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar
> 
> # touch /local/A/foo/bar
> 
> # ls -l /local/A/foo/bar /nfs/A/foo/bar
> -rw-r--r--  1 root root 0 Dec 17 03:47 /local/A/foo/bar
> -rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar
> 
> We can see the local mtime is updated, but the NFS mount still shows
> the old value. The patch below makes it work:
> 
> Initial setup...
> 07:11:02 up 25 days, 1 min,  0 users,  load average: 0.15, 0.03, 0.04
> 
> # ls -l /local/A/foo/bar /nfs/A/foo/bar
> -rw-r--r--  1 root root 0 Jan 11 07:11 /local/A/foo/bar
> -rw-r--r--  1 root root 0 Jan 11 07:11 /nfs/A/foo/bar
> 
> # touch /local/A/foo/bar
> 
> # ls -l /local/A/foo/bar /nfs/A/foo/bar
> -rw-r--r--  1 root root 0 Jan 11 07:14 /local/A/foo/bar
> -rw-r--r--  1 root root 0 Jan 11 07:14 /nfs/A/foo/bar
> 
> I'd love to hear comments and criticism. Even if this is not the
> correct way to fix it, I'm fairly sure we can agree that there's the
> need for a fix and a proper fix can be found.
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ea97408..8e85710 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1852,7 +1852,7 @@ int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, struct nfs
>  	cache = nfs_access_search_rbtree(inode, cred);
>  	if (cache == NULL)
>  		goto out;
> -	if (time_after(jiffies, cache->jiffies + NFS_ATTRTIMEO(inode)))
> +	if (timeout_or_wrap(jiffies, cache->jiffies, NFS_ATTRTIMEO(inode)))
>  		goto out_stale;
>  	res->jiffies = cache->jiffies;
>  	res->cred = cache->cred;
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index bca6cdc..f611f95 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -662,7 +662,7 @@ int nfs_attribute_timeout(struct inode *inode)
>  
>  	if (nfs_have_delegation(inode, FMODE_READ))
>  		return 0;
> -	return time_after(jiffies, nfsi->read_cache_jiffies+nfsi->attrtimeo);
> +	return timeout_or_wrap(jiffies, nfsi->read_cache_jiffies, nfsi->attrtimeo);
>  }
>  
>  /**
> @@ -1061,7 +1061,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
>  		nfs_inc_stats(inode, NFSIOS_ATTRINVALIDATE);
>  		nfsi->attrtimeo = NFS_MINATTRTIMEO(inode);
>  		nfsi->attrtimeo_timestamp = now;
> -	} else if (time_after(now, nfsi->attrtimeo_timestamp+nfsi->attrtimeo)) {
> +	} else if (timeout_or_wrap(now, nfsi->attrtimeo_timestamp, nfsi->attrtimeo)) {
>  		if ((nfsi->attrtimeo <<= 1) > NFS_MAXATTRTIMEO(inode))
>  			nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
>  		nfsi->attrtimeo_timestamp = now;
> diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
> index c080f61..ffda7cb 100644
> --- a/include/linux/jiffies.h
> +++ b/include/linux/jiffies.h
> @@ -115,6 +115,11 @@ static inline u64 get_jiffies_64(void)
>  	 ((long)(a) - (long)(b) >= 0))
>  #define time_before_eq(a,b)	time_after_eq(b,a)
>  
> +#define timeout_or_wrap(a,b,c)	\
> +	(typecheck(unsigned long, a) && \
> +	 typecheck(unsigned long, b) && \
> +	 ((a) < (b) || (a) - (b) > (c)))
> +

Ugly name. Besides, the above macro causes problems at the wraparound
boundary when (b) may indeed be < (a), but we're still in the allowed
range. There is also no guarantee that (c) is unsigned. How about

#define time_in_range(a,b,c) \
	(time_after_eq(a,b) && \
	 time_before_eq(a,c))

instead?

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2007-07-25 14:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-25  3:08 [PATCH] Attribute timeout handling and wrapping u32 jiffies Fabio Olive Leite
2007-07-25 14:00 ` Trond Myklebust [this message]
2007-07-25 14:28   ` Fabio Olive Leite
2007-07-25 14:32     ` Trond Myklebust
2007-07-25 14:54       ` Fabio Olive Leite
2007-07-27  1:59       ` Fabio Olive Leite
2007-07-31 16:25         ` Fabio Olive Leite
2007-07-31 19:47           ` Trond Myklebust
2007-08-22 18:25             ` Fabio Olive Leite
2007-08-22 18:47               ` J. Bruce Fields
2007-08-27 13:23               ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1185372013.6585.73.camel@localhost \
    --to=trond.myklebust@fys.uio.no \
    --cc=fleite@redhat.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox