From: Peter Staubach <staubach@redhat.com>
To: chucklever@gmail.com
Cc: NFS list <linux-nfs@vger.kernel.org>,
Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH V2] make "noac" and "actimeo=0" work correctly
Date: Thu, 10 Jul 2008 13:41:02 -0400 [thread overview]
Message-ID: <487649AE.1090909@redhat.com> (raw)
In-Reply-To: <76bd70e30807100858g58fbf454uc9331035a2bbf264-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Chuck Lever wrote:
> Hi Peter-
>
>
Hi, Chuck.
> On Tue, Jul 8, 2008 at 12:08 PM, Peter Staubach <staubach@redhat.com> wrote:
>
>> Hi.
>>
>> I've been looking at a bugzilla which describes a problem where
>> a customer was advised to use either the "noac" or "actimeo=0"
>> mount options to solve a consistency problem that they were
>> seeing in the file attributes. It turned out that this solution
>> did not work reliably for them because sometimes, the local
>> attribute cache was believed to be valid and not timed out.
>> (With an attribute cache timeout of 0, the cache should always
>> appear to be timed out.)
>>
>> In looking at this situation, it appears to me that the problem
>> is that the attribute cache timeout code has an off-by-one
>> error in it. It is assuming that the cache is valid in the
>> region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The
>> cache should be considered valid only in the region,
>> [read_cache_jiffies, read_cache_jiffies + attrtimeo). With this
>> change, the options, "noac" and "actimeo=0", work as originally
>> expected.
>>
>> While I was there, I addressed a problem with the jiffies
>> overflowing on 32 bit systems. When overflow occurs, the
>> value of read_cache_jiffies + attrtimeo can be less then the
>> value of read_cache_jiffies. This would cause an unnecessary
>> GETATTR over the wire.
>>
>> Thoughts and/or comments? This is an updated patch which includes
>> the previous support which was added to correct the noac/actimeo=0
>> handling.
>>
>
> A couple of random thoughts below.
>
>
Some thoughts in response --
>> Signed-off-by: Peter Staubach <staubach@redhat.com>
>>
>>
>> --- linux-2.6.25.i686/fs/nfs/dir.c.org
>> +++ linux-2.6.25.i686/fs/nfs/dir.c
>> @@ -1808,7 +1808,7 @@ static int nfs_access_get_cached(struct
>> cache = nfs_access_search_rbtree(inode, cred);
>> if (cache == NULL)
>> goto out;
>> - if (!time_in_range(jiffies, cache->jiffies, cache->jiffies +
>> nfsi->attrtimeo))
>> + if (!nfs_time_in_range_open(jiffies, cache->jiffies, cache->jiffies
>> + nfsi->attrtimeo))
>> goto out_stale;
>> res->jiffies = cache->jiffies;
>> res->cred = cache->cred;
>> --- linux-2.6.25.i686/fs/nfs/inode.c.org
>> +++ linux-2.6.25.i686/fs/nfs/inode.c
>> @@ -706,14 +706,7 @@ int nfs_attribute_timeout(struct inode *
>>
>> if (nfs_have_delegation(inode, FMODE_READ))
>> return 0;
>> - /*
>> - * Special case: if the attribute timeout is set to 0, then always
>> - * treat the cache as having expired (unless holding
>> - * a delegation).
>> - */
>> - if (nfsi->attrtimeo == 0)
>> - return 1;
>> - return !time_in_range(jiffies, nfsi->read_cache_jiffies,
>> nfsi->read_cache_jiffies + nfsi->attrtimeo);
>> + return !nfs_time_in_range_open(jiffies, nfsi->read_cache_jiffies,
>> nfsi->read_cache_jiffies + nfsi->attrtimeo);
>> }
>>
>> /**
>> @@ -1098,7 +1091,7 @@ static int nfs_update_inode(struct inode
>> nfsi->attrtimeo_timestamp = now;
>> nfsi->last_updated = now;
>> } else {
>> - if (!time_in_range(now, nfsi->attrtimeo_timestamp,
>> nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
>> + if (!nfs_time_in_range_open(now, nfsi->attrtimeo_timestamp,
>> nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
>> if ((nfsi->attrtimeo <<= 1) >
>> NFS_MAXATTRTIMEO(inode))
>> nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
>> nfsi->attrtimeo_timestamp = now;
>> --- linux-2.6.25.i686/include/linux/nfs_fs.h.org
>> +++ linux-2.6.25.i686/include/linux/nfs_fs.h
>> @@ -121,7 +121,7 @@ struct nfs_inode {
>> *
>> * We need to revalidate the cached attrs for this inode if
>> *
>> - * jiffies - read_cache_jiffies > attrtimeo
>> + * jiffies - read_cache_jiffies >= attrtimeo
>> */
>> unsigned long read_cache_jiffies;
>> unsigned long attrtimeo;
>> @@ -244,6 +244,22 @@ static inline unsigned NFS_MAXATTRTIMEO(
>> return S_ISDIR(inode->i_mode) ? nfss->acdirmax : nfss->acregmax;
>> }
>>
>> +static inline int nfs_time_in_range_open(unsigned long a,
>> + unsigned long b, unsigned long c)
>>
>
> All of nfs_time_in_range_open's callers use a sum of 'b' and
> 'nfsi->attrtimeo' for 'c'. Would it be cleaner to pass in
> nfsi->attrtimeo' rather than 'b + nfsi->attrtimeo' and do the sum
> here? It might make sense to explicitly check nfsi->attrtimeo for
> zero here to document the special behavior of "actimeo=0".
>
>
The behavior of "actimeo=0" isn't any more special than "actimeo=1".
It simply indicates that the attribute timeout is 0 jiffies long.
I thought about reducing the arguments, but it didn't seem to yield
anything any clearer to me.
> Alternately, checking explicitly if b and c are equal might accomplish
> the same without changing the synopsis.
>
> Also, all of nfs_time_in_range_open's callers negate the return value.
> Would the code in the callers be cleaner if that negation was moved
> into nfs_time_in_range_open? You might rename
> nfs_time_in_range_open() as nfs_cache_has_expired(), for example, to
> make the 'if' statements in the callers make sense in English.
>
> My feeling is that if you have to sit and stare at this for 5 minutes
> to understand precisely how it works, it has already become too
> obfuscated. In addition to fixing any bugs, I wonder if we can make
> it easier to understand and maintain as well.
>
>
>> +{
>> + /*
>> + * If c is less then b, then the jiffies have wrapped.
>> + * If so, then check to see if a is between b and the
>> + * max jiffies value or between 0 and the value of c.
>> + * This is the range between b and c.
>>
>
> include/linux/jiffies.h claims it handles jiffy wrapping correctly.
> Why isn't time_in_range() sufficient if 'c' has wrapped? If it isn't,
> should you fix time_in_range() too?
>
>
Clearly, time_in_range() is not sufficient if the 'c' has
wrapped. It only tests to see if a >=b and a <= c. If 'c'
is less than 'b', then time_in_range() will return false.
I am reluctant to fix time_in_range() because I don't know
that it is broken. It appears to me that it works for other
uses, because otherwise, someone would have "fixed" it.
> You could then simplify this to "return b != c && time_in_range(a, b,
> c);" or something like that. Or if you negate the return value here:
>
> static inline nfs_attributes_have_expired(unsigned long current,
> unsigned long
> start, unsigned long end)
> {
> return (start == end) || !time_in_range(current, start, end);
> }
>
> My 0.02USD.
>
>
The change, which makes attrtimeo=0 work for free, is to figure out
that if the attrtimeo is N, then the attribute cache is valid from
time, T, to T + N - 1, not T + N. Thus, the current attribute
cache implementation is off by one because the attribute cache
should expire at time, T + N. The time_in_range() macro was handy
and looked right, but wasn't quite right for the desired semantics.
Adding tests to check to see if b and c are equal is tuning for
the wrong case, I think. I believe that the majority of file
systems are not mounted with "noac" or "actimeo=0", so the extra
test would just be overhead for the common case.
ps
>> + *
>> + * Otherwise, just check to see whether a is in [b, c).
>> + */
>> + if (c < b)
>> + return time_after_eq(a, b) || time_before(a, c);
>> + return time_after_eq(a, b) && time_before(a, c);
>> +}
>> +
>> static inline int NFS_STALE(const struct inode *inode)
>> {
>> return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);
>>
>
>
next prev parent reply other threads:[~2008-07-10 17:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-10 20:52 [PATCH] make "noac" and "actimeo=0" work correctly Peter Staubach
2008-07-08 16:08 ` [PATCH V2] " Peter Staubach
2008-07-10 15:58 ` Chuck Lever
[not found] ` <76bd70e30807100858g58fbf454uc9331035a2bbf264-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-10 17:41 ` Peter Staubach [this message]
2008-07-10 18:55 ` Chuck Lever
[not found] ` <76bd70e30807101155l226c1cceh24ca17157cb454bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-10 19:23 ` Peter Staubach
2008-07-10 19:31 ` Chuck Lever
2008-07-10 19:29 ` Trond Myklebust
2008-07-11 20:14 ` Peter Staubach
2008-07-11 20:19 ` Trond Myklebust
2008-07-11 20:24 ` Peter Staubach
2008-12-05 21:37 ` [PATCH V3] optimize attribute timeouts for "noac" and "actimeo=0" Peter Staubach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=487649AE.1090909@redhat.com \
--to=staubach@redhat.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chucklever@gmail.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.