All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Staubach <staubach@redhat.com>
To: chucklever@gmail.com
Cc: NFS list <linux-nfs@vger.kernel.org>,
	Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH V2] make "noac" and "actimeo=0" work correctly
Date: Thu, 10 Jul 2008 15:23:15 -0400	[thread overview]
Message-ID: <487661A3.7080507@redhat.com> (raw)
In-Reply-To: <76bd70e30807101155l226c1cceh24ca17157cb454bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Chuck Lever wrote:
> On Thu, Jul 10, 2008 at 1:41 PM, Peter Staubach <staubach@redhat.com> wrote:
>   
>> Chuck Lever wrote:
>>     
>>> Hi Peter-
>>>
>>>
>>>       
>> Hi, Chuck.
>>
>>     
>>> On Tue, Jul 8, 2008 at 12:08 PM, Peter Staubach <staubach@redhat.com>
>>> wrote:
>>>
>>>       
>>>> Hi.
>>>>
>>>> I've been looking at a bugzilla which describes a problem where
>>>> a customer was advised to use either the "noac" or "actimeo=0"
>>>> mount options to solve a consistency problem that they were
>>>> seeing in the file attributes.  It turned out that this solution
>>>> did not work reliably for them because sometimes, the local
>>>> attribute cache was believed to be valid and not timed out.
>>>> (With an attribute cache timeout of 0, the cache should always
>>>> appear to be timed out.)
>>>>
>>>> In looking at this situation, it appears to me that the problem
>>>> is that the attribute cache timeout code has an off-by-one
>>>> error in it.  It is assuming that the cache is valid in the
>>>> region, [read_cache_jiffies, read_cache_jiffies + attrtimeo].  The
>>>> cache should be considered valid only in the region,
>>>> [read_cache_jiffies, read_cache_jiffies + attrtimeo).  With this
>>>> change, the options, "noac" and "actimeo=0", work as originally
>>>> expected.
>>>>
>>>> While I was there, I addressed a problem with the jiffies
>>>> overflowing on 32 bit systems.  When overflow occurs, the
>>>> value of read_cache_jiffies + attrtimeo can be less then the
>>>> value of read_cache_jiffies.  This would cause an unnecessary
>>>> GETATTR over the wire.
>>>>
>>>> Thoughts and/or comments?  This is an updated patch which includes
>>>> the previous support which was added to correct the noac/actimeo=0
>>>> handling.
>>>>
>>>>         
>>> A couple of random thoughts below.
>>>
>>>
>>>       
>> Some thoughts in response --
>>
>>     
>>>> Signed-off-by: Peter Staubach <staubach@redhat.com>
>>>>
>>>>
>>>> --- linux-2.6.25.i686/fs/nfs/dir.c.org
>>>> +++ linux-2.6.25.i686/fs/nfs/dir.c
>>>> @@ -1808,7 +1808,7 @@ static int nfs_access_get_cached(struct
>>>>       cache = nfs_access_search_rbtree(inode, cred);
>>>>       if (cache == NULL)
>>>>               goto out;
>>>> -       if (!time_in_range(jiffies, cache->jiffies, cache->jiffies +
>>>> nfsi->attrtimeo))
>>>> +       if (!nfs_time_in_range_open(jiffies, cache->jiffies,
>>>> cache->jiffies
>>>> + nfsi->attrtimeo))
>>>>               goto out_stale;
>>>>       res->jiffies = cache->jiffies;
>>>>       res->cred = cache->cred;
>>>> --- linux-2.6.25.i686/fs/nfs/inode.c.org
>>>> +++ linux-2.6.25.i686/fs/nfs/inode.c
>>>> @@ -706,14 +706,7 @@ int nfs_attribute_timeout(struct inode *
>>>>
>>>>       if (nfs_have_delegation(inode, FMODE_READ))
>>>>               return 0;
>>>> -       /*
>>>> -        * Special case: if the attribute timeout is set to 0, then
>>>> always
>>>> -        *               treat the cache as having expired (unless
>>>> holding
>>>> -        *               a delegation).
>>>> -        */
>>>> -       if (nfsi->attrtimeo == 0)
>>>> -               return 1;
>>>> -       return !time_in_range(jiffies, nfsi->read_cache_jiffies,
>>>> nfsi->read_cache_jiffies + nfsi->attrtimeo);
>>>> +       return !nfs_time_in_range_open(jiffies, nfsi->read_cache_jiffies,
>>>> nfsi->read_cache_jiffies + nfsi->attrtimeo);
>>>>  }
>>>>
>>>>  /**
>>>> @@ -1098,7 +1091,7 @@ static int nfs_update_inode(struct inode
>>>>               nfsi->attrtimeo_timestamp = now;
>>>>               nfsi->last_updated = now;
>>>>       } else {
>>>> -               if (!time_in_range(now, nfsi->attrtimeo_timestamp,
>>>> nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
>>>> +               if (!nfs_time_in_range_open(now,
>>>> nfsi->attrtimeo_timestamp,
>>>> nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
>>>>                       if ((nfsi->attrtimeo <<= 1) >
>>>> NFS_MAXATTRTIMEO(inode))
>>>>                               nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
>>>>                       nfsi->attrtimeo_timestamp = now;
>>>> --- linux-2.6.25.i686/include/linux/nfs_fs.h.org
>>>> +++ linux-2.6.25.i686/include/linux/nfs_fs.h
>>>> @@ -121,7 +121,7 @@ struct nfs_inode {
>>>>        *
>>>>        * We need to revalidate the cached attrs for this inode if
>>>>        *
>>>> -        *      jiffies - read_cache_jiffies > attrtimeo
>>>> +        *      jiffies - read_cache_jiffies >= attrtimeo
>>>>        */
>>>>       unsigned long           read_cache_jiffies;
>>>>       unsigned long           attrtimeo;
>>>> @@ -244,6 +244,22 @@ static inline unsigned NFS_MAXATTRTIMEO(
>>>>       return S_ISDIR(inode->i_mode) ? nfss->acdirmax : nfss->acregmax;
>>>>  }
>>>>
>>>> +static inline int nfs_time_in_range_open(unsigned long a,
>>>> +                               unsigned long b, unsigned long c)
>>>>
>>>>         
>>> All of nfs_time_in_range_open's callers use a sum of 'b' and
>>> 'nfsi->attrtimeo' for 'c'.  Would it be cleaner to pass in
>>> nfsi->attrtimeo' rather than 'b + nfsi->attrtimeo' and do the sum
>>> here?  It might make sense to explicitly check nfsi->attrtimeo for
>>> zero here to document the special behavior of "actimeo=0".
>>>
>>>
>>>       
>> The behavior of "actimeo=0" isn't any more special than "actimeo=1".
>> It simply indicates that the attribute timeout is 0 jiffies long.
>>     
>
> Right.  I'm simply suggesting that adding explicit code is good
> documentation for this case.  It calls it out so developers remember
> that to check that case when they change this code.
>
> You are correct that "noac/actimeo=0" is not the common case; however,
> it is a case that gets ignored and therefore broken easily, and that
> usually results in corruption of a customer's data.
>
>   
>> I thought about reducing the arguments, but it didn't seem to yield
>> anything any clearer to me.
>>
>>     
>>> Alternately, checking explicitly if b and c are equal might accomplish
>>> the same without changing the synopsis.
>>>
>>> Also, all of nfs_time_in_range_open's callers negate the return value.
>>>  Would the code in the callers be cleaner if that negation was moved
>>> into nfs_time_in_range_open?  You might rename
>>> nfs_time_in_range_open() as nfs_cache_has_expired(), for example, to
>>> make the 'if' statements in the callers make sense in English.
>>>
>>> My feeling is that if you have to sit and stare at this for 5 minutes
>>> to understand precisely how it works, it has already become too
>>> obfuscated.  In addition to fixing any bugs, I wonder if we can make
>>> it easier to understand and maintain as well.
>>>
>>>
>>>       
>>>> +{
>>>> +       /*
>>>> +        * If c is less then b, then the jiffies have wrapped.
>>>> +        * If so, then check to see if a is between b and the
>>>> +        * max jiffies value or between 0 and the value of c.
>>>> +        * This is the range between b and c.
>>>>
>>>>         
>>> include/linux/jiffies.h claims it handles jiffy wrapping correctly.
>>> Why isn't time_in_range() sufficient if 'c' has wrapped?  If it isn't,
>>> should you fix time_in_range() too?
>>>
>>>
>>>       
>> Clearly, time_in_range() is not sufficient if the 'c' has
>> wrapped.  It only tests to see if a >=b and a <= c.  If 'c'
>> is less than 'b', then time_in_range() will return false.
>>
>> I am reluctant to fix time_in_range() because I don't know
>> that it is broken.  It appears to me that it works for other
>> uses, because otherwise, someone would have "fixed" it.
>>     
>
> The only callers I found are the NFS client and the RPC client's auth
> cache, so it is probably safe to change time_in_range() without
> concern for breaking someone else's code.  It's all ours, baby :-)
>
> <fleite@redhat.com> introduced time_in_range() a year ago with commit
> c7e15961 for, it appears from his patch description, very similar
> reasons to your fix.  It might be a good idea to discuss the wrapping
> bug with him.
>
>   
>>> You could then simplify this to "return b != c && time_in_range(a, b,
>>> c);" or something like that.  Or if you negate the return value here:
>>>
>>> static inline nfs_attributes_have_expired(unsigned long current,
>>>                                                         unsigned long
>>> start, unsigned long end)
>>> {
>>>        return (start == end) || !time_in_range(current, start, end);
>>> }
>>>
>>> My 0.02USD.
>>>
>>>
>>>       
>> The change, which makes attrtimeo=0 work for free, is to figure out
>> that if the attrtimeo is N, then the attribute cache is valid from
>> time, T, to T + N - 1, not T + N.  Thus, the current attribute
>> cache implementation is off by one because the attribute cache
>> should expire at time, T + N.  The time_in_range() macro was handy
>> and looked right, but wasn't quite right for the desired semantics.
>>
>> Adding tests to check to see if b and c are equal is tuning for
>> the wrong case, I think.  I believe that the majority of file
>> systems are not mounted with "noac" or "actimeo=0", so the extra
>> test would just be overhead for the common case.
>>     
>
> True enough, but you can "fix" that simply by reversing the two checks:
>
>    return !time_in_range(a, b, c) || unlikely(b == c);
>
> Again, I think there is some value in explicitly documenting the
> actimeo=0 case here whether or not it is covered implicitly by
> time_in_range(), precisely because it is not the common case and is
> often forgotten when changing attribute cache-related logic.  This is
> exactly why we are now here fixing this problem!
>
> The comments you added here nicely explain the complexity of the time
> checks, but do not explicitly state that actimeo=0 must work after any
> changes to this code -- one of the important reasons that you have
> open-coded the time comparisons rather than reusing time_in_range().
>
> For me this is one of those times where cleverly folding all the cases
> into a single group of logic makes the code less good because it
> increases the chances of breakage later on, for example if
> time_in_range() is changed by someone else who doesn't have local
> knowledge of NFS.
>
>   

This was really just an off by one bug.  _All_ attribute cache
timeouts are one clock tick too long.

Adding unlikely() around the test may help to reduce its cost,
but I don't think that it will make it zero cost.  Ordering the
tests will also help to minimize the cost, but still won't make
the additional test zero cost.

Actually, the _only_ reason that I implemented nfs_time_in_range_open
instead of just modifying time_in_range() was that I didn't want
to impact things that were orthogonal to the bug that I needed to
fix.  Given that time_in_range() is only used by NFS and RPC,
perhaps we can safely modify it.  If time_in_range had only been
being used by those three tests, I would have simply updated it.

Simply correcting the math gets us the desired functionality for
zero additional cost over the broken support.  In my viewpoint,
it is also the easiest to understand because there won't be any
special cases to worry about and the math will match the desired
semantics.

    Thanx...

       ps

>>>> +        *
>>>> +        * Otherwise, just check to see whether a is in [b, c).
>>>> +        */
>>>> +       if (c < b)
>>>> +               return time_after_eq(a, b) || time_before(a, c);
>>>> +       return time_after_eq(a, b) && time_before(a, c);
>>>> +}
>>>> +
>>>>  static inline int NFS_STALE(const struct inode *inode)
>>>>  {
>>>>       return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);
>>>>
>>>>         
>>>       
>>     
>
>
>
>   


  parent reply	other threads:[~2008-07-10 19:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-10 20:52 [PATCH] make "noac" and "actimeo=0" work correctly Peter Staubach
2008-07-08 16:08 ` [PATCH V2] " Peter Staubach
2008-07-10 15:58   ` Chuck Lever
     [not found]     ` <76bd70e30807100858g58fbf454uc9331035a2bbf264-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-10 17:41       ` Peter Staubach
2008-07-10 18:55         ` Chuck Lever
     [not found]           ` <76bd70e30807101155l226c1cceh24ca17157cb454bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-10 19:23             ` Peter Staubach [this message]
2008-07-10 19:31               ` Chuck Lever
2008-07-10 19:29         ` Trond Myklebust
2008-07-11 20:14           ` Peter Staubach
2008-07-11 20:19             ` Trond Myklebust
2008-07-11 20:24               ` Peter Staubach
2008-12-05 21:37                 ` [PATCH V3] optimize attribute timeouts for "noac" and "actimeo=0" Peter Staubach

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487661A3.7080507@redhat.com \
    --to=staubach@redhat.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=chucklever@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.