From: Ric Wheeler <rwheeler@redhat.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org, miklos@szeredi.hu,
viro@ZenIV.linux.org.uk, hch@infradead.org,
michael.brantley@deshaw.com, sven.breuner@itwm.fraunhofer.de,
chuck.lever@oracle.com, pstaubach@exagrid.com,
malahal@us.ibm.com, bfields@fieldses.org,
trond.myklebust@fys.uio.no, rees@umich.edu
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call
Date: Sun, 22 Apr 2012 09:46:32 +0530 [thread overview]
Message-ID: <4F938620.2080301@redhat.com> (raw)
In-Reply-To: <20120420104055.511e15bc@tlielax.poochiereds.net>
On 04/20/2012 08:10 PM, Jeff Layton wrote:
> On Wed, 18 Apr 2012 07:52:07 -0400
> Jeff Layton<jlayton@redhat.com> wrote:
>
>> ESTALE errors are a source of pain for many users of NFS. Usually they
>> occur when a file is removed from the server after a successful lookup
>> against it.
>>
>> Luckily, the remedy in these cases is usually simple. We should just
>> redo the lookup, forcing revalidations all the way in and then retry the
>> call. We of course cannot do this for syscalls that do not involve a
>> path, but for path-based syscalls we can and should attempt to recover
>> from an ESTALE.
>>
>> This patch implements this by having the VFS reattempt the lookup (with
>> LOOKUP_REVAL set) and call exactly once when it would ordinarily return
>> ESTALE. This should catch the bulk of these cases under normal usage,
>> without unduly inconveniencing other filesystems that return ESTALE on
>> path-based syscalls.
>>
>> Note that it's possible to hit this race more than once, but a single
>> retry should catch the bulk of these cases under normal circumstances.
>>
>> This patch is just an example. We'll alter most path-based syscalls in a
>> similar fashion to fix this correctly. At this point, I'm just trying to
>> ensure that the retry semantics are acceptable before I being that work.
>>
>> Does anyone have strong objections to this patch? I'm aware that the
>> retry mechanism is not as robust as many (e.g. Peter) would like, but it
>> should at least improve the current situation.
>>
>> If no one has a strong objection, then I'll start going through and
>> adding similar code to the other syscalls. And we can hopefully we can
>> get at least some of them in for 3.5.
>>
>> Signed-off-by: Jeff Layton<jlayton@redhat.com>
>> ---
>> fs/stat.c | 9 ++++++++-
>> 1 files changed, 8 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/stat.c b/fs/stat.c
>> index c733dc5..0ee9cb4 100644
>> --- a/fs/stat.c
>> +++ b/fs/stat.c
>> @@ -73,7 +73,8 @@ int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> {
>> struct path path;
>> int error = -EINVAL;
>> - int lookup_flags = 0;
>> + bool retried = false;
>> + unsigned int lookup_flags = 0;
>>
>> if ((flag& ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
>> AT_EMPTY_PATH)) != 0)
>> @@ -84,12 +85,18 @@ int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>> if (flag& AT_EMPTY_PATH)
>> lookup_flags |= LOOKUP_EMPTY;
>>
>> +retry:
>> error = user_path_at(dfd, filename, lookup_flags,&path);
>> if (error)
>> goto out;
>>
>> error = vfs_getattr(path.mnt, path.dentry, stat);
>> path_put(&path);
>> + if (error == -ESTALE&& !retried) {
>> + retried = true;
>> + lookup_flags |= LOOKUP_REVAL;
>> + goto retry;
>> + }
>> out:
>> return error;
>> }
> Apologies for replying to myself here. Just to beat on the deceased
> equine a little longer, I should note that the above approach does
> *not* fix Peter's reproducer in his original email. It fails rather
> quickly when run simultaneously on the client and server.
>
> At least one of the tests in it creates and removes a hierarchy of
> directories in a loop. During that, the lookup from the client can
> easily fail more than once with ESTALE. Since we give up after a single
> retry, that causes the call to return ESTALE.
>
> While testing this approach with mkdir and fstatat, I modified the
> patch to retry multiple times, also retry when the lookup thows ESTALE
> and to throw a printk when the number of retries was> 1 with the
> number of retries that the call did and the eventual error code.
>
> With Peter's (admittedly synthetic) test, we can get an answer of sorts
> to Trond's question from earlier in the thread as to how many retries
> is "enough":
>
> [ 45.023665] sys_mkdirat: retries=33 error=-2
> [ 47.889300] sys_mkdirat: retries=51 error=-2
> [ 49.172746] sys_mkdirat: retries=27 error=-2
> [ 52.325723] sys_mkdirat: retries=10 error=-2
> [ 58.082576] sys_mkdirat: retries=33 error=-2
> [ 59.810461] sys_mkdirat: retries=5 error=-2
> [ 63.387914] sys_mkdirat: retries=14 error=-2
> [ 63.630785] sys_mkdirat: retries=4 error=-2
> [ 68.268903] sys_mkdirat: retries=6 error=-2
> [ 71.124173] sys_mkdirat: retries=99 error=-2
> [ 75.657649] sys_mkdirat: retries=123 error=-2
> [ 76.903814] sys_mkdirat: retries=26 error=-2
> [ 82.009463] sys_mkdirat: retries=30 error=-2
> [ 84.807731] sys_mkdirat: retries=67 error=-2
> [ 89.825529] sys_mkdirat: retries=166 error=-2
> [ 91.599104] sys_mkdirat: retries=8 error=-2
> [ 95.621855] sys_mkdirat: retries=44 error=-2
> [ 98.164588] sys_mkdirat: retries=61 error=-2
> [ 99.783347] sys_mkdirat: retries=11 error=-2
> [ 105.593980] sys_mkdirat: retries=104 error=-2
> [ 110.348861] sys_mkdirat: retries=8 error=-2
> [ 112.087966] sys_mkdirat: retries=46 error=-2
> [ 117.613316] sys_mkdirat: retries=90 error=-2
> [ 120.117550] sys_mkdirat: retries=2 error=-2
> [ 122.624330] sys_mkdirat: retries=15 error=-2
>
> So, now I'm having buyers remorse of sorts about proposing to just
> retry once as that may not be strong enough to fix some of the cases
> we're interested in.
>
> I guess the questions at this point is:
>
> 1) How representative is Peter's mkdir_test() of a real-world workload?
>
> 2) if we assume that it is fairly representative of one, how can we
> achieve retrying indefinitely with NFS, or at least some large finite
> amount?
>
> I have my doubts as to whether it would really be as big a problem for
> other filesystems as Miklos and others have asserted, but I'll take
> their word for it at the moment. What's the best way to contain this
> behavior to just those filesystems that want to retry indefinitely when
> they get an ESTALE? Would we need to go with an entirely new
> ESTALERETRY after all?
>
Maybe we should have a default of a single loop and a tunable to allow clients
to crank it up?
Ric
next prev parent reply other threads:[~2012-04-22 4:16 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-13 11:25 [PATCH RFC] vfs: make fstatat retry on ESTALE errors from getattr call Jeff Layton
2012-04-13 12:02 ` Jim Rees
2012-04-13 12:09 ` Jeff Layton
2012-04-13 15:05 ` Malahal Naineni
2012-04-13 15:42 ` Jeff Layton
2012-04-13 16:07 ` Steve Dickson
2012-04-13 17:10 ` Jeff Layton
2012-04-13 17:34 ` Peter Staubach
2012-04-13 23:00 ` Jeff Layton
2012-04-14 0:57 ` Trond Myklebust
2012-04-15 19:03 ` Bernd Schubert
2012-04-15 19:27 ` J. Bruce Fields
2012-04-16 14:23 ` Bernd Schubert
2012-04-15 19:57 ` Chuck Lever
2012-04-16 11:23 ` Jeff Layton
2012-04-17 11:53 ` Steve Dickson
2012-04-16 11:36 ` Jeff Layton
2012-04-16 12:54 ` Peter Staubach
2012-04-16 16:04 ` Jeff Layton
2012-04-16 14:44 ` Bernd Schubert
2012-04-16 17:46 ` Jeff Layton
2012-04-16 19:33 ` Myklebust, Trond
2012-04-16 19:43 ` Jeff Layton
2012-04-16 20:25 ` Myklebust, Trond
2012-04-16 23:05 ` Jeff Layton
2012-04-17 11:46 ` Steve Dickson
2012-04-17 13:36 ` Jeff Layton
2012-04-17 14:14 ` Steve Dickson
2012-04-17 14:27 ` Miklos Szeredi
2012-04-17 15:02 ` Jeff Layton
2012-04-17 15:50 ` Miklos Szeredi
2012-04-17 16:03 ` Jeff Layton
2012-04-17 15:59 ` Steve Dickson
2012-04-17 13:12 ` Miklos Szeredi
2012-04-17 13:32 ` Jeff Layton
2012-04-17 14:03 ` Miklos Szeredi
2012-04-17 14:22 ` Jeff Layton
2012-04-17 14:04 ` Myklebust, Trond
2012-04-17 14:20 ` Jeff Layton
2012-04-17 15:45 ` J. Bruce Fields
2012-04-17 16:02 ` Miklos Szeredi
2012-04-17 13:39 ` Peter Staubach
2012-04-17 14:08 ` Myklebust, Trond
2012-04-17 14:48 ` Peter Staubach
2012-04-18 15:16 ` Jeff Layton
2012-04-16 19:43 ` Scott Lovenberg
2012-04-16 16:55 ` [PATCH RFC v2] " Jeff Layton
2012-04-18 11:52 ` [PATCH RFC v3] vfs: make fstatat retry once " Jeff Layton
2012-04-20 14:40 ` Jeff Layton
2012-04-20 20:18 ` Steve Dickson
2012-04-20 20:37 ` Malahal Naineni
2012-04-20 21:13 ` Jeff Layton
2012-04-22 5:40 ` Miklos Szeredi
2012-04-23 12:00 ` Jeff Layton
2012-04-23 13:00 ` J. Bruce Fields
2012-04-23 13:12 ` Jeff Layton
2012-04-23 13:34 ` J. Bruce Fields
2012-04-23 13:50 ` Jeff Layton
2012-04-23 13:54 ` J. Bruce Fields
2012-04-23 14:51 ` Miklos Szeredi
2012-04-23 15:02 ` Chuck Lever
2012-04-23 15:23 ` Miklos Szeredi
2012-04-23 17:45 ` Peter Staubach
2012-04-23 15:16 ` Jeff Layton
2012-04-23 15:28 ` Miklos Szeredi
2012-04-23 18:59 ` Jeff Layton
2012-04-20 21:13 ` Jeff Layton
2012-04-23 14:55 ` Steve Dickson
2012-04-23 15:32 ` Jeff Layton
2012-04-23 18:06 ` Steve Dickson
2012-04-23 18:33 ` Jeff Layton
2012-04-23 20:38 ` Peter Staubach
2012-04-24 14:50 ` Jeff Layton
2012-04-24 15:54 ` Miklos Szeredi
2012-04-24 16:34 ` Jeff Layton
2012-04-25 9:41 ` Miklos Szeredi
2012-04-25 12:04 ` Jeff Layton
2012-04-23 17:43 ` Peter Staubach
2012-04-23 19:06 ` Malahal Naineni
2012-04-22 4:16 ` Ric Wheeler [this message]
2012-04-23 11:20 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F938620.2080301@redhat.com \
--to=rwheeler@redhat.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=hch@infradead.org \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=malahal@us.ibm.com \
--cc=michael.brantley@deshaw.com \
--cc=miklos@szeredi.hu \
--cc=pstaubach@exagrid.com \
--cc=rees@umich.edu \
--cc=sven.breuner@itwm.fraunhofer.de \
--cc=trond.myklebust@fys.uio.no \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).