Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Steve Dickson <SteveD@redhat.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, miklos@szeredi.hu,
	viro@ZenIV.linux.org.uk, hch@infradead.org,
	michael.brantley@deshaw.com, sven.breuner@itwm.fraunhofer.de,
	chuck.lever@oracle.com, pstaubach@exagrid.com,
	malahal@us.ibm.com, bfields@fieldses.org,
	trond.myklebust@fys.uio.no, rees@umich.edu
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call
Date: Mon, 23 Apr 2012 14:06:46 -0400	[thread overview]
Message-ID: <4F959A36.2080402@RedHat.com> (raw)
In-Reply-To: <20120423113216.01992555@tlielax.poochiereds.net>



On 04/23/2012 11:32 AM, Jeff Layton wrote:
> On Mon, 23 Apr 2012 10:55:24 -0400
> Steve Dickson <SteveD@redhat.com> wrote:
> 
>>
>>
>> On 04/20/2012 05:13 PM, Jeff Layton wrote:
>>> On Fri, 20 Apr 2012 16:18:37 -0400
>>> Steve Dickson <SteveD@redhat.com> wrote:
>>>
>>>> On 04/20/2012 10:40 AM, Jeff Layton wrote:
>>>>> I guess the questions at this point is:
>>>>>
>>>>> 1) How representative is Peter's mkdir_test() of a real-world workload?
>>>> Reading your email I had to wonder the same thing... What application 
>>>> removes hierarchy of directories in a loop from two different clients?
>>>> I would suspect not many, if any... esp over NFS... 
>>>>  
>>>
>>> Peter's test just happens to demonstrate the problem well, but one
>>> could envision someone removing a heirarchy of directories on the
>>> server while we're trying to do other operations in it. At that point,
>>> we can easily end up hitting an ESTALE twice while doing the lookup and
>>> returning ESTALE back to userspace.
>> Just curious, what happens when you run Peter's mkdir_test() on a
>> local file system? Any errors returned? 
>>
>> I would think removing hierarchy of directories while they are being 
>> accessed has to even cause local fs some type of havoc
>>
> 
> Peter's test only treats an ESTALE error as a failure since it was
> specifically designed to ensure that those didn't make it in to
> userspace.
> 
> If you run 2 copies on the same local fs and strace it, then you'll see
> the syscalls get back things like ENOENT or EEXIST as they step on each
> others' toes in the mkdir()/rmdir() calls.
I figured as much... I just don't see any real world applications remove
directory hierarchies without some type of synchronization locking...
 
> 
>>>
>>>>>
>>>>> 2) if we assume that it is fairly representative of one, how can we
>>>>> achieve retrying indefinitely with NFS, or at least some large finite
>>>>> amount?
>>>> The amount of looping would be peer speculation. If the problem can
>>>> not be handled by one simple retry I would say we simply pass the
>>>> error up to the app... Its an application issue... 
>>>>  
>>>
>>> It's not an application issue. The application just asked the kernel
>>> to do an operation on a pathname. The only reason you're getting an
>>> ESTALE back in this situation is a shortcoming of the implementation.
>>>
>>> We passed it a pathname after all, not a filehandle. ESTALE really has
>>> no place as a return code in that situation...
>> We'll have to agree to disagree... I think any application that is 
>> removing hierarchies of file and directory w/out taking any 
>> precautionary locking is a shortcoming of the application
>> implementation.
>>
> 
> I'm not saying they should never get an error in that situation. I'm
> just saying that an ESTALE return in this situation is wrong (or at
> least not helpful) since the syscall was provided a pathname not a
> filehandle or open fd or anything. When we still have the pathname,
> then we have the ability to reattempt on an ESTALE, and it would be
> preferable to do so.
Point. But if the reestablishment can not be done in one try, the
I say we punt... 

> 
>>>
>>>>>
>>>>> I have my doubts as to whether it would really be as big a problem for
>>>>> other filesystems as Miklos and others have asserted, but I'll take
>>>>> their word for it at the moment. What's the best way to contain this
>>>>> behavior to just those filesystems that want to retry indefinitely when
>>>>> they get an ESTALE? Would we need to go with an entirely new
>>>>> ESTALERETRY after all?
>>>>>
>>>> Introducing a new errno to handle this problem would be overkill IMHO...
>>>>
>>>> If we have to go to the looping approach, I would strong suggest we
>>>> make the file systems register for this type of behavior...
>>>>
>>>
>>> Returning ESTALERETRY would be registering for it in a way and it is
>>> somewhat cleaner than having to go all the way back up to the fstype to
>>> figure out whether you want to retry it or not.
>> How would legacy apps handle this new errno, esp if they have logic
>> to take care of ESTALE errors?
>>
> 
> Userspace should never see that error. 
Why do you say this?  ESTALE the errno has been around forever... 
Its defined in the errno man page "ESTALE - Stale file handle (POSIX.1)" 

> The idea is that this would be a
> kernel-internal error code that indicates to the VFS that it should
> retry the lookup and operation. If the kernel decides to give up after
> the FS returns ESTALERETRY, then we'd have to convert that error
> into ESTALE.
Yeah... I understand the idea... I just don't think another error
code is needed to handle this problem... 

> 
> It'd be preferable to me if we didn't require a new error code, but if
> different filesystems require different semantics from the VFS on an
> ESTALE return, then that is one way to achieve it.
> 
Well I thought the use of the fs_flags to register for this type
of semantics was a good one...

steved.

WARNING: multiple messages have this Message-ID (diff)

From: Steve Dickson <SteveD-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	michael.brantley-Iq/kdjr4a97QT0dZR+AlfA@public.gmane.org,
	sven.breuner-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org,
	chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	pstaubach-83r9SdEf25FBDgjK7y7TUQ@public.gmane.org,
	malahal-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org,
	trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org,
	rees-63aXycvo3TyHXe+LvDLADg@public.gmane.org
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call
Date: Mon, 23 Apr 2012 14:06:46 -0400	[thread overview]
Message-ID: <4F959A36.2080402@RedHat.com> (raw)
In-Reply-To: <20120423113216.01992555-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>



On 04/23/2012 11:32 AM, Jeff Layton wrote:
> On Mon, 23 Apr 2012 10:55:24 -0400
> Steve Dickson <SteveD-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>>
>>
>> On 04/20/2012 05:13 PM, Jeff Layton wrote:
>>> On Fri, 20 Apr 2012 16:18:37 -0400
>>> Steve Dickson <SteveD-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>> On 04/20/2012 10:40 AM, Jeff Layton wrote:
>>>>> I guess the questions at this point is:
>>>>>
>>>>> 1) How representative is Peter's mkdir_test() of a real-world workload?
>>>> Reading your email I had to wonder the same thing... What application 
>>>> removes hierarchy of directories in a loop from two different clients?
>>>> I would suspect not many, if any... esp over NFS... 
>>>>  
>>>
>>> Peter's test just happens to demonstrate the problem well, but one
>>> could envision someone removing a heirarchy of directories on the
>>> server while we're trying to do other operations in it. At that point,
>>> we can easily end up hitting an ESTALE twice while doing the lookup and
>>> returning ESTALE back to userspace.
>> Just curious, what happens when you run Peter's mkdir_test() on a
>> local file system? Any errors returned? 
>>
>> I would think removing hierarchy of directories while they are being 
>> accessed has to even cause local fs some type of havoc
>>
> 
> Peter's test only treats an ESTALE error as a failure since it was
> specifically designed to ensure that those didn't make it in to
> userspace.
> 
> If you run 2 copies on the same local fs and strace it, then you'll see
> the syscalls get back things like ENOENT or EEXIST as they step on each
> others' toes in the mkdir()/rmdir() calls.
I figured as much... I just don't see any real world applications remove
directory hierarchies without some type of synchronization locking...
 
> 
>>>
>>>>>
>>>>> 2) if we assume that it is fairly representative of one, how can we
>>>>> achieve retrying indefinitely with NFS, or at least some large finite
>>>>> amount?
>>>> The amount of looping would be peer speculation. If the problem can
>>>> not be handled by one simple retry I would say we simply pass the
>>>> error up to the app... Its an application issue... 
>>>>  
>>>
>>> It's not an application issue. The application just asked the kernel
>>> to do an operation on a pathname. The only reason you're getting an
>>> ESTALE back in this situation is a shortcoming of the implementation.
>>>
>>> We passed it a pathname after all, not a filehandle. ESTALE really has
>>> no place as a return code in that situation...
>> We'll have to agree to disagree... I think any application that is 
>> removing hierarchies of file and directory w/out taking any 
>> precautionary locking is a shortcoming of the application
>> implementation.
>>
> 
> I'm not saying they should never get an error in that situation. I'm
> just saying that an ESTALE return in this situation is wrong (or at
> least not helpful) since the syscall was provided a pathname not a
> filehandle or open fd or anything. When we still have the pathname,
> then we have the ability to reattempt on an ESTALE, and it would be
> preferable to do so.
Point. But if the reestablishment can not be done in one try, the
I say we punt... 

> 
>>>
>>>>>
>>>>> I have my doubts as to whether it would really be as big a problem for
>>>>> other filesystems as Miklos and others have asserted, but I'll take
>>>>> their word for it at the moment. What's the best way to contain this
>>>>> behavior to just those filesystems that want to retry indefinitely when
>>>>> they get an ESTALE? Would we need to go with an entirely new
>>>>> ESTALERETRY after all?
>>>>>
>>>> Introducing a new errno to handle this problem would be overkill IMHO...
>>>>
>>>> If we have to go to the looping approach, I would strong suggest we
>>>> make the file systems register for this type of behavior...
>>>>
>>>
>>> Returning ESTALERETRY would be registering for it in a way and it is
>>> somewhat cleaner than having to go all the way back up to the fstype to
>>> figure out whether you want to retry it or not.
>> How would legacy apps handle this new errno, esp if they have logic
>> to take care of ESTALE errors?
>>
> 
> Userspace should never see that error. 
Why do you say this?  ESTALE the errno has been around forever... 
Its defined in the errno man page "ESTALE - Stale file handle (POSIX.1)" 

> The idea is that this would be a
> kernel-internal error code that indicates to the VFS that it should
> retry the lookup and operation. If the kernel decides to give up after
> the FS returns ESTALERETRY, then we'd have to convert that error
> into ESTALE.
Yeah... I understand the idea... I just don't think another error
code is needed to handle this problem... 

> 
> It'd be preferable to me if we didn't require a new error code, but if
> different filesystems require different semantics from the VFS on an
> ESTALE return, then that is one way to achieve it.
> 
Well I thought the use of the fs_flags to register for this type
of semantics was a good one...

steved.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2012-04-23 19:02 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-13 11:25 [PATCH RFC] vfs: make fstatat retry on ESTALE errors from getattr call Jeff Layton
2012-04-13 11:25 ` Jeff Layton
2012-04-13 12:02 ` Jim Rees
2012-04-13 12:02   ` Jim Rees
2012-04-13 12:09   ` Jeff Layton
2012-04-13 12:09     ` Jeff Layton
2012-04-13 15:05 ` Malahal Naineni
2012-04-13 15:42   ` Jeff Layton
2012-04-13 16:07     ` Steve Dickson
2012-04-13 17:10       ` Jeff Layton
2012-04-13 17:10         ` Jeff Layton
2012-04-13 17:34       ` Peter Staubach
2012-04-13 17:34         ` Peter Staubach
2012-04-13 23:00         ` Jeff Layton
2012-04-13 23:00           ` Jeff Layton
2012-04-14  0:57         ` Trond Myklebust
2012-04-15 19:03     ` Bernd Schubert
2012-04-15 19:27       ` J. Bruce Fields
2012-04-15 19:27         ` J. Bruce Fields
2012-04-16 14:23         ` Bernd Schubert
2012-04-15 19:57       ` Chuck Lever
2012-04-15 19:57         ` Chuck Lever
2012-04-16 11:23         ` Jeff Layton
2012-04-17 11:53         ` Steve Dickson
2012-04-16 11:36       ` Jeff Layton
2012-04-16 11:36         ` Jeff Layton
2012-04-16 12:54         ` Peter Staubach
2012-04-16 12:54           ` Peter Staubach
2012-04-16 16:04           ` Jeff Layton
2012-04-16 14:44         ` Bernd Schubert
2012-04-16 17:46           ` Jeff Layton
2012-04-16 17:46             ` Jeff Layton
2012-04-16 19:33             ` Myklebust, Trond
2012-04-16 19:33               ` Myklebust, Trond
2012-04-16 19:33               ` Myklebust, Trond
2012-04-16 19:43               ` Jeff Layton
2012-04-16 20:25                 ` Myklebust, Trond
2012-04-16 20:25                   ` Myklebust, Trond
2012-04-16 20:25                   ` Myklebust, Trond
2012-04-16 23:05                   ` Jeff Layton
2012-04-17 11:46                     ` Steve Dickson
2012-04-17 11:46                       ` Steve Dickson
2012-04-17 13:36                       ` Jeff Layton
2012-04-17 13:36                         ` Jeff Layton
2012-04-17 14:14                         ` Steve Dickson
2012-04-17 14:14                           ` Steve Dickson
2012-04-17 14:27                           ` Miklos Szeredi
2012-04-17 15:02                             ` Jeff Layton
2012-04-17 15:50                               ` Miklos Szeredi
2012-04-17 15:50                                 ` Miklos Szeredi
2012-04-17 16:03                                 ` Jeff Layton
2012-04-17 16:03                                   ` Jeff Layton
2012-04-17 15:59                               ` Steve Dickson
2012-04-17 15:59                                 ` Steve Dickson
2012-04-17 13:12                     ` Miklos Szeredi
2012-04-17 13:32                       ` Jeff Layton
2012-04-17 14:03                         ` Miklos Szeredi
2012-04-17 14:22                           ` Jeff Layton
2012-04-17 14:22                             ` Jeff Layton
2012-04-17 14:04                         ` Myklebust, Trond
2012-04-17 14:04                           ` Myklebust, Trond
2012-04-17 14:04                           ` Myklebust, Trond
2012-04-17 14:20                           ` Jeff Layton
2012-04-17 15:45                             ` J. Bruce Fields
2012-04-17 15:45                               ` J. Bruce Fields
2012-04-17 16:02                               ` Miklos Szeredi
2012-04-17 16:02                                 ` Miklos Szeredi
2012-04-17 13:39                     ` Peter Staubach
2012-04-17 14:08                       ` Myklebust, Trond
2012-04-17 14:08                         ` Myklebust, Trond
2012-04-17 14:08                         ` Myklebust, Trond
2012-04-17 14:48                         ` Peter Staubach
2012-04-17 14:48                           ` Peter Staubach
2012-04-17 14:48                           ` Peter Staubach
2012-04-18 15:16                           ` Jeff Layton
2012-04-18 15:16                             ` Jeff Layton
2012-04-16 19:43             ` Scott Lovenberg
2012-04-16 19:43               ` Scott Lovenberg
2012-04-16 16:55 ` [PATCH RFC v2] " Jeff Layton
2012-04-18 11:52 ` [PATCH RFC v3] vfs: make fstatat retry once " Jeff Layton
2012-04-18 11:52   ` Jeff Layton
2012-04-20 14:40   ` Jeff Layton
2012-04-20 20:18     ` Steve Dickson
2012-04-20 20:18       ` Steve Dickson
2012-04-20 20:37       ` Malahal Naineni
2012-04-20 20:37         ` Malahal Naineni
2012-04-20 21:13         ` Jeff Layton
2012-04-22  5:40           ` Miklos Szeredi
2012-04-23 12:00             ` Jeff Layton
2012-04-23 12:00               ` Jeff Layton
2012-04-23 13:00               ` J. Bruce Fields
2012-04-23 13:00                 ` J. Bruce Fields
2012-04-23 13:12                 ` Jeff Layton
2012-04-23 13:12                   ` Jeff Layton
2012-04-23 13:34                   ` J. Bruce Fields
2012-04-23 13:34                     ` J. Bruce Fields
2012-04-23 13:50                     ` Jeff Layton
2012-04-23 13:50                       ` Jeff Layton
2012-04-23 13:54                       ` J. Bruce Fields
2012-04-23 14:51                         ` Miklos Szeredi
2012-04-23 15:02                           ` Chuck Lever
2012-04-23 15:02                             ` Chuck Lever
2012-04-23 15:23                             ` Miklos Szeredi
2012-04-23 17:45                               ` Peter Staubach
2012-04-23 15:16                           ` Jeff Layton
2012-04-23 15:16                             ` Jeff Layton
2012-04-23 15:28                             ` Miklos Szeredi
2012-04-23 18:59                               ` Jeff Layton
2012-04-20 21:13       ` Jeff Layton
2012-04-20 21:13         ` Jeff Layton
2012-04-23 14:55         ` Steve Dickson
2012-04-23 14:55           ` Steve Dickson
2012-04-23 15:32           ` Jeff Layton
2012-04-23 15:32             ` Jeff Layton
2012-04-23 18:06             ` Steve Dickson [this message]
2012-04-23 18:06               ` Steve Dickson
2012-04-23 18:33               ` Jeff Layton
2012-04-23 20:38               ` Peter Staubach
2012-04-23 20:38                 ` Peter Staubach
2012-04-24 14:50                 ` Jeff Layton
2012-04-24 15:54                   ` Miklos Szeredi
2012-04-24 15:54                     ` Miklos Szeredi
2012-04-24 16:34                     ` Jeff Layton
2012-04-25  9:41                       ` Miklos Szeredi
2012-04-25  9:41                         ` Miklos Szeredi
2012-04-25 12:04                         ` Jeff Layton
2012-04-25 12:04                           ` Jeff Layton
2012-04-23 17:43           ` Peter Staubach
2012-04-23 17:43             ` Peter Staubach
2012-04-23 19:06           ` Malahal Naineni
2012-04-23 19:06             ` Malahal Naineni
2012-04-22  4:16     ` Ric Wheeler
2012-04-22  4:16       ` Ric Wheeler
2012-04-23 11:20       ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F959A36.2080402@RedHat.com \
    --to=steved@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=hch@infradead.org \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=malahal@us.ibm.com \
    --cc=michael.brantley@deshaw.com \
    --cc=miklos@szeredi.hu \
    --cc=pstaubach@exagrid.com \
    --cc=rees@umich.edu \
    --cc=sven.breuner@itwm.fraunhofer.de \
    --cc=trond.myklebust@fys.uio.no \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.