* [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler @ 2024-05-21 12:58 Sagi Grimberg 2024-05-21 13:22 ` Jeff Layton 2024-05-21 14:02 ` Chuck Lever 0 siblings, 2 replies; 18+ messages in thread From: Sagi Grimberg @ 2024-05-21 12:58 UTC (permalink / raw) To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Dan Aloni, Christoph Hellwig There is an inherent race where a symlink file may have been overriden (by a different client) between lookup and readlink, resulting in a spurious EIO error returned to userspace. Fix this by propagating back ESTALE errors such that the vfs will retry the lookup/get_link (similar to nfs4_file_open) at least once. Cc: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- Note that with this change the vfs should retry once for ESTALE errors. However with an artificial reproducer of high frequency symlink overrides, nothing prevents the retry to also encounter ESTALE, propagating the error back to userspace. The man pages for openat/readlinkat do not list an ESTALE errno. An alternative attempt (implemented by Dan) was a local retry loop in nfs_get_link(), if this is an applicable approach, Dan can share his patch instead. fs/nfs/symlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c index 0e27a2e4e68b..13818129d268 100644 --- a/fs/nfs/symlink.c +++ b/fs/nfs/symlink.c @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio) error: folio_set_error(folio); folio_unlock(folio); - return -EIO; + return error; } static const char *nfs_get_link(struct dentry *dentry, -- 2.40.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 12:58 [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler Sagi Grimberg @ 2024-05-21 13:22 ` Jeff Layton 2024-05-21 14:36 ` Dan Aloni 2024-05-21 15:05 ` Sagi Grimberg 2024-05-21 14:02 ` Chuck Lever 1 sibling, 2 replies; 18+ messages in thread From: Jeff Layton @ 2024-05-21 13:22 UTC (permalink / raw) To: Sagi Grimberg, linux-nfs; +Cc: Chuck Lever, Dan Aloni, Christoph Hellwig On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > There is an inherent race where a symlink file may have been overriden > (by a different client) between lookup and readlink, resulting in a > spurious EIO error returned to userspace. Fix this by propagating back > ESTALE errors such that the vfs will retry the lookup/get_link (similar > to nfs4_file_open) at least once. > > Cc: Dan Aloni <dan.aloni@vastdata.com> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > Note that with this change the vfs should retry once for > ESTALE errors. However with an artificial reproducer of high > frequency symlink overrides, nothing prevents the retry to > also encounter ESTALE, propagating the error back to userspace. > The man pages for openat/readlinkat do not list an ESTALE errno. > > An alternative attempt (implemented by Dan) was a local retry loop > in nfs_get_link(), if this is an applicable approach, Dan can > share his patch instead. > > fs/nfs/symlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > index 0e27a2e4e68b..13818129d268 100644 > --- a/fs/nfs/symlink.c > +++ b/fs/nfs/symlink.c > @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio) > error: > folio_set_error(folio); > folio_unlock(folio); > - return -EIO; > + return error; > } > > static const char *nfs_get_link(struct dentry *dentry, git blame seems to indicate that we've returned -EIO here since the beginning of the git era (and likely long before that). I see no reason for us to cloak the real error there though, especially with something like an ESTALE error. Reviewed-by: Jeff Layton <jlayton@kernel.org> FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond what we already do. Yes, we can sometimes trigger ESTALE errors to bubble up to userland if we really thrash the underlying filesystem when testing, but I think that's actually desirable: If you have real workloads across multiple machines that are racing with other that tightly, then you should probably be using some sort of locking or other synchronization. If it's clever enough that it doesn''t need that, then it should be able to deal with the occasional ESTALE error by retrying on its own. Cheers, Jeff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 13:22 ` Jeff Layton @ 2024-05-21 14:36 ` Dan Aloni 2024-05-21 15:05 ` Sagi Grimberg 1 sibling, 0 replies; 18+ messages in thread From: Dan Aloni @ 2024-05-21 14:36 UTC (permalink / raw) To: Jeff Layton; +Cc: Sagi Grimberg, linux-nfs, Chuck Lever, Christoph Hellwig On 2024-05-21 09:22:46, Jeff Layton wrote: > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > > There is an inherent race where a symlink file may have been overriden > > (by a different client) between lookup and readlink, resulting in a > > spurious EIO error returned to userspace. Fix this by propagating back > > ESTALE errors such that the vfs will retry the lookup/get_link (similar > > to nfs4_file_open) at least once. [..] > FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond > what we already do. > > Yes, we can sometimes trigger ESTALE errors to bubble up to userland if > we really thrash the underlying filesystem when testing, but I think > that's actually desirable: > > If you have real workloads across multiple machines that are racing > with other that tightly, then you should probably be using some sort of > locking or other synchronization. If it's clever enough that it > doesn''t need that, then it should be able to deal with the occasional > ESTALE error by retrying on its own. We saw an issue in a real workload employing the method I describe in the following test case, where the user was surprised getting an error. It's a symlink atomicity semantics case, where there's a symlink that is frequently being overridden using a rename. This use case works well with local file systems and with some other network file systems implementations (this was also noticeable as other options where tested). There is fixed set of regular files `test_file_{1,2,3}`, and a 'shunt' symlink that keeps getting recreated to one of them: while true; do i=1; while (( i <= 3 )) ; do ln -s -f test_file_$i shunt i=$((i + 1)) done done Behind the scenes `ln` creates a symlink with a random name, then performs the `rename` system call to override `shunt`. When another client is trying to call `open` via the symlink so it would necessarily resolve to one of the regular files client-side. The previous FH of `shunt` becomes stale and sometimes fails this test. It is why we saw a retry loop being worthwhile to implement, whether inside the NFS client or outside in the VFS layer, the justification for it was to prevent the workload from breaking when moving between file systems. -- Dan Aloni ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 13:22 ` Jeff Layton 2024-05-21 14:36 ` Dan Aloni @ 2024-05-21 15:05 ` Sagi Grimberg 2024-05-21 15:13 ` Trond Myklebust 1 sibling, 1 reply; 18+ messages in thread From: Sagi Grimberg @ 2024-05-21 15:05 UTC (permalink / raw) To: Jeff Layton, linux-nfs; +Cc: Chuck Lever, Dan Aloni, Christoph Hellwig On 21/05/2024 16:22, Jeff Layton wrote: > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: >> There is an inherent race where a symlink file may have been overriden >> (by a different client) between lookup and readlink, resulting in a >> spurious EIO error returned to userspace. Fix this by propagating back >> ESTALE errors such that the vfs will retry the lookup/get_link (similar >> to nfs4_file_open) at least once. >> >> Cc: Dan Aloni <dan.aloni@vastdata.com> >> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >> --- >> Note that with this change the vfs should retry once for >> ESTALE errors. However with an artificial reproducer of high >> frequency symlink overrides, nothing prevents the retry to >> also encounter ESTALE, propagating the error back to userspace. >> The man pages for openat/readlinkat do not list an ESTALE errno. >> >> An alternative attempt (implemented by Dan) was a local retry loop >> in nfs_get_link(), if this is an applicable approach, Dan can >> share his patch instead. >> >> fs/nfs/symlink.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c >> index 0e27a2e4e68b..13818129d268 100644 >> --- a/fs/nfs/symlink.c >> +++ b/fs/nfs/symlink.c >> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio) >> error: >> folio_set_error(folio); >> folio_unlock(folio); >> - return -EIO; >> + return error; >> } >> >> static const char *nfs_get_link(struct dentry *dentry, > git blame seems to indicate that we've returned -EIO here since the > beginning of the git era (and likely long before that). I see no reason > for us to cloak the real error there though, especially with something > like an ESTALE error. > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > > FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond > what we already do. > > Yes, we can sometimes trigger ESTALE errors to bubble up to userland if > we really thrash the underlying filesystem when testing, but I think > that's actually desirable: Returning ESTALE would be an improvement over returning EIO IMO, but it may be surprising for userspace to see an undocumented errno. Maybe the man pages can be amended? > > If you have real workloads across multiple machines that are racing > with other that tightly, then you should probably be using some sort of > locking or other synchronization. If it's clever enough that it > doesn''t need that, then it should be able to deal with the occasional > ESTALE error by retrying on its own. I tend to agree. FWIW Solaris has a config knob for number of stale retries it does, maybe there is an appetite to have something like that as well? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 15:05 ` Sagi Grimberg @ 2024-05-21 15:13 ` Trond Myklebust 2024-05-21 15:24 ` Chuck Lever III 2024-05-21 16:09 ` Sagi Grimberg 0 siblings, 2 replies; 18+ messages in thread From: Trond Myklebust @ 2024-05-21 15:13 UTC (permalink / raw) To: linux-nfs@vger.kernel.org, sagi@grimberg.me, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: > > > On 21/05/2024 16:22, Jeff Layton wrote: > > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > > > There is an inherent race where a symlink file may have been > > > overriden > > > (by a different client) between lookup and readlink, resulting in > > > a > > > spurious EIO error returned to userspace. Fix this by propagating > > > back > > > ESTALE errors such that the vfs will retry the lookup/get_link > > > (similar > > > to nfs4_file_open) at least once. > > > > > > Cc: Dan Aloni <dan.aloni@vastdata.com> > > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > > > --- > > > Note that with this change the vfs should retry once for > > > ESTALE errors. However with an artificial reproducer of high > > > frequency symlink overrides, nothing prevents the retry to > > > also encounter ESTALE, propagating the error back to userspace. > > > The man pages for openat/readlinkat do not list an ESTALE errno. > > > > > > An alternative attempt (implemented by Dan) was a local retry > > > loop > > > in nfs_get_link(), if this is an applicable approach, Dan can > > > share his patch instead. > > > > > > fs/nfs/symlink.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > > > index 0e27a2e4e68b..13818129d268 100644 > > > --- a/fs/nfs/symlink.c > > > +++ b/fs/nfs/symlink.c > > > @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file > > > *file, struct folio *folio) > > > error: > > > folio_set_error(folio); > > > folio_unlock(folio); > > > - return -EIO; > > > + return error; > > > } > > > > > > static const char *nfs_get_link(struct dentry *dentry, > > git blame seems to indicate that we've returned -EIO here since the > > beginning of the git era (and likely long before that). I see no > > reason > > for us to cloak the real error there though, especially with > > something > > like an ESTALE error. > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > > > > FWIW, I think we shouldn't try to do any retry looping on ESTALE > > beyond > > what we already do. > > > > Yes, we can sometimes trigger ESTALE errors to bubble up to > > userland if > > we really thrash the underlying filesystem when testing, but I > > think > > that's actually desirable: > > Returning ESTALE would be an improvement over returning EIO IMO, > but it may be surprising for userspace to see an undocumented errno. > Maybe the man pages can be amended? > > > > > If you have real workloads across multiple machines that are racing > > with other that tightly, then you should probably be using some > > sort of > > locking or other synchronization. If it's clever enough that it > > doesn''t need that, then it should be able to deal with the > > occasional > > ESTALE error by retrying on its own. > > I tend to agree. FWIW Solaris has a config knob for number of stale > retries > it does, maybe there is an appetite to have something like that as > well? > Any reason why we couldn't just return ENOENT in the case where the filehandle is stale? There will have been an unlink() on the symlink at some point in the recent past. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 15:13 ` Trond Myklebust @ 2024-05-21 15:24 ` Chuck Lever III 2024-05-22 4:41 ` Dan Aloni 2024-05-21 16:09 ` Sagi Grimberg 1 sibling, 1 reply; 18+ messages in thread From: Chuck Lever III @ 2024-05-21 15:24 UTC (permalink / raw) To: Trond Myklebust Cc: Linux NFS Mailing List, sagi@grimberg.me, jlayton@kernel.org, hch@lst.de, dan.aloni@vastdata.com > On May 21, 2024, at 11:13 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: >> >> >> On 21/05/2024 16:22, Jeff Layton wrote: >>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: >>>> There is an inherent race where a symlink file may have been >>>> overriden >>>> (by a different client) between lookup and readlink, resulting in >>>> a >>>> spurious EIO error returned to userspace. Fix this by propagating >>>> back >>>> ESTALE errors such that the vfs will retry the lookup/get_link >>>> (similar >>>> to nfs4_file_open) at least once. >>>> >>>> Cc: Dan Aloni <dan.aloni@vastdata.com> >>>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >>>> --- >>>> Note that with this change the vfs should retry once for >>>> ESTALE errors. However with an artificial reproducer of high >>>> frequency symlink overrides, nothing prevents the retry to >>>> also encounter ESTALE, propagating the error back to userspace. >>>> The man pages for openat/readlinkat do not list an ESTALE errno. >>>> >>>> An alternative attempt (implemented by Dan) was a local retry >>>> loop >>>> in nfs_get_link(), if this is an applicable approach, Dan can >>>> share his patch instead. >>>> >>>> fs/nfs/symlink.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c >>>> index 0e27a2e4e68b..13818129d268 100644 >>>> --- a/fs/nfs/symlink.c >>>> +++ b/fs/nfs/symlink.c >>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file >>>> *file, struct folio *folio) >>>> error: >>>> folio_set_error(folio); >>>> folio_unlock(folio); >>>> - return -EIO; >>>> + return error; >>>> } >>>> >>>> static const char *nfs_get_link(struct dentry *dentry, >>> git blame seems to indicate that we've returned -EIO here since the >>> beginning of the git era (and likely long before that). I see no >>> reason >>> for us to cloak the real error there though, especially with >>> something >>> like an ESTALE error. >>> >>> Reviewed-by: Jeff Layton <jlayton@kernel.org> >>> >>> FWIW, I think we shouldn't try to do any retry looping on ESTALE >>> beyond >>> what we already do. >>> >>> Yes, we can sometimes trigger ESTALE errors to bubble up to >>> userland if >>> we really thrash the underlying filesystem when testing, but I >>> think >>> that's actually desirable: >> >> Returning ESTALE would be an improvement over returning EIO IMO, >> but it may be surprising for userspace to see an undocumented errno. >> Maybe the man pages can be amended? >> >>> >>> If you have real workloads across multiple machines that are racing >>> with other that tightly, then you should probably be using some >>> sort of >>> locking or other synchronization. If it's clever enough that it >>> doesn''t need that, then it should be able to deal with the >>> occasional >>> ESTALE error by retrying on its own. >> >> I tend to agree. FWIW Solaris has a config knob for number of stale >> retries >> it does, maybe there is an appetite to have something like that as >> well? >> > > Any reason why we couldn't just return ENOENT in the case where the > filehandle is stale? There will have been an unlink() on the symlink at > some point in the recent past. To me ENOENT is preferable to both EIO and ESTALE. -- Chuck Lever ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 15:24 ` Chuck Lever III @ 2024-05-22 4:41 ` Dan Aloni 2024-05-22 12:11 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Dan Aloni @ 2024-05-22 4:41 UTC (permalink / raw) To: Chuck Lever III Cc: Trond Myklebust, Linux NFS Mailing List, sagi@grimberg.me, jlayton@kernel.org, hch@lst.de On 2024-05-21 15:24:19, Chuck Lever III wrote: > > > > On May 21, 2024, at 11:13 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: > >> > >> > >> On 21/05/2024 16:22, Jeff Layton wrote: > >>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > >>>> There is an inherent race where a symlink file may have been > >>>> overriden > >>>> (by a different client) between lookup and readlink, resulting in > >>>> a > >>>> spurious EIO error returned to userspace. Fix this by propagating > >>>> back > >>>> ESTALE errors such that the vfs will retry the lookup/get_link > >>>> (similar > >>>> to nfs4_file_open) at least once. > >>>> > >>>> Cc: Dan Aloni <dan.aloni@vastdata.com> > >>>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > >>>> --- > >>>> Note that with this change the vfs should retry once for > >>>> ESTALE errors. However with an artificial reproducer of high > >>>> frequency symlink overrides, nothing prevents the retry to > >>>> also encounter ESTALE, propagating the error back to userspace. > >>>> The man pages for openat/readlinkat do not list an ESTALE errno. > >>>> > >>>> An alternative attempt (implemented by Dan) was a local retry > >>>> loop > >>>> in nfs_get_link(), if this is an applicable approach, Dan can > >>>> share his patch instead. > >>>> > >>>> fs/nfs/symlink.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > >>>> index 0e27a2e4e68b..13818129d268 100644 > >>>> --- a/fs/nfs/symlink.c > >>>> +++ b/fs/nfs/symlink.c > >>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file > >>>> *file, struct folio *folio) > >>>> error: > >>>> folio_set_error(folio); > >>>> folio_unlock(folio); > >>>> - return -EIO; > >>>> + return error; > >>>> } > >>>> > >>>> static const char *nfs_get_link(struct dentry *dentry, > >>> git blame seems to indicate that we've returned -EIO here since the > >>> beginning of the git era (and likely long before that). I see no > >>> reason > >>> for us to cloak the real error there though, especially with > >>> something > >>> like an ESTALE error. > >>> > >>> Reviewed-by: Jeff Layton <jlayton@kernel.org> > >>> > >>> FWIW, I think we shouldn't try to do any retry looping on ESTALE > >>> beyond > >>> what we already do. > >>> > >>> Yes, we can sometimes trigger ESTALE errors to bubble up to > >>> userland if > >>> we really thrash the underlying filesystem when testing, but I > >>> think > >>> that's actually desirable: > >> > >> Returning ESTALE would be an improvement over returning EIO IMO, > >> but it may be surprising for userspace to see an undocumented errno. > >> Maybe the man pages can be amended? > >> > >>> > >>> If you have real workloads across multiple machines that are racing > >>> with other that tightly, then you should probably be using some > >>> sort of > >>> locking or other synchronization. If it's clever enough that it > >>> doesn''t need that, then it should be able to deal with the > >>> occasional > >>> ESTALE error by retrying on its own. > >> > >> I tend to agree. FWIW Solaris has a config knob for number of stale > >> retries > >> it does, maybe there is an appetite to have something like that as > >> well? > >> > > > > Any reason why we couldn't just return ENOENT in the case where the > > filehandle is stale? There will have been an unlink() on the symlink at > > some point in the recent past. > > To me ENOENT is preferable to both EIO and ESTALE. Another view on that, where in the scenario of `rename` causing the unlinking, there was no situation of 'no entry' as the directory entry was only updated and not removed. So ENOENT in this regard by the meaning of 'no entry' would not reflect what has really happened. (unless you go with the 'no entity' interpretation of ENOENT, but that would be against most of the POSIX-spec cases where ENOENT is returned which deal primarily with missing path components i.e. names to objects and not the objects themselves) -- Dan Aloni ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-22 4:41 ` Dan Aloni @ 2024-05-22 12:11 ` Trond Myklebust 0 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2024-05-22 12:11 UTC (permalink / raw) To: dan.aloni@vastdata.com, chuck.lever@oracle.com Cc: hch@lst.de, linux-nfs@vger.kernel.org, sagi@grimberg.me, jlayton@kernel.org On Wed, 2024-05-22 at 07:41 +0300, Dan Aloni wrote: > On 2024-05-21 15:24:19, Chuck Lever III wrote: > > > > > > > On May 21, 2024, at 11:13 AM, Trond Myklebust > > > <trondmy@hammerspace.com> wrote: > > > > > > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: > > > > > > > > > > > > On 21/05/2024 16:22, Jeff Layton wrote: > > > > > On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: > > > > > > There is an inherent race where a symlink file may have > > > > > > been > > > > > > overriden > > > > > > (by a different client) between lookup and readlink, > > > > > > resulting in > > > > > > a > > > > > > spurious EIO error returned to userspace. Fix this by > > > > > > propagating > > > > > > back > > > > > > ESTALE errors such that the vfs will retry the > > > > > > lookup/get_link > > > > > > (similar > > > > > > to nfs4_file_open) at least once. > > > > > > > > > > > > Cc: Dan Aloni <dan.aloni@vastdata.com> > > > > > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > > > > > > --- > > > > > > Note that with this change the vfs should retry once for > > > > > > ESTALE errors. However with an artificial reproducer of > > > > > > high > > > > > > frequency symlink overrides, nothing prevents the retry to > > > > > > also encounter ESTALE, propagating the error back to > > > > > > userspace. > > > > > > The man pages for openat/readlinkat do not list an ESTALE > > > > > > errno. > > > > > > > > > > > > An alternative attempt (implemented by Dan) was a local > > > > > > retry > > > > > > loop > > > > > > in nfs_get_link(), if this is an applicable approach, Dan > > > > > > can > > > > > > share his patch instead. > > > > > > > > > > > > fs/nfs/symlink.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > > > > > > index 0e27a2e4e68b..13818129d268 100644 > > > > > > --- a/fs/nfs/symlink.c > > > > > > +++ b/fs/nfs/symlink.c > > > > > > @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file > > > > > > *file, struct folio *folio) > > > > > > error: > > > > > > folio_set_error(folio); > > > > > > folio_unlock(folio); > > > > > > - return -EIO; > > > > > > + return error; > > > > > > } > > > > > > > > > > > > static const char *nfs_get_link(struct dentry *dentry, > > > > > git blame seems to indicate that we've returned -EIO here > > > > > since the > > > > > beginning of the git era (and likely long before that). I see > > > > > no > > > > > reason > > > > > for us to cloak the real error there though, especially with > > > > > something > > > > > like an ESTALE error. > > > > > > > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > > > > > > > > > > FWIW, I think we shouldn't try to do any retry looping on > > > > > ESTALE > > > > > beyond > > > > > what we already do. > > > > > > > > > > Yes, we can sometimes trigger ESTALE errors to bubble up to > > > > > userland if > > > > > we really thrash the underlying filesystem when testing, but > > > > > I > > > > > think > > > > > that's actually desirable: > > > > > > > > Returning ESTALE would be an improvement over returning EIO > > > > IMO, > > > > but it may be surprising for userspace to see an undocumented > > > > errno. > > > > Maybe the man pages can be amended? > > > > > > > > > > > > > > If you have real workloads across multiple machines that are > > > > > racing > > > > > with other that tightly, then you should probably be using > > > > > some > > > > > sort of > > > > > locking or other synchronization. If it's clever enough that > > > > > it > > > > > doesn''t need that, then it should be able to deal with the > > > > > occasional > > > > > ESTALE error by retrying on its own. > > > > > > > > I tend to agree. FWIW Solaris has a config knob for number of > > > > stale > > > > retries > > > > it does, maybe there is an appetite to have something like that > > > > as > > > > well? > > > > > > > > > > Any reason why we couldn't just return ENOENT in the case where > > > the > > > filehandle is stale? There will have been an unlink() on the > > > symlink at > > > some point in the recent past. > > > > To me ENOENT is preferable to both EIO and ESTALE. > > Another view on that, where in the scenario of `rename` causing the > unlinking, there was no situation of 'no entry' as the directory > entry > was only updated and not removed. So ENOENT in this regard by the > meaning of 'no entry' would not reflect what has really happened. > > (unless you go with the 'no entity' interpretation of ENOENT, but > that > would be against most of the POSIX-spec cases where ENOENT is > returned > which deal primarily with missing path components i.e. names to > objects and not the objects themselves) > The Linux NFS client doesn't support volatile filehandles. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 15:13 ` Trond Myklebust 2024-05-21 15:24 ` Chuck Lever III @ 2024-05-21 16:09 ` Sagi Grimberg 2024-05-22 19:40 ` Sagi Grimberg 1 sibling, 1 reply; 18+ messages in thread From: Sagi Grimberg @ 2024-05-21 16:09 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On 21/05/2024 18:13, Trond Myklebust wrote: > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: >> >> On 21/05/2024 16:22, Jeff Layton wrote: >>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: >>>> There is an inherent race where a symlink file may have been >>>> overriden >>>> (by a different client) between lookup and readlink, resulting in >>>> a >>>> spurious EIO error returned to userspace. Fix this by propagating >>>> back >>>> ESTALE errors such that the vfs will retry the lookup/get_link >>>> (similar >>>> to nfs4_file_open) at least once. >>>> >>>> Cc: Dan Aloni <dan.aloni@vastdata.com> >>>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >>>> --- >>>> Note that with this change the vfs should retry once for >>>> ESTALE errors. However with an artificial reproducer of high >>>> frequency symlink overrides, nothing prevents the retry to >>>> also encounter ESTALE, propagating the error back to userspace. >>>> The man pages for openat/readlinkat do not list an ESTALE errno. >>>> >>>> An alternative attempt (implemented by Dan) was a local retry >>>> loop >>>> in nfs_get_link(), if this is an applicable approach, Dan can >>>> share his patch instead. >>>> >>>> fs/nfs/symlink.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c >>>> index 0e27a2e4e68b..13818129d268 100644 >>>> --- a/fs/nfs/symlink.c >>>> +++ b/fs/nfs/symlink.c >>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file >>>> *file, struct folio *folio) >>>> error: >>>> folio_set_error(folio); >>>> folio_unlock(folio); >>>> - return -EIO; >>>> + return error; >>>> } >>>> >>>> static const char *nfs_get_link(struct dentry *dentry, >>> git blame seems to indicate that we've returned -EIO here since the >>> beginning of the git era (and likely long before that). I see no >>> reason >>> for us to cloak the real error there though, especially with >>> something >>> like an ESTALE error. >>> >>> Reviewed-by: Jeff Layton <jlayton@kernel.org> >>> >>> FWIW, I think we shouldn't try to do any retry looping on ESTALE >>> beyond >>> what we already do. >>> >>> Yes, we can sometimes trigger ESTALE errors to bubble up to >>> userland if >>> we really thrash the underlying filesystem when testing, but I >>> think >>> that's actually desirable: >> Returning ESTALE would be an improvement over returning EIO IMO, >> but it may be surprising for userspace to see an undocumented errno. >> Maybe the man pages can be amended? >> >>> If you have real workloads across multiple machines that are racing >>> with other that tightly, then you should probably be using some >>> sort of >>> locking or other synchronization. If it's clever enough that it >>> doesn''t need that, then it should be able to deal with the >>> occasional >>> ESTALE error by retrying on its own. >> I tend to agree. FWIW Solaris has a config knob for number of stale >> retries >> it does, maybe there is an appetite to have something like that as >> well? >> > Any reason why we couldn't just return ENOENT in the case where the > filehandle is stale? There will have been an unlink() on the symlink at > some point in the recent past. > No reason that I can see. However given that this was observed in the wild, and essentially a common pattern with symlinks (overwrite a config file for example), I think its reasonable to have the vfs at least do a single retry, by simply returning ESTALE. However NFS cannot distinguish between first and second retries afaict... Perhaps the vfs can help with a ESTALE->ENOENT conversion? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 16:09 ` Sagi Grimberg @ 2024-05-22 19:40 ` Sagi Grimberg 2024-05-22 21:04 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Sagi Grimberg @ 2024-05-22 19:40 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com Hey Trond, >> filehandle is stale? There will have been an unlink() on the symlink at >> some point in the recent past. >> > > No reason that I can see. However given that this was observed in the > wild, and essentially > a common pattern with symlinks (overwrite a config file for example), > I think its reasonable > to have the vfs at least do a single retry, by simply returning ESTALE. > However NFS cannot distinguish between first and second retries > afaict... Perhaps the > vfs can help with a ESTALE->ENOENT conversion? So what do you suggest we do here? IMO at a minimum NFS should retry once similar to nfs4_file_open (it would probably address 99.9% of the use-cases where symlinks are not overwritten in a high enough frequency for the client to see 2 consecutive stale readlink rpc rplies). I can send a patch paired with a vfs ESTALE conversion patch? alternatively retry locally in NFS... I would like to understand your position here. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-22 19:40 ` Sagi Grimberg @ 2024-05-22 21:04 ` Trond Myklebust 2024-05-22 21:19 ` Sagi Grimberg 2024-05-22 21:20 ` Jeff Layton 0 siblings, 2 replies; 18+ messages in thread From: Trond Myklebust @ 2024-05-22 21:04 UTC (permalink / raw) To: linux-nfs@vger.kernel.org, sagi@grimberg.me, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On Wed, 2024-05-22 at 22:40 +0300, Sagi Grimberg wrote: > Hey Trond, > > > > filehandle is stale? There will have been an unlink() on the > > > symlink at > > > some point in the recent past. > > > > > > > No reason that I can see. However given that this was observed in > > the > > wild, and essentially > > a common pattern with symlinks (overwrite a config file for > > example), > > I think its reasonable > > to have the vfs at least do a single retry, by simply returning > > ESTALE. > > However NFS cannot distinguish between first and second retries > > afaict... Perhaps the > > vfs can help with a ESTALE->ENOENT conversion? > > So what do you suggest we do here? IMO at a minimum NFS should retry > once similar > to nfs4_file_open (it would probably address 99.9% of the use-cases > where symlinks are > not overwritten in a high enough frequency for the client to see 2 > consecutive stale readlink > rpc rplies). > > I can send a patch paired with a vfs ESTALE conversion patch? > alternatively retry locally in NFS... > I would like to understand your position here. > Looking more closely at nfs_get_link(), it is obvious that it can already return ESTALE (thanks to the call to nfs_revalidate_mapping()) and looking at do_readlinkat(), it has already been plumbed through with a call to retry_estale(). So I think we can take your patch as is, since it doesn't add any error cases that callers of readlink() don't have to handle already. We might still want to think about cleaning up the output of the VFS in all these cases, so that we don't return ESTALE when it isn't allowed by POSIX, but that would be a separate task. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-22 21:04 ` Trond Myklebust @ 2024-05-22 21:19 ` Sagi Grimberg 2024-05-30 18:08 ` Sagi Grimberg 2024-05-22 21:20 ` Jeff Layton 1 sibling, 1 reply; 18+ messages in thread From: Sagi Grimberg @ 2024-05-22 21:19 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com >> So what do you suggest we do here? IMO at a minimum NFS should retry >> once similar >> to nfs4_file_open (it would probably address 99.9% of the use-cases >> where symlinks are >> not overwritten in a high enough frequency for the client to see 2 >> consecutive stale readlink >> rpc rplies). >> >> I can send a patch paired with a vfs ESTALE conversion patch? >> alternatively retry locally in NFS... >> I would like to understand your position here. >> > Looking more closely at nfs_get_link(), it is obvious that it can > already return ESTALE (thanks to the call to nfs_revalidate_mapping()) > and looking at do_readlinkat(), it has already been plumbed through > with a call to retry_estale(). > > So I think we can take your patch as is, since it doesn't add any error > cases that callers of readlink() don't have to handle already. Sounds good. > > We might still want to think about cleaning up the output of the VFS in > all these cases, so that we don't return ESTALE when it isn't allowed > by POSIX, but that would be a separate task. > Yes, I can follow up on that later... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-22 21:19 ` Sagi Grimberg @ 2024-05-30 18:08 ` Sagi Grimberg 2024-05-30 18:21 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Sagi Grimberg @ 2024-05-30 18:08 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On 23/05/2024 0:19, Sagi Grimberg wrote: > >>> So what do you suggest we do here? IMO at a minimum NFS should retry >>> once similar >>> to nfs4_file_open (it would probably address 99.9% of the use-cases >>> where symlinks are >>> not overwritten in a high enough frequency for the client to see 2 >>> consecutive stale readlink >>> rpc rplies). >>> >>> I can send a patch paired with a vfs ESTALE conversion patch? >>> alternatively retry locally in NFS... >>> I would like to understand your position here. >>> >> Looking more closely at nfs_get_link(), it is obvious that it can >> already return ESTALE (thanks to the call to nfs_revalidate_mapping()) >> and looking at do_readlinkat(), it has already been plumbed through >> with a call to retry_estale(). >> >> So I think we can take your patch as is, since it doesn't add any error >> cases that callers of readlink() don't have to handle already. > > Sounds good. > >> >> We might still want to think about cleaning up the output of the VFS in >> all these cases, so that we don't return ESTALE when it isn't allowed >> by POSIX, but that would be a separate task. >> > > Yes, I can follow up on that later... > Hey Trond, is there anything else you are expecting to see before this is taken to your tree? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-30 18:08 ` Sagi Grimberg @ 2024-05-30 18:21 ` Trond Myklebust 2024-05-31 4:24 ` Sagi Grimberg 0 siblings, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2024-05-30 18:21 UTC (permalink / raw) To: linux-nfs@vger.kernel.org, sagi@grimberg.me, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On Thu, 2024-05-30 at 21:08 +0300, Sagi Grimberg wrote: > > > On 23/05/2024 0:19, Sagi Grimberg wrote: > > > > > > So what do you suggest we do here? IMO at a minimum NFS should > > > > retry > > > > once similar > > > > to nfs4_file_open (it would probably address 99.9% of the use- > > > > cases > > > > where symlinks are > > > > not overwritten in a high enough frequency for the client to > > > > see 2 > > > > consecutive stale readlink > > > > rpc rplies). > > > > > > > > I can send a patch paired with a vfs ESTALE conversion patch? > > > > alternatively retry locally in NFS... > > > > I would like to understand your position here. > > > > > > > Looking more closely at nfs_get_link(), it is obvious that it can > > > already return ESTALE (thanks to the call to > > > nfs_revalidate_mapping()) > > > and looking at do_readlinkat(), it has already been plumbed > > > through > > > with a call to retry_estale(). > > > > > > So I think we can take your patch as is, since it doesn't add any > > > error > > > cases that callers of readlink() don't have to handle already. > > > > Sounds good. > > > > > > > > We might still want to think about cleaning up the output of the > > > VFS in > > > all these cases, so that we don't return ESTALE when it isn't > > > allowed > > > by POSIX, but that would be a separate task. > > > > > > > Yes, I can follow up on that later... > > > > Hey Trond, > is there anything else you are expecting to see before this is taken > to > your tree? > It's already queued in my testing branch: https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/testing I'll probably push that into the linux-next branch over the weekend. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-30 18:21 ` Trond Myklebust @ 2024-05-31 4:24 ` Sagi Grimberg 0 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2024-05-31 4:24 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, jlayton@kernel.org Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com >> Hey Trond, >> is there anything else you are expecting to see before this is taken >> to >> your tree? >> > It's already queued in my testing branch: > https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/testing > I'll probably push that into the linux-next branch over the weekend. > Ah, I was looking at your linux-next. Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-22 21:04 ` Trond Myklebust 2024-05-22 21:19 ` Sagi Grimberg @ 2024-05-22 21:20 ` Jeff Layton 1 sibling, 0 replies; 18+ messages in thread From: Jeff Layton @ 2024-05-22 21:20 UTC (permalink / raw) To: Trond Myklebust, linux-nfs@vger.kernel.org, sagi@grimberg.me Cc: hch@lst.de, dan.aloni@vastdata.com, chuck.lever@oracle.com On Wed, 2024-05-22 at 21:04 +0000, Trond Myklebust wrote: > On Wed, 2024-05-22 at 22:40 +0300, Sagi Grimberg wrote: > > Hey Trond, > > > > > > filehandle is stale? There will have been an unlink() on the > > > > symlink at > > > > some point in the recent past. > > > > > > > > > > No reason that I can see. However given that this was observed in > > > the > > > wild, and essentially > > > a common pattern with symlinks (overwrite a config file for > > > example), > > > I think its reasonable > > > to have the vfs at least do a single retry, by simply returning > > > ESTALE. > > > However NFS cannot distinguish between first and second retries > > > afaict... Perhaps the > > > vfs can help with a ESTALE->ENOENT conversion? > > > > So what do you suggest we do here? IMO at a minimum NFS should retry > > once similar > > to nfs4_file_open (it would probably address 99.9% of the use-cases > > where symlinks are > > not overwritten in a high enough frequency for the client to see 2 > > consecutive stale readlink > > rpc rplies). > > > > I can send a patch paired with a vfs ESTALE conversion patch? > > alternatively retry locally in NFS... > > I would like to understand your position here. > > > > Looking more closely at nfs_get_link(), it is obvious that it can > already return ESTALE (thanks to the call to nfs_revalidate_mapping()) > and looking at do_readlinkat(), it has already been plumbed through > with a call to retry_estale(). > > So I think we can take your patch as is, since it doesn't add any error > cases that callers of readlink() don't have to handle already. > > We might still want to think about cleaning up the output of the VFS in > all these cases, so that we don't return ESTALE when it isn't allowed > by POSIX, but that would be a separate task. > I think we can effectively turn ESTALE into ENOENT in most (all?) syscalls that take a pathname, since you can argue that it would have been an ENOENT had we gotten in there just a little later. To fix this the right way, I think you'd have to plumb this translation into most path-based syscall handlers at a fairly high level. Maybe we need some sort of generic sanitize_errno() handler that we call from these sorts of calls? In any case, I think that's a somewhat larger project. :) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 12:58 [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler Sagi Grimberg 2024-05-21 13:22 ` Jeff Layton @ 2024-05-21 14:02 ` Chuck Lever 2024-05-21 14:59 ` Sagi Grimberg 1 sibling, 1 reply; 18+ messages in thread From: Chuck Lever @ 2024-05-21 14:02 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nfs, Jeff Layton, Dan Aloni, Christoph Hellwig On Tue, May 21, 2024 at 03:58:40PM +0300, Sagi Grimberg wrote: > There is an inherent race where a symlink file may have been overriden Nit: Do you mean "overwritten" ? > (by a different client) between lookup and readlink, resulting in a > spurious EIO error returned to userspace. Fix this by propagating back > ESTALE errors such that the vfs will retry the lookup/get_link (similar > to nfs4_file_open) at least once. > > Cc: Dan Aloni <dan.aloni@vastdata.com> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > Note that with this change the vfs should retry once for > ESTALE errors. However with an artificial reproducer of high > frequency symlink overrides, nothing prevents the retry to Nit: "overwrites" ? > also encounter ESTALE, propagating the error back to userspace. > The man pages for openat/readlinkat do not list an ESTALE errno. Speaking only as a community member, I consider that an undesirable behavior regression. IMO it's a bug for a system call to return an errno that isn't documented. That's likely why this logic has worked this way for forever. > An alternative attempt (implemented by Dan) was a local retry loop > in nfs_get_link(), if this is an applicable approach, Dan can > share his patch instead. I'm not entirely convinced by your patch description that returning an EIO on occasion is a problem. Is it reasonable for the app to expect that readlinkat() will /never/ fail? Making symlink semantics more atomic on NFS mounts is probably a good goal. But IMO the proposed change by itself isn't going to get you that with high reliability and few or no undesirable side effects. Note that NFS client-side patches should be sent To: Trond, Anna, and Cc: linux-nfs@ . Trond and Anna need to weigh in on this. > fs/nfs/symlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c > index 0e27a2e4e68b..13818129d268 100644 > --- a/fs/nfs/symlink.c > +++ b/fs/nfs/symlink.c > @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio) > error: > folio_set_error(folio); > folio_unlock(folio); > - return -EIO; > + return error; > } > > static const char *nfs_get_link(struct dentry *dentry, > -- > 2.40.1 > -- Chuck Lever ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler 2024-05-21 14:02 ` Chuck Lever @ 2024-05-21 14:59 ` Sagi Grimberg 0 siblings, 0 replies; 18+ messages in thread From: Sagi Grimberg @ 2024-05-21 14:59 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-nfs, Jeff Layton, Dan Aloni, Christoph Hellwig On 21/05/2024 17:02, Chuck Lever wrote: > On Tue, May 21, 2024 at 03:58:40PM +0300, Sagi Grimberg wrote: >> There is an inherent race where a symlink file may have been overriden > Nit: Do you mean "overwritten" ? Yes. > > >> (by a different client) between lookup and readlink, resulting in a >> spurious EIO error returned to userspace. Fix this by propagating back >> ESTALE errors such that the vfs will retry the lookup/get_link (similar >> to nfs4_file_open) at least once. >> >> Cc: Dan Aloni <dan.aloni@vastdata.com> >> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> >> --- >> Note that with this change the vfs should retry once for >> ESTALE errors. However with an artificial reproducer of high >> frequency symlink overrides, nothing prevents the retry to > Nit: "overwrites" ? Yes. > > >> also encounter ESTALE, propagating the error back to userspace. >> The man pages for openat/readlinkat do not list an ESTALE errno. > Speaking only as a community member, I consider that an undesirable > behavior regression. IMO it's a bug for a system call to return an > errno that isn't documented. That's likely why this logic has worked > this way for forever. Well, if this is an issue, it would be paired with a vfs change that checks for the error-code of the retry and convert it back. Something like: -- diff --git a/fs/namei.c b/fs/namei.c index ceb9ddf8dfdd..9a06408bb52f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3812,6 +3812,8 @@ static struct file *path_openat(struct nameidata *nd, else error = -ESTALE; } + if (error == -ESTALE && (flags & LOOKUP_REVAL)) + error = -EIO; return ERR_PTR(error); } -- But we'd need to check with Al for this type of a change... > > >> An alternative attempt (implemented by Dan) was a local retry loop >> in nfs_get_link(), if this is an applicable approach, Dan can >> share his patch instead. > I'm not entirely convinced by your patch description that returning > an EIO on occasion is a problem. Is it reasonable for the app to > expect that readlinkat() will /never/ fail? Maybe not never, but its fairly easy to encounter, and it was definitely observed in the wild. > > Making symlink semantics more atomic on NFS mounts is probably a > good goal. But IMO the proposed change by itself isn't going to get > you that with high reliability and few or no undesirable side > effects. What undesirable effects? > > Note that NFS client-side patches should be sent To: Trond, Anna, > and Cc: linux-nfs@ . Trond and Anna need to weigh in on this. Yes, it was sent to linux-nfs so I expect Trond and Anna to see it, you and Jeff were CC'd because we briefly discussed about this last week at LSFMM. ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-31 4:24 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-21 12:58 [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler Sagi Grimberg 2024-05-21 13:22 ` Jeff Layton 2024-05-21 14:36 ` Dan Aloni 2024-05-21 15:05 ` Sagi Grimberg 2024-05-21 15:13 ` Trond Myklebust 2024-05-21 15:24 ` Chuck Lever III 2024-05-22 4:41 ` Dan Aloni 2024-05-22 12:11 ` Trond Myklebust 2024-05-21 16:09 ` Sagi Grimberg 2024-05-22 19:40 ` Sagi Grimberg 2024-05-22 21:04 ` Trond Myklebust 2024-05-22 21:19 ` Sagi Grimberg 2024-05-30 18:08 ` Sagi Grimberg 2024-05-30 18:21 ` Trond Myklebust 2024-05-31 4:24 ` Sagi Grimberg 2024-05-22 21:20 ` Jeff Layton 2024-05-21 14:02 ` Chuck Lever 2024-05-21 14:59 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox