* [patch] fix infinite loop in generic_file_splice_read()
@ 2008-04-09 15:57 Miklos Szeredi
2008-04-09 17:05 ` Oliver Pinter
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Miklos Szeredi @ 2008-04-09 15:57 UTC (permalink / raw)
To: akpm, jens.axboe; +Cc: linux-kernel, linux-fsdevel
generic_file_splice_read() goes into an infinite loop if it races with
truncation. I've found this with fsx-linux on NFS over fuse.
Perhaps the whole while() loop is bogus, but I can't tell from a
cursory glance at __generic_file_splice_read() if it will return zero
only on EOF, or it can do that for other reasons as well. In the
latter case the loop is obviously needed.
This simplistic patch fixes the issue for me.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
fs/splice.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
Index: linux/fs/splice.c
===================================================================
--- linux.orig/fs/splice.c 2008-04-02 13:34:58.000000000 +0200
+++ linux/fs/splice.c 2008-04-09 17:35:06.000000000 +0200
@@ -481,19 +481,20 @@ ssize_t generic_file_splice_read(struct
{
ssize_t spliced;
int ret;
- loff_t isize, left;
-
- isize = i_size_read(in->f_mapping->host);
- if (unlikely(*ppos >= isize))
- return 0;
-
- left = isize - *ppos;
- if (unlikely(left < len))
- len = left;
ret = 0;
spliced = 0;
while (len && !spliced) {
+ loff_t isize, left;
+
+ isize = i_size_read(in->f_mapping->host);
+ if (unlikely(*ppos >= isize))
+ return 0;
+
+ left = isize - *ppos;
+ if (unlikely(left < len))
+ len = left;
+
ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
if (ret < 0)
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [patch] fix infinite loop in generic_file_splice_read() 2008-04-09 15:57 [patch] fix infinite loop in generic_file_splice_read() Miklos Szeredi @ 2008-04-09 17:05 ` Oliver Pinter 2008-04-09 18:57 ` Andrew Morton 2008-04-10 19:51 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi 2 siblings, 0 replies; 25+ messages in thread From: Oliver Pinter @ 2008-04-09 17:05 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, jens.axboe, linux-kernel, linux-fsdevel Helló Miklós! ez a patch szükséges a 2.6.22.y kernelhez? (2.6.22.y-ba első ránézésre hunk nélkül hozzáadható) vagy várjam meg, amig bekerül a mainline kernelbe? On 4/9/08, Miklos Szeredi <miklos@szeredi.hu> wrote: > generic_file_splice_read() goes into an infinite loop if it races with > truncation. I've found this with fsx-linux on NFS over fuse. > > Perhaps the whole while() loop is bogus, but I can't tell from a > cursory glance at __generic_file_splice_read() if it will return zero > only on EOF, or it can do that for other reasons as well. In the > latter case the loop is obviously needed. > > This simplistic patch fixes the issue for me. > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > --- > fs/splice.c | 19 ++++++++++--------- > 1 file changed, 10 insertions(+), 9 deletions(-) > > Index: linux/fs/splice.c > =================================================================== > --- linux.orig/fs/splice.c 2008-04-02 13:34:58.000000000 +0200 > +++ linux/fs/splice.c 2008-04-09 17:35:06.000000000 +0200 > @@ -481,19 +481,20 @@ ssize_t generic_file_splice_read(struct > { > ssize_t spliced; > int ret; > - loff_t isize, left; > - > - isize = i_size_read(in->f_mapping->host); > - if (unlikely(*ppos >= isize)) > - return 0; > - > - left = isize - *ppos; > - if (unlikely(left < len)) > - len = left; > > ret = 0; > spliced = 0; > while (len && !spliced) { > + loff_t isize, left; > + > + isize = i_size_read(in->f_mapping->host); > + if (unlikely(*ppos >= isize)) > + return 0; > + > + left = isize - *ppos; > + if (unlikely(left < len)) > + len = left; > + > ret = __generic_file_splice_read(in, ppos, pipe, len, flags); > > if (ret < 0) > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Thanks, Oliver ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] fix infinite loop in generic_file_splice_read() 2008-04-09 15:57 [patch] fix infinite loop in generic_file_splice_read() Miklos Szeredi 2008-04-09 17:05 ` Oliver Pinter @ 2008-04-09 18:57 ` Andrew Morton 2008-04-09 19:25 ` Miklos Szeredi ` (2 more replies) 2008-04-10 19:51 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi 2 siblings, 3 replies; 25+ messages in thread From: Andrew Morton @ 2008-04-09 18:57 UTC (permalink / raw) To: Miklos Szeredi Cc: jens.axboe, linux-kernel, linux-fsdevel, Allard Hoeve, Neil Brown On Wed, 09 Apr 2008 17:57:56 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote: > generic_file_splice_read() goes into an infinite loop if it races with > truncation. I've found this with fsx-linux on NFS over fuse. > > Perhaps the whole while() loop is bogus, but I can't tell from a > cursory glance at __generic_file_splice_read() if it will return zero > only on EOF, or it can do that for other reasons as well. In the > latter case the loop is obviously needed. > > This simplistic patch fixes the issue for me. > We found suspicious-looking code in generic_file_splice_read() back in February. See http://lkml.org/lkml/2008/2/29/443. I suspect that patch (if it works) will address the truncate lockup as well - it zaps the loop entirely. Unfortunately Allard never got back to us (probably because he's running 2.6.24 which has a quite different generic_file_splice_read()) and the patch didn't get anywhere. Here it is again. It needs a changelog. Nobody has tested this at all, to my knowledge. It's going to take some serious and sudden effort to get these bugs fixed for 2.6.25. fs/splice.c | 31 ++++++------------------------- 1 file changed, 6 insertions(+), 25 deletions(-) diff -puN fs/splice.c~generic_file_splice_read-fix-lockups fs/splice.c --- a/fs/splice.c~generic_file_splice_read-fix-lockups +++ a/fs/splice.c @@ -370,8 +370,10 @@ __generic_file_splice_read(struct file * * for an in-flight io page */ if (flags & SPLICE_F_NONBLOCK) { - if (TestSetPageLocked(page)) + if (TestSetPageLocked(page)) { + error = -EAGAIN; break; + } } else lock_page(page); @@ -479,9 +481,8 @@ ssize_t generic_file_splice_read(struct struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - ssize_t spliced; - int ret; loff_t isize, left; + int ret; isize = i_size_read(in->f_mapping->host); if (unlikely(*ppos >= isize)) @@ -491,29 +492,9 @@ ssize_t generic_file_splice_read(struct if (unlikely(left < len)) len = left; - ret = 0; - spliced = 0; - while (len && !spliced) { - ret = __generic_file_splice_read(in, ppos, pipe, len, flags); - - if (ret < 0) - break; - else if (!ret) { - if (spliced) - break; - if (flags & SPLICE_F_NONBLOCK) { - ret = -EAGAIN; - break; - } - } - + ret = __generic_file_splice_read(in, ppos, pipe, len, flags); + if (ret > 0) *ppos += ret; - len -= ret; - spliced += ret; - } - - if (spliced) - return spliced; return ret; } _ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] fix infinite loop in generic_file_splice_read() 2008-04-09 18:57 ` Andrew Morton @ 2008-04-09 19:25 ` Miklos Szeredi 2008-04-09 19:52 ` Jens Axboe 2008-04-10 6:29 ` Allard Hoeve 2 siblings, 0 replies; 25+ messages in thread From: Miklos Szeredi @ 2008-04-09 19:25 UTC (permalink / raw) To: akpm; +Cc: miklos, jens.axboe, linux-kernel, linux-fsdevel, allard, neilb > We found suspicious-looking code in generic_file_splice_read() back in > February. See http://lkml.org/lkml/2008/2/29/443. I suspect that patch > (if it works) will address the truncate lockup as well - it zaps the loop > entirely. > > Unfortunately Allard never got back to us (probably because he's running > 2.6.24 which has a quite different generic_file_splice_read()) and the > patch didn't get anywhere. > > Here it is again. > > It needs a changelog. > > Nobody has tested this at all, to my knowledge. Patch looks sane, and fixes the fsx-linux/NFS/fuse lockup, as expected. Thanks, Miklos ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] fix infinite loop in generic_file_splice_read() 2008-04-09 18:57 ` Andrew Morton 2008-04-09 19:25 ` Miklos Szeredi @ 2008-04-09 19:52 ` Jens Axboe 2008-04-10 6:29 ` Allard Hoeve 2 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2008-04-09 19:52 UTC (permalink / raw) To: Andrew Morton Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, Allard Hoeve, Neil Brown On Wed, Apr 09 2008, Andrew Morton wrote: > On Wed, 09 Apr 2008 17:57:56 +0200 > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > generic_file_splice_read() goes into an infinite loop if it races with > > truncation. I've found this with fsx-linux on NFS over fuse. > > > > Perhaps the whole while() loop is bogus, but I can't tell from a > > cursory glance at __generic_file_splice_read() if it will return zero > > only on EOF, or it can do that for other reasons as well. In the > > latter case the loop is obviously needed. > > > > This simplistic patch fixes the issue for me. > > > > We found suspicious-looking code in generic_file_splice_read() back in > February. See http://lkml.org/lkml/2008/2/29/443. I suspect that patch > (if it works) will address the truncate lockup as well - it zaps the loop > entirely. > > Unfortunately Allard never got back to us (probably because he's running > 2.6.24 which has a quite different generic_file_splice_read()) and the > patch didn't get anywhere. Hmm strange, I was pretty sure I pushed my patch back then. I'll double check and make sure it gets upstream asap. > Nobody has tested this at all, to my knowledge. The original reporter did not, however others did. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] fix infinite loop in generic_file_splice_read() 2008-04-09 18:57 ` Andrew Morton 2008-04-09 19:25 ` Miklos Szeredi 2008-04-09 19:52 ` Jens Axboe @ 2008-04-10 6:29 ` Allard Hoeve 2 siblings, 0 replies; 25+ messages in thread From: Allard Hoeve @ 2008-04-10 6:29 UTC (permalink / raw) To: Andrew Morton Cc: Miklos Szeredi, jens.axboe, linux-kernel, linux-fsdevel, Neil Brown, Gertjan Oude Lohuis Hallo Andrew, others, > We found suspicious-looking code in generic_file_splice_read() back in > February. See http://lkml.org/lkml/2008/2/29/443. I suspect that patch > (if it works) will address the truncate lockup as well - it zaps the > loop entirely. > > Unfortunately Allard never got back to us (probably because he's running > 2.6.24 which has a quite different generic_file_splice_read()) and the > patch didn't get anywhere. Unfortunately, we cannot test the patch on the server that triggered the bug in 2.6.24.2, because it's too critical for operation. On other similar servers the bug hasn't been encountered and we were unable to reproduce it, so it must be some combination of load and typical usage (NFS server). We are now running 2.6.22.x, which does fine. I hope you understand our hesitation of running patched, known-bugged kernels on our fileservers :) Thanks for all the bughunting here. There is a maintenance window soon, I'll try to put the original patch on the agenda. Regards, Allard Hoeve ^ permalink raw reply [flat|nested] 25+ messages in thread
* nfs: infinite loop in fcntl(F_SETLKW) 2008-04-09 15:57 [patch] fix infinite loop in generic_file_splice_read() Miklos Szeredi 2008-04-09 17:05 ` Oliver Pinter 2008-04-09 18:57 ` Andrew Morton @ 2008-04-10 19:51 ` Miklos Szeredi 2008-04-10 21:02 ` Trond Myklebust 2 siblings, 1 reply; 25+ messages in thread From: Miklos Szeredi @ 2008-04-10 19:51 UTC (permalink / raw) To: neilb, Trond.Myklebust; +Cc: akpm, linux-nfs, linux-kernel, linux-fsdevel Another infinite loop, this one involving both client and server. Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever. I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for blocking locks, as NLM_LCK_BLOCKED is for the contended case. For testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be misleading, but at least there's no infinite loop in that case. I've minimally tested this patch to verify that it cures the lockup, and that simple blocking locks keep working. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- fs/lockd/clntproc.c | 3 +++ 1 file changed, 3 insertions(+) Index: linux/fs/lockd/clntproc.c =================================================================== --- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200 +++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200 @@ -536,6 +536,9 @@ again: up_read(&host->h_rwsem); } status = nlm_stat_to_errno(resp->status); + /* Don't return EAGAIN, as that would make fcntl_setlk() loop */ + if (status == -EAGAIN) + status = -ENOLCK; out_unblock: nlmclnt_finish_block(block); /* Cancel the blocked request if it is still pending */ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-10 19:51 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi @ 2008-04-10 21:02 ` Trond Myklebust 2008-04-10 21:07 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2008-04-10 21:02 UTC (permalink / raw) To: Miklos Szeredi; +Cc: neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote: > Another infinite loop, this one involving both client and server. > > Basically what happens is that on the server nlm_fopen() calls > nfsd_open() which returns -EACCES, to which nlm_fopen() returns > NLM_LCK_DENIED. > > On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), > which in will cause fcntl_setlk() to retry forever. > > I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for > blocking locks, as NLM_LCK_BLOCKED is for the contended case. For > testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be > misleading, but at least there's no infinite loop in that case. > > I've minimally tested this patch to verify that it cures the lockup, > and that simple blocking locks keep working. > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > --- > fs/lockd/clntproc.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: linux/fs/lockd/clntproc.c > =================================================================== > --- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200 > +++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200 > @@ -536,6 +536,9 @@ again: > up_read(&host->h_rwsem); > } > status = nlm_stat_to_errno(resp->status); > + /* Don't return EAGAIN, as that would make fcntl_setlk() loop */ > + if (status == -EAGAIN) > + status = -ENOLCK; > out_unblock: > nlmclnt_finish_block(block); > /* Cancel the blocked request if it is still pending */ Wait. There is something really weird going on here. According to the spec, LCK_DENIED means 'the request failed' (i.e. ENOLCK is definitely correct) OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary failures, the first because the server had a resource problem, and the second because the server rebooted and is in the grace period (i.e. EAGAIN would appear to be more appropriate). See http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02 AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by swapping the return values for NLM_LCK_DENIED and NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD. The problem is that there appears to be a similar confusion on the Linux server side in nlmsvc_lock(). :-( -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-10 21:02 ` Trond Myklebust @ 2008-04-10 21:07 ` Trond Myklebust [not found] ` <1207861661.8180.18.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2008-04-10 21:07 UTC (permalink / raw) To: Miklos Szeredi; +Cc: neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Thu, 2008-04-10 at 17:02 -0400, Trond Myklebust wrote: > On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote: > > Another infinite loop, this one involving both client and server. > > > > Basically what happens is that on the server nlm_fopen() calls > > nfsd_open() which returns -EACCES, to which nlm_fopen() returns > > NLM_LCK_DENIED. > > > > On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), > > which in will cause fcntl_setlk() to retry forever. > > > > I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for > > blocking locks, as NLM_LCK_BLOCKED is for the contended case. For > > testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be > > misleading, but at least there's no infinite loop in that case. > > > > I've minimally tested this patch to verify that it cures the lockup, > > and that simple blocking locks keep working. > > > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > > --- > > fs/lockd/clntproc.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > Index: linux/fs/lockd/clntproc.c > > =================================================================== > > --- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200 > > +++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200 > > @@ -536,6 +536,9 @@ again: > > up_read(&host->h_rwsem); > > } > > status = nlm_stat_to_errno(resp->status); > > + /* Don't return EAGAIN, as that would make fcntl_setlk() loop */ > > + if (status == -EAGAIN) > > + status = -ENOLCK; > > out_unblock: > > nlmclnt_finish_block(block); > > /* Cancel the blocked request if it is still pending */ > > > Wait. There is something really weird going on here. > > According to the spec, LCK_DENIED means 'the request failed' (i.e. > ENOLCK is definitely correct) > > OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary > failures, the first because the server had a resource problem, and the > second because the server rebooted and is in the grace period (i.e. > EAGAIN would appear to be more appropriate). See > > http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02 > > AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by > swapping the return values for NLM_LCK_DENIED and > NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD. > > The problem is that there appears to be a similar confusion on the Linux > server side in nlmsvc_lock(). :-( Duh... Sorry, EAGAIN is indeed the correct return value for fcntl() when the lock attempt failed. I should have reread the manpage/posix spec before replying. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1207861661.8180.18.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: nfs: infinite loop in fcntl(F_SETLKW) [not found] ` <1207861661.8180.18.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2008-04-10 21:20 ` Trond Myklebust 2008-04-10 21:54 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2008-04-10 21:20 UTC (permalink / raw) To: Miklos Szeredi, Marc Eshel, Dr. J. Bruce Fields Cc: neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, 2008-04-10 at 17:07 -0400, Trond Myklebust wrote: > On Thu, 2008-04-10 at 17:02 -0400, Trond Myklebust wrote: > > On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote: > > > Another infinite loop, this one involving both client and server. > > > > > > Basically what happens is that on the server nlm_fopen() calls > > > nfsd_open() which returns -EACCES, to which nlm_fopen() returns > > > NLM_LCK_DENIED. > > > > > > On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), > > > which in will cause fcntl_setlk() to retry forever. > > > > > > I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for > > > blocking locks, as NLM_LCK_BLOCKED is for the contended case. For > > > testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be > > > misleading, but at least there's no infinite loop in that case. > > > > > > I've minimally tested this patch to verify that it cures the lockup, > > > and that simple blocking locks keep working. > > > > > > Signed-off-by: Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> > > > --- > > > fs/lockd/clntproc.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > Index: linux/fs/lockd/clntproc.c > > > =================================================================== > > > --- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200 > > > +++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200 > > > @@ -536,6 +536,9 @@ again: > > > up_read(&host->h_rwsem); > > > } > > > status = nlm_stat_to_errno(resp->status); > > > + /* Don't return EAGAIN, as that would make fcntl_setlk() loop */ > > > + if (status == -EAGAIN) > > > + status = -ENOLCK; > > > out_unblock: > > > nlmclnt_finish_block(block); > > > /* Cancel the blocked request if it is still pending */ > > > > > > Wait. There is something really weird going on here. > > > > According to the spec, LCK_DENIED means 'the request failed' (i.e. > > ENOLCK is definitely correct) > > > > OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary > > failures, the first because the server had a resource problem, and the > > second because the server rebooted and is in the grace period (i.e. > > EAGAIN would appear to be more appropriate). See > > > > http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02 > > > > AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by > > swapping the return values for NLM_LCK_DENIED and > > NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD. > > > > The problem is that there appears to be a similar confusion on the Linux > > server side in nlmsvc_lock(). :-( > > Duh... Sorry, EAGAIN is indeed the correct return value for fcntl() when > the lock attempt failed. I should have reread the manpage/posix spec > before replying. OK. So the correct fix here should really be applied to fcntl_setlk(). There is absolutely no reason why we should be looping at all if the filesystem has a ->lock() method. In fact, this looping behaviour was introduced recently in commit 7723ec9777d9832849b76475b1a21a2872a40d20. Marc, Bruce, why was this done, and how are filesystems now supposed to behave? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-10 21:20 ` Trond Myklebust @ 2008-04-10 21:54 ` J. Bruce Fields 2008-04-11 19:12 ` Miklos Szeredi [not found] ` <20080410215410.GF22324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 2 replies; 25+ messages in thread From: J. Bruce Fields @ 2008-04-10 21:54 UTC (permalink / raw) To: Trond Myklebust Cc: Miklos Szeredi, Marc Eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Thu, Apr 10, 2008 at 05:20:36PM -0400, Trond Myklebust wrote: > > On Thu, 2008-04-10 at 17:07 -0400, Trond Myklebust wrote: > > On Thu, 2008-04-10 at 17:02 -0400, Trond Myklebust wrote: > > > On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote: > > > > Another infinite loop, this one involving both client and server. > > > > > > > > Basically what happens is that on the server nlm_fopen() calls > > > > nfsd_open() which returns -EACCES, to which nlm_fopen() returns > > > > NLM_LCK_DENIED. > > > > > > > > On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), > > > > which in will cause fcntl_setlk() to retry forever. > > > > > > > > I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for > > > > blocking locks, as NLM_LCK_BLOCKED is for the contended case. For > > > > testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be > > > > misleading, but at least there's no infinite loop in that case. > > > > > > > > I've minimally tested this patch to verify that it cures the lockup, > > > > and that simple blocking locks keep working. > > > > > > > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > > > > --- > > > > fs/lockd/clntproc.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > Index: linux/fs/lockd/clntproc.c > > > > =================================================================== > > > > --- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200 > > > > +++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200 > > > > @@ -536,6 +536,9 @@ again: > > > > up_read(&host->h_rwsem); > > > > } > > > > status = nlm_stat_to_errno(resp->status); > > > > + /* Don't return EAGAIN, as that would make fcntl_setlk() loop */ > > > > + if (status == -EAGAIN) > > > > + status = -ENOLCK; > > > > out_unblock: > > > > nlmclnt_finish_block(block); > > > > /* Cancel the blocked request if it is still pending */ > > > > > > > > > Wait. There is something really weird going on here. > > > > > > According to the spec, LCK_DENIED means 'the request failed' (i.e. > > > ENOLCK is definitely correct) > > > > > > OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary > > > failures, the first because the server had a resource problem, and the > > > second because the server rebooted and is in the grace period (i.e. > > > EAGAIN would appear to be more appropriate). See > > > > > > http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02 > > > > > > AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by > > > swapping the return values for NLM_LCK_DENIED and > > > NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD. > > > > > > The problem is that there appears to be a similar confusion on the Linux > > > server side in nlmsvc_lock(). :-( > > > > Duh... Sorry, EAGAIN is indeed the correct return value for fcntl() when > > the lock attempt failed. I should have reread the manpage/posix spec > > before replying. > > OK. So the correct fix here should really be applied to fcntl_setlk(). > There is absolutely no reason why we should be looping at all if the > filesystem has a ->lock() method. > > In fact, this looping behaviour was introduced recently in commit > 7723ec9777d9832849b76475b1a21a2872a40d20. Apologies, that was indeed a behavioral change introduced in a commit that claimed just to be shuffling code around. > Marc, Bruce, why was this > done, and how are filesystems now supposed to behave? > The assumption must have been that -EAGAIN could only mean that we needed to keep blocking, and hence was a nonsensical return from a filesystem lock method that waited itself for the lock to become available--such a method would return 0, -EINTR (or some more exotic error), or continue waiting. If we agree that EAGAIN is actually a legimate error to return from a blocking lock, then, yes, we need take ->lock() back out of this loop. And I don't think there's any real reason we need the new behavior. So we should probably revert that--I'll take a closer look tomorrow.... --b. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-10 21:54 ` J. Bruce Fields @ 2008-04-11 19:12 ` Miklos Szeredi 2008-04-11 19:19 ` J. Bruce Fields 2008-04-13 0:08 ` J. Bruce Fields [not found] ` <20080410215410.GF22324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 1 sibling, 2 replies; 25+ messages in thread From: Miklos Szeredi @ 2008-04-11 19:12 UTC (permalink / raw) To: bfields Cc: trond.myklebust, miklos, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel > > OK. So the correct fix here should really be applied to fcntl_setlk(). > > There is absolutely no reason why we should be looping at all if the > > filesystem has a ->lock() method. > > > > In fact, this looping behaviour was introduced recently in commit > > 7723ec9777d9832849b76475b1a21a2872a40d20. > > Apologies, that was indeed a behavioral change introduced in a commit > that claimed just to be shuffling code around. Yeah, that patch looks totally wrong. It's not generally a good idea to do a loop where the exit condition depends on something you don't control. And error values from filesystem methods are typically like that. For example with fuse, the error code could come from an unprivileged userspace process. I didn't realize this aspect of the bug previously, because I concentrated on the lockd inconsistency. Btw, why hasn't this work been posted on -fsdevel prior to merging into mainline? > > > Marc, Bruce, why was this > > done, and how are filesystems now supposed to behave? > > > > The assumption must have been that -EAGAIN could only mean that we > needed to keep blocking, and hence was a nonsensical return from a > filesystem lock method that waited itself for the lock to become > available--such a method would return 0, -EINTR (or some more exotic > error), or continue waiting. EAGAIN for a blocking lock is nonsensical, so my original patch could still make sense. But that's no longer a regression, and not all that important. Miklos ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-11 19:12 ` Miklos Szeredi @ 2008-04-11 19:19 ` J. Bruce Fields [not found] ` <20080411191910.GB16965-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2008-04-13 0:08 ` J. Bruce Fields 1 sibling, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2008-04-11 19:19 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Fri, Apr 11, 2008 at 09:12:23PM +0200, Miklos Szeredi wrote: > > > OK. So the correct fix here should really be applied to fcntl_setlk(). > > > There is absolutely no reason why we should be looping at all if the > > > filesystem has a ->lock() method. > > > > > > In fact, this looping behaviour was introduced recently in commit > > > 7723ec9777d9832849b76475b1a21a2872a40d20. > > > > Apologies, that was indeed a behavioral change introduced in a commit > > that claimed just to be shuffling code around. > > Yeah, that patch looks totally wrong. It's not generally a good idea > to do a loop where the exit condition depends on something you don't > control. And error values from filesystem methods are typically like > that. For example with fuse, the error code could come from an > unprivileged userspace process. > > I didn't realize this aspect of the bug previously, because I > concentrated on the lockd inconsistency. > > Btw, why hasn't this work been posted on -fsdevel prior to merging > into mainline? http://marc.info/?l=linux-fsdevel&m=117581629326194&w=2 --b. > > > > > > Marc, Bruce, why was this > > > done, and how are filesystems now supposed to behave? > > > > > > > The assumption must have been that -EAGAIN could only mean that we > > needed to keep blocking, and hence was a nonsensical return from a > > filesystem lock method that waited itself for the lock to become > > available--such a method would return 0, -EINTR (or some more exotic > > error), or continue waiting. > > EAGAIN for a blocking lock is nonsensical, so my original patch could > still make sense. But that's no longer a regression, and not all that > important. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20080411191910.GB16965-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: nfs: infinite loop in fcntl(F_SETLKW) [not found] ` <20080411191910.GB16965-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2008-04-11 19:22 ` Miklos Szeredi 0 siblings, 0 replies; 25+ messages in thread From: Miklos Szeredi @ 2008-04-11 19:22 UTC (permalink / raw) To: bfields-uC3wQj2KruNg9hUCZPvPmw Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, trond.myklebust-41N18TsMXrtuMpJDpNschA, eshel-6kx38NOBMPqrIzol8Bc5pA, neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA > > Btw, why hasn't this work been posted on -fsdevel prior to merging > > into mainline? > > http://marc.info/?l=linux-fsdevel&m=117581629326194&w=2 Ah, sorry. I was only looking at the last half year of archives :) Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-11 19:12 ` Miklos Szeredi 2008-04-11 19:19 ` J. Bruce Fields @ 2008-04-13 0:08 ` J. Bruce Fields [not found] ` <20080413000830.GF31789-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 1 sibling, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2008-04-13 0:08 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Fri, Apr 11, 2008 at 09:12:23PM +0200, Miklos Szeredi wrote: > > > OK. So the correct fix here should really be applied to fcntl_setlk(). > > > There is absolutely no reason why we should be looping at all if the > > > filesystem has a ->lock() method. > > > > > > In fact, this looping behaviour was introduced recently in commit > > > 7723ec9777d9832849b76475b1a21a2872a40d20. > > > > Apologies, that was indeed a behavioral change introduced in a commit > > that claimed just to be shuffling code around. > > Yeah, that patch looks totally wrong. It's not generally a good idea > to do a loop where the exit condition depends on something you don't > control. And error values from filesystem methods are typically like > that. For example with fuse, the error code could come from an > unprivileged userspace process. > > I didn't realize this aspect of the bug previously, because I > concentrated on the lockd inconsistency. So, does this patch on its own fix the problem you saw? Any extra eyes welcome.... --b. commit e56100676b9ea3b2d5f3e937c3ce8a5149cffb84 Author: J. Bruce Fields <bfields@citi.umich.edu> Date: Sat Apr 12 18:12:15 2008 -0400 locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: 7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Marc Eshel <eshel@almaden.ibm.com> diff --git a/fs/locks.c b/fs/locks.c index d83fab1..43c0af2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1801,17 +1801,21 @@ again: if (error) goto out; - for (;;) { - error = vfs_lock_file(filp, cmd, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; + if (filp->f_op && filp->f_op->lock != NULL) + error = filp->f_op->lock(filp, cmd, file_lock); + else { + for (;;) { + error = posix_lock_file(filp, file_lock, NULL); + if (error != -EAGAIN || cmd == F_SETLK) + break; + error = wait_event_interruptible(file_lock->fl_wait, + !file_lock->fl_next); + if (!error) + continue; - locks_delete_block(file_lock); - break; + locks_delete_block(file_lock); + break; + } } /* @@ -1925,17 +1929,21 @@ again: if (error) goto out; - for (;;) { - error = vfs_lock_file(filp, cmd, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK64) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; + if (filp->f_op && filp->f_op->lock != NULL) + error = filp->f_op->lock(filp, cmd, file_lock); + else { + for (;;) { + error = posix_lock_file(filp, file_lock, NULL); + if (error != -EAGAIN || cmd == F_SETLK64) + break; + error = wait_event_interruptible(file_lock->fl_wait, + !file_lock->fl_next); + if (!error) + continue; - locks_delete_block(file_lock); - break; + locks_delete_block(file_lock); + break; + } } /* ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <20080413000830.GF31789-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: nfs: infinite loop in fcntl(F_SETLKW) [not found] ` <20080413000830.GF31789-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2008-04-13 8:13 ` Miklos Szeredi 2008-04-14 17:07 ` J. Bruce Fields [not found] ` <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org> 0 siblings, 2 replies; 25+ messages in thread From: Miklos Szeredi @ 2008-04-13 8:13 UTC (permalink / raw) To: bfields-uC3wQj2KruNg9hUCZPvPmw Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, trond.myklebust-41N18TsMXrtuMpJDpNschA, eshel-6kx38NOBMPqrIzol8Bc5pA, neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA > > > > OK. So the correct fix here should really be applied to fcntl_setlk(). > > > > There is absolutely no reason why we should be looping at all if the > > > > filesystem has a ->lock() method. > > > > > > > > In fact, this looping behaviour was introduced recently in commit > > > > 7723ec9777d9832849b76475b1a21a2872a40d20. > > > > > > Apologies, that was indeed a behavioral change introduced in a commit > > > that claimed just to be shuffling code around. > > > > Yeah, that patch looks totally wrong. It's not generally a good idea > > to do a loop where the exit condition depends on something you don't > > control. And error values from filesystem methods are typically like > > that. For example with fuse, the error code could come from an > > unprivileged userspace process. > > > > I didn't realize this aspect of the bug previously, because I > > concentrated on the lockd inconsistency. > > So, does this patch on its own fix the problem you saw? Yes. With the patch applied, the test program returns "lockf: Resource temporarily unavailable" instead of hanging. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-13 8:13 ` Miklos Szeredi @ 2008-04-14 17:07 ` J. Bruce Fields [not found] ` <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org> 1 sibling, 0 replies; 25+ messages in thread From: J. Bruce Fields @ 2008-04-14 17:07 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Sun, Apr 13, 2008 at 10:13:21AM +0200, Miklos Szeredi wrote: > > > > > OK. So the correct fix here should really be applied to fcntl_setlk(). > > > > > There is absolutely no reason why we should be looping at all if the > > > > > filesystem has a ->lock() method. > > > > > > > > > > In fact, this looping behaviour was introduced recently in commit > > > > > 7723ec9777d9832849b76475b1a21a2872a40d20. > > > > > > > > Apologies, that was indeed a behavioral change introduced in a commit > > > > that claimed just to be shuffling code around. > > > > > > Yeah, that patch looks totally wrong. It's not generally a good idea > > > to do a loop where the exit condition depends on something you don't > > > control. And error values from filesystem methods are typically like > > > that. For example with fuse, the error code could come from an > > > unprivileged userspace process. > > > > > > I didn't realize this aspect of the bug previously, because I > > > concentrated on the lockd inconsistency. > > > > So, does this patch on its own fix the problem you saw? > > Yes. With the patch applied, the test program returns "lockf: > Resource temporarily unavailable" instead of hanging. OK, thanks! --b. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>]
* [PATCH] locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs [not found] ` <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org> @ 2008-04-14 19:03 ` J. Bruce Fields 0 siblings, 0 replies; 25+ messages in thread From: J. Bruce Fields @ 2008-04-14 19:03 UTC (permalink / raw) To: Linus Torvalds Cc: trond.myklebust-41N18TsMXrtuMpJDpNschA, eshel-6kx38NOBMPqrIzol8Bc5pA, neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Miklos Szeredi, stable-DgEjT+Ai2ygdnm+yROfE0A From: J. Bruce Fields <bfields-vtMw8L3fJ9vSiEDVxGk4TQ@public.gmane.org> Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: 7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields-vtMw8L3fJ9vSiEDVxGk4TQ@public.gmane.org> Tested-by: Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> Cc: Trond Myklebust <trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org> Cc: Marc Eshel <eshel-6kx38NOBMPqrIzol8Bc5pA@public.gmane.org> --- fs/locks.c | 48 ++++++++++++++++++++++++++++-------------------- 1 files changed, 28 insertions(+), 20 deletions(-) This is appropriate for 2.6.25 (as well as older stable kernels--the bug was introduced in 2.6.22). --b. diff --git a/fs/locks.c b/fs/locks.c index d83fab1..43c0af2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1801,17 +1801,21 @@ again: if (error) goto out; - for (;;) { - error = vfs_lock_file(filp, cmd, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; + if (filp->f_op && filp->f_op->lock != NULL) + error = filp->f_op->lock(filp, cmd, file_lock); + else { + for (;;) { + error = posix_lock_file(filp, file_lock, NULL); + if (error != -EAGAIN || cmd == F_SETLK) + break; + error = wait_event_interruptible(file_lock->fl_wait, + !file_lock->fl_next); + if (!error) + continue; - locks_delete_block(file_lock); - break; + locks_delete_block(file_lock); + break; + } } /* @@ -1925,17 +1929,21 @@ again: if (error) goto out; - for (;;) { - error = vfs_lock_file(filp, cmd, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK64) - break; - error = wait_event_interruptible(file_lock->fl_wait, - !file_lock->fl_next); - if (!error) - continue; + if (filp->f_op && filp->f_op->lock != NULL) + error = filp->f_op->lock(filp, cmd, file_lock); + else { + for (;;) { + error = posix_lock_file(filp, file_lock, NULL); + if (error != -EAGAIN || cmd == F_SETLK64) + break; + error = wait_event_interruptible(file_lock->fl_wait, + !file_lock->fl_next); + if (!error) + continue; - locks_delete_block(file_lock); - break; + locks_delete_block(file_lock); + break; + } } /* -- 1.5.5.rc1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <20080410215410.GF22324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: nfs: infinite loop in fcntl(F_SETLKW) [not found] ` <20080410215410.GF22324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2008-04-13 8:28 ` Miklos Szeredi 2008-04-14 17:19 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Miklos Szeredi @ 2008-04-13 8:28 UTC (permalink / raw) To: bfields-uC3wQj2KruNg9hUCZPvPmw Cc: trond.myklebust-41N18TsMXrtuMpJDpNschA, miklos-sUDqSbJrdHQHWmgEVkV9KA, eshel-6kx38NOBMPqrIzol8Bc5pA, neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA > Apologies, that was indeed a behavioral change introduced in a commit > that claimed just to be shuffling code around. Another complaint about this series: using EINPROGRESS to signal asynchronous locking looks really fishy. How does the filesystem know, that the caller wants to do async locking? How do we make sure, that the filesystem (like fuse or 9p, which "blindly" return the error from the server) doesn't return EINPROGRESS even when it's _not_ doing an asynchronous lock? I think it would have been much cleaner to have a completely separate interface for async locking, instead of trying to cram that into f_op->lock(). Would that be possible to fix now? Or at least make EINPROGRESS a kernel-internal error value (>512), to make it that it has a special meaning for the _kernel only_? Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-13 8:28 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi @ 2008-04-14 17:19 ` J. Bruce Fields 2008-04-14 21:15 ` Miklos Szeredi 0 siblings, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2008-04-14 17:19 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Sun, Apr 13, 2008 at 10:28:08AM +0200, Miklos Szeredi wrote: > > Apologies, that was indeed a behavioral change introduced in a commit > > that claimed just to be shuffling code around. > > Another complaint about this series: using EINPROGRESS to signal > asynchronous locking looks really fishy. How does the filesystem > know, that the caller wants to do async locking? The caller sets a fl_grant callback. But I guess the interesting question is how the caller knows that the filesystem is really going to return the results asynchronously: > How do we make sure, > that the filesystem (like fuse or 9p, which "blindly" return the error > from the server) doesn't return EINPROGRESS even when it's _not_ doing > an asynchronous lock? Right, there's no safeguard there--if fuse returns EINPROGRESS, then we'll wait for a grant callback that's not going to come. It should time out, so that's not a total disaster, but still. Anyway, I'm not sure what to do about that. > > I think it would have been much cleaner to have a completely separate > interface for async locking, instead of trying to cram that into > f_op->lock(). Maybe so. ->lock() had quite a bit crammed into it even before this. > Would that be possible to fix now? Or at least make EINPROGRESS a > kernel-internal error value (>512), to make it that it has a special > meaning for the _kernel only_? Perhaps so. The behavior of lockd will still depend to some degree on the exact error returned from the filesystem--e.g. if you return -EAGAIN from ->lock() without later calling ->fl_grant() it will cause some unexpected delays. (Though again clients will eventually give up and poll for the lock.) --b. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-14 17:19 ` J. Bruce Fields @ 2008-04-14 21:15 ` Miklos Szeredi 2008-04-15 18:58 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Miklos Szeredi @ 2008-04-14 21:15 UTC (permalink / raw) To: bfields Cc: miklos, trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel > On Sun, Apr 13, 2008 at 10:28:08AM +0200, Miklos Szeredi wrote: > > > Apologies, that was indeed a behavioral change introduced in a commit > > > that claimed just to be shuffling code around. > > > > Another complaint about this series: using EINPROGRESS to signal > > asynchronous locking looks really fishy. How does the filesystem > > know, that the caller wants to do async locking? > > The caller sets a fl_grant callback. But I guess the interesting > question is how the caller knows that the filesystem is really going to > return the results asynchronously: > > > How do we make sure, > > that the filesystem (like fuse or 9p, which "blindly" return the error > > from the server) doesn't return EINPROGRESS even when it's _not_ doing > > an asynchronous lock? > > Right, there's no safeguard there--if fuse returns EINPROGRESS, then > we'll wait for a grant callback that's not going to come. It should > time out, so that's not a total disaster, but still. > > Anyway, I'm not sure what to do about that. > > > > > I think it would have been much cleaner to have a completely separate > > interface for async locking, instead of trying to cram that into > > f_op->lock(). > > Maybe so. ->lock() had quite a bit crammed into it even before this. Oh yeah, but at least that was 1:1 with the fcntl interface. > > Would that be possible to fix now? Or at least make EINPROGRESS a > > kernel-internal error value (>512), to make it that it has a special > > meaning for the _kernel only_? > > Perhaps so. > > The behavior of lockd will still depend to some degree on the exact > error returned from the filesystem--e.g. if you return -EAGAIN from > ->lock() without later calling ->fl_grant() it will cause some > unexpected delays. (Though again clients will eventually give up and > poll for the lock.) OK, so the semantics of vfs_lock_file() are now: 1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention 2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on contention 3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on contention: a) either return -EINPROGRESS and call fl_grant when granted b) or return -EAGAIN and call fl_notify when the lock needs retrying As a first step, how about doing the following: For 3) use LOCK_FILE_ASYNC as a return value. AFAICS there's no reason to distinguish between a) and b). For 1) leave the -EAGAIN error, since in that case no lock was queued. Here's an untested patch. Comments? Miklos --- fs/gfs2/locking/dlm/plock.c | 2 +- fs/lockd/svclock.c | 13 ++++--------- fs/locks.c | 20 ++++++++++---------- include/linux/fs.h | 15 +++++++++++++++ 4 files changed, 30 insertions(+), 20 deletions(-) Index: linux-2.6/fs/locks.c =================================================================== --- linux-2.6.orig/fs/locks.c 2008-04-14 22:19:57.000000000 +0200 +++ linux-2.6/fs/locks.c 2008-04-14 22:52:06.000000000 +0200 @@ -784,8 +784,10 @@ find_conflict: if (!flock_locks_conflict(request, fl)) continue; error = -EAGAIN; - if (request->fl_flags & FL_SLEEP) - locks_insert_block(fl, request); + if (!(request->fl_flags & FL_SLEEP)) + goto out; + error = FILE_LOCK_ASYNC; + locks_insert_block(fl, request); goto out; } if (request->fl_flags & FL_ACCESS) @@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod error = -EDEADLK; if (posix_locks_deadlock(request, fl)) goto out; - error = -EAGAIN; + error = FILE_LOCK_ASYNC; locks_insert_block(fl, request); goto out; } @@ -1040,7 +1042,7 @@ int posix_lock_file_wait(struct file *fi might_sleep (); for (;;) { error = posix_lock_file(filp, fl, NULL); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_ASYNC) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1112,9 +1114,7 @@ int locks_mandatory_area(int read_write, for (;;) { error = __posix_lock_file(inode, &fl, NULL); - if (error != -EAGAIN) - break; - if (!(fl.fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_ASYNC) break; error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); if (!error) { @@ -1534,7 +1534,7 @@ int flock_lock_file_wait(struct file *fi might_sleep(); for (;;) { error = flock_lock_file(filp, fl); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_ASYNC) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1806,7 +1806,7 @@ again: else { for (;;) { error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK) + if (error != FILE_LOCK_ASYNC) break; error = wait_event_interruptible(file_lock->fl_wait, !file_lock->fl_next); @@ -1934,7 +1934,7 @@ again: else { for (;;) { error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK64) + if (error != FILE_LOCK_ASYNC) break; error = wait_event_interruptible(file_lock->fl_wait, !file_lock->fl_next); Index: linux-2.6/fs/gfs2/locking/dlm/plock.c =================================================================== --- linux-2.6.orig/fs/gfs2/locking/dlm/plock.c 2008-04-09 16:44:15.000000000 +0200 +++ linux-2.6/fs/gfs2/locking/dlm/plock.c 2008-04-14 22:54:05.000000000 +0200 @@ -108,7 +108,7 @@ int gdlm_plock(void *lockspace, struct l if (xop->callback == NULL) wait_event(recv_wq, (op->done != 0)); else - return -EINPROGRESS; + return FILE_LOCK_ASYNC; spin_lock(&ops_lock); if (!list_empty(&op->list)) { Index: linux-2.6/fs/lockd/svclock.c =================================================================== --- linux-2.6.orig/fs/lockd/svclock.c 2008-04-09 16:44:18.000000000 +0200 +++ linux-2.6/fs/lockd/svclock.c 2008-04-14 22:55:20.000000000 +0200 @@ -423,8 +423,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; case -EAGAIN: ret = nlm_lck_denied; - break; - case -EINPROGRESS: + goto out; + case FILE_LOCK_ASYNC: if (wait) break; /* Filesystem lock operation is in progress @@ -439,10 +439,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; } - ret = nlm_lck_denied; - if (!wait) - goto out; - ret = nlm_lck_blocked; /* Append to list of blocked */ @@ -520,7 +516,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, } error = vfs_test_lock(file->f_file, &lock->fl); - if (error == -EINPROGRESS) { + if (error == FILE_LOCK_ASYNC) { ret = nlmsvc_defer_lock_rqst(rqstp, block); goto out; } @@ -744,8 +740,7 @@ nlmsvc_grant_blocked(struct nlm_block *b switch (error) { case 0: break; - case -EAGAIN: - case -EINPROGRESS: + case FILE_LOCK_ASYNC: dprintk("lockd: lock still blocked error %d\n", error); nlmsvc_insert_block(block, NLM_NEVER); nlmsvc_release_block(block); Index: linux-2.6/include/linux/fs.h =================================================================== --- linux-2.6.orig/include/linux/fs.h 2008-04-09 16:44:54.000000000 +0200 +++ linux-2.6/include/linux/fs.h 2008-04-14 23:03:03.000000000 +0200 @@ -837,6 +837,21 @@ extern spinlock_t files_lock; #define FL_SLEEP 128 /* A blocking lock */ /* + * FILE_LOCK_ASYNC: + * + * Special return value from posix_lock_file() and vfs_lock_file() + * for asynchronous locking. + * + * For posix_lock_file() it means, that the lock has been queued on + * the block list. + * + * For vfs_lock_file() it means, that the filesystem will call back + * either fl_notify() or fl_granted() when the lock needs to be + * retried or if it has been granted. + */ +#define FILE_LOCK_ASYNC 1 + +/* * The POSIX file lock owner is determined by * the "struct files_struct" in the thread group * (or NULL for no owner - BSD locks). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-14 21:15 ` Miklos Szeredi @ 2008-04-15 18:58 ` J. Bruce Fields 2008-04-16 16:28 ` Miklos Szeredi 0 siblings, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2008-04-15 18:58 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Mon, Apr 14, 2008 at 11:15:53PM +0200, Miklos Szeredi wrote: > > On Sun, Apr 13, 2008 at 10:28:08AM +0200, Miklos Szeredi wrote: > > > > Apologies, that was indeed a behavioral change introduced in a commit > > > > that claimed just to be shuffling code around. > > > > > > Another complaint about this series: using EINPROGRESS to signal > > > asynchronous locking looks really fishy. How does the filesystem > > > know, that the caller wants to do async locking? > > > > The caller sets a fl_grant callback. But I guess the interesting > > question is how the caller knows that the filesystem is really going to > > return the results asynchronously: > > > > > How do we make sure, > > > that the filesystem (like fuse or 9p, which "blindly" return the error > > > from the server) doesn't return EINPROGRESS even when it's _not_ doing > > > an asynchronous lock? > > > > Right, there's no safeguard there--if fuse returns EINPROGRESS, then > > we'll wait for a grant callback that's not going to come. It should > > time out, so that's not a total disaster, but still. > > > > Anyway, I'm not sure what to do about that. > > > > > > > > I think it would have been much cleaner to have a completely separate > > > interface for async locking, instead of trying to cram that into > > > f_op->lock(). > > > > Maybe so. ->lock() had quite a bit crammed into it even before this. > > Oh yeah, but at least that was 1:1 with the fcntl interface. > > > > Would that be possible to fix now? Or at least make EINPROGRESS a > > > kernel-internal error value (>512), to make it that it has a special > > > meaning for the _kernel only_? > > > > Perhaps so. > > > > The behavior of lockd will still depend to some degree on the exact > > error returned from the filesystem--e.g. if you return -EAGAIN from > > ->lock() without later calling ->fl_grant() it will cause some > > unexpected delays. (Though again clients will eventually give up and > > poll for the lock.) > > OK, so the semantics of vfs_lock_file() are now: Not quite; the original idea was that we didn't care about the blocking lock case, since there's already a lock manager callback for that. (It's arguably a minor abuse for us to return NLM_LCK_BLOCKED and then do a grant later even in the case where there's no conflict, but it works.) So we only changed the nonblocking case. Which may be sacrificing elegance for expediency, and I'm not opposed to fixing that in whatever way seems best. As a start, we could document this better. So: > > 1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention > 2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on > contention > 3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on > contention: > a) either return -EINPROGRESS and call fl_grant when granted > b) or return -EAGAIN and call fl_notify when the lock needs retrying I'd put it this way (after a quick check of the code to convince myself I'm remembering this right...): 1) If FL_SLEEP, then return 0 if granted, and on contention either: a) block, or b) return -EAGAIN, and call fl_notify when the lock should be retried. 2) If !FL_SLEEP, then either a) return a result now, or b) return -EINPROGRESS, then call fl_grant with the result. In each case, b) is only of course allowed if the corresponding callback is defined. If you want to support NFS exports, then you should also take option b) when possible. (Though note only 1b is *required* to avoid deadlocks; 2b is basically just a performance enhancement. We could perhaps even ditch 2b if we fixed lockd's BKL reliance and made it multithreaded.) The names can be confusing, since fl_notify generally results in granting a previously blocked lock, whereas fl_grant just returns the result (positive or negative) of a single lock request. They have the names they do because of the difference in their semantics: - With fl_grant the filesystem says: I'm giving you the final result of the lock operation. In particular, if I'm telling you it succeeded, that means I've already granted you the lock; don't ask me about it again. - With fl_notify the filesystem says: something just happened to this lock. I'm not guaranteeing anything, I'm just telling you this would be a good time to try the lock again. Do it and maybe you'll get lucky! This difference is why the fl_grant() operation takes a status argument (to tell the lock manager what happened), whereas fl_notify() doesn't (since it's just a poke in the ribs, and only the subsequent lock call will give the real status). Is that explanation helpful? OK, but I haven't read your patch yet, apologies.... --b. > > As a first step, how about doing the following: > > For 3) use LOCK_FILE_ASYNC as a return value. AFAICS there's no > reason to distinguish between a) and b). For 1) leave the -EAGAIN > error, since in that case no lock was queued. > > Here's an untested patch. Comments? > > Miklos > > > --- > fs/gfs2/locking/dlm/plock.c | 2 +- > fs/lockd/svclock.c | 13 ++++--------- > fs/locks.c | 20 ++++++++++---------- > include/linux/fs.h | 15 +++++++++++++++ > 4 files changed, 30 insertions(+), 20 deletions(-) > > Index: linux-2.6/fs/locks.c > =================================================================== > --- linux-2.6.orig/fs/locks.c 2008-04-14 22:19:57.000000000 +0200 > +++ linux-2.6/fs/locks.c 2008-04-14 22:52:06.000000000 +0200 > @@ -784,8 +784,10 @@ find_conflict: > if (!flock_locks_conflict(request, fl)) > continue; > error = -EAGAIN; > - if (request->fl_flags & FL_SLEEP) > - locks_insert_block(fl, request); > + if (!(request->fl_flags & FL_SLEEP)) > + goto out; > + error = FILE_LOCK_ASYNC; > + locks_insert_block(fl, request); > goto out; > } > if (request->fl_flags & FL_ACCESS) > @@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod > error = -EDEADLK; > if (posix_locks_deadlock(request, fl)) > goto out; > - error = -EAGAIN; > + error = FILE_LOCK_ASYNC; > locks_insert_block(fl, request); > goto out; > } > @@ -1040,7 +1042,7 @@ int posix_lock_file_wait(struct file *fi > might_sleep (); > for (;;) { > error = posix_lock_file(filp, fl, NULL); > - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_ASYNC) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); > if (!error) > @@ -1112,9 +1114,7 @@ int locks_mandatory_area(int read_write, > > for (;;) { > error = __posix_lock_file(inode, &fl, NULL); > - if (error != -EAGAIN) > - break; > - if (!(fl.fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_ASYNC) > break; > error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); > if (!error) { > @@ -1534,7 +1534,7 @@ int flock_lock_file_wait(struct file *fi > might_sleep(); > for (;;) { > error = flock_lock_file(filp, fl); > - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_ASYNC) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); > if (!error) > @@ -1806,7 +1806,7 @@ again: > else { > for (;;) { > error = posix_lock_file(filp, file_lock, NULL); > - if (error != -EAGAIN || cmd == F_SETLK) > + if (error != FILE_LOCK_ASYNC) > break; > error = wait_event_interruptible(file_lock->fl_wait, > !file_lock->fl_next); > @@ -1934,7 +1934,7 @@ again: > else { > for (;;) { > error = posix_lock_file(filp, file_lock, NULL); > - if (error != -EAGAIN || cmd == F_SETLK64) > + if (error != FILE_LOCK_ASYNC) > break; > error = wait_event_interruptible(file_lock->fl_wait, > !file_lock->fl_next); > Index: linux-2.6/fs/gfs2/locking/dlm/plock.c > =================================================================== > --- linux-2.6.orig/fs/gfs2/locking/dlm/plock.c 2008-04-09 16:44:15.000000000 +0200 > +++ linux-2.6/fs/gfs2/locking/dlm/plock.c 2008-04-14 22:54:05.000000000 +0200 > @@ -108,7 +108,7 @@ int gdlm_plock(void *lockspace, struct l > if (xop->callback == NULL) > wait_event(recv_wq, (op->done != 0)); > else > - return -EINPROGRESS; > + return FILE_LOCK_ASYNC; > > spin_lock(&ops_lock); > if (!list_empty(&op->list)) { > Index: linux-2.6/fs/lockd/svclock.c > =================================================================== > --- linux-2.6.orig/fs/lockd/svclock.c 2008-04-09 16:44:18.000000000 +0200 > +++ linux-2.6/fs/lockd/svclock.c 2008-04-14 22:55:20.000000000 +0200 > @@ -423,8 +423,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru > goto out; > case -EAGAIN: > ret = nlm_lck_denied; > - break; > - case -EINPROGRESS: > + goto out; > + case FILE_LOCK_ASYNC: > if (wait) > break; > /* Filesystem lock operation is in progress > @@ -439,10 +439,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru > goto out; > } > > - ret = nlm_lck_denied; > - if (!wait) > - goto out; > - > ret = nlm_lck_blocked; > > /* Append to list of blocked */ > @@ -520,7 +516,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, > } > > error = vfs_test_lock(file->f_file, &lock->fl); > - if (error == -EINPROGRESS) { > + if (error == FILE_LOCK_ASYNC) { > ret = nlmsvc_defer_lock_rqst(rqstp, block); > goto out; > } > @@ -744,8 +740,7 @@ nlmsvc_grant_blocked(struct nlm_block *b > switch (error) { > case 0: > break; > - case -EAGAIN: > - case -EINPROGRESS: > + case FILE_LOCK_ASYNC: > dprintk("lockd: lock still blocked error %d\n", error); > nlmsvc_insert_block(block, NLM_NEVER); > nlmsvc_release_block(block); > Index: linux-2.6/include/linux/fs.h > =================================================================== > --- linux-2.6.orig/include/linux/fs.h 2008-04-09 16:44:54.000000000 +0200 > +++ linux-2.6/include/linux/fs.h 2008-04-14 23:03:03.000000000 +0200 > @@ -837,6 +837,21 @@ extern spinlock_t files_lock; > #define FL_SLEEP 128 /* A blocking lock */ > > /* > + * FILE_LOCK_ASYNC: > + * > + * Special return value from posix_lock_file() and vfs_lock_file() > + * for asynchronous locking. > + * > + * For posix_lock_file() it means, that the lock has been queued on > + * the block list. > + * > + * For vfs_lock_file() it means, that the filesystem will call back > + * either fl_notify() or fl_granted() when the lock needs to be > + * retried or if it has been granted. > + */ > +#define FILE_LOCK_ASYNC 1 > + > +/* > * The POSIX file lock owner is determined by > * the "struct files_struct" in the thread group > * (or NULL for no owner - BSD locks). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-15 18:58 ` J. Bruce Fields @ 2008-04-16 16:28 ` Miklos Szeredi 2008-04-17 22:26 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Miklos Szeredi @ 2008-04-16 16:28 UTC (permalink / raw) To: bfields Cc: miklos, trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel > > > The behavior of lockd will still depend to some degree on the exact > > > error returned from the filesystem--e.g. if you return -EAGAIN from > > > ->lock() without later calling ->fl_grant() it will cause some > > > unexpected delays. (Though again clients will eventually give up and > > > poll for the lock.) > > > > OK, so the semantics of vfs_lock_file() are now: > > Not quite; the original idea was that we didn't care about the blocking > lock case, since there's already a lock manager callback for that. > (It's arguably a minor abuse for us to return NLM_LCK_BLOCKED and then > do a grant later even in the case where there's no conflict, but it > works.) So we only changed the nonblocking case. > > Which may be sacrificing elegance for expediency, and I'm not opposed to > fixing that in whatever way seems best. As a start, we could document > this better. So: > > > > > 1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention > > 2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on > > contention > > 3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on > > contention: > > a) either return -EINPROGRESS and call fl_grant when granted > > b) or return -EAGAIN and call fl_notify when the lock needs retrying > > I'd put it this way (after a quick check of the code to convince myself > I'm remembering this right...): > > 1) If FL_SLEEP, then return 0 if granted, and on contention either: > a) block, or > b) return -EAGAIN, and call fl_notify when the lock should be > retried. Gfs2 seems to return -EINPROGRESS regardless of the FL_SLEEP flag: c) return -EINPROGRESS and call f_grant when the lock is granted > 2) If !FL_SLEEP, then either > a) return a result now, or > b) return -EINPROGRESS, then call fl_grant with the result. OK, I missed that. > In each case, b) is only of course allowed if the corresponding callback > is defined. If you want to support NFS exports, then you should also > take option b) when possible. (Though note only 1b is *required* to > avoid deadlocks; 2b is basically just a performance enhancement. We > could perhaps even ditch 2b if we fixed lockd's BKL reliance and made it > multithreaded.) > > The names can be confusing, since fl_notify generally results in > granting a previously blocked lock, whereas fl_grant just returns the > result (positive or negative) of a single lock request. They have the > names they do because of the difference in their semantics: > > - With fl_grant the filesystem says: I'm giving you the final > result of the lock operation. In particular, if I'm telling > you it succeeded, that means I've already granted you the > lock; don't ask me about it again. > > - With fl_notify the filesystem says: something just happened to > this lock. I'm not guaranteeing anything, I'm just telling > you this would be a good time to try the lock again. Do it > and maybe you'll get lucky! > > This difference is why the fl_grant() operation takes a status argument > (to tell the lock manager what happened), whereas fl_notify() doesn't > (since it's just a poke in the ribs, and only the subsequent lock call > will give the real status). > > Is that explanation helpful? Yes, thanks. > OK, but I haven't read your patch yet, apologies.... No problem. Here it is again with some cosmetic fixes and testing. Miklos -- Use a special error value FILE_LOCK_DEFERRED to mean that a locking operation returned asynchronously. This is returned by posix_lock_file() for sleeping locks to mean that the lock has been queued on the block list, and will be woken up when it might become available and needs to be retried (either fl_lmops->fl_notify() is called or fl_wait is woken up). f_op->lock() to mean either the above, or that the filesystem will call back with fl_lmops->fl_grant() when the result of the locking operation is known. The filesystem can do this for sleeping as well as non-sleeping locks. This is to make sure, that return values of -EAGAIN and -EINPROGRESS by filesystems are not mistaken to mean an asynchronous locking. This also makes error handling in fs/locks.c and lockd/svclock.c slightly cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- fs/gfs2/locking/dlm/plock.c | 2 +- fs/lockd/svclock.c | 13 ++++--------- fs/locks.c | 28 ++++++++++++++-------------- include/linux/fs.h | 6 ++++++ 4 files changed, 25 insertions(+), 24 deletions(-) Index: linux-2.6/fs/locks.c =================================================================== --- linux-2.6.orig/fs/locks.c 2008-04-16 17:33:49.000000000 +0200 +++ linux-2.6/fs/locks.c 2008-04-16 17:38:01.000000000 +0200 @@ -784,8 +784,10 @@ find_conflict: if (!flock_locks_conflict(request, fl)) continue; error = -EAGAIN; - if (request->fl_flags & FL_SLEEP) - locks_insert_block(fl, request); + if (!(request->fl_flags & FL_SLEEP)) + goto out; + error = FILE_LOCK_DEFERRED; + locks_insert_block(fl, request); goto out; } if (request->fl_flags & FL_ACCESS) @@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod error = -EDEADLK; if (posix_locks_deadlock(request, fl)) goto out; - error = -EAGAIN; + error = FILE_LOCK_DEFERRED; locks_insert_block(fl, request); goto out; } @@ -1040,7 +1042,7 @@ int posix_lock_file_wait(struct file *fi might_sleep (); for (;;) { error = posix_lock_file(filp, fl, NULL); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1112,9 +1114,7 @@ int locks_mandatory_area(int read_write, for (;;) { error = __posix_lock_file(inode, &fl, NULL); - if (error != -EAGAIN) - break; - if (!(fl.fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); if (!error) { @@ -1534,7 +1534,7 @@ int flock_lock_file_wait(struct file *fi might_sleep(); for (;;) { error = flock_lock_file(filp, fl); - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); if (!error) @@ -1719,17 +1719,17 @@ out: * fl_grant is set. Callers expecting ->lock() to return asynchronously * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) * the request is for a blocking lock. When ->lock() does return asynchronously, - * it must return -EINPROGRESS, and call ->fl_grant() when the lock + * it must return FILE_LOCK_DEFERRED, and call ->fl_grant() when the lock * request completes. * If the request is for non-blocking lock the file system should return - * -EINPROGRESS then try to get the lock and call the callback routine with - * the result. If the request timed out the callback routine will return a + * FILE_LOCK_DEFERRED then try to get the lock and call the callback routine + * with the result. If the request timed out the callback routine will return a * nonzero return code and the file system should release the lock. The file * system is also responsible to keep a corresponding posix lock when it * grants a lock so the VFS can find out which locks are locally held and do * the correct lock cleanup when required. * The underlying filesystem must not drop the kernel lock or call - * ->fl_grant() before returning to the caller with a -EINPROGRESS + * ->fl_grant() before returning to the caller with a FILE_LOCK_DEFERRED * return code. */ int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf) @@ -1806,7 +1806,7 @@ again: else { for (;;) { error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(file_lock->fl_wait, !file_lock->fl_next); @@ -1934,7 +1934,7 @@ again: else { for (;;) { error = posix_lock_file(filp, file_lock, NULL); - if (error != -EAGAIN || cmd == F_SETLK64) + if (error != FILE_LOCK_DEFERRED) break; error = wait_event_interruptible(file_lock->fl_wait, !file_lock->fl_next); Index: linux-2.6/fs/gfs2/locking/dlm/plock.c =================================================================== --- linux-2.6.orig/fs/gfs2/locking/dlm/plock.c 2008-04-16 17:33:49.000000000 +0200 +++ linux-2.6/fs/gfs2/locking/dlm/plock.c 2008-04-16 17:35:46.000000000 +0200 @@ -108,7 +108,7 @@ int gdlm_plock(void *lockspace, struct l if (xop->callback == NULL) wait_event(recv_wq, (op->done != 0)); else - return -EINPROGRESS; + return FILE_LOCK_DEFERRED; spin_lock(&ops_lock); if (!list_empty(&op->list)) { Index: linux-2.6/fs/lockd/svclock.c =================================================================== --- linux-2.6.orig/fs/lockd/svclock.c 2008-04-16 17:33:49.000000000 +0200 +++ linux-2.6/fs/lockd/svclock.c 2008-04-16 17:35:46.000000000 +0200 @@ -423,8 +423,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; case -EAGAIN: ret = nlm_lck_denied; - break; - case -EINPROGRESS: + goto out; + case FILE_LOCK_DEFERRED: if (wait) break; /* Filesystem lock operation is in progress @@ -439,10 +439,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru goto out; } - ret = nlm_lck_denied; - if (!wait) - goto out; - ret = nlm_lck_blocked; /* Append to list of blocked */ @@ -520,7 +516,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, } error = vfs_test_lock(file->f_file, &lock->fl); - if (error == -EINPROGRESS) { + if (error == FILE_LOCK_DEFERRED) { ret = nlmsvc_defer_lock_rqst(rqstp, block); goto out; } @@ -744,8 +740,7 @@ nlmsvc_grant_blocked(struct nlm_block *b switch (error) { case 0: break; - case -EAGAIN: - case -EINPROGRESS: + case FILE_LOCK_DEFERRED: dprintk("lockd: lock still blocked error %d\n", error); nlmsvc_insert_block(block, NLM_NEVER); nlmsvc_release_block(block); Index: linux-2.6/include/linux/fs.h =================================================================== --- linux-2.6.orig/include/linux/fs.h 2008-04-16 17:33:49.000000000 +0200 +++ linux-2.6/include/linux/fs.h 2008-04-16 18:17:12.000000000 +0200 @@ -837,6 +837,12 @@ extern spinlock_t files_lock; #define FL_SLEEP 128 /* A blocking lock */ /* + * Special return value from posix_lock_file() and vfs_lock_file() for + * asynchronous locking. + */ +#define FILE_LOCK_DEFERRED 1 + +/* * The POSIX file lock owner is determined by * the "struct files_struct" in the thread group * (or NULL for no owner - BSD locks). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: nfs: infinite loop in fcntl(F_SETLKW) 2008-04-16 16:28 ` Miklos Szeredi @ 2008-04-17 22:26 ` J. Bruce Fields [not found] ` <20080417222620.GL9912-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2008-04-17 22:26 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, eshel, neilb, akpm, linux-nfs, linux-kernel, linux-fsdevel On Wed, Apr 16, 2008 at 06:28:05PM +0200, Miklos Szeredi wrote: > > > > The behavior of lockd will still depend to some degree on the exact > > > > error returned from the filesystem--e.g. if you return -EAGAIN from > > > > ->lock() without later calling ->fl_grant() it will cause some > > > > unexpected delays. (Though again clients will eventually give up and > > > > poll for the lock.) > > > > > > OK, so the semantics of vfs_lock_file() are now: > > > > Not quite; the original idea was that we didn't care about the blocking > > lock case, since there's already a lock manager callback for that. > > (It's arguably a minor abuse for us to return NLM_LCK_BLOCKED and then > > do a grant later even in the case where there's no conflict, but it > > works.) So we only changed the nonblocking case. > > > > Which may be sacrificing elegance for expediency, and I'm not opposed to > > fixing that in whatever way seems best. As a start, we could document > > this better. So: > > > > > > > > 1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention > > > 2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on > > > contention > > > 3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on > > > contention: > > > a) either return -EINPROGRESS and call fl_grant when granted > > > b) or return -EAGAIN and call fl_notify when the lock needs retrying > > > > I'd put it this way (after a quick check of the code to convince myself > > I'm remembering this right...): > > > > 1) If FL_SLEEP, then return 0 if granted, and on contention either: > > a) block, or > > b) return -EAGAIN, and call fl_notify when the lock should be > > retried. > > Gfs2 seems to return -EINPROGRESS regardless of the FL_SLEEP flag: Oops, you're right; in FL_SLEEP case fs/lockd/svclock.c:nlmsvc_lock() returns NLM_LCK_BLOCKED. I believe it'll get an fl_grant() callback after that and do a grant call back to the client, but I haven't checked.... Note, as has been pointed out by Mark Snitzer (http://article.gmane.org/gmane.linux.nfs/17801/), this limits the kind of error reporting the filesystem can do--if it needs to block on the initial lock call, it has to commit to queueing up, and eventually granting, the lock. > > OK, but I haven't read your patch yet, apologies.... > > No problem. Here it is again with some cosmetic fixes and testing. Thanks! Ping me in a couple days if I haven't made any comments. From a quick skim the GFS2 change and the error return change both seem reasonable. I need to a real GFS2 testing setup.... (Did you test GFS2 locking specifically?) --b. > > Miklos > -- > > > Use a special error value FILE_LOCK_DEFERRED to mean that a locking > operation returned asynchronously. This is returned by > > posix_lock_file() for sleeping locks to mean that the lock has been > queued on the block list, and will be woken up when it might become > available and needs to be retried (either fl_lmops->fl_notify() is > called or fl_wait is woken up). > > f_op->lock() to mean either the above, or that the filesystem will > call back with fl_lmops->fl_grant() when the result of the locking > operation is known. The filesystem can do this for sleeping as well > as non-sleeping locks. > > This is to make sure, that return values of -EAGAIN and -EINPROGRESS > by filesystems are not mistaken to mean an asynchronous locking. > > This also makes error handling in fs/locks.c and lockd/svclock.c > slightly cleaner. > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > --- > fs/gfs2/locking/dlm/plock.c | 2 +- > fs/lockd/svclock.c | 13 ++++--------- > fs/locks.c | 28 ++++++++++++++-------------- > include/linux/fs.h | 6 ++++++ > 4 files changed, 25 insertions(+), 24 deletions(-) > > Index: linux-2.6/fs/locks.c > =================================================================== > --- linux-2.6.orig/fs/locks.c 2008-04-16 17:33:49.000000000 +0200 > +++ linux-2.6/fs/locks.c 2008-04-16 17:38:01.000000000 +0200 > @@ -784,8 +784,10 @@ find_conflict: > if (!flock_locks_conflict(request, fl)) > continue; > error = -EAGAIN; > - if (request->fl_flags & FL_SLEEP) > - locks_insert_block(fl, request); > + if (!(request->fl_flags & FL_SLEEP)) > + goto out; > + error = FILE_LOCK_DEFERRED; > + locks_insert_block(fl, request); > goto out; > } > if (request->fl_flags & FL_ACCESS) > @@ -841,7 +843,7 @@ static int __posix_lock_file(struct inod > error = -EDEADLK; > if (posix_locks_deadlock(request, fl)) > goto out; > - error = -EAGAIN; > + error = FILE_LOCK_DEFERRED; > locks_insert_block(fl, request); > goto out; > } > @@ -1040,7 +1042,7 @@ int posix_lock_file_wait(struct file *fi > might_sleep (); > for (;;) { > error = posix_lock_file(filp, fl, NULL); > - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); > if (!error) > @@ -1112,9 +1114,7 @@ int locks_mandatory_area(int read_write, > > for (;;) { > error = __posix_lock_file(inode, &fl, NULL); > - if (error != -EAGAIN) > - break; > - if (!(fl.fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); > if (!error) { > @@ -1534,7 +1534,7 @@ int flock_lock_file_wait(struct file *fi > might_sleep(); > for (;;) { > error = flock_lock_file(filp, fl); > - if ((error != -EAGAIN) || !(fl->fl_flags & FL_SLEEP)) > + if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); > if (!error) > @@ -1719,17 +1719,17 @@ out: > * fl_grant is set. Callers expecting ->lock() to return asynchronously > * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) > * the request is for a blocking lock. When ->lock() does return asynchronously, > - * it must return -EINPROGRESS, and call ->fl_grant() when the lock > + * it must return FILE_LOCK_DEFERRED, and call ->fl_grant() when the lock > * request completes. > * If the request is for non-blocking lock the file system should return > - * -EINPROGRESS then try to get the lock and call the callback routine with > - * the result. If the request timed out the callback routine will return a > + * FILE_LOCK_DEFERRED then try to get the lock and call the callback routine > + * with the result. If the request timed out the callback routine will return a > * nonzero return code and the file system should release the lock. The file > * system is also responsible to keep a corresponding posix lock when it > * grants a lock so the VFS can find out which locks are locally held and do > * the correct lock cleanup when required. > * The underlying filesystem must not drop the kernel lock or call > - * ->fl_grant() before returning to the caller with a -EINPROGRESS > + * ->fl_grant() before returning to the caller with a FILE_LOCK_DEFERRED > * return code. > */ > int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf) > @@ -1806,7 +1806,7 @@ again: > else { > for (;;) { > error = posix_lock_file(filp, file_lock, NULL); > - if (error != -EAGAIN || cmd == F_SETLK) > + if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(file_lock->fl_wait, > !file_lock->fl_next); > @@ -1934,7 +1934,7 @@ again: > else { > for (;;) { > error = posix_lock_file(filp, file_lock, NULL); > - if (error != -EAGAIN || cmd == F_SETLK64) > + if (error != FILE_LOCK_DEFERRED) > break; > error = wait_event_interruptible(file_lock->fl_wait, > !file_lock->fl_next); > Index: linux-2.6/fs/gfs2/locking/dlm/plock.c > =================================================================== > --- linux-2.6.orig/fs/gfs2/locking/dlm/plock.c 2008-04-16 17:33:49.000000000 +0200 > +++ linux-2.6/fs/gfs2/locking/dlm/plock.c 2008-04-16 17:35:46.000000000 +0200 > @@ -108,7 +108,7 @@ int gdlm_plock(void *lockspace, struct l > if (xop->callback == NULL) > wait_event(recv_wq, (op->done != 0)); > else > - return -EINPROGRESS; > + return FILE_LOCK_DEFERRED; > > spin_lock(&ops_lock); > if (!list_empty(&op->list)) { > Index: linux-2.6/fs/lockd/svclock.c > =================================================================== > --- linux-2.6.orig/fs/lockd/svclock.c 2008-04-16 17:33:49.000000000 +0200 > +++ linux-2.6/fs/lockd/svclock.c 2008-04-16 17:35:46.000000000 +0200 > @@ -423,8 +423,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru > goto out; > case -EAGAIN: > ret = nlm_lck_denied; > - break; > - case -EINPROGRESS: > + goto out; > + case FILE_LOCK_DEFERRED: > if (wait) > break; > /* Filesystem lock operation is in progress > @@ -439,10 +439,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru > goto out; > } > > - ret = nlm_lck_denied; > - if (!wait) > - goto out; > - > ret = nlm_lck_blocked; > > /* Append to list of blocked */ > @@ -520,7 +516,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, > } > > error = vfs_test_lock(file->f_file, &lock->fl); > - if (error == -EINPROGRESS) { > + if (error == FILE_LOCK_DEFERRED) { > ret = nlmsvc_defer_lock_rqst(rqstp, block); > goto out; > } > @@ -744,8 +740,7 @@ nlmsvc_grant_blocked(struct nlm_block *b > switch (error) { > case 0: > break; > - case -EAGAIN: > - case -EINPROGRESS: > + case FILE_LOCK_DEFERRED: > dprintk("lockd: lock still blocked error %d\n", error); > nlmsvc_insert_block(block, NLM_NEVER); > nlmsvc_release_block(block); > Index: linux-2.6/include/linux/fs.h > =================================================================== > --- linux-2.6.orig/include/linux/fs.h 2008-04-16 17:33:49.000000000 +0200 > +++ linux-2.6/include/linux/fs.h 2008-04-16 18:17:12.000000000 +0200 > @@ -837,6 +837,12 @@ extern spinlock_t files_lock; > #define FL_SLEEP 128 /* A blocking lock */ > > /* > + * Special return value from posix_lock_file() and vfs_lock_file() for > + * asynchronous locking. > + */ > +#define FILE_LOCK_DEFERRED 1 > + > +/* > * The POSIX file lock owner is determined by > * the "struct files_struct" in the thread group > * (or NULL for no owner - BSD locks). > > > ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20080417222620.GL9912-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: nfs: infinite loop in fcntl(F_SETLKW) [not found] ` <20080417222620.GL9912-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2008-04-18 12:47 ` Miklos Szeredi 0 siblings, 0 replies; 25+ messages in thread From: Miklos Szeredi @ 2008-04-18 12:47 UTC (permalink / raw) To: bfields-uC3wQj2KruNg9hUCZPvPmw Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, trond.myklebust-41N18TsMXrtuMpJDpNschA, eshel-6kx38NOBMPqrIzol8Bc5pA, neilb-l3A5Bk7waGM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA > > > > 1) if !FL_SLEEP, then return 0 if granted, -EAGAIN on contention > > > > 2) if FL_SLEEP and fl_grant == NULL, then return 0 if granted, block on > > > > contention > > > > 3) if FL_SLEEP and fl_grant != NULL, then return 0 if granted, on > > > > contention: > > > > a) either return -EINPROGRESS and call fl_grant when granted > > > > b) or return -EAGAIN and call fl_notify when the lock needs retrying > > > > > > I'd put it this way (after a quick check of the code to convince myself > > > I'm remembering this right...): > > > > > > 1) If FL_SLEEP, then return 0 if granted, and on contention either: > > > a) block, or > > > b) return -EAGAIN, and call fl_notify when the lock should be > > > retried. > > > > Gfs2 seems to return -EINPROGRESS regardless of the FL_SLEEP flag: > > Oops, you're right; in FL_SLEEP case fs/lockd/svclock.c:nlmsvc_lock() > returns NLM_LCK_BLOCKED. I believe it'll get an fl_grant() callback > after that and do a grant call back to the client, but I haven't > checked.... > > Note, as has been pointed out by Mark Snitzer > (http://article.gmane.org/gmane.linux.nfs/17801/), this limits the kind > of error reporting the filesystem can do--if it needs to block on the > initial lock call, it has to commit to queueing up, and eventually > granting, the lock. OK, but that AFAICS is a lock manager implementation issue, not an API issue. Which is important, but not as important as the API ;) > > > OK, but I haven't read your patch yet, apologies.... > > > > No problem. Here it is again with some cosmetic fixes and testing. > > Thanks! Ping me in a couple days if I haven't made any comments. From > a quick skim the GFS2 change and the error return change both seem > reasonable. > > I need to a real GFS2 testing setup.... (Did you test GFS2 locking > specifically?) No gfs2, only ext3. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2008-04-18 12:47 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-09 15:57 [patch] fix infinite loop in generic_file_splice_read() Miklos Szeredi
2008-04-09 17:05 ` Oliver Pinter
2008-04-09 18:57 ` Andrew Morton
2008-04-09 19:25 ` Miklos Szeredi
2008-04-09 19:52 ` Jens Axboe
2008-04-10 6:29 ` Allard Hoeve
2008-04-10 19:51 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi
2008-04-10 21:02 ` Trond Myklebust
2008-04-10 21:07 ` Trond Myklebust
[not found] ` <1207861661.8180.18.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-10 21:20 ` Trond Myklebust
2008-04-10 21:54 ` J. Bruce Fields
2008-04-11 19:12 ` Miklos Szeredi
2008-04-11 19:19 ` J. Bruce Fields
[not found] ` <20080411191910.GB16965-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2008-04-11 19:22 ` Miklos Szeredi
2008-04-13 0:08 ` J. Bruce Fields
[not found] ` <20080413000830.GF31789-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2008-04-13 8:13 ` Miklos Szeredi
2008-04-14 17:07 ` J. Bruce Fields
[not found] ` <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-04-14 19:03 ` [PATCH] locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs J. Bruce Fields
[not found] ` <20080410215410.GF22324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2008-04-13 8:28 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi
2008-04-14 17:19 ` J. Bruce Fields
2008-04-14 21:15 ` Miklos Szeredi
2008-04-15 18:58 ` J. Bruce Fields
2008-04-16 16:28 ` Miklos Szeredi
2008-04-17 22:26 ` J. Bruce Fields
[not found] ` <20080417222620.GL9912-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2008-04-18 12:47 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).