linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
@ 2011-08-04 10:30 Frank van Maarseveen
  2011-08-04 16:34 ` J. Bruce Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Frank van Maarseveen @ 2011-08-04 10:30 UTC (permalink / raw)
  To: Linux NFS mailing list

Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
relock_filesystem patch proposed earlier).

A second client has an exclusive lock on a file on the server. The
client under test calls fcntl(F_SETLKW) to wait for the same exclusive
lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.

Next the server is rebooted. The second client recovers the lock
correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
this changes to NLM_BLOCKED after grace period expiration the fcntl
returns -ENOLCK ("No locks available.") instead of continuing to wait.

server:/proc/locks shows two entries for the file after the -ENOLCK. When
the second client gives up its lock because the program running there
is killed one entry in server:/proc/locks remains indefinately: as a
result no NFS client can lock the file anymore.

-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 10:30 [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires Frank van Maarseveen
@ 2011-08-04 16:34 ` J. Bruce Fields
  2011-08-04 16:43   ` Frank van Maarseveen
  0 siblings, 1 reply; 10+ messages in thread
From: J. Bruce Fields @ 2011-08-04 16:34 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> relock_filesystem patch proposed earlier).
> 
> A second client has an exclusive lock on a file on the server. The
> client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> 
> Next the server is rebooted. The second client recovers the lock
> correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> this changes to NLM_BLOCKED after grace period expiration the fcntl
> returns -ENOLCK ("No locks available.") instead of continuing to wait.

So that sounds like a client bug, and correct behavior from the server
(assuming the second client was still holding the lock throughout).

> server:/proc/locks shows two entries for the file after the -ENOLCK. When
> the second client gives up its lock because the program running there
> is killed one entry in server:/proc/locks remains indefinately: as a
> result no NFS client can lock the file anymore.

But that sounds like a server bug--what do the two entries look like?

Also, what filesystem are you exporting?

--b.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 16:34 ` J. Bruce Fields
@ 2011-08-04 16:43   ` Frank van Maarseveen
  2011-08-04 16:49     ` J. Bruce Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Frank van Maarseveen @ 2011-08-04 16:43 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS mailing list

On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > relock_filesystem patch proposed earlier).
> > 
> > A second client has an exclusive lock on a file on the server. The
> > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > 
> > Next the server is rebooted. The second client recovers the lock
> > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> 
> So that sounds like a client bug, and correct behavior from the server
> (assuming the second client was still holding the lock throughout).

yes.

> 
> > server:/proc/locks shows two entries for the file after the -ENOLCK. When
> > the second client gives up its lock because the program running there
> > is killed one entry in server:/proc/locks remains indefinately: as a
> > result no NFS client can lock the file anymore.
> 
> But that sounds like a server bug--what do the two entries look like?

I think the server assumes correct client behavior; the client under
test resulted in a '->' prefixed entry. The fcntl at the client just
shouldn't have returned yet.

> 
> Also, what filesystem are you exporting?

ext4

-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 16:43   ` Frank van Maarseveen
@ 2011-08-04 16:49     ` J. Bruce Fields
  2011-08-04 17:10       ` Trond Myklebust
  2011-08-04 17:24       ` Frank van Maarseveen
  0 siblings, 2 replies; 10+ messages in thread
From: J. Bruce Fields @ 2011-08-04 16:49 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Linux NFS mailing list

On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > relock_filesystem patch proposed earlier).
> > > 
> > > A second client has an exclusive lock on a file on the server. The
> > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > 
> > > Next the server is rebooted. The second client recovers the lock
> > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > 
> > So that sounds like a client bug, and correct behavior from the server
> > (assuming the second client was still holding the lock throughout).
> 
> yes.
> 
> > 
> > > server:/proc/locks shows two entries for the file after the -ENOLCK. When
> > > the second client gives up its lock because the program running there
> > > is killed one entry in server:/proc/locks remains indefinately: as a
> > > result no NFS client can lock the file anymore.
> > 
> > But that sounds like a server bug--what do the two entries look like?
> 
> I think the server assumes correct client behavior; the client under
> test resulted in a '->' prefixed entry. The fcntl at the client just
> shouldn't have returned yet.

Oh, right, so did you see a granted callback returned to the client?

--b.

> 
> > 
> > Also, what filesystem are you exporting?
> 
> ext4
> 
> -- 
> Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 16:49     ` J. Bruce Fields
@ 2011-08-04 17:10       ` Trond Myklebust
  2011-08-04 17:27         ` Frank van Maarseveen
  2011-08-04 17:24       ` Frank van Maarseveen
  1 sibling, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2011-08-04 17:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Frank van Maarseveen, Linux NFS mailing list

On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: 
> On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > > relock_filesystem patch proposed earlier).
> > > > 
> > > > A second client has an exclusive lock on a file on the server. The
> > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > > 
> > > > Next the server is rebooted. The second client recovers the lock
> > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > > 
> > > So that sounds like a client bug, and correct behavior from the server
> > > (assuming the second client was still holding the lock throughout).
> > 
> > yes.

Is the client actually asking for a blocking lock after the grace period
expires?

> > > 
> > > > server:/proc/locks shows two entries for the file after the -ENOLCK. When
> > > > the second client gives up its lock because the program running there
> > > > is killed one entry in server:/proc/locks remains indefinately: as a
> > > > result no NFS client can lock the file anymore.
> > > 
> > > But that sounds like a server bug--what do the two entries look like?
> > 
> > I think the server assumes correct client behavior; the client under
> > test resulted in a '->' prefixed entry. The fcntl at the client just
> > shouldn't have returned yet.
> 
> Oh, right, so did you see a granted callback returned to the client?

The client will reject any unsolicited GRANTED callbacks with an
NLM_LCK_DENIED. As far as I can see, nlmsvc_grant_reply() then only
removes the block, it doesn't cancel the lock...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 16:49     ` J. Bruce Fields
  2011-08-04 17:10       ` Trond Myklebust
@ 2011-08-04 17:24       ` Frank van Maarseveen
  1 sibling, 0 replies; 10+ messages in thread
From: Frank van Maarseveen @ 2011-08-04 17:24 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS mailing list

On Thu, Aug 04, 2011 at 12:49:13PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > > relock_filesystem patch proposed earlier).
> > > > 
> > > > A second client has an exclusive lock on a file on the server. The
> > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > > 
> > > > Next the server is rebooted. The second client recovers the lock
> > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > > 
> > > So that sounds like a client bug, and correct behavior from the server
> > > (assuming the second client was still holding the lock throughout).
> > 
> > yes.
> > 
> > > 
> > > > server:/proc/locks shows two entries for the file after the -ENOLCK. When
> > > > the second client gives up its lock because the program running there
> > > > is killed one entry in server:/proc/locks remains indefinately: as a
> > > > result no NFS client can lock the file anymore.
> > > 
> > > But that sounds like a server bug--what do the two entries look like?
> > 
> > I think the server assumes correct client behavior; the client under
> > test resulted in a '->' prefixed entry. The fcntl at the client just
> > shouldn't have returned yet.
> 
> Oh, right, so did you see a granted callback returned to the client?

Hmm no, maybe it is a server bug. These are the final request and reply
(which result in the incorrect -ENOLCK for F_SETLKW at the client under
test), decoded by wireshark:

No.     Time        Source                Destination           Protocol Info
    529 225.386189  172.17.1.124          172.17.1.49           NLM      V4 LOCK Call (Reply In 530) FH:0xb17f38ea svid:10 pos:0-0

Frame 529: 246 bytes on wire (1968 bits), 246 bytes captured (1968 bits)
Network Lock Manager Protocol
    [Program Version: 4]
    [V4 Procedure: LOCK (2)]
    cookie: <DATA>
        length: 4
        contents: <DATA>
    block: Yes
    exclusive: Yes
    lock
        caller_name: lokka.tasking.nl
            length: 16
            contents: lokka.tasking.nl
        fh
            length: 28
            [hash (CRC-32): 0xb17f38ea]
            decode type as: unknown
            filehandle: 01000601e66f5c256cb3414eba710fcd882a67201b000000...
        owner: <DATA>
            length: 19
            contents: <DATA>
            fill bytes: opaque data
        svid: 10
        l_offset: 0
        l_len: 0
    reclaim: No
    state: 87

No.     Time        Source                Destination           Protocol Info
    530 225.386368  172.17.1.49           172.17.1.124          NLM      V4 LOCK Reply (Call In 529) NLM_BLOCKED

Frame 530: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Network Lock Manager Protocol
    [Program Version: 4]
    [V4 Procedure: LOCK (2)]
    cookie: <DATA>
        length: 4
        contents: <DATA>
    stat: NLM_BLOCKED (3)


-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 17:10       ` Trond Myklebust
@ 2011-08-04 17:27         ` Frank van Maarseveen
  2011-08-04 18:17           ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Frank van Maarseveen @ 2011-08-04 17:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: J. Bruce Fields, Linux NFS mailing list

On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote:
> On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: 
> > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > > > relock_filesystem patch proposed earlier).
> > > > > 
> > > > > A second client has an exclusive lock on a file on the server. The
> > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > > > 
> > > > > Next the server is rebooted. The second client recovers the lock
> > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > > > 
> > > > So that sounds like a client bug, and correct behavior from the server
> > > > (assuming the second client was still holding the lock throughout).
> > > 
> > > yes.
> 
> Is the client actually asking for a blocking lock after the grace period
> expires?

yes, according to my interpretation of that of wireshark, see reply to Bruce.

-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 17:27         ` Frank van Maarseveen
@ 2011-08-04 18:17           ` Trond Myklebust
  2011-08-05 13:28             ` Frank van Maarseveen
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2011-08-04 18:17 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: J. Bruce Fields, Linux NFS mailing list

On Thu, 2011-08-04 at 19:27 +0200, Frank van Maarseveen wrote: 
> On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote:
> > On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: 
> > > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> > > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > > > > relock_filesystem patch proposed earlier).
> > > > > > 
> > > > > > A second client has an exclusive lock on a file on the server. The
> > > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > > > > 
> > > > > > Next the server is rebooted. The second client recovers the lock
> > > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > > > > 
> > > > > So that sounds like a client bug, and correct behavior from the server
> > > > > (assuming the second client was still holding the lock throughout).
> > > > 
> > > > yes.
> > 
> > Is the client actually asking for a blocking lock after the grace period
> > expires?
> 
> yes, according to my interpretation of that of wireshark, see reply to Bruce.
> 

OK... Does the following patch help?

Cheers
  Trond
--- 
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 8392cb8..40c0d88 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc)
 			return -ENOLCK;
 		msg.rpc_proc = &clnt->cl_procinfo[proc];
 
+		/* Reset the reply status */
+		if (argp->block)
+			resp->status = nlm_lck_blocked;
 		/* Perform the RPC call. If an error occurs, try again */
 		if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) {
 			dprintk("lockd: rpc_call returned error %d\n", -status);

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-04 18:17           ` Trond Myklebust
@ 2011-08-05 13:28             ` Frank van Maarseveen
  2012-03-16 10:53               ` Ichiko Sakamoto
  0 siblings, 1 reply; 10+ messages in thread
From: Frank van Maarseveen @ 2011-08-05 13:28 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: J. Bruce Fields, Linux NFS mailing list

On Thu, Aug 04, 2011 at 02:17:35PM -0400, Trond Myklebust wrote:
> On Thu, 2011-08-04 at 19:27 +0200, Frank van Maarseveen wrote: 
> > On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote:
> > > On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: 
> > > > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
> > > > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
> > > > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
> > > > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
> > > > > > > relock_filesystem patch proposed earlier).
> > > > > > > 
> > > > > > > A second client has an exclusive lock on a file on the server. The
> > > > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
> > > > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
> > > > > > > 
> > > > > > > Next the server is rebooted. The second client recovers the lock
> > > > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
> > > > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
> > > > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
> > > > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
> > > > > > 
> > > > > > So that sounds like a client bug, and correct behavior from the server
> > > > > > (assuming the second client was still holding the lock throughout).
> > > > > 
> > > > > yes.
> > > 
> > > Is the client actually asking for a blocking lock after the grace period
> > > expires?
> > 
> > yes, according to my interpretation of that of wireshark, see reply to Bruce.
> > 
> 
> OK... Does the following patch help?
> 
> Cheers
>   Trond
> --- 
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index 8392cb8..40c0d88 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc)
>  			return -ENOLCK;
>  		msg.rpc_proc = &clnt->cl_procinfo[proc];
>  
> +		/* Reset the reply status */
> +		if (argp->block)
> +			resp->status = nlm_lck_blocked;
>  		/* Perform the RPC call. If an error occurs, try again */
>  		if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) {
>  			dprintk("lockd: rpc_call returned error %d\n", -status);
> 

Negative. I've tried it on the client under test and I'm seeing three
types of behavior, one good, two bad. In all cases the secondary
client (unmodified) correctly regains the lock after the server has
rebooted. Client under test behavior depends on whether it had queued
the conflicting lock before of after the server reboot. Afterwards it
seems to work with the above modification (don't know if that was the
case before though).

When the client under test tries to lock before the server reboot then
the fcntl(F_SETLKW) returns either right after the NSM NOTIFY with
-ENOLCK without any NLM trafic or it returns with -ENOLCK when the
NLM_DENIED_GRACE_PERIOD changes into NLM_BLOCKED (the original report).

-- 
Frank

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires.
  2011-08-05 13:28             ` Frank van Maarseveen
@ 2012-03-16 10:53               ` Ichiko Sakamoto
  0 siblings, 0 replies; 10+ messages in thread
From: Ichiko Sakamoto @ 2012-03-16 10:53 UTC (permalink / raw)
  To: linux-nfs; +Cc: frankvm, Trond.Myklebust, bfields

[-- Attachment #1: Type: text/plain, Size: 6466 bytes --]

(2011/08/05 22:28), Frank van Maarseveen wrote:

> On Thu, Aug 04, 2011 at 02:17:35PM -0400, Trond Myklebust wrote:
>> On Thu, 2011-08-04 at 19:27 +0200, Frank van Maarseveen wrote: 
>> > On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote:
>> > > On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: 
>> > > > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote:
>> > > > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote:
>> > > > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote:
>> > > > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the
>> > > > > > > relock_filesystem patch proposed earlier).
>> > > > > > > 
>> > > > > > > A second client has an exclusive lock on a file on the server. The
>> > > > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive
>> > > > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED.
>> > > > > > > 
>> > > > > > > Next the server is rebooted. The second client recovers the lock
>> > > > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for
>> > > > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When
>> > > > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl
>> > > > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait.
>> > > > > > 
>> > > > > > So that sounds like a client bug, and correct behavior from the server
>> > > > > > (assuming the second client was still holding the lock throughout).
>> > > > > 
>> > > > > yes.
>> > > 
>> > > Is the client actually asking for a blocking lock after the grace period
>> > > expires?
>> > 
>> > yes, according to my interpretation of that of wireshark, see reply to Bruce.
>> > 
>> 
>> OK... Does the following patch help?
>> 
>> Cheers
>>   Trond
>> --- 
>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>> index 8392cb8..40c0d88 100644
>> --- a/fs/lockd/clntproc.c
>> +++ b/fs/lockd/clntproc.c
>> @@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc)
>>  			return -ENOLCK;
>>  		msg.rpc_proc = &clnt->cl_procinfo[proc];
>>  
>> +		/* Reset the reply status */
>> +		if (argp->block)
>> +			resp->status = nlm_lck_blocked;
>>  		/* Perform the RPC call. If an error occurs, try again */
>>  		if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) {
>>  			dprintk("lockd: rpc_call returned error %d\n", -status);
>> 
> 
> Negative. I've tried it on the client under test and I'm seeing three
> types of behavior, one good, two bad. In all cases the secondary
> client (unmodified) correctly regains the lock after the server has
> rebooted. Client under test behavior depends on whether it had queued
> the conflicting lock before of after the server reboot. Afterwards it
> seems to work with the above modification (don't know if that was the
> case before though).
> 
> When the client under test tries to lock before the server reboot then
> the fcntl(F_SETLKW) returns either right after the NSM NOTIFY with
> -ENOLCK without any NLM trafic or it returns with -ENOLCK when the
> NLM_DENIED_GRACE_PERIOD changes into NLM_BLOCKED (the original report).
> 




Hi all

Was this fixed?
I have same issue in 3.2.9-2.fc16.

When the client recieves NSM NOTIFY, reclaimer() thread updates
block->b_status to nlm_lck_denied_grace_period.

fs/lockd/clntlock.c
      265        /* Now, wake up all processes that sleep on a blocked lock */
      266        spin_lock(&nlm_blocked_lock);
      267        list_for_each_entry(block, &nlm_blocked, b_list) {
      268                if (block->b_host == host) {
   *  269                        block->b_status = nlm_lck_denied_grace_period;
      270                        wake_up(&block->b_wait);
      271                }
      272        }
      273        spin_unlock(&nlm_blocked_lock);

Blocked process loops inside nlmclnt_call() during grace period,
and recieves NLM_BLOCKED again.
Then nlmclnt_block() copies block->b_status(== nlm_lck_denied_grace_period)
to req->a_res.status.

fs/lockd/clntlock.c
      139        ret = wait_event_interruptible_timeout(block->b_wait,
      140                        block->b_status != nlm_lck_blocked,
      141                        timeout);
      142        if (ret < 0)
      143                return -ERESTARTSYS;
   *  144        req->a_res.status = block->b_status;
      145        return 0;

.. and nlmclnt_lock() breaks retry loop and returns -ENOLCK.

fs/lockd/clntproc.c
      550                /* Wait on an NLM blocking lock */
      551                status = nlmclnt_block(block, req, NLMCLNT_POLL_TIMEOUT);
      552                if (status < 0)
      553                        break;
   *  554                if (resp->status != nlm_lck_blocked)
   *  555                        break;
      556        }
      ...
      590        if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP))
      591                status = -ENOLCK;
      592        else
   *  593                status = nlm_stat_to_errno(resp->status);
      594out_unblock:
      595        nlmclnt_finish_block(block);
      596out:
      597        nlmclnt_release_call(req);
   *  598        return status;



Following patch works fine in my fc16.

--- a/fs/lockd/clntlock.c	2012-01-04 23:55:44.000000000 +0000
+++ b/fs/lockd/clntlock.c	2012-03-16 08:08:03.793687409 +0000
@@ -121,6 +121,7 @@
 int nlmclnt_block(struct nlm_wait *block, struct nlm_rqst *req, long timeout)
 {
 	long ret;
+	u32 nsmstate;

 	/* A borken server might ask us to block even if we didn't
 	 * request it. Just say no!
@@ -136,8 +137,10 @@
 	 * a 1 minute timeout would do. See the comment before
 	 * nlmclnt_lock for an explanation.
 	 */
+	nsmstate = block->b_host->h_nsmstate;
 	ret = wait_event_interruptible_timeout(block->b_wait,
-			block->b_status != nlm_lck_blocked,
+			block->b_status != nlm_lck_blocked ||
+			block->b_host->h_nsmstate != nsmstate,
 			timeout);
 	if (ret < 0)
 		return -ERESTARTSYS;
@@ -266,7 +269,6 @@
 	spin_lock(&nlm_blocked_lock);
 	list_for_each_entry(block, &nlm_blocked, b_list) {
 		if (block->b_host == host) {
-			block->b_status = nlm_lck_denied_grace_period;
 			wake_up(&block->b_wait);
 		}
 	}


Thanks,
Ichiko



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5483 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-16 10:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-04 10:30 [NLM] fcntl(F_SETLKW) yields -ENOLCK when grace period expires Frank van Maarseveen
2011-08-04 16:34 ` J. Bruce Fields
2011-08-04 16:43   ` Frank van Maarseveen
2011-08-04 16:49     ` J. Bruce Fields
2011-08-04 17:10       ` Trond Myklebust
2011-08-04 17:27         ` Frank van Maarseveen
2011-08-04 18:17           ` Trond Myklebust
2011-08-05 13:28             ` Frank van Maarseveen
2012-03-16 10:53               ` Ichiko Sakamoto
2011-08-04 17:24       ` Frank van Maarseveen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).