lockd recovery not working on RH with 2.6 kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

* lockd recovery not working on RH with 2.6 kernel
@ 2004-11-11 19:12 Marc Eshel
  2004-11-17 19:58 ` Steve Dickson
  2004-11-18 16:52 ` Steve Dickson
  0 siblings, 2 replies; 13+ messages in thread
From: Marc Eshel @ 2004-11-11 19:12 UTC (permalink / raw)
  To: NFS





The problem is that after the NFS sever machine reboots its statd sends a
notification to all NFS clients that had locking activity but the clients
fail to reclaim their locks.

I tried it with RedHat ES, 2.6.8 kernel, and nfs utils 1.0.6; and also with
RedHat Fedora, 2.6.5 kernel and nfs utils 1.0.6

It did work when I mount with '-o nfsvers=2' which used lockd version 1
instead of lockd version 4

Here is the debug messages on the NFS client:
      The debug messages with 'xxx' were added by me.
      as you can see in the 4th line the protocol and version are both 0
(p=0, v=0)
      in the following 2 lines you can see valid protocol and version
      but because the don't match with the input protocol and version the
host
      is not found and the client will not claim its locks.

Nov 11 11:35:03 hiper53 kernel: lockd: request from 7f000001
Nov 11 11:35:03 hiper53 kernel: lockd: nlmsvc_dispatch vers 4 proc 16
Nov 11 11:35:03 hiper53 kernel: lockd: SM_NOTIFY     called
Nov 11 11:35:03 hiper53 kernel: lockd: nlm_lookup_host(09018c42, p=0, v=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx1 nlm_lookup_host(server 0 s=0
p=17, v=4)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx2 nlm_lookup_host(server 0 s=0
p=17, v=1)
Nov 11 11:35:03 hiper53 kernel: lockd: creating host entry
Nov 11 11:35:03 hiper53 kernel: lockd: rebind host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: NLM: reclaiming locks for host 9.1.140.66
lockd: xxx2 nlmclnt_recovery h_reclaiming 1
Nov 11 11:35:03 hiper53 kernel: lockd: get host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: lockd: release host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx3 reclaimer start
Nov 11 11:35:03 hiper53 kernel: lockd: xxx4 reclaimer magic 6969 6969
Nov 11 11:35:03 hiper53 kernel: lockd: xxx5 reclaimer host
f7d43d00(9.1.140.66) f744bb80(9.1.140.66)
Nov 11 11:35:04 hiper53 kernel: lockd: release host 9.1.140.66




-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-11 19:12 lockd recovery not working on RH with 2.6 kernel Marc Eshel
@ 2004-11-17 19:58 ` Steve Dickson
  2004-11-18 16:52 ` Steve Dickson
  1 sibling, 0 replies; 13+ messages in thread
From: Steve Dickson @ 2004-11-17 19:58 UTC (permalink / raw)
  Cc: NFS

Marc Eshel wrote:

>The problem is that after the NFS sever machine reboots its statd sends a
>notification to all NFS clients that had locking activity but the clients
>fail to reclaim their locks.
>  
>
Looking into this... either I'm missing some really crucial patches or
lock recover with the 2.6.9/10 kernels is really broken.  I'm really
hoping its the former.... :) but there is what I'm seeing...

The client takes a lock. The server is rebooted (both 2.6.9 kernels).
The server statd sends the SM_NOTIFY to the client statd, and
client statd notifies the kernel, BUT not with enough information
for the kernel to find the granted lock, so the lock request is blown
off....

The details: since nlm4svc_decode_reboot() does not set argp->vers
or argp->proto, nlm_lookup_host() does not find the outstanding nlm_host
pointer so a new one is created, which causes both reclaimer() and
nlmclnt_mark_reclaim() not to find the file_lock pointer....

Now giving the kernel the correct information (i.e. setting  both 
argp->vers
and argp->proto to the correctly values), the correct nlm_host pointer is
found which cause the client to query the server portmapper for lockd's 
port.
Unfortunately, lockd is not up yet so the portmap query fails and again,
the request is blown off....

The details:  nlmclnt_reclaim() calls nlmclnt_call() which fails with
-EACCES because the portmapper is up but lockd is not.

Now when a retry mechanism is added to nlmclnt_reclaim() which
ignores the EACCES, a lock request, with the reclaim bit set, is
sent to the server. Unfortunately, the server (for a reason I have yet
to figure out) denies the lock but then immediately grants the lock.
The really bizarre thing is both server messages have the same XID!

Is anybody else seeing these type of oddities with lock recovery?

SteveD.

-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-11 19:12 lockd recovery not working on RH with 2.6 kernel Marc Eshel
  2004-11-17 19:58 ` Steve Dickson
@ 2004-11-18 16:52 ` Steve Dickson
  2004-11-19 16:34   ` Trond Myklebust
  1 sibling, 1 reply; 13+ messages in thread
From: Steve Dickson @ 2004-11-18 16:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

Hey Trond,

Marc Eshel wrote:

>The problem is that after the NFS sever machine reboots its statd sends a
>notification to all NFS clients that had locking activity but the clients
>fail to reclaim their locks.
>  
>
Well it appears things are a bit broken. Here is a client side patch that
enables the client to reclaim locks on a rebooted server.

The two main issues were nlm4svc_decode_reboot() not setting
the protocol which cause the nlm_host structure not to be found
and two, making nlmclnt_reclaim() retry when the portmapper was up
but lockd had not made it yet.... I also fixed a debugging
statement and well as added a couple... that I found useful....

Now the reclaim retry code currently retries forever in an
interruptible loop waiting for lockd to come up. This may or
may not be a good idea, but the client should not make any
assumptions about the health of the server, to I'm not sure there
is anything else that can be done....

Unfortunately this reclaim code freaks out the linux server, causing it
to send two back-to-back messages (both using the same xid) that
fails and then grant the lock.... It seems the dentry_open() call
(in nfsd_open()) is returning 30000 error value. Its not clear why or
what a 30000 value means....  I'm still looking in to that, but this code
was tested with both a Neapps filer and Solaris 10 server which seem
to work fine..

Comments? 

SteveD.

[-- Attachment #2: linux-2.6.9-lockd-reclaims.patch --]
[-- Type: text/x-patch, Size: 2525 bytes --]

--- linux-2.6.9/fs/lockd/xdr4.c.org	2004-10-18 17:53:06.000000000 -0400
+++ linux-2.6.9/fs/lockd/xdr4.c	2004-11-18 10:44:27.324666000 -0500
@@ -355,6 +355,9 @@ nlm4svc_decode_reboot(struct svc_rqst *r
 	argp->state = ntohl(*p++);
 	/* Preserve the address in network byte order */
 	argp->addr = *p++;
+	argp->vers = *p++;
+	argp->proto = *p++;
+
 	return xdr_argsize_check(rqstp, p);
 }
 
--- linux-2.6.9/fs/lockd/clntlock.c.org	2004-11-12 05:43:13.508648000 -0500
+++ linux-2.6.9/fs/lockd/clntlock.c	2004-11-18 07:57:33.464093000 -0500
@@ -173,7 +173,7 @@ void nlmclnt_prepare_reclaim(struct nlm_
 	host->h_nextrebind = 0;
 	nlm_rebind_host(host);
 	nlmclnt_mark_reclaim(host);
-	dprintk("NLM: reclaiming locks for host %s", host->h_name);
+	dprintk("NLM: reclaiming locks for host %s\n", host->h_name);
 }
 
 /*
--- linux-2.6.9/fs/lockd/host.c.org	2004-10-18 17:54:31.000000000 -0400
+++ linux-2.6.9/fs/lockd/host.c	2004-11-18 07:58:26.263774000 -0500
@@ -190,15 +190,17 @@ nlm_bind_host(struct nlm_host *host)
 		}
 	} else {
 		xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL);
-		if (IS_ERR(xprt))
+		if (IS_ERR(xprt)) {
+			dprintk("lockd: xprt_create_proto failed: %ld\n", PTR_ERR(xprt));
 			goto forgetit;
-
+		}
 		xprt_set_timeout(&xprt->timeout, 5, nlmsvc_timeout);
 
 		clnt = rpc_create_client(xprt, host->h_name, &nlm_program,
 					host->h_version, host->h_authflavor);
 		if (IS_ERR(clnt)) {
 			xprt_destroy(xprt);
+			dprintk("lockd: rpc_create_client failed: %ld\n", PTR_ERR(clnt));
 			goto forgetit;
 		}
 		clnt->cl_autobind = 1;	/* turn on pmap queries */
--- linux-2.6.9/fs/lockd/clntproc.c.org	2004-10-18 17:55:36.000000000 -0400
+++ linux-2.6.9/fs/lockd/clntproc.c	2004-11-18 08:02:36.787274000 -0500
@@ -592,9 +592,25 @@ nlmclnt_reclaim(struct nlm_host *host, s
 	nlmclnt_setlockargs(req, fl);
 	req->a_args.reclaim = 1;
 
-	if ((status = nlmclnt_call(req, NLMPROC_LOCK)) >= 0
-	 && req->a_res.status == NLM_LCK_GRANTED)
-		return 0;
+again:
+	switch ((status = nlmclnt_call(req, NLMPROC_LOCK))) {
+	case 0:
+		if (req->a_res.status == NLM_LCK_GRANTED)
+			return 0;
+		break;
+	case -EAGAIN:
+	case -EACCES: /* portmapper might be up, but lockd isn't */
+		current->state = TASK_INTERRUPTIBLE;
+		schedule_timeout(10*HZ);
+		if (signalled()) {
+			status = -EINTR;
+			dprintk("lockd: reclaim got interrupted!\n");
+			break;
+		}
+		goto again;
+	default:
+		break;
+	}
 
 	printk(KERN_WARNING "lockd: failed to reclaim lock for pid %d "
 				"(errno %d, status %d)\n", fl->fl_pid,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-18 16:52 ` Steve Dickson
@ 2004-11-19 16:34   ` Trond Myklebust
  2004-11-19 17:50     ` Steve Dickson
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2004-11-19 16:34 UTC (permalink / raw)
  To: Steve Dickson; +Cc: NFS, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 1567 bytes --]

to den 18.11.2004 Klokka 11:52 (-0500) skreiv Steve Dickson:
> Well it appears things are a bit broken. Here is a client side patch that
> enables the client to reclaim locks on a rebooted server.
> 
> The two main issues were nlm4svc_decode_reboot() not setting
> the protocol which cause the nlm_host structure not to be found
> and two, making nlmclnt_reclaim() retry when the portmapper was up
> but lockd had not made it yet.... I also fixed a debugging
> statement and well as added a couple... that I found useful....

Yep. Good work!

> Now the reclaim retry code currently retries forever in an
> interruptible loop waiting for lockd to come up. This may or
> may not be a good idea, but the client should not make any
> assumptions about the health of the server, to I'm not sure there
> is anything else that can be done....
> 
> Unfortunately this reclaim code freaks out the linux server, causing it
> to send two back-to-back messages (both using the same xid) that
> fails and then grant the lock.... It seems the dentry_open() call
> (in nfsd_open()) is returning 30000 error value. Its not clear why or
> what a 30000 value means....  I'm still looking in to that, but this code
> was tested with both a Neapps filer and Solaris 10 server which seem
> to work fine..

30000 ???? All kernel errors should be < 1000. Is this the perhaps the
bug with the unintialized variable in the mountd upcall code? I believe
the attached patch has already been committed to the nfs-utils CVS tree.

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>

[-- Attachment #2: Vedlagt melding - Fix a problem with an uninitialized variable in rpc.mountd... --]
[-- Type: message/rfc822, Size: 1300 bytes --]

From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: "Dr. Bruce Fields" <bfields@fieldses.org>
Subject: Fix a problem with an uninitialized variable in rpc.mountd...
Date: Sun, 05 Sep 2004 22:34:26 -0400
Message-ID: <1094438066.10492.73.camel@lade.trondhjem.org>

Currently, mountd return an "error -1073752996" on my laptop when it
cannot lookup the IP address.

Cheers,
  Trond

--- nfs-utils-1.0.6/utils/mountd/auth.c.orig	2003-07-14 18:10:12.000000000 -0400
+++ nfs-utils-1.0.6/utils/mountd/auth.c	2004-09-05 21:25:09.000000000 -0400
@@ -80,6 +80,7 @@ auth_authenticate_internal(char *what, s
 			my_client.m_naddr = 0;
 			my_client.m_addrlist[0] = caller->sin_addr;
 			n = client_compose(caller->sin_addr);
+			*error = unknown_host;
 			if (!n)
 				return NULL;
 			strcpy(my_client.m_hostname, *n?n:"DEFAULT");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-19 16:34   ` Trond Myklebust
@ 2004-11-19 17:50     ` Steve Dickson
  2004-11-19 20:24       ` Trond Myklebust
  2004-11-19 20:38       ` Steve Dickson
  0 siblings, 2 replies; 13+ messages in thread
From: Steve Dickson @ 2004-11-19 17:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS

Trond Myklebust wrote:

>to den 18.11.2004 Klokka 11:52 (-0500) skreiv Steve Dickson:
>  
>
>>Well it appears things are a bit broken. Here is a client side patch that
>>enables the client to reclaim locks on a rebooted server.
>>
>>The two main issues were nlm4svc_decode_reboot() not setting
>>the protocol which cause the nlm_host structure not to be found
>>and two, making nlmclnt_reclaim() retry when the portmapper was up
>>but lockd had not made it yet.... I also fixed a debugging
>>statement and well as added a couple... that I found useful....
>>    
>>
>
>Yep. Good work!
>  
>
cool... can I assuming the patch will be headed to one of the upstream 
kernels soon?

>>Unfortunately this reclaim code freaks out the linux server, causing it
>>to send two back-to-back messages (both using the same xid) that
>>fails and then grant the lock.... It seems the dentry_open() call
>>(in nfsd_open()) is returning 30000 error value. Its not clear why or
>>what a 30000 value means....  I'm still looking in to that, but this code
>>was tested with both a Neapps filer and Solaris 10 server which seem
>>to work fine..
>>    
>>
>
>30000 ???? All kernel errors should be < 1000. Is this the perhaps the
>bug with the unintialized variable in the mountd upcall code? I believe
>the attached patch has already been committed to the nfs-utils CVS tree.
>  
>
Well after further review.... dentry_open() is not the one failing with 
an error
code of 30000, its fh_verify() that's failing with 30000 which means 
nfserr_dropit.
Basically what this means is exp_find()  is returning EAGAIN because the 
there
is an upcall is already in process (or the cache is not yet fully 
primed)....

Unfortunately the NLM protocol does not support a EAGAIN notion and the way
the NLM rpc routines are setup, is does not seem possible to simply 
svc_drop
NLM messages....

So I've pinged Neil on how he would like to hand this....

SteveD.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-19 17:50     ` Steve Dickson
@ 2004-11-19 20:24       ` Trond Myklebust
  2004-11-19 20:27         ` Trond Myklebust
  2004-11-19 21:40         ` Steve Dickson
  2004-11-19 20:38       ` Steve Dickson
  1 sibling, 2 replies; 13+ messages in thread
From: Trond Myklebust @ 2004-11-19 20:24 UTC (permalink / raw)
  To: Steve Dickson; +Cc: NFS

fr den 19.11.2004 Klokka 12:50 (-0500) skreiv Steve Dickson:
> cool... can I assuming the patch will be headed to one of the upstream 
> kernels soon?

Yes.

> >>Unfortunately this reclaim code freaks out the linux server, causing it
> >>to send two back-to-back messages (both using the same xid) that
> >>fails and then grant the lock.... It seems the dentry_open() call
> >>(in nfsd_open()) is returning 30000 error value. Its not clear why or
> >>what a 30000 value means....  I'm still looking in to that, but this code
> >>was tested with both a Neapps filer and Solaris 10 server which seem
> >>to work fine..
> >>    
> >>
> >
> >30000 ???? All kernel errors should be < 1000. Is this the perhaps the
> >bug with the unintialized variable in the mountd upcall code? I believe
> >the attached patch has already been committed to the nfs-utils CVS tree.
> >  
> >
> Well after further review.... dentry_open() is not the one failing with 
> an error
> code of 30000, its fh_verify() that's failing with 30000 which means 
> nfserr_dropit.
> Basically what this means is exp_find()  is returning EAGAIN because the 
> there
> is an upcall is already in process (or the cache is not yet fully 
> primed)....
> 
> Unfortunately the NLM protocol does not support a EAGAIN notion and the way
> the NLM rpc routines are setup, is does not seem possible to simply 
> svc_drop
> NLM messages....

See
  http://sourceforge.net/mailarchive/message.php?msg_id=9712677

Marc and Sridhar have set up a method to allow lockd to defer answering
to a locking request. Their goal is to make lockd work with clustered
filesystems, but the basic idea is pretty much the same as what you want
to do here.

Just out of curiosity, though... Does this mean that knfsd is now
sometimes returning NFS3ERR_JUKEBOX to NFSv2 clients?

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-19 20:24       ` Trond Myklebust
@ 2004-11-19 20:27         ` Trond Myklebust
  2004-11-19 21:40         ` Steve Dickson
  1 sibling, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2004-11-19 20:27 UTC (permalink / raw)
  To: Steve Dickson; +Cc: NFS

fr den 19.11.2004 Klokka 15:24 (-0500) skreiv Trond Myklebust:
> Just out of curiosity, though... Does this mean that knfsd is now
> sometimes returning NFS3ERR_JUKEBOX to NFSv2 clients?

Ah... No... Looks like it is just dropping those requests. That is going
to REALLY SUCK on NFSv2 over TCP...


Oh well...

Cheers,
 Trond

-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-19 17:50     ` Steve Dickson
  2004-11-19 20:24       ` Trond Myklebust
@ 2004-11-19 20:38       ` Steve Dickson
  2004-11-23  0:45         ` unlock during lockd recovery Marc Eshel
  1 sibling, 1 reply; 13+ messages in thread
From: Steve Dickson @ 2004-11-19 20:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

Hey Neil,

Steve Dickson wrote:

> Unfortunately the NLM protocol does not support a EAGAIN notion and 
> the way
> the NLM rpc routines are setup, is does not seem possible to simply 
> svc_drop
> NLM messages....

Well... it turns out make the nlm rpc routines drop messages  was not 
that difficult.
Fairly straightforward actually....  basically copying working code in 
to other places
and  making things work just like the kNFSd does...

So that attached patch does the following:
1) Adds an internal nlm_lck_dropit error code.
2) Adds a  nlmsvc_dispatch() function that will drop message when the NLM
    procedure function returns nlm_lck_dropit.
3) Changes nlm_fopen() and nlm_lookup_file() to handle the nlm_lck_dropit
    error code.

Finally, I left in some of the truly helpful debugging statements. The 
ones that
were key in helping me figure out what was going on... Now I'm not one 
to force my
debugging style on anybody, but... having fh_verify() and exp_find_key() 
tell
us why they are failing is a good thing... imho...

Comments?

SteveD.

[-- Attachment #2: linux-2.6.9-lockd-svc-reclaims.patch --]
[-- Type: text/plain, Size: 6774 bytes --]

--- linux-2.6.9/include/linux/lockd/xdr.h.orig	2004-11-18 15:06:39.000000000 -0500
+++ linux-2.6.9/include/linux/lockd/xdr.h	2004-11-19 14:32:31.880197648 -0500
@@ -21,6 +21,11 @@
 #define	nlm_lck_denied_nolocks	__constant_htonl(NLM_LCK_DENIED_NOLOCKS)
 #define	nlm_lck_blocked		__constant_htonl(NLM_LCK_BLOCKED)
 #define	nlm_lck_denied_grace_period	__constant_htonl(NLM_LCK_DENIED_GRACE_PERIOD)
+/* error codes for internal use */
+/* if a request fails due to kmalloc failure, it gets dropped.
+ *  Client should resend eventually
+ */
+#define	nlm_lck_dropit		__constant_htonl(30000)
 
 /* Lock info passed via NLM */
 struct nlm_lock {
--- linux-2.6.9/fs/nfsd/nfsfh.c.orig	2004-11-18 15:06:39.000000000 -0500
+++ linux-2.6.9/fs/nfsd/nfsfh.c	2004-11-19 14:51:20.079685256 -0500
@@ -142,13 +142,15 @@ fh_verify(struct svc_rqst *rqstp, struct
 		}
 
 		error = nfserr_dropit;
-		if (IS_ERR(exp) && PTR_ERR(exp) == -EAGAIN)
+		if (IS_ERR(exp) && PTR_ERR(exp) == -EAGAIN) {
+			dprintk("nfsd: fh_verify failed: nfserr_dropit\n");
 			goto out;
-
+		}
 		error = nfserr_stale; 
-		if (!exp || IS_ERR(exp))
+		if (!exp || IS_ERR(exp)) {
+			dprintk("nfsd: fh_verify failed: nfserr_stale\n");
 			goto out;
-
+		}
 		/* Check if the request originated from a secure port. */
 		error = nfserr_perm;
 		if (!rqstp->rq_secure && EX_SECURE(exp)) {
@@ -162,6 +164,7 @@ fh_verify(struct svc_rqst *rqstp, struct
 		/* Set user creds for this exportpoint */
 		error = nfsd_setuser(rqstp, exp);
 		if (error) {
+			dprintk("nfsd: nfsd_setuser failed: %d\n", error);
 			error = nfserrno(error);
 			goto out;
 		}
@@ -198,6 +201,7 @@ fh_verify(struct svc_rqst *rqstp, struct
 		if (dentry == NULL)
 			goto out;
 		if (IS_ERR(dentry)) {
+			dprintk("nfsd: CALL(nop,decode_fh) failed: %ld\n", PTR_ERR(dentry));
 			if (PTR_ERR(dentry) != -EINVAL)
 				error = nfserrno(PTR_ERR(dentry));
 			goto out;
@@ -243,6 +247,7 @@ fh_verify(struct svc_rqst *rqstp, struct
 			error = nfserr_isdir;
 		else
 			error = nfserr_inval;
+		dprintk("nfsd: bad type: %d\n", ntohl(error));
 		goto out;
 	}
 	if (type < 0 && (inode->i_mode & S_IFMT) == -type) {
@@ -252,6 +257,7 @@ fh_verify(struct svc_rqst *rqstp, struct
 			error = nfserr_isdir;
 		else
 			error = nfserr_notdir;
+		dprintk("nfsd: bad type2: %d\n", ntohl(error));
 		goto out;
 	}
 
--- linux-2.6.9/fs/nfsd/lockd.c.orig	2004-10-18 17:54:55.000000000 -0400
+++ linux-2.6.9/fs/nfsd/lockd.c	2004-11-19 10:10:10.239244488 -0500
@@ -42,15 +42,18 @@ nlm_fopen(struct svc_rqst *rqstp, struct
  	/* nlm and nfsd don't share error codes.
 	 * we invent: 0 = no error
 	 *            1 = stale file handle
-	 *	      2 = other error
+	 *            2 = nfserr_dropit (or -EAGAIN)
+	 *	          3 = other error
 	 */
 	switch (nfserr) {
 	case nfs_ok:
 		return 0;
 	case nfserr_stale:
 		return 1;
-	default:
+	case nfserr_dropit:
 		return 2;
+	default:
+		return 3;
 	}
 }
 
--- linux-2.6.9/fs/nfsd/export.c.orig	2004-10-18 17:54:32.000000000 -0400
+++ linux-2.6.9/fs/nfsd/export.c	2004-11-19 14:54:37.145726664 -0500
@@ -509,9 +509,12 @@ exp_find_key(svc_client *clp, int fsid_t
 	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
 
 	ek = svc_expkey_lookup(&key, 0);
-	if (ek != NULL)
-		if ((err = cache_check(&svc_expkey_cache, &ek->h, reqp)))
+	if (ek != NULL) {
+		if ((err = cache_check(&svc_expkey_cache, &ek->h, reqp))) {
+			dprintk("exp_find_key: cache_check failed: %d\n", err);
 			ek = ERR_PTR(err);
+		}
+	}
 	return ek;
 }
 
--- linux-2.6.9/fs/lockd/svcsubs.c.orig	2004-10-18 17:54:37.000000000 -0400
+++ linux-2.6.9/fs/lockd/svcsubs.c	2004-11-19 14:32:57.842250816 -0500
@@ -90,7 +90,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 	 * the file.
 	 */
 	if ((nfserr = nlmsvc_ops->fopen(rqstp, f, &file->f_file)) != 0) {
-		dprintk("lockd: open failed (nfserr %d)\n", ntohl(nfserr));
+		dprintk("lockd: open failed (nfserr %d)\n", nfserr);
 		goto out_free;
 	}
 
@@ -114,7 +114,10 @@ out_free:
 		nfserr = nlm4_stale_fh;
 	else
 #endif
-	nfserr = nlm_lck_denied;
+	if (nfserr == 2)
+		nfserr = nlm_lck_dropit;
+	else
+		nfserr = nlm_lck_denied;
 	goto out_unlock;
 }
 
--- linux-2.6.9/fs/lockd/svc4proc.c.orig	2004-11-18 15:06:39.000000000 -0500
+++ linux-2.6.9/fs/lockd/svc4proc.c	2004-11-19 14:56:36.204626960 -0500
@@ -128,9 +128,12 @@ nlm4svc_proc_lock(struct svc_rqst *rqstp
 	}
 
 	/* Obtain client and file */
-	if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
+	if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file))) {
+		dprintk("lockd: LOCK(args)    status %d\n", ntohl(resp->status));
+		if (resp->status == nlm_lck_dropit)
+			return nlm_lck_dropit;
 		return rpc_success;
-
+	}
 #if 0
 	/* If supplied state doesn't match current state, we assume it's
 	 * an old request that time-warped somehow. Any error return would
--- linux-2.6.9/fs/lockd/svc.c.orig	2004-11-18 15:06:39.000000000 -0500
+++ linux-2.6.9/fs/lockd/svc.c	2004-11-19 14:39:07.076118736 -0500
@@ -86,6 +86,46 @@ static inline void clear_grace_period(vo
 {
 	nlmsvc_grace_period = 0;
 }
+int
+nlmsvc_dispatch(struct svc_rqst *rqstp, u32 *statp)
+{
+	struct svc_procedure	*procp;
+	kxdrproc_t		xdr;
+	struct kvec *argv;
+	struct kvec *resv;
+
+	dprintk("nlmsvc_dispatch: vers %d proc %d\n",
+				rqstp->rq_vers, rqstp->rq_proc);
+
+	procp = rqstp->rq_procinfo;
+	argv = &rqstp->rq_arg.head[0];
+	resv = &rqstp->rq_res.head[0];
+
+	/* Decode arguments */
+	xdr = procp->pc_decode;
+	if (xdr && !xdr(rqstp, argv->iov_base, rqstp->rq_argp)) {
+		dprintk("nlmsvc_dispatch: failed to decode arguments!\n");
+		*statp = rpc_garbage_args;
+		return 1;
+	}
+	*statp = procp->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp);
+	if (*statp == nlm_lck_dropit) {
+		dprintk("nlmsvc_dispatch: dropping request\n");
+		return 0;
+	}
+
+	/* Encode reply */
+	if (*statp == rpc_success && (xdr = procp->pc_encode)
+	 && !xdr(rqstp, resv->iov_base+resv->iov_len, rqstp->rq_resp)) {
+		dprintk("nlmsvc_dispatch: failed to encode reply\n");
+		*statp = rpc_system_err;
+		return 1;
+	}
+
+	dprintk("nlmsvc_dispatch: statp %d\n", ntohl(*statp));
+
+	return 1;
+}
 
 /*
  * This is the lockd kernel thread
@@ -459,12 +499,14 @@ static struct svc_version	nlmsvc_version
 		.vs_vers	= 1,
 		.vs_nproc	= 17,
 		.vs_proc	= nlmsvc_procedures,
+		.vs_dispatch = nlmsvc_dispatch,
 		.vs_xdrsize	= NLMSVC_XDRSIZE,
 };
 static struct svc_version	nlmsvc_version3 = {
 		.vs_vers	= 3,
 		.vs_nproc	= 24,
 		.vs_proc	= nlmsvc_procedures,
+		.vs_dispatch = nlmsvc_dispatch,
 		.vs_xdrsize	= NLMSVC_XDRSIZE,
 };
 #ifdef CONFIG_LOCKD_V4
@@ -472,6 +514,7 @@ static struct svc_version	nlmsvc_version
 		.vs_vers	= 4,
 		.vs_nproc	= 24,
 		.vs_proc	= nlmsvc_procedures4,
+		.vs_dispatch = nlmsvc_dispatch,
 		.vs_xdrsize	= NLMSVC_XDRSIZE,
 };
 #endif

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: lockd recovery not working on RH with 2.6 kernel
  2004-11-19 20:24       ` Trond Myklebust
  2004-11-19 20:27         ` Trond Myklebust
@ 2004-11-19 21:40         ` Steve Dickson
  1 sibling, 0 replies; 13+ messages in thread
From: Steve Dickson @ 2004-11-19 21:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS

Trond Myklebust wrote:

>See
>  http://sourceforge.net/mailarchive/message.php?msg_id=9712677
>  
>
This seems a bit more complicated that was need to recover locks...
Although the nlmsvc_dispatch routine is very similar to the one
I posted....

>Just out of curiosity, though... Does this mean that knfsd is now
>sometimes returning NFS3ERR_JUKEBOX to NFSv2 clients?
>  
>
No... the patch I posted just drops messages. It does not change or
return any new return values.

SteveD.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* unlock during lockd recovery
  2004-11-19 20:38       ` Steve Dickson
@ 2004-11-23  0:45         ` Marc Eshel
  2004-11-23  8:10           ` Olaf Kirch
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Eshel @ 2004-11-23  0:45 UTC (permalink / raw)
  To: NFS

Is there a reason way unlock needs to wait for the lockd to come out of
grace period ?
If the protocol allows for it I would let unlock request through during
grace period.
Marc.



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unlock during lockd recovery
  2004-11-23  0:45         ` unlock during lockd recovery Marc Eshel
@ 2004-11-23  8:10           ` Olaf Kirch
  2004-11-23 17:44             ` Marc Eshel
  0 siblings, 1 reply; 13+ messages in thread
From: Olaf Kirch @ 2004-11-23  8:10 UTC (permalink / raw)
  To: Marc Eshel; +Cc: NFS

On Mon, Nov 22, 2004 at 04:45:53PM -0800, Marc Eshel wrote:
> Is there a reason way unlock needs to wait for the lockd to come out of
> grace period ?

The lock may not have been reclaimed by the client yet, so you may end up
with a stale lock.

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unlock during lockd recovery
  2004-11-23  8:10           ` Olaf Kirch
@ 2004-11-23 17:44             ` Marc Eshel
  2004-11-24  8:59               ` Olaf Kirch
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Eshel @ 2004-11-23 17:44 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: NFS, nfs-admin



nfs-admin@lists.sourceforge.net wrote on 11/23/2004 12:10:47 AM:

> On Mon, Nov 22, 2004 at 04:45:53PM -0800, Marc Eshel wrote:
> > Is there a reason why unlock needs to wait for the lockd to come out of
> > grace period ?

> The lock may not have been reclaimed by the client yet, so you may end up
> with a stale lock.


If the client application unlocks the lock before it was reclaimed than it
should not be reclaimed.
Isn't there some serialization of activity on the client between the client
requests for lock/unlock and the reclaim process?

Marc.


> Olaf
> --
> Olaf Kirch     | Things that make Monday morning interesting, #2:
> okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
> ---------------+   running out of privileged ports?"

>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unlock during lockd recovery
  2004-11-23 17:44             ` Marc Eshel
@ 2004-11-24  8:59               ` Olaf Kirch
  0 siblings, 0 replies; 13+ messages in thread
From: Olaf Kirch @ 2004-11-24  8:59 UTC (permalink / raw)
  To: Marc Eshel; +Cc: NFS, nfs-admin

On Tue, Nov 23, 2004 at 09:44:10AM -0800, Marc Eshel wrote:
> If the client application unlocks the lock before it was reclaimed than it
> should not be reclaimed.

The problem is that your lockd server would make assumptions about the
client's implementation; in particular, this would mandate that the client
prevents any regular NLM activity while it's in the middle of a reclaim.

However, the X/Open spec for NLM says about NLM_LOCK: "During the grace
period, the server will only accept locks with reclaim set to true."
So the client is free to assume that it's okay to keep on retransmitting
LOCK/UNLOCK request all along, without having to care about reclaim or not,
because the spec says the server will ignore them anyway.

Consider this scenario:

 -	Server tells client to start reclaim
 -	Client sends reclaim request for lock X
 -	RPC packet gets lost
 -	Application requests to unlock X
 -	Client calls server, server finds there's nothing to
 	unlock, ACKs the RPC call
 -	Client retransmits reclaim packet, server re-installs
 	the lock
 -	you have a stale lock

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-11-24  8:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-11 19:12 lockd recovery not working on RH with 2.6 kernel Marc Eshel
2004-11-17 19:58 ` Steve Dickson
2004-11-18 16:52 ` Steve Dickson
2004-11-19 16:34   ` Trond Myklebust
2004-11-19 17:50     ` Steve Dickson
2004-11-19 20:24       ` Trond Myklebust
2004-11-19 20:27         ` Trond Myklebust
2004-11-19 21:40         ` Steve Dickson
2004-11-19 20:38       ` Steve Dickson
2004-11-23  0:45         ` unlock during lockd recovery Marc Eshel
2004-11-23  8:10           ` Olaf Kirch
2004-11-23 17:44             ` Marc Eshel
2004-11-24  8:59               ` Olaf Kirch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.