From: Steve Dickson <SteveD@redhat.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: NFS@lists.sourceforge.net, Neil Brown <neilb@cse.unsw.edu.au>
Subject: Re: lockd recovery not working on RH with 2.6 kernel
Date: Thu, 18 Nov 2004 11:52:19 -0500 [thread overview]
Message-ID: <419CD343.4000600@RedHat.com> (raw)
In-Reply-To: <OF573E8465.7E3D5CC0-ON88256F49.0064BB5C-88256F49.00698B56@us.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]
Hey Trond,
Marc Eshel wrote:
>The problem is that after the NFS sever machine reboots its statd sends a
>notification to all NFS clients that had locking activity but the clients
>fail to reclaim their locks.
>
>
Well it appears things are a bit broken. Here is a client side patch that
enables the client to reclaim locks on a rebooted server.
The two main issues were nlm4svc_decode_reboot() not setting
the protocol which cause the nlm_host structure not to be found
and two, making nlmclnt_reclaim() retry when the portmapper was up
but lockd had not made it yet.... I also fixed a debugging
statement and well as added a couple... that I found useful....
Now the reclaim retry code currently retries forever in an
interruptible loop waiting for lockd to come up. This may or
may not be a good idea, but the client should not make any
assumptions about the health of the server, to I'm not sure there
is anything else that can be done....
Unfortunately this reclaim code freaks out the linux server, causing it
to send two back-to-back messages (both using the same xid) that
fails and then grant the lock.... It seems the dentry_open() call
(in nfsd_open()) is returning 30000 error value. Its not clear why or
what a 30000 value means.... I'm still looking in to that, but this code
was tested with both a Neapps filer and Solaris 10 server which seem
to work fine..
Comments?
SteveD.
[-- Attachment #2: linux-2.6.9-lockd-reclaims.patch --]
[-- Type: text/x-patch, Size: 2525 bytes --]
--- linux-2.6.9/fs/lockd/xdr4.c.org 2004-10-18 17:53:06.000000000 -0400
+++ linux-2.6.9/fs/lockd/xdr4.c 2004-11-18 10:44:27.324666000 -0500
@@ -355,6 +355,9 @@ nlm4svc_decode_reboot(struct svc_rqst *r
argp->state = ntohl(*p++);
/* Preserve the address in network byte order */
argp->addr = *p++;
+ argp->vers = *p++;
+ argp->proto = *p++;
+
return xdr_argsize_check(rqstp, p);
}
--- linux-2.6.9/fs/lockd/clntlock.c.org 2004-11-12 05:43:13.508648000 -0500
+++ linux-2.6.9/fs/lockd/clntlock.c 2004-11-18 07:57:33.464093000 -0500
@@ -173,7 +173,7 @@ void nlmclnt_prepare_reclaim(struct nlm_
host->h_nextrebind = 0;
nlm_rebind_host(host);
nlmclnt_mark_reclaim(host);
- dprintk("NLM: reclaiming locks for host %s", host->h_name);
+ dprintk("NLM: reclaiming locks for host %s\n", host->h_name);
}
/*
--- linux-2.6.9/fs/lockd/host.c.org 2004-10-18 17:54:31.000000000 -0400
+++ linux-2.6.9/fs/lockd/host.c 2004-11-18 07:58:26.263774000 -0500
@@ -190,15 +190,17 @@ nlm_bind_host(struct nlm_host *host)
}
} else {
xprt = xprt_create_proto(host->h_proto, &host->h_addr, NULL);
- if (IS_ERR(xprt))
+ if (IS_ERR(xprt)) {
+ dprintk("lockd: xprt_create_proto failed: %ld\n", PTR_ERR(xprt));
goto forgetit;
-
+ }
xprt_set_timeout(&xprt->timeout, 5, nlmsvc_timeout);
clnt = rpc_create_client(xprt, host->h_name, &nlm_program,
host->h_version, host->h_authflavor);
if (IS_ERR(clnt)) {
xprt_destroy(xprt);
+ dprintk("lockd: rpc_create_client failed: %ld\n", PTR_ERR(clnt));
goto forgetit;
}
clnt->cl_autobind = 1; /* turn on pmap queries */
--- linux-2.6.9/fs/lockd/clntproc.c.org 2004-10-18 17:55:36.000000000 -0400
+++ linux-2.6.9/fs/lockd/clntproc.c 2004-11-18 08:02:36.787274000 -0500
@@ -592,9 +592,25 @@ nlmclnt_reclaim(struct nlm_host *host, s
nlmclnt_setlockargs(req, fl);
req->a_args.reclaim = 1;
- if ((status = nlmclnt_call(req, NLMPROC_LOCK)) >= 0
- && req->a_res.status == NLM_LCK_GRANTED)
- return 0;
+again:
+ switch ((status = nlmclnt_call(req, NLMPROC_LOCK))) {
+ case 0:
+ if (req->a_res.status == NLM_LCK_GRANTED)
+ return 0;
+ break;
+ case -EAGAIN:
+ case -EACCES: /* portmapper might be up, but lockd isn't */
+ current->state = TASK_INTERRUPTIBLE;
+ schedule_timeout(10*HZ);
+ if (signalled()) {
+ status = -EINTR;
+ dprintk("lockd: reclaim got interrupted!\n");
+ break;
+ }
+ goto again;
+ default:
+ break;
+ }
printk(KERN_WARNING "lockd: failed to reclaim lock for pid %d "
"(errno %d, status %d)\n", fl->fl_pid,
next prev parent reply other threads:[~2004-11-18 16:52 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-11-11 19:12 lockd recovery not working on RH with 2.6 kernel Marc Eshel
2004-11-17 19:58 ` Steve Dickson
2004-11-18 16:52 ` Steve Dickson [this message]
2004-11-19 16:34 ` Trond Myklebust
2004-11-19 17:50 ` Steve Dickson
2004-11-19 20:24 ` Trond Myklebust
2004-11-19 20:27 ` Trond Myklebust
2004-11-19 21:40 ` Steve Dickson
2004-11-19 20:38 ` Steve Dickson
2004-11-23 0:45 ` unlock during lockd recovery Marc Eshel
2004-11-23 8:10 ` Olaf Kirch
2004-11-23 17:44 ` Marc Eshel
2004-11-24 8:59 ` Olaf Kirch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=419CD343.4000600@RedHat.com \
--to=steved@redhat.com \
--cc=NFS@lists.sourceforge.net \
--cc=neilb@cse.unsw.edu.au \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.