From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jonathan Couper <jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Mutex Hang on GC in lockd for 2.6.22-14
Date: Mon, 21 Jan 2008 13:31:21 -0500 [thread overview]
Message-ID: <20080121183121.GJ17468@fieldses.org> (raw)
In-Reply-To: <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
On Mon, Jan 21, 2008 at 06:24:52PM +1300, Jonathan Couper wrote:
> Heya NFS Experts,
>
> I'm pretty darn sure I've found (and am currently testing a fix for) a
> mutex bug in lockd that hangs NFS server pretty reliably. I have no
> idea how this bug has gone unnoticed for so long, since this is the
> current Gutsy Gibbon kernel!
>
> That makes me think that maybe I'm mistaken. But my system definitely
> hangs, and the code definitely appears to attempt to lock twice on a
> non-recursive mutex.
>
> All the details are:
>
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996
Hm; from:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/2
"...and on the server side I will now see TWO "[lockd]"
processes where before I saw one."
That at least should be prevented by Jeff Layton's recent patches, which
you can get from
git://linux-nfs.org/~bfields/linux.git nfs-server-stable
(Quick git tutorial; you can get that with
apt-get install git-core
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
cd linux-2.6
git remote add bfields git://linux-nfs.org/~bfields/linux.git
git fetch bfields
git checkout bfields/nfs-server-stable
)
But, hm, I'm confused by your description in:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/10
The double-lock you're identifying is from:
nlmsvc_lock() taking f_mutex, then calling
nlmsvc_create_block(), which calls
nlmsvc_lookup_host()
but nlmsvc_create_block() doesn't call nlmsvc_lookup_host(). Oh, I
see--it used to in v2.6.22; the commit below fixed it. (And note if you
apply that you should also apply
a6d85430424d44e946e0946bfaad607115510989 "NLM: Fix a memory leak in
nlmsvc_testlock"
> I'd LOVE to get my name into the Linux Kernel. Please tell me I've
> found a real bug!
Alas! So the secret is to test the most recent kernels, as those have
the most unfound bugs....
--b.
commit 255129d1e9ca0ed3d69d5517fae3e03d7ab4b806
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue Sep 25 15:55:03 2007 -0400
NLM: Fix a circular lock dependency in lockd
The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.
Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a21e4bc..d098c7a 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -171,19 +171,14 @@ found:
* GRANTED_RES message by cookie, without having to rely on the client's IP
* address. --okir
*/
-static inline struct nlm_block *
-nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_file *file,
- struct nlm_lock *lock, struct nlm_cookie *cookie)
+static struct nlm_block *
+nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_host *host,
+ struct nlm_file *file, struct nlm_lock *lock,
+ struct nlm_cookie *cookie)
{
struct nlm_block *block;
- struct nlm_host *host;
struct nlm_rqst *call = NULL;
- /* Create host handle for callback */
- host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
- if (host == NULL)
- return NULL;
-
call = nlm_alloc_call(host);
if (call == NULL)
return NULL;
@@ -366,6 +361,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
struct nlm_lock *lock, int wait, struct nlm_cookie *cookie)
{
struct nlm_block *block = NULL;
+ struct nlm_host *host;
int error;
__be32 ret;
@@ -377,6 +373,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
(long long)lock->fl.fl_end,
wait);
+ /* Create host handle for callback */
+ host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+ if (host == NULL)
+ return nlm_lck_denied_nolocks;
/* Lock file against concurrent access */
mutex_lock(&file->f_mutex);
@@ -385,7 +385,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
*/
block = nlmsvc_lookup_block(file, lock);
if (block == NULL) {
- block = nlmsvc_create_block(rqstp, file, lock, cookie);
+ block = nlmsvc_create_block(rqstp, nlm_get_host(host), file,
+ lock, cookie);
ret = nlm_lck_denied_nolocks;
if (block == NULL)
goto out;
@@ -449,6 +450,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
out:
mutex_unlock(&file->f_mutex);
nlmsvc_release_block(block);
+ nlm_release_host(host);
dprintk("lockd: nlmsvc_lock returned %u\n", ret);
return ret;
}
@@ -477,10 +479,15 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
if (block == NULL) {
struct file_lock *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+ struct nlm_host *host;
if (conf == NULL)
return nlm_granted;
- block = nlmsvc_create_block(rqstp, file, lock, cookie);
+ /* Create host handle for callback */
+ host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+ if (host == NULL)
+ return nlm_lck_denied_nolocks;
+ block = nlmsvc_create_block(rqstp, host, file, lock, cookie);
if (block == NULL) {
kfree(conf);
return nlm_granted;
next prev parent reply other threads:[~2008-01-21 18:31 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <479427C5.8020203@spiderfan.org>
[not found] ` <18324.11227.8442.938224@notabene.brown>
[not found] ` <18324.11227.8442.938224-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-21 5:24 ` Mutex Hang on GC in lockd for 2.6.22-14 Jonathan Couper
[not found] ` <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 18:31 ` J. Bruce Fields [this message]
2008-01-21 20:49 ` Jonathan Couper
[not found] ` <47950544.2080205-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 21:03 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080121183121.GJ17468@fieldses.org \
--to=bfields@fieldses.org \
--cc=jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox