Linux NFS development
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jonathan Couper <jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Mutex Hang on GC in lockd for 2.6.22-14
Date: Mon, 21 Jan 2008 13:31:21 -0500	[thread overview]
Message-ID: <20080121183121.GJ17468@fieldses.org> (raw)
In-Reply-To: <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>

On Mon, Jan 21, 2008 at 06:24:52PM +1300, Jonathan Couper wrote:
> Heya NFS Experts,
>
> 	I'm pretty darn sure I've found (and am currently testing a fix for) a  
> mutex bug in lockd that hangs NFS server pretty reliably.  I have no  
> idea how this bug has gone unnoticed for so long, since this is the
> current Gutsy Gibbon kernel!
>
> 	That makes me think that maybe I'm mistaken.  But my system definitely  
> hangs, and the code definitely appears to attempt to lock twice on a  
> non-recursive mutex.
>
> 	All the details are:
>
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996	

Hm; from:

	https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/2

	"...and on the server side I will now see TWO "[lockd]"
	processes where before I saw one."

That at least should be prevented by Jeff Layton's recent patches, which
you can get from

	git://linux-nfs.org/~bfields/linux.git nfs-server-stable

(Quick git tutorial; you can get that with

	apt-get install git-core
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
	cd linux-2.6
	git remote add bfields git://linux-nfs.org/~bfields/linux.git
	git fetch bfields
	git checkout bfields/nfs-server-stable

)

But, hm, I'm confused by your description in:

	https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/10

The double-lock you're identifying is from:

	nlmsvc_lock() taking f_mutex, then calling
		nlmsvc_create_block(), which calls
			nlmsvc_lookup_host()

but nlmsvc_create_block() doesn't call nlmsvc_lookup_host().  Oh, I
see--it used to in v2.6.22; the commit below fixed it.  (And note if you
apply that you should also apply
a6d85430424d44e946e0946bfaad607115510989 "NLM: Fix a memory leak in
nlmsvc_testlock"

> 	I'd LOVE to get my name into the Linux Kernel.  Please tell me I've  
> found a real bug!

Alas!  So the secret is to test the most recent kernels, as those have
the most unfound bugs....

--b.

commit 255129d1e9ca0ed3d69d5517fae3e03d7ab4b806
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Tue Sep 25 15:55:03 2007 -0400

    NLM: Fix a circular lock dependency in lockd
    
    The problem is that the garbage collector for the 'host' structures
    nlm_gc_hosts(), holds nlm_host_mutex while calling down to
    nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
    
    We cannot therefore call nlmsvc_lookup_host() from within
    nlmsvc_create_block, since the caller will already hold file->f_mutex, so
    the attempt to grab nlm_host_mutex may deadlock.
    
    Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a21e4bc..d098c7a 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -171,19 +171,14 @@ found:
  * GRANTED_RES message by cookie, without having to rely on the client's IP
  * address. --okir
  */
-static inline struct nlm_block *
-nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_file *file,
-		struct nlm_lock *lock, struct nlm_cookie *cookie)
+static struct nlm_block *
+nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_host *host,
+		    struct nlm_file *file, struct nlm_lock *lock,
+		    struct nlm_cookie *cookie)
 {
 	struct nlm_block	*block;
-	struct nlm_host		*host;
 	struct nlm_rqst		*call = NULL;
 
-	/* Create host handle for callback */
-	host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
-	if (host == NULL)
-		return NULL;
-
 	call = nlm_alloc_call(host);
 	if (call == NULL)
 		return NULL;
@@ -366,6 +361,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 			struct nlm_lock *lock, int wait, struct nlm_cookie *cookie)
 {
 	struct nlm_block	*block = NULL;
+	struct nlm_host		*host;
 	int			error;
 	__be32			ret;
 
@@ -377,6 +373,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 				(long long)lock->fl.fl_end,
 				wait);
 
+	/* Create host handle for callback */
+	host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+	if (host == NULL)
+		return nlm_lck_denied_nolocks;
 
 	/* Lock file against concurrent access */
 	mutex_lock(&file->f_mutex);
@@ -385,7 +385,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 	 */
 	block = nlmsvc_lookup_block(file, lock);
 	if (block == NULL) {
-		block = nlmsvc_create_block(rqstp, file, lock, cookie);
+		block = nlmsvc_create_block(rqstp, nlm_get_host(host), file,
+				lock, cookie);
 		ret = nlm_lck_denied_nolocks;
 		if (block == NULL)
 			goto out;
@@ -449,6 +450,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 out:
 	mutex_unlock(&file->f_mutex);
 	nlmsvc_release_block(block);
+	nlm_release_host(host);
 	dprintk("lockd: nlmsvc_lock returned %u\n", ret);
 	return ret;
 }
@@ -477,10 +479,15 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
 
 	if (block == NULL) {
 		struct file_lock *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+		struct nlm_host	*host;
 
 		if (conf == NULL)
 			return nlm_granted;
-		block = nlmsvc_create_block(rqstp, file, lock, cookie);
+		/* Create host handle for callback */
+		host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+		if (host == NULL)
+			return nlm_lck_denied_nolocks;
+		block = nlmsvc_create_block(rqstp, host, file, lock, cookie);
 		if (block == NULL) {
 			kfree(conf);
 			return nlm_granted;

  parent reply	other threads:[~2008-01-21 18:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <479427C5.8020203@spiderfan.org>
     [not found] ` <18324.11227.8442.938224@notabene.brown>
     [not found]   ` <18324.11227.8442.938224-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-21  5:24     ` Mutex Hang on GC in lockd for 2.6.22-14 Jonathan Couper
     [not found]       ` <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 18:31         ` J. Bruce Fields [this message]
2008-01-21 20:49           ` Jonathan Couper
     [not found]             ` <47950544.2080205-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 21:03               ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080121183121.GJ17468@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox