All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jonathan Couper <jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Mutex Hang on GC in lockd for 2.6.22-14
Date: Mon, 21 Jan 2008 13:31:21 -0500	[thread overview]
Message-ID: <20080121183121.GJ17468@fieldses.org> (raw)
In-Reply-To: <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>

On Mon, Jan 21, 2008 at 06:24:52PM +1300, Jonathan Couper wrote:
> Heya NFS Experts,
>
> 	I'm pretty darn sure I've found (and am currently testing a fix for) a  
> mutex bug in lockd that hangs NFS server pretty reliably.  I have no  
> idea how this bug has gone unnoticed for so long, since this is the
> current Gutsy Gibbon kernel!
>
> 	That makes me think that maybe I'm mistaken.  But my system definitely  
> hangs, and the code definitely appears to attempt to lock twice on a  
> non-recursive mutex.
>
> 	All the details are:
>
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996	

Hm; from:

	https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/2

	"...and on the server side I will now see TWO "[lockd]"
	processes where before I saw one."

That at least should be prevented by Jeff Layton's recent patches, which
you can get from

	git://linux-nfs.org/~bfields/linux.git nfs-server-stable

(Quick git tutorial; you can get that with

	apt-get install git-core
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
	cd linux-2.6
	git remote add bfields git://linux-nfs.org/~bfields/linux.git
	git fetch bfields
	git checkout bfields/nfs-server-stable

)

But, hm, I'm confused by your description in:

	https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996/comments/10

The double-lock you're identifying is from:

	nlmsvc_lock() taking f_mutex, then calling
		nlmsvc_create_block(), which calls
			nlmsvc_lookup_host()

but nlmsvc_create_block() doesn't call nlmsvc_lookup_host().  Oh, I
see--it used to in v2.6.22; the commit below fixed it.  (And note if you
apply that you should also apply
a6d85430424d44e946e0946bfaad607115510989 "NLM: Fix a memory leak in
nlmsvc_testlock"

> 	I'd LOVE to get my name into the Linux Kernel.  Please tell me I've  
> found a real bug!

Alas!  So the secret is to test the most recent kernels, as those have
the most unfound bugs....

--b.

commit 255129d1e9ca0ed3d69d5517fae3e03d7ab4b806
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Tue Sep 25 15:55:03 2007 -0400

    NLM: Fix a circular lock dependency in lockd
    
    The problem is that the garbage collector for the 'host' structures
    nlm_gc_hosts(), holds nlm_host_mutex while calling down to
    nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
    
    We cannot therefore call nlmsvc_lookup_host() from within
    nlmsvc_create_block, since the caller will already hold file->f_mutex, so
    the attempt to grab nlm_host_mutex may deadlock.
    
    Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a21e4bc..d098c7a 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -171,19 +171,14 @@ found:
  * GRANTED_RES message by cookie, without having to rely on the client's IP
  * address. --okir
  */
-static inline struct nlm_block *
-nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_file *file,
-		struct nlm_lock *lock, struct nlm_cookie *cookie)
+static struct nlm_block *
+nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_host *host,
+		    struct nlm_file *file, struct nlm_lock *lock,
+		    struct nlm_cookie *cookie)
 {
 	struct nlm_block	*block;
-	struct nlm_host		*host;
 	struct nlm_rqst		*call = NULL;
 
-	/* Create host handle for callback */
-	host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
-	if (host == NULL)
-		return NULL;
-
 	call = nlm_alloc_call(host);
 	if (call == NULL)
 		return NULL;
@@ -366,6 +361,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 			struct nlm_lock *lock, int wait, struct nlm_cookie *cookie)
 {
 	struct nlm_block	*block = NULL;
+	struct nlm_host		*host;
 	int			error;
 	__be32			ret;
 
@@ -377,6 +373,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 				(long long)lock->fl.fl_end,
 				wait);
 
+	/* Create host handle for callback */
+	host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+	if (host == NULL)
+		return nlm_lck_denied_nolocks;
 
 	/* Lock file against concurrent access */
 	mutex_lock(&file->f_mutex);
@@ -385,7 +385,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 	 */
 	block = nlmsvc_lookup_block(file, lock);
 	if (block == NULL) {
-		block = nlmsvc_create_block(rqstp, file, lock, cookie);
+		block = nlmsvc_create_block(rqstp, nlm_get_host(host), file,
+				lock, cookie);
 		ret = nlm_lck_denied_nolocks;
 		if (block == NULL)
 			goto out;
@@ -449,6 +450,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 out:
 	mutex_unlock(&file->f_mutex);
 	nlmsvc_release_block(block);
+	nlm_release_host(host);
 	dprintk("lockd: nlmsvc_lock returned %u\n", ret);
 	return ret;
 }
@@ -477,10 +479,15 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
 
 	if (block == NULL) {
 		struct file_lock *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+		struct nlm_host	*host;
 
 		if (conf == NULL)
 			return nlm_granted;
-		block = nlmsvc_create_block(rqstp, file, lock, cookie);
+		/* Create host handle for callback */
+		host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len);
+		if (host == NULL)
+			return nlm_lck_denied_nolocks;
+		block = nlmsvc_create_block(rqstp, host, file, lock, cookie);
 		if (block == NULL) {
 			kfree(conf);
 			return nlm_granted;

  parent reply	other threads:[~2008-01-21 18:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <479427C5.8020203@spiderfan.org>
     [not found] ` <18324.11227.8442.938224@notabene.brown>
     [not found]   ` <18324.11227.8442.938224-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-21  5:24     ` Mutex Hang on GC in lockd for 2.6.22-14 Jonathan Couper
     [not found]       ` <47942CA4.1060700-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 18:31         ` J. Bruce Fields [this message]
2008-01-21 20:49           ` Jonathan Couper
     [not found]             ` <47950544.2080205-3c88B9b11vOEi8DpZVb4nw@public.gmane.org>
2008-01-21 21:03               ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080121183121.GJ17468@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=jonathan-3c88B9b11vOEi8DpZVb4nw@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.