Linux NFS development
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Cc: Sebastian Hetze <s.hetze@linux-ag.de>
Subject: deadlock in lockd
Date: Thu, 10 Mar 2005 21:44:16 +0100	[thread overview]
Message-ID: <20050310204416.GC3424@suse.de> (raw)

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

Hi all,

I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.

Here's the backtrace courtesy of sysrq:

 lockd         D 00000000  3648   791      1   792     795   783 (L-TLB)
 Call Trace:         [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
   [.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
   [nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
   [posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
   [svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)

What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.

We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...

The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.

Comments?

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]

Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c	2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c	2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int			nrhosts;
 static DECLARE_MUTEX(nlm_host_sema);
 
 
-static void			nlm_gc_hosts(void);
+static void			__nlm_gc_hosts(void);
 
 /*
  * Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
 	/* Lock hash table */
 	down(&nlm_host_sema);
 
-	if (time_after_eq(jiffies, next_gc))
-		nlm_gc_hosts();
-
 	for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
 		if (proto && host->h_proto != proto)
 			continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
 	}
 
 	/* Then, perform a garbage collection pass */
-	nlm_gc_hosts();
+	__nlm_gc_hosts();
 	up(&nlm_host_sema);
 
 	/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
  * mark & sweep for resources held by remote clients.
  */
 static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
 {
 	struct nlm_host	**q, *host;
 	struct rpc_clnt	*clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
 	next_gc = jiffies + NLM_HOST_COLLECT;
 }
 
+void
+nlm_gc_hosts()
+{
+	down(&nlm_host_sema);
+	if (time_after_eq(jiffies, next_gc))
+		__nlm_gc_hosts();
+	up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c	2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c	2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
 			break;
 		}
 
+		/* Perform hosts cache garbage collection */
+		nlm_gc_hosts();
+
 		dprintk("lockd: request from %08x\n",
 			(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
 
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h	2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h	2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void		  nlm_rebind_host(struct nlm_host 
 struct nlm_host * nlm_get_host(struct nlm_host *);
 void		  nlm_release_host(struct nlm_host *);
 void		  nlm_shutdown_hosts(void);
+void		  nlm_gc_hosts(void);
 
 /*
  * Server-side lock handling

             reply	other threads:[~2005-03-10 20:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-10 20:44 Olaf Kirch [this message]
2005-03-10 21:49 ` deadlock in lockd Daniel Forrest
2005-03-10 22:42   ` Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050310204416.GC3424@suse.de \
    --to=okir@suse.de \
    --cc=nfs@lists.sourceforge.net \
    --cc=s.hetze@linux-ag.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox