All of lore.kernel.org
 help / color / mirror / Atom feed
* deadlock in lockd
@ 2005-03-10 20:44 Olaf Kirch
  2005-03-10 21:49 ` Daniel Forrest
  0 siblings, 1 reply; 3+ messages in thread
From: Olaf Kirch @ 2005-03-10 20:44 UTC (permalink / raw)
  To: nfs; +Cc: Sebastian Hetze

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

Hi all,

I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.

Here's the backtrace courtesy of sysrq:

 lockd         D 00000000  3648   791      1   792     795   783 (L-TLB)
 Call Trace:         [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
   [.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
   [nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
   [posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
   [svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)

What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.

We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...

The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.

Comments?

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]

Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c	2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c	2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int			nrhosts;
 static DECLARE_MUTEX(nlm_host_sema);
 
 
-static void			nlm_gc_hosts(void);
+static void			__nlm_gc_hosts(void);
 
 /*
  * Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
 	/* Lock hash table */
 	down(&nlm_host_sema);
 
-	if (time_after_eq(jiffies, next_gc))
-		nlm_gc_hosts();
-
 	for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
 		if (proto && host->h_proto != proto)
 			continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
 	}
 
 	/* Then, perform a garbage collection pass */
-	nlm_gc_hosts();
+	__nlm_gc_hosts();
 	up(&nlm_host_sema);
 
 	/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
  * mark & sweep for resources held by remote clients.
  */
 static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
 {
 	struct nlm_host	**q, *host;
 	struct rpc_clnt	*clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
 	next_gc = jiffies + NLM_HOST_COLLECT;
 }
 
+void
+nlm_gc_hosts()
+{
+	down(&nlm_host_sema);
+	if (time_after_eq(jiffies, next_gc))
+		__nlm_gc_hosts();
+	up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c	2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c	2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
 			break;
 		}
 
+		/* Perform hosts cache garbage collection */
+		nlm_gc_hosts();
+
 		dprintk("lockd: request from %08x\n",
 			(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
 
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h	2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h	2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void		  nlm_rebind_host(struct nlm_host 
 struct nlm_host * nlm_get_host(struct nlm_host *);
 void		  nlm_release_host(struct nlm_host *);
 void		  nlm_shutdown_hosts(void);
+void		  nlm_gc_hosts(void);
 
 /*
  * Server-side lock handling

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-03-10 22:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-10 20:44 deadlock in lockd Olaf Kirch
2005-03-10 21:49 ` Daniel Forrest
2005-03-10 22:42   ` Olaf Kirch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.