All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Cc: Sebastian Hetze <s.hetze@linux-ag.de>
Subject: deadlock in lockd
Date: Thu, 10 Mar 2005 21:44:16 +0100	[thread overview]
Message-ID: <20050310204416.GC3424@suse.de> (raw)

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

Hi all,

I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.

Here's the backtrace courtesy of sysrq:

 lockd         D 00000000  3648   791      1   792     795   783 (L-TLB)
 Call Trace:         [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
   [.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
   [nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
   [posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
   [svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)

What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.

We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...

The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.

Comments?

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]

Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c	2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c	2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int			nrhosts;
 static DECLARE_MUTEX(nlm_host_sema);
 
 
-static void			nlm_gc_hosts(void);
+static void			__nlm_gc_hosts(void);
 
 /*
  * Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
 	/* Lock hash table */
 	down(&nlm_host_sema);
 
-	if (time_after_eq(jiffies, next_gc))
-		nlm_gc_hosts();
-
 	for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
 		if (proto && host->h_proto != proto)
 			continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
 	}
 
 	/* Then, perform a garbage collection pass */
-	nlm_gc_hosts();
+	__nlm_gc_hosts();
 	up(&nlm_host_sema);
 
 	/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
  * mark & sweep for resources held by remote clients.
  */
 static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
 {
 	struct nlm_host	**q, *host;
 	struct rpc_clnt	*clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
 	next_gc = jiffies + NLM_HOST_COLLECT;
 }
 
+void
+nlm_gc_hosts()
+{
+	down(&nlm_host_sema);
+	if (time_after_eq(jiffies, next_gc))
+		__nlm_gc_hosts();
+	up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c	2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c	2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
 			break;
 		}
 
+		/* Perform hosts cache garbage collection */
+		nlm_gc_hosts();
+
 		dprintk("lockd: request from %08x\n",
 			(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
 
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h	2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h	2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void		  nlm_rebind_host(struct nlm_host 
 struct nlm_host * nlm_get_host(struct nlm_host *);
 void		  nlm_release_host(struct nlm_host *);
 void		  nlm_shutdown_hosts(void);
+void		  nlm_gc_hosts(void);
 
 /*
  * Server-side lock handling

             reply	other threads:[~2005-03-10 20:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-10 20:44 Olaf Kirch [this message]
2005-03-10 21:49 ` deadlock in lockd Daniel Forrest
2005-03-10 22:42   ` Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050310204416.GC3424@suse.de \
    --to=okir@suse.de \
    --cc=nfs@lists.sourceforge.net \
    --cc=s.hetze@linux-ag.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.