From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Cc: Sebastian Hetze <s.hetze@linux-ag.de>
Subject: deadlock in lockd
Date: Thu, 10 Mar 2005 21:44:16 +0100 [thread overview]
Message-ID: <20050310204416.GC3424@suse.de> (raw)
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
Hi all,
I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.
Here's the backtrace courtesy of sysrq:
lockd D 00000000 3648 791 1 792 795 783 (L-TLB)
Call Trace: [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
[.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
[nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
[posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
[svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)
What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.
We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...
The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.
Comments?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]
Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c 2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c 2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int nrhosts;
static DECLARE_MUTEX(nlm_host_sema);
-static void nlm_gc_hosts(void);
+static void __nlm_gc_hosts(void);
/*
* Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
/* Lock hash table */
down(&nlm_host_sema);
- if (time_after_eq(jiffies, next_gc))
- nlm_gc_hosts();
-
for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
if (proto && host->h_proto != proto)
continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
}
/* Then, perform a garbage collection pass */
- nlm_gc_hosts();
+ __nlm_gc_hosts();
up(&nlm_host_sema);
/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
* mark & sweep for resources held by remote clients.
*/
static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
{
struct nlm_host **q, *host;
struct rpc_clnt *clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
next_gc = jiffies + NLM_HOST_COLLECT;
}
+void
+nlm_gc_hosts()
+{
+ down(&nlm_host_sema);
+ if (time_after_eq(jiffies, next_gc))
+ __nlm_gc_hosts();
+ up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c 2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c 2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
break;
}
+ /* Perform hosts cache garbage collection */
+ nlm_gc_hosts();
+
dprintk("lockd: request from %08x\n",
(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h 2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h 2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void nlm_rebind_host(struct nlm_host
struct nlm_host * nlm_get_host(struct nlm_host *);
void nlm_release_host(struct nlm_host *);
void nlm_shutdown_hosts(void);
+void nlm_gc_hosts(void);
/*
* Server-side lock handling
next reply other threads:[~2005-03-10 20:44 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-10 20:44 Olaf Kirch [this message]
2005-03-10 21:49 ` deadlock in lockd Daniel Forrest
2005-03-10 22:42 ` Olaf Kirch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050310204416.GC3424@suse.de \
--to=okir@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=s.hetze@linux-ag.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox