From: Olaf Kirch <okir@suse.de>
To: nfs@lists.sourceforge.net
Cc: Sebastian Hetze <s.hetze@linux-ag.de>
Subject: deadlock in lockd
Date: Thu, 10 Mar 2005 21:44:16 +0100 [thread overview]
Message-ID: <20050310204416.GC3424@suse.de> (raw)
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
Hi all,
I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.
Here's the backtrace courtesy of sysrq:
lockd D 00000000 3648 791 1 792 795 783 (L-TLB)
Call Trace: [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
[.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
[nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
[posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
[svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)
What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.
We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...
The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.
Comments?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]
Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c 2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c 2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int nrhosts;
static DECLARE_MUTEX(nlm_host_sema);
-static void nlm_gc_hosts(void);
+static void __nlm_gc_hosts(void);
/*
* Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
/* Lock hash table */
down(&nlm_host_sema);
- if (time_after_eq(jiffies, next_gc))
- nlm_gc_hosts();
-
for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
if (proto && host->h_proto != proto)
continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
}
/* Then, perform a garbage collection pass */
- nlm_gc_hosts();
+ __nlm_gc_hosts();
up(&nlm_host_sema);
/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
* mark & sweep for resources held by remote clients.
*/
static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
{
struct nlm_host **q, *host;
struct rpc_clnt *clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
next_gc = jiffies + NLM_HOST_COLLECT;
}
+void
+nlm_gc_hosts()
+{
+ down(&nlm_host_sema);
+ if (time_after_eq(jiffies, next_gc))
+ __nlm_gc_hosts();
+ up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c 2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c 2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
break;
}
+ /* Perform hosts cache garbage collection */
+ nlm_gc_hosts();
+
dprintk("lockd: request from %08x\n",
(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h 2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h 2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void nlm_rebind_host(struct nlm_host
struct nlm_host * nlm_get_host(struct nlm_host *);
void nlm_release_host(struct nlm_host *);
void nlm_shutdown_hosts(void);
+void nlm_gc_hosts(void);
/*
* Server-side lock handling
next reply other threads:[~2005-03-10 20:44 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-10 20:44 Olaf Kirch [this message]
2005-03-10 21:49 ` deadlock in lockd Daniel Forrest
2005-03-10 22:42 ` Olaf Kirch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050310204416.GC3424@suse.de \
--to=okir@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=s.hetze@linux-ag.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.