* deadlock in lockd
@ 2005-03-10 20:44 Olaf Kirch
2005-03-10 21:49 ` Daniel Forrest
0 siblings, 1 reply; 3+ messages in thread
From: Olaf Kirch @ 2005-03-10 20:44 UTC (permalink / raw)
To: nfs; +Cc: Sebastian Hetze
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
Hi all,
I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
I think the same problem exists in 2.6.
Here's the backtrace courtesy of sysrq:
lockd D 00000000 3648 791 1 792 795 783 (L-TLB)
Call Trace: [do_schedule+338/608] (36) [__down+131/224] (52) [__down_failed+8/12] (16)
[.text.lock.svclock+5/150] (04) [vsnprintf+519/1120] (24) [nlm_traverse_files+324/368] (32) [nlmsvc_mark_resources+32/64] (12)
[nlm_gc_hosts+69/384] (28) [nlm_lookup_host+139/800] (48) [nlmsvc_lookup_host+48/64] (20) [nlmsvc_create_block+145/352] (16)
[posix_test_lock+132/160] (20) [nlmsvc_lock+222/784] (28) [nlm4svc_retrieve_args+199/288] (40) [nlm4svc_proc_lock+172/256] (44)
[svc_process+827/1392] (56) [lockd+426/720] (40) [arch_kernel_thread+46/64] (08) [lockd+0/720] (04)
What happens is that nlmsvc_lock takes the f_sema on the file the client wishes to lock,
then calls posix_test_lock, which finds there's a blocking lock. So it calls
nlmsvc_create_block, which calls nlmsvc_lookup_host - and the host code decides
to do a garbage collection pass.
We call nlm_traverse_files, that hits the file we're just trying to lock, and
invokes nlmsvc_traverse_blocks, which will try to down f_sema once more.
And there it hangs...
The attched (untested) patch changes the way we do garbage collection
passes, instead of doing it inside nlm_lookup_host, nlm_gc_hosts is
called from the top-level service loop in lockd now, where we don'T hold any
locks. It looks saner anyway.
Comments?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: lockd-gc-deadlock --]
[-- Type: text/plain, Size: 2428 bytes --]
Index: linux-2.4.21/fs/lockd/host.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/host.c 2005-02-10 10:58:25.000000000 +0100
+++ linux-2.4.21/fs/lockd/host.c 2005-03-10 20:44:43.000000000 +0100
@@ -34,7 +34,7 @@ static int nrhosts;
static DECLARE_MUTEX(nlm_host_sema);
-static void nlm_gc_hosts(void);
+static void __nlm_gc_hosts(void);
/*
* Find an NLM server handle in the cache. If there is none, create it.
@@ -94,9 +94,6 @@ nlm_lookup_host(struct svc_client *clnt,
/* Lock hash table */
down(&nlm_host_sema);
- if (time_after_eq(jiffies, next_gc))
- nlm_gc_hosts();
-
for (hp = &nlm_hosts[hash]; (host = *hp); hp = &host->h_next) {
if (proto && host->h_proto != proto)
continue;
@@ -273,7 +270,7 @@ nlm_shutdown_hosts(void)
}
/* Then, perform a garbage collection pass */
- nlm_gc_hosts();
+ __nlm_gc_hosts();
up(&nlm_host_sema);
/* complain if any hosts are left */
@@ -296,7 +293,7 @@ nlm_shutdown_hosts(void)
* mark & sweep for resources held by remote clients.
*/
static void
-nlm_gc_hosts(void)
+__nlm_gc_hosts(void)
{
struct nlm_host **q, *host;
struct rpc_clnt *clnt;
@@ -341,3 +338,12 @@ nlm_gc_hosts(void)
next_gc = jiffies + NLM_HOST_COLLECT;
}
+void
+nlm_gc_hosts()
+{
+ down(&nlm_host_sema);
+ if (time_after_eq(jiffies, next_gc))
+ __nlm_gc_hosts();
+ up(&nlm_host_sema);
+}
+
Index: linux-2.4.21/fs/lockd/svc.c
===================================================================
--- linux-2.4.21.orig/fs/lockd/svc.c 2005-02-10 10:57:58.000000000 +0100
+++ linux-2.4.21/fs/lockd/svc.c 2005-03-10 20:45:53.000000000 +0100
@@ -154,6 +154,9 @@ lockd(struct svc_rqst *rqstp)
break;
}
+ /* Perform hosts cache garbage collection */
+ nlm_gc_hosts();
+
dprintk("lockd: request from %08x\n",
(unsigned)ntohl(rqstp->rq_addr.sin_addr.s_addr));
Index: linux-2.4.21/include/linux/lockd/lockd.h
===================================================================
--- linux-2.4.21.orig/include/linux/lockd/lockd.h 2005-02-10 10:57:37.000000000 +0100
+++ linux-2.4.21/include/linux/lockd/lockd.h 2005-03-10 20:43:10.000000000 +0100
@@ -150,6 +150,7 @@ void nlm_rebind_host(struct nlm_host
struct nlm_host * nlm_get_host(struct nlm_host *);
void nlm_release_host(struct nlm_host *);
void nlm_shutdown_hosts(void);
+void nlm_gc_hosts(void);
/*
* Server-side lock handling
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: deadlock in lockd
2005-03-10 20:44 deadlock in lockd Olaf Kirch
@ 2005-03-10 21:49 ` Daniel Forrest
2005-03-10 22:42 ` Olaf Kirch
0 siblings, 1 reply; 3+ messages in thread
From: Daniel Forrest @ 2005-03-10 21:49 UTC (permalink / raw)
To: Olaf Kirch; +Cc: nfs, Sebastian Hetze
Olaf,
> Hi all,
>
> I just debugged a deadlock in lockd in SLES8 (i.e 2.4.21), but
> I think the same problem exists in 2.6.
I found this ages ago.
Neil Brown submitted a patch on 12 Mar 2003:
Subject: [NFS] [PATCH] kNFSd - 3 of 6 - Fix deadlock problem in lockd.
Which fixes this. Apparently this made it into 2.5, but not 2.4.
He submitted the same patch on 29 Oct 2003:
Subject: [NFS] [PATCH] Fix deadlock problem in lockd.
With the comment: "(This was fixed in 2.5 8 months ago)".
You may want to check if this ever made it into the 2.4 source.
Here is that e-mail:
---
nlmsvc_lock calls nlmsvc_create_block with file->f_sema
held.
nlmsvc_create_block calls nlmclnt_lookup_host which might
call nlm_gc_hosts which might, eventually, try to claim
file->f_sema for the same file -> deadlock.
nlmsvc_create_block does not need any protection under
any lock as lockd is single-threaded and _create_block
only plays with internal data structures.
So we release the f_sema before calling in, and make sure
it gets claimed again afterwards.
(This was fixed in 2.5 8 months ago)
diff ./fs/lockd/svclock.c~current~ ./fs/lockd/svclock.c
--- ./fs/lockd/svclock.c~current~ 2003-10-29 09:07:10.000000000 +1100
+++ ./fs/lockd/svclock.c 2003-10-29 09:07:10.000000000 +1100
@@ -317,8 +317,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru
(long long)lock->fl.fl_end,
wait);
- /* Lock file against concurrent access */
- down(&file->f_sema);
/* Get existing block (in case client is busy-waiting) */
block = nlmsvc_lookup_block(file, lock, 0);
@@ -326,6 +324,9 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru
lock->fl.fl_flags |= FL_LOCKD;
again:
+ /* Lock file against concurrent access */
+ down(&file->f_sema);
+
if (!(conflock = posix_test_lock(&file->f_file, &lock->fl))) {
error = posix_lock_file(&file->f_file, &lock->fl, 0);
@@ -358,7 +359,10 @@ again:
/* If we don't have a block, create and initialize it. Then
* retry because we may have slept in kmalloc. */
+ /* We have to release f_sema as nlmsvc_create_block may try to
+ * claim it while doing host garbage collection */
if (block == NULL) {
+ up(&file->f_sema);
dprintk("lockd: blocking on this lock (allocating).\n");
if (!(block = nlmsvc_create_block(rqstp, file, lock, cookie)))
return nlm_lck_denied_nolocks;
--
Daniel K. Forrest Laboratory for Molecular and
forrest@lmcg.wisc.edu Computational Genomics
University of Wisconsin, Madison
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: deadlock in lockd
2005-03-10 21:49 ` Daniel Forrest
@ 2005-03-10 22:42 ` Olaf Kirch
0 siblings, 0 replies; 3+ messages in thread
From: Olaf Kirch @ 2005-03-10 22:42 UTC (permalink / raw)
To: Daniel Forrest; +Cc: nfs, Sebastian Hetze
On Thu, Mar 10, 2005 at 03:49:44PM -0600, Daniel Forrest wrote:
> Subject: [NFS] [PATCH] kNFSd - 3 of 6 - Fix deadlock problem in lockd.
I see... I would have been surprised if I had been the first one
to run into this.
Still I think it's cleaner to move the nlm_gc_hosts calls out of
nlm_lookup_host.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-03-10 22:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-10 20:44 deadlock in lockd Olaf Kirch
2005-03-10 21:49 ` Daniel Forrest
2005-03-10 22:42 ` Olaf Kirch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.