Linux NFS development
 help / color / mirror / Atom feed
From: Wendy Cheng <wcheng@redhat.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Linux NFS Mailing List <nfs@lists.sourceforge.net>
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock
Date: Wed, 09 Aug 2006 18:07:59 -0400	[thread overview]
Message-ID: <44DA5CBF.3040309@redhat.com> (raw)
In-Reply-To: <1155159937.15624.22.camel@localhost>

Trond Myklebust wrote:

>On Wed, 2006-08-09 at 14:13 -0400, Wendy Cheng wrote:
>  
>
>>I was testing NLM failover patches this morning and found the command 
>>hangs. Look like nlm_traverse_files(), where it grabs nlm_file_mutex 
>>early in the call, will have a chance to call nlm_release_file() via 
>>nlmsvc_free_block() inside kref_put(). The nlm_release_file() wants 
>>nlm_file_mutex too - this would generate a deadlock as the following:
>>
>>dhcp59-234 kernel: Call Trace:
>>[<c02dd749>] __mutex_lock_slowpath+0x4c/0x7e
>>[<c02dd78a>] .text.lock.mutex+0xf/0x14
>>[<f8afeacd>] nlm_release_file+0x2b/0xdf [lockd]
>>[<f8afda90>] nlmsvc_free_block+0x8c/0x9d [lockd]
>>[<f8afda04>] nlmsvc_free_block+0x0/0x9d [lockd]
>>[<c01be98d>] kref_put+0x4e/0x58
>>[<f8afd175>] nlmsvc_traverse_blocks+0xaf/0xc6 [lockd]
>>[<f8afe960>] nlm_traverse_files+0x108/0x1cd [lockd]
>>
>>The attached patch seems to fix the issue - it skips (defers) the file 
>>removal. Eventually either nlm_gc_hosts (some time later when client is 
>>unmonitored) or nlmsvc_traverse_files will finish the clean up.  Note 
>>that this is a 10-minutes  work - not sure its ramification at this 
>>moment. Take a look ?
>>
>>-- Wendy
>>
>>plain text document attachment (gfs_nlm_deadlock.patch)
>>--- linux-2/fs/lockd/svclock.c	2006-08-08 10:20:16.000000000 -0400
>>+++ linux/fs/lockd/svclock.c	2006-08-09 10:28:35.000000000 -0400
>>@@ -264,7 +264,9 @@ static void nlmsvc_free_block(struct kre
>> 
>> 	nlmsvc_freegrantargs(block->b_call);
>> 	nlm_release_call(block->b_call);
>>-	nlm_release_file(block->b_file);
>>+	down(&file->f_sema);
>>+	file->f_count--;
>>+	up(&file->f_sema);
>> 	kfree(block);
>> }
>>    
>>
>
>Vetoed. The block holds a reference to the file. It _must_ call
>nlm_release_file() in order to release that reference. It is in any case
>a bug to grab file->f_sema without holding a reference to the file.
>
>I suspect, rather, that the problem is due to nlmsvc_create_block()
>incrementing file->f_count without holding the nlm_file_mutex. If we
>convert it to an atomic_t instead, then that problem should be solved.
>  
>

Disagree ! :)

The whole thing is about deadlock, not about reference count. Look at 
the logic ... nlm_traverse_files grabs the nlm_file_mutex, then comes 
down to nlm_release_file where it tries to get nlm_file_mutex lock 
again. I'll need to run now - we can discuss this tomorrow morning. 
Please re-read the issue and we'll discuss later.

-- Wendy

>aside: Note also that we want to get rid of all that mark and sweep
>braindamage in nlm_traverse_*() with all the silly counting of f_lock,
>f_blocks, f_shares,.... and replace those variables with proper
>references to the struct nlm_file by the locks, blocks (is already the
>case?), and shares.
>
>Cheers,
>  Trond
>
>  
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2006-08-09 22:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-09 18:13 [PATCH] fix recursive nlm_file_mutex deadlock Wendy Cheng
2006-08-09 18:32 ` Wendy Cheng
2006-08-09 21:45 ` Trond Myklebust
2006-08-09 22:07   ` Wendy Cheng [this message]
2006-08-09 22:41     ` Trond Myklebust
2006-08-09 23:57     ` Trond Myklebust
2006-08-10 15:24       ` Wendy Cheng
2006-08-10 15:40         ` Trond Myklebust
2006-08-10 16:05           ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44DA5CBF.3040309@redhat.com \
    --to=wcheng@redhat.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox