From: Benny Halevy <bhalevy@panasas.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Steve Dickson <SteveD@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org,
linux-kernel <linux-kernel@vger.kernel.org>,
nfs@lists.sourceforge.net,
"J. Bruce Fields" <bfields@fieldses.org>,
Trond Myklebust <trond.myklebust@fys.uio.no>
Subject: Re: [PATCH] NFS: Stop sillyname renames and unmounts from racing
Date: Tue, 06 Nov 2007 10:24:50 +0200 [thread overview]
Message-ID: <473024D2.6080105@panasas.com> (raw)
In-Reply-To: <20071105210636.2fc72e14.akpm@linux-foundation.org>
On Nov. 06, 2007, 7:06 +0200, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sat, 03 Nov 2007 07:09:25 -0400 Steve Dickson <SteveD@redhat.com> wrote:
>
>> The following patch stops NFS sillyname renames and umounts from racing.
>
> (appropriate cc's added)
>
>> I have a test script does the following:
>> 1) start nfs server
>> 2) mount loopback
>> 3) open file in background
>> 4) remove file
>> 5) stop nfs server
>> 6) kill -9 process which has file open
>> 7) restart nfs server
>> 8) umount looback mount.
>>
>> After umount I got the "VFS: Busy inodes after unmount" message
>> because the processing of the rename has not finished.
>>
>> Below is a patch that the uses the new silly_count mechanism to
>> synchronize sillyname processing and umounts. The patch introduces a
>> nfs_put_super() routine that waits until the nfsi->silly_count count
>> is zero.
>>
>> A side-effect of finding and waiting for all the inode to
>> find the sillyname processing, is I need to traverse
>> the sb->s_inodes list in the supper block. To do that
>> safely the inode_lock spin lock has to be held. So for
>> modules to be able to "see" that lock I needed to
>> EXPORT_SYMBOL_GPL() it.
>>
>> Any objections to exporting the inode_lock spin lock?
>> If so, how should modules _safely_ access the s_inode list?
>>
>> steved.
>>
>>
>> Author: Steve Dickson <steved@redhat.com>
>> Date: Wed Oct 31 12:19:26 2007 -0400
>>
>> Close a unlink/sillyname rename and umount race by added a
>> nfs_put_super routine that will run through all the inode
>> currently on the super block, waiting for those that are
>> in the middle of a sillyname rename or removal.
>>
>> This patch stop the infamous "VFS: Busy inodes after unmount... "
>> warning during umounts.
>>
>> Signed-off-by: Steve Dickson <steved@redhat.com>
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index ed35383..da9034a 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -81,6 +81,7 @@ static struct hlist_head *inode_hashtable __read_mostly;
>> * the i_state of an inode while it is in use..
>> */
>> DEFINE_SPINLOCK(inode_lock);
>> +EXPORT_SYMBOL_GPL(inode_lock);
>
> That's going to make hch unhappy.
>
> Your email client is performing space-stuffing.
> See http://mbligh.org/linuxdocs/Email/Clients/Thunderbird
>
>> static struct file_system_type nfs_fs_type = {
>> .owner = THIS_MODULE,
>> @@ -223,6 +225,7 @@ static const struct super_operations nfs_sops = {
>> .alloc_inode = nfs_alloc_inode,
>> .destroy_inode = nfs_destroy_inode,
>> .write_inode = nfs_write_inode,
>> + .put_super = nfs_put_super,
>> .statfs = nfs_statfs,
>> .clear_inode = nfs_clear_inode,
>> .umount_begin = nfs_umount_begin,
>> @@ -1767,6 +1770,30 @@ static void nfs4_kill_super(struct super_block *sb)
>> nfs_free_server(server);
>> }
>>
>> +void nfs_put_super(struct super_block *sb)
>
> This was (correctly) declared to be static. We should define it that way
> too (I didn't know you could do this, actually).
>
>> +{
>> + struct inode *inode;
>> + struct nfs_inode *nfsi;
>> + /*
>> + * Make sure there are no outstanding renames
>> + */
>> +relock:
>> + spin_lock(&inode_lock);
>> + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>> + nfsi = NFS_I(inode);
>> + if (atomic_read(&nfsi->silly_count) > 0) {
>> + /* Keep this inode around during the wait */
>> + atomic_inc(&inode->i_count);
>> + spin_unlock(&inode_lock);
>> + wait_event(nfsi->waitqueue,
>> + atomic_read(&nfsi->silly_count) == 1);
>> + iput(inode);
>> + goto relock;
>> + }
>> + }
>> + spin_unlock(&inode_lock);
>> +}
>
> That's an O(n^2) search. If it is at all possible to hit a catastrophic
> slowdown in here, you can bet that someone out there will indeed hit it in
> real life.
>
> I'm too lazy to look, but we might need to check things like I_FREEING
> and I_CLEAR before taking a ref on this inode.
It'd be very nice if the silly renamed inodes (with silly_count > 1) were moved
to a different list in the first pass, under the inode_lock, and then waited on
until silly_count <= 1 in a second pass only on the filtered list. This will
provide you with O(1).
next prev parent reply other threads:[~2007-11-06 8:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-03 11:09 [PATCH] NFS: Stop sillyname renames and unmounts from racing Steve Dickson
2007-11-06 5:06 ` Andrew Morton
2007-11-06 5:15 ` Alexander Viro
2007-11-08 9:05 ` Steve Dickson
2007-11-06 8:24 ` Benny Halevy [this message]
2007-11-06 8:50 ` Alexander Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=473024D2.6080105@panasas.com \
--to=bhalevy@panasas.com \
--cc=SteveD@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aviro@redhat.com \
--cc=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nfs@lists.sourceforge.net \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox