From: Josh Hunt <johunt@akamai.com>
To: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"sandeen@redhat.com" <sandeen@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC][PATCH] ext2: Resolve i_nlink race in ext2_rename
Date: Thu, 24 Feb 2011 12:18:32 -0800 [thread overview]
Message-ID: <4D66BD18.2030207@akamai.com> (raw)
In-Reply-To: <20110224112010.GB23042@quack.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 4242 bytes --]
On 02/24/2011 03:20 AM, Jan Kara wrote:
> On Thu 24-02-11 06:37:49, Al Viro wrote:
>> On Wed, Feb 23, 2011 at 10:21:41PM -0800, Josh Hunt wrote:
>>> [resending: left Jan off the original mail by accident]
>>>
>>> We have a multi-threaded workload which is currently "losing" files in the form
>>> of unattached inodes. The workload is link, rename, unlink intensive. This is
>>> happening on an ext2 filesystem and have reproduced the issue in kernel
>>> 2.6.37. Here's a sample strace:
>>>
>>> open("/a/tmp/tmpfile.1296184058", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 9
>>> link("/a/tmp/tmpfile.1296184058", "/a/tmp/tmpfile.28117.1296184059") = 0
>>> rename("/a/tmp/tmpfile.28117.1296184059", "/a/tmp/tmpfile") = 0
>>> stat64("/a/tmp/tmpfile", {st_mode=S_IFREG|0644, st_size=24248267, ...}) = 0
>>> link("/a/tmp/tmpfile", "/a/tmp/submit/tmpfile") = 0
>>> open("/a/tmp/tmpfile.1296184058", O_RDONLY) = 13
>>> open("/a/tmp/submit/tmpfile.send.q9SNoL", O_RDWR|O_CREAT|O_EXCL, 0600) = 824
>>> rename("/a/tmp/submit/tmpfile", "/a/tmp/submit/tmpfile.send.q9SNoL") = 0
>>> unlink("/a/tmp/tmpfile.1296184058") = 0
>>> open("/a/tmp/submit/tmpfile.send.q9SNoL", O_RDONLY) = 827
>>> open("/a/tmp/submit/tmpfile.send.q9SNoL", O_RDONLY) = 828
>>> open("/a/tmp/submit/tmpfile.send.q9SNoL", O_RDONLY) = 829
>>> unlink("/a/tmp/submit/tmpfile.send.q9SNoL") = 0
>>>
>>> The application behavior shown above repeats indefinitely with most filenames
>>> changing during each iteration except for 'tmpfile'. Looking into this issue I
>>> see that vfs_rename_other() only takes i_mutex for the new inode and the new
>>> inode's directory as well as the old directory's mutex. This works for
>>> modifying the dir entry and appears to be fine for most filesystems, but
>>> ext2 and a few others (exofs, minix, nilfs2, omfs, sysv, ufs) modify i_nlink
>>> inside of their respective rename functions without grabbing the i_mutex. The
>>> modifications are done through calls to inode_inc_link_count(old_inode) and
>>> inode_dec_link_count(old_inode), etc.
>>>
>>> Taking the mutex for the old inode appears to resolve the issue of the
>>> lost files/unattached inodes that I am seeing with this workload. I've attached
>>> a patch below doing what I've described above. If this is an accepted solution
>>> I believe other filesystems may also be affected by this and I could provide
>>> a patch for those as well.
>>
>> I don't know... The thing is, we mostly do that to make life easier for
>> fsck in case of crash. Other than that, there's no reason to play with
>> link count of that sucker at all. The question is, do we really want
>> such rename() interrupted by dirty shutdown to result in what looks like two
>> legitimate links to that inode without any indications of what had happened?
>> Note that fsck (at least on ext2) will correct link counts anyway and if
>> nothing else, we probably want some noise pointing to the inode in question...
> Yeah, I agree that playing with the link count is not worth it. It is
> even more disputable because it would have some reasonable effect only if
> we happened to write out the moved inode after it is linked to the new
> directory and before it is unlinked from the old one. Moreover we'd need
> to writeout the new directory and not the old directory before crash
> happens. All this is highly unlikely and even if that happens, it is
> questionable whether the result is worth it. So I'll just do away with
> those games with link count...
> The patch is attached. Josh, can you test it as well? Thanks.
>
> Honza
Jan
I'm not seeing the problem with your patch as was expected since we're
not messing with i_nlink anymore. Al suggested marking the inode as
dirty where we were previously doing the old_inode dec. I believe this
is needed as well since we are updating it's ctime. I've attached a
version marking the inode dirty and it also fixes the comment making
reference to calling inode_dec_link_count().
I'm not completely clear on the historical reasons for messing with the
link count of old_inode in the first place. It was just to simulate the
linking and unlinking of the old_inode?
Thanks
Josh
[-- Attachment #2: 0001-ext2-Resolve-i_nlink-count-corruption-with-heavy-unl.patch --]
[-- Type: text/x-patch, Size: 2039 bytes --]
>From cb347219bc21622dee06491b689c711e452005ce Mon Sep 17 00:00:00 2001
From: Josh Hunt <johunt@akamai.com>
Date: Thu, 24 Feb 2011 11:56:43 -0800
Subject: [PATCH] ext2: Resolve i_nlink count corruption with heavy unlink and rename
old_inode's i_nlink count can become corrupt in ext2_rename() because we do not
grab it's i_mutex in vfs_rename_other(). Looking into this further it has been
determined there is no good reason for changing old_inode's link counts at all.
Instead we just mark the inode as dirty when we are done.
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Jan Kara <jack@suse.cz>
Signed-off-by: Josh Hunt <johunt@akamai.com>
---
fs/ext2/namei.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 2e1d834..adb9185 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -344,7 +344,6 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
new_de = ext2_find_entry (new_dir, &new_dentry->d_name, &new_page);
if (!new_de)
goto out_dir;
- inode_inc_link_count(old_inode);
ext2_set_link(new_dir, new_de, new_page, old_inode, 1);
new_inode->i_ctime = CURRENT_TIME_SEC;
if (dir_de)
@@ -356,12 +355,9 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
if (new_dir->i_nlink >= EXT2_LINK_MAX)
goto out_dir;
}
- inode_inc_link_count(old_inode);
err = ext2_add_link(new_dentry, old_inode);
- if (err) {
- inode_dec_link_count(old_inode);
+ if (err)
goto out_dir;
- }
if (dir_de)
inode_inc_link_count(new_dir);
}
@@ -369,12 +365,11 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
/*
* Like most other Unix systems, set the ctime for inodes on a
* rename.
- * inode_dec_link_count() will mark the inode dirty.
*/
old_inode->i_ctime = CURRENT_TIME_SEC;
+ mark_inode_dirty(old_inode);
ext2_delete_entry (old_de, old_page);
- inode_dec_link_count(old_inode);
if (dir_de) {
if (old_dir != new_dir)
--
1.7.0.4
next prev parent reply other threads:[~2011-02-24 20:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-24 6:21 [RFC][PATCH] ext2: Resolve i_nlink race in ext2_rename Josh Hunt
2011-02-24 6:37 ` Al Viro
2011-02-24 6:42 ` fat64 implementation Keshava Munegowda
2011-02-24 11:20 ` [RFC][PATCH] ext2: Resolve i_nlink race in ext2_rename Jan Kara
2011-02-24 20:18 ` Josh Hunt [this message]
2011-02-25 7:38 ` Marco Stornelli
2011-02-28 17:57 ` Jan Kara
2011-02-28 20:17 ` Josh Hunt
2011-02-28 20:56 ` Jan Kara
2011-03-01 12:15 ` Al Viro
2011-03-01 11:07 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2011-02-22 23:42 Josh Hunt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D66BD18.2030207@akamai.com \
--to=johunt@akamai.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).