From: "zhangyi (F)" <yi.zhang@huawei.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
<Valdis.Kletnieks@vt.edu>, <linux-ext4@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<adilger.kernel@dilger.ca>, Jan Kara <jack@suse.cz>,
<viro@zeniv.linux.org.uk>, <miaoxie@huawei.com>
Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy
Date: Mon, 16 Jan 2017 11:24:46 +0800 [thread overview]
Message-ID: <b891c283-c83b-83fd-91f6-db25529b3c4a@huawei.com> (raw)
In-Reply-To: <20170111153449.ourcta6jraxo4mzy@thunk.org>
on 2017/1/11 23:34, Theodore Ts'o wrote:
> On Wed, Jan 11, 2017 at 05:07:29PM +0800, zhangyi (F) wrote:
>>
>> (1) The file we want to unlink have many hard links, but only one dcache entry in memory.
>> (2) open this file, but it's inode->i_nlink read from disk was 1 (too low).
>> (3) some one call rename and drop it's i_nlink to zero.
>> (4) it's inode is still in use and do not destroy (not closed), at the same time,
>> some others open it's hard link and create a dcache entry.
>> (5) call rename again and it's i_nlink will still underflow and cause memory corruption.
>
> Do you have reproducers that make it easy to reproduce situations like
> this? (It shouldn't be hard to write, but if you have them already
> will save me some effort. :-)
>
I make a reproducer, we can do the following steps to reproduce this probrem easily:
1) mount a ext4 file system, and create 3 files and 1 hard link,
#mount /dev/sdax /mnt
#cd /mnt
#touch old_file1 old_file2 new_file
#ln new_file new_link1
2) umount the file system and use the debugfs to change new_file's
links_count value to 1, which is used to simulate the fs inconsistency,
#umount /mnt
#debugfs /dev/sdax -w
set_inode_field new_file links_count 1
3) mount the fs again, and then execute the following program (Note:
do not execute the ls cmd, it will create the second dcache entry),
#define RENAME_OLD_FILE_1 "old_file1"
#define RENAME_OLD_FILE_2 "old_file2"
#define RENAME_NEW_FILE "new_file"
#define NEW_FILE_LINK_1 "new_link1"
int main(int argc, char *argv[])
{
int fd = 0;
int err = 0;
fd = open(RENAME_NEW_FILE, O_RDONLY);
if (fd < 0) {
printf("open error:%d\n", errno);
return -1;
}
err = rename(RENAME_OLD_FILE_1, RENAME_NEW_FILE);
if (err < 0) {
printf("rename error:%d\n", errno);
close(fd);
return -1;
}
err = rename(RENAME_OLD_FILE_2, NEW_FILE_LINK_1);
if (err < 0) {
printf("rename error:%d\n", errno);
close(fd);
return -1;
}
close(fd);
return 0;
}
4) after this, the new_file's inode->i_nlink is underflowed and add to orphan list,
kernel dump like this:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1814 at fs/inode.c:282 drop_nlink+0x3e/0x50
...
Call Trace:
dump_stack+0x63/0x86
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
drop_nlink+0x3e/0x50
ext4_rename+0x532/0x8c0
ext4_rename2+0x1d/0x30
vfs_rename+0x728/0x940
? __lookup_hash+0x20/0xa0
SyS_rename+0x3ba/0x3e0
entry_SYSCALL_64_fastpath+0x1a/0xa9
...
---[ end trace b157dacbc891e6e8 ]---
5) then, we trigger mem shrink, this inode will be destroyed but it is still
on the orphan list,
#echo 3 > /proc/sys/vm/drop_caches
kernrl dump:
EXT4-fs (sdb1): Inode 16 (ffff98f4b3285c20): orphan list check failed!
...
ffff98f4b3285d30: fa87e800 ffff98f4 b3285e80 ffff98f4 .........^(.....
ffff98f4b3285d40: b20829d8 ffff98f4 00000010 00000000 .)..............
ffff98f4b3285d50: ffffffff 00000000 00000000 00000000 ................
...
Call Trace:
dump_stack+0x63/0x86
ext4_destroy_inode+0xa0/0xb0
destroy_inode+0x3b/0x60
evict+0x130/0x1c0
dispose_list+0x4d/0x70
prune_icache_sb+0x5a/0x80
super_cache_scan+0x14b/0x1a0
shrink_slab.part.40+0x1f5/0x420
shrink_slab+0x29/0x30
drop_slab_node+0x31/0x60
drop_slab+0x3f/0x70
drop_caches_sysctl_handler+0x71/0xc0
proc_sys_call_handler+0xea/0x110
proc_sys_write+0x14/0x20
__vfs_write+0x37/0x160
? selinux_file_permission+0xd7/0x110
? security_file_permission+0x3b/0xc0
vfs_write+0xb5/0x1a0
SyS_write+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa9
...
bash (1594): drop_caches: 3
6) Some time later, if we change the orphan list, it will cause memory corruption.
Thanks.
zhangyi
prev parent reply other threads:[~2017-01-16 3:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-26 12:34 [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy yi zhang
2016-12-26 18:32 ` Andreas Dilger
2016-12-31 22:59 ` Valdis.Kletnieks
2017-01-04 8:29 ` zhangyi (F)
2017-01-04 21:54 ` Darrick J. Wong
2017-01-04 22:00 ` Andreas Dilger
2017-01-04 23:35 ` Theodore Ts'o
2017-01-05 7:24 ` zhangyi (F)
2017-01-05 17:38 ` Darrick J. Wong
2017-01-11 9:07 ` zhangyi (F)
2017-01-11 15:34 ` Theodore Ts'o
2017-01-12 8:00 ` zhangyi (F)
2017-01-12 17:03 ` Theodore Ts'o
2017-01-13 3:42 ` Al Viro
2017-01-13 14:26 ` Theodore Ts'o
2017-01-16 3:24 ` zhangyi (F) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b891c283-c83b-83fd-91f6-db25529b3c4a@huawei.com \
--to=yi.zhang@huawei.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=adilger.kernel@dilger.ca \
--cc=darrick.wong@oracle.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miaoxie@huawei.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).