All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>, <andi@firstfloor.org>,
	Wuqixuan <wuqixuan@huawei.com>, Al Viro <viro@ZenIV.linux.org.uk>,
	<gregkh@linuxfoundation.org>
Subject: Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex
Date: Tue, 19 Feb 2013 19:48:08 +0800	[thread overview]
Message-ID: <51236678.3040509@huawei.com> (raw)
In-Reply-To: <20130219091931.GB21945@quack.suse.cz>

On 2013/2/19 17:19, Jan Kara wrote:
> On Tue 19-02-13 09:22:40, Li Zefan wrote:
>> There's a long long-standing bug...As long as I don't know when it dates
>> from.
>>
>> I've written and attached a simple program to reproduce this bug, and it can
>> immediately trigger the bug in my box. It uses two threads, one keeps calling
>> read(), and the other calling readdir(), both on the same directory fd.
>   So the fact that read() or even write() to fd opened O_RDONLY has *any*
> effect on f_pos looks really unexpected to me. I think we really should
> have there:
> 	if (ret >= 0)
> 		file_pos_write(...);

I thought about this. The problem is then we have to check every fop->write()
to see if any of them can return -errno with file->f_pos changed and fix them,
though it's do-able.

>   That would solve problems with read() and write() on directories for
> pretty much every filesystem since the first usually returns -EISDIR and
> the second -EBADF.

Yeah, seems ceph is the only filesystem that allows read() on directories.

> 
>> When I ran it on ext3 (can be replaced with ext2/ext4) which has _dir_index_
>> feature disabled, I got this:
>>
>> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
>> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
>> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
>> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
>> ...
>>
>> If we configured errors=remount-ro, the filesystem will become read-only.
>>
>> SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
>> {
>> 	...
>> 		loff_t pos = file_pos_read(file);
>> 		ret = vfs_read(file, buf, count, &pos);
>> 		file_pos_write(file, pos);
>> 		fput_light(file, fput_needed);
>> 	...
>> }
>>
>> While readdir() is protected with i_mutex, f_pos can be changed without
>> any locking in various read()/write() syscalls, which leads to this bug.
>>
>> What makes things worse is Andi removed i_mutex from generic_file_llseek,
>> so you can trigger the same bug by replacing read() with lseek() in the
>> test program.
>   Yes, and here I'd say it's a filesystem issue. If filesystem needs f_pos
> changed only under i_mutex, it should use default_llseek() or get the mutex
> itself. That's what the callback is for. We shouldn't unnecessarily impose
> the i_mutex restriction on llseek on a directory for every filesystem.
> 

One of my concern is, concurrent lseek() and readdir() doesn't seem to be
well tested. I'll add a test case in xfstests.

  parent reply	other threads:[~2013-02-19 11:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-19  1:22 [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex Li Zefan
2013-02-19  4:06 ` Miao Xie
2013-02-19  9:19 ` Jan Kara
2013-02-19 11:47   ` Li Zefan
2013-02-19 12:59     ` Jan Kara
2013-02-20  1:49       ` Li Zefan
2013-02-19 11:48   ` Li Zefan [this message]
2013-02-19 12:33 ` Zheng Liu
2013-02-19 12:43   ` Li Zefan
2013-02-23 17:35 ` [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex) Al Viro
2013-02-25  6:09   ` Li Zefan
2013-02-25 18:25   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51236678.3040509@huawei.com \
    --to=lizefan@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.