Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Li Zefan <lizefan@huawei.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	andi@firstfloor.org, Wuqixuan <wuqixuan@huawei.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	gregkh@linuxfoundation.org
Subject: Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex
Date: Tue, 19 Feb 2013 13:59:13 +0100	[thread overview]
Message-ID: <20130219125913.GD21945@quack.suse.cz> (raw)
In-Reply-To: <51236652.1050608@huawei.com>

On Tue 19-02-13 19:47:30, Li Zefan wrote:
> On 2013/2/19 17:19, Jan Kara wrote:
> > On Tue 19-02-13 09:22:40, Li Zefan wrote:
> >> There's a long long-standing bug...As long as I don't know when it dates
> >> from.
> >>
> >> I've written and attached a simple program to reproduce this bug, and it can
> >> immediately trigger the bug in my box. It uses two threads, one keeps calling
> >> read(), and the other calling readdir(), both on the same directory fd.
> >   So the fact that read() or even write() to fd opened O_RDONLY has *any*
> > effect on f_pos looks really unexpected to me. I think we really should
> > have there:
> > 	if (ret >= 0)
> > 		file_pos_write(...);
> 
> I thought about this. The problem is then we have to check every fop->write()
> to see if any of them can return -errno with file->f_pos changed and fix them,
> though it's do-able.
  But returning error and advancing f_pos would be a bug - specification
says write() returns the number of bytes written or -1 and f_pos should be
advanced by the number of bytes written.

> >   That would solve problems with read() and write() on directories for
> > pretty much every filesystem since the first usually returns -EISDIR and
> > the second -EBADF.
> 
> Yeah, seems ceph is the only filesystem that allows read() on directories.
> 
> >> When I ran it on ext3 (can be replaced with ext2/ext4) which has _dir_index_
> >> feature disabled, I got this:
> >>
> >> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
> >> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
> >> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0
> >> EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0
> >> ...
> >>
> >> If we configured errors=remount-ro, the filesystem will become read-only.
> >>
> >> SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
> >> {
> >> 	...
> >> 		loff_t pos = file_pos_read(file);
> >> 		ret = vfs_read(file, buf, count, &pos);
> >> 		file_pos_write(file, pos);
> >> 		fput_light(file, fput_needed);
> >> 	...
> >> }
> >>
> >> While readdir() is protected with i_mutex, f_pos can be changed without
> >> any locking in various read()/write() syscalls, which leads to this bug.
> >>
> >> What makes things worse is Andi removed i_mutex from generic_file_llseek,
> >> so you can trigger the same bug by replacing read() with lseek() in the
> >> test program.
> >   Yes, and here I'd say it's a filesystem issue. If filesystem needs f_pos
> > changed only under i_mutex, it should use default_llseek() or get the mutex
> > itself. That's what the callback is for. We shouldn't unnecessarily impose
> > the i_mutex restriction on llseek on a directory for every filesystem.
> 
> One of my concern is, concurrent lseek() and readdir() doesn't seem to be
> well tested. I'll add a test case in xfstests.
  Yes, that might be a useful test to add.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2013-02-19 12:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-19  1:22 [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex Li Zefan
2013-02-19  4:06 ` Miao Xie
2013-02-19  9:19 ` Jan Kara
2013-02-19 11:47   ` Li Zefan
2013-02-19 12:59     ` Jan Kara [this message]
2013-02-20  1:49       ` Li Zefan
2013-02-19 11:48   ` Li Zefan
2013-02-19 12:33 ` Zheng Liu
2013-02-19 12:43   ` Li Zefan
2013-02-23 17:35 ` [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex) Al Viro
2013-02-25  6:09   ` Li Zefan
2013-02-25 18:25   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130219125913.GD21945@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).