Re: [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Li Zefan <lizefan@huawei.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, "Theodore Ts'o" <tytso@mit.edu>,
	Andrew Morton <akpm@linux-foundation.org>, <andi@firstfloor.org>,
	Wuqixuan <wuqixuan@huawei.com>, <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex)
Date: Mon, 25 Feb 2013 14:09:30 +0800	[thread overview]
Message-ID: <512B001A.8040704@huawei.com> (raw)
In-Reply-To: <20130223173521.GP4503@ZenIV.linux.org.uk>

>> We can fix this bug in each filesystem, but can't we just make sure i_mutex is
>> acquired in lseek(), read(), write() and readdir() for directory file operations?
>>
>> (the patch is for demonstration only)
> 
> No.  This is a very massive overkill.  If anything, we want to *reduce* the
> amount of time we hold ->i_mutex in that area.
> 
> There are several bugs mixed here:
> 	* disappearing exclusion between readdir and lseek for directories.
> Bad, since offset validation suddenly needs to be redone every time we look
> at ->f_pos in ->readdir() instances *and* since ->readdir() itself updates
> position, often by file->f_pos += something.
> 	* write(2) doing "get a copy of f_pos, try and fail ->write(),
> put that copy back".  With no locking whatsoever.  What we get is a f_pos
> value reverting to what it used to be at some random earlier point.  Makes
> life really nasty for everything that updates ->f_pos, obviously.
> 	* read(2) doing the same, *and* some directories apparently having
> ->read() now.
> 
> ->readdir() part of that would be the simplest one - we need to stop messing
> with ->f_pos and just pass an address of its copy, like we do for ->read()
> et.al.  Preserving the method prototype is not worth it and this particular
> method has needed an overhaul of calling conventions for many reasons.
> 
> The issue with write(2) and friends is potentially nastier.  I'm looking
> at the ->f_pos users right now, and while the majority are ->readdir()
> and ->llseek() instances, there's some stuff beyond that.  Some of that is
> done with struct file opened kernel-side and not accessible to userland;
> those are safe (and often really ugly - see drivers/media/pci/cx25821/
> hits for f_pos).  Some are simply wrong - e.g. dev_mem_read()
> (in drivers/net/wireless/ti/wlcore/debugfs.c) ignores *ppos value and uses
> file->f_pos instead; wrong behaviour for ->read() instance.  I'm about
> 20% through the list; so far everything seems to be possible to deal with
> (especially if we add a couple of helpers for common lseek variants and
> use existing generic_file_llseek_size()), so it might turn out to be
> not a serious issue, but it's a potential minefield.  Hell knows...
> 
> As for ->readdir(), I'd like to resurrect an old proposal to change the ABI
> of that sucker.  Quoting the thread from 4 years ago:
> ====
> As for the locking...  I toyed with one idea for a while: instead of passing
> a callback and one of many completely different structs, how about a common
> *beginning* of a struct, with callback stored in it along with several
> common fields?  Namely,
>         * count of filldir calls already made
>         * pointer to file in question
>         * "are we done" flag
> And instead of calling filldir(buf, ...) ->readdir() would call one of several
> helpers:
>         * readdir_emit(buf, ...) - obvious
>         * readdir_relax(buf) - called in a spot convenient for dropping
> and retaking lock; returns whether we need to do revalidation.
>         * readdir_eof(buf) - end of directory
>         * maybe readdir_dot() and readdir_dotdot() - those are certainly
> common enough
> That's the obvious flagday stuff, but since we need to give serious beating
> to most of the instances anyway...  Might as well consider something in
> that direction.
> ====
> 
> Back then it didn't go anywhere, but if we really go for change of calling
> conventions (passing a pointer to copy of ->f_pos), it would make a lot of
> sense, IMO.  Note that ->i_mutex contention could be seriously relieved that
> way - e.g. ext* would just call readdir_relax() at the block boundaries,
> since those locations are always valid there, etc.
> 

So there will be no lock to protect f_pos in read/write/llseek in your proposal.
Do we need to care about reading/writing fpos in 32 bit machine is not atomic?

next prev parent reply	other threads:[~2013-02-25  6:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-19  1:22 [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex Li Zefan
2013-02-19  4:06 ` Miao Xie
2013-02-19  9:19 ` Jan Kara
2013-02-19 11:47   ` Li Zefan
2013-02-19 12:59     ` Jan Kara
2013-02-20  1:49       ` Li Zefan
2013-02-19 11:48   ` Li Zefan
2013-02-19 12:33 ` Zheng Liu
2013-02-19 12:43   ` Li Zefan
2013-02-23 17:35 ` [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex) Al Viro
2013-02-25  6:09   ` Li Zefan [this message]
2013-02-25 18:25   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512B001A.8040704@huawei.com \
    --to=lizefan@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=wuqixuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).