From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: David Chinner <dgc@sgi.com>
Cc: Dmitriy Monakhov <dmonakhov@sw.ru>,
Dmitriy Monakhov <dmonakhov@openvz.org>,
linux-kernel@vger.kernel.org, devel@openvz.org,
Andrew Morton <akpm@osdl.org>,
xfs@oss.sgi.com
Subject: Re: [PATCH] incorrect direct io error handling
Date: Wed, 10 Jan 2007 17:36:57 +0300 [thread overview]
Message-ID: <87irffkkxi.fsf@sw.ru> (raw)
In-Reply-To: <20061220142631.GZ44411608@melbourne.sgi.com> (David Chinner's message of "Thu, 21 Dec 2006 01:26:31 +1100")
Sorry for long delay (russian holidays are very hard time :) )
David Chinner <dgc@sgi.com> writes:
> On Tue, Dec 19, 2006 at 09:07:12AM +0300, Dmitriy Monakhov wrote:
>> David Chinner <dgc@sgi.com> writes:
>> > On Mon, Dec 18, 2006 at 04:22:44PM +0300, Dmitriy Monakhov wrote:
>> >> diff --git a/mm/filemap.c b/mm/filemap.c
>> >> index 8332c77..7c571dd 100644
>> >> --- a/mm/filemap.c
>> >> +++ b/mm/filemap.c
>
> <snip stuff>
>
>> > You comment in the first hunk that i_mutex may not be held here,
>> > but there's no comment in __generic_file_aio_write_nolock() that the
>> > i_mutex must be held for !S_ISBLK devices.
>> Any one may call directly call generic_file_direct_write() with i_mutex not held.
>
> Only block devices based on the implementation (i.e. buffered I/O is
> done here). but one can't call vmtruncate without the i_mutex held,
> so if a filesystem is calling generic_file_direct_write() it won't
> be able to use __generic_file_aio_write_nolock() without the i_mutex
> held (because it can right now if it doesn't need the buffered I/O
> fallback path), then
>
>> >
>> >> @@ -2341,6 +2353,13 @@ ssize_t generic_file_aio_write_nolock(st
>> >> ssize_t ret;
>> >>
>> >> BUG_ON(iocb->ki_pos != pos);
>> >> + /*
>> >> + * generic_file_buffered_write() may be called inside
>> >> + * __generic_file_aio_write_nolock() even in case of
>> >> + * O_DIRECT for non S_ISBLK files. So i_mutex must be held.
>> >> + */
>> >> + if (!S_ISBLK(inode->i_mode))
>> >> + BUG_ON(!mutex_is_locked(&inode->i_mutex));
>> >>
>> >> ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>> >> &iocb->ki_pos);
>> >
>> > I note that you comment here in generic_file_aio_write_nolock(),
>> > but it's not immediately obvious that this is refering to the
>> > vmtruncate() call in __generic_file_aio_write_nolock().
>> This is not about vmtruncate(). __generic_file_aio_write_nolock() may
>> call generic_file_buffered_write() even in case of O_DIRECT for !S_ISBLK, and
>
> No, the need for i_mutex is currently dependent on doing direct I/O
> and the return value from generic_file_buffered_write().
> A filesystem that doesn't fall back to buffered I/O (e.g. XFS) can currently
> use generic_file_aio_write_nolock() without needing to hold i_mutex.
> use generic_file_aio_write_nolock() without needing to hold i_mutex.
But it doesn't use it. XFS implement it's own write method with it's own locking
rules and explicitly call generic_file_direct_write() in case of O_DIRECT.
BTW XFS correctly handling ENOSPC in case of O_DIRECT (fs corruption not happend
after error occur).
>
> Your change prevents that by introducing a vmtruncate() before the
> generic_file_buffered_write() return value check, which means that a
> filesystem now _must_ hold the i_mutex when calling
> generic_file_aio_write_nolock() even when it doesn't do buffered I/O
> through this path.
Yes it's so. But it is just explicitly document the fact that every fs call
generic_file_aio_write_nolock() with i_mutex held (where is no any fs that
invoke it without i_mutex). As i understand Andrew Morton think so too:
http://lkml.org/lkml/2006/12/12/67
<snip>
I guess we can make that a rule (document it, add
BUG_ON(!mutex_is_locked(..)) if it isn't a blockdev) if needs be. After
really checking that this matches reality for all callers.
<snip>
>
>> generic_file_buffered_write() has documented locking rules (i_mutex held).
>> IMHO it is important to explicitly document this . And after we realize
>> that i_mutex always held, vmtruncate() may be safely called.
>
> I don't think changing the locking semantics of
> generic_file_aio_write_nolock() to require a lock for all
> filesystem-based users is a good way to fix a filesystem specific
> direct I/O problem which can be easily fixed in filesystem specific
> code - i.e. call vmtruncate() in ext3_file_write() on failure....
Where are more than 10 filesystems where we have to fix it then.
And fix is almost the same for all fs, so we have to do many copy/paste work
IMHO fix it inside generic_file_aio_write_nolock is realy straightforward way.
What do you think?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
prev parent reply other threads:[~2007-01-10 14:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-18 13:22 [PATCH] incorrect direct io error handling Dmitriy Monakhov
2006-12-18 13:22 ` Dmitriy Monakhov
2006-12-18 19:56 ` Chen, Kenneth W
2006-12-18 19:56 ` Chen, Kenneth W
2006-12-19 6:31 ` Dmitriy Monakhov
2006-12-19 6:31 ` Dmitriy Monakhov
2006-12-18 22:15 ` David Chinner
2006-12-19 6:07 ` Dmitriy Monakhov
2006-12-20 14:26 ` David Chinner
2007-01-10 14:36 ` Dmitriy Monakhov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87irffkkxi.fsf@sw.ru \
--to=dmonakhov@sw.ru \
--cc=akpm@osdl.org \
--cc=devel@openvz.org \
--cc=dgc@sgi.com \
--cc=dmonakhov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.