From: Dmitriy Monakhov <dmonakhov@sw.ru>
To: David Chinner <dgc@sgi.com>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>,
linux-kernel@vger.kernel.org, devel@openvz.org,
Andrew Morton <akpm@osdl.org>,
xfs@oss.sgi.com
Subject: Re: [PATCH] incorrect direct io error handling
Date: Tue, 19 Dec 2006 09:07:12 +0300 [thread overview]
Message-ID: <87psagto4v.fsf@sw.ru> (raw)
In-Reply-To: <20061218221515.GN44411608@melbourne.sgi.com> (David Chinner's message of "Tue, 19 Dec 2006 09:15:15 +1100")
David Chinner <dgc@sgi.com> writes:
> On Mon, Dec 18, 2006 at 04:22:44PM +0300, Dmitriy Monakhov wrote:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 8332c77..7c571dd 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2044,8 +2044,9 @@ generic_file_direct_write(struct kiocb *
>> /*
>> * Sync the fs metadata but not the minor inode changes and
>> * of course not the data as we did direct DMA for the IO.
>> - * i_mutex is held, which protects generic_osync_inode() from
>> - * livelocking. AIO O_DIRECT ops attempt to sync metadata here.
>> + * i_mutex may not being held (XFS does this), if so some specific locking
>> + * ordering must protect generic_osync_inode() from livelocking.
>> + * AIO O_DIRECT ops attempt to sync metadata here.
>> */
>> if ((written >= 0 || written == -EIOCBQUEUED) &&
>> ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
>> @@ -2279,6 +2280,17 @@ __generic_file_aio_write_nolock(struct k
>>
>> written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
>> ppos, count, ocount);
>> + /*
>> + * If host is not S_ISBLK generic_file_direct_write() may
>> + * have instantiated a few blocks outside i_size files
>> + * Trim these off again.
>> + */
>> + if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) {
>> + loff_t isize = i_size_read(inode);
>> + if (pos + count > isize)
>> + vmtruncate(inode, isize);
>> + }
>> +
>> if (written < 0 || written == count)
>> goto out;
>
> You comment in the first hunk that i_mutex may not be held here,
> but there's no comment in __generic_file_aio_write_nolock() that the
> i_mutex must be held for !S_ISBLK devices.
Any one may call directly call generic_file_direct_write() with i_mutex not held.
>
>> @@ -2341,6 +2353,13 @@ ssize_t generic_file_aio_write_nolock(st
>> ssize_t ret;
>>
>> BUG_ON(iocb->ki_pos != pos);
>> + /*
>> + * generic_file_buffered_write() may be called inside
>> + * __generic_file_aio_write_nolock() even in case of
>> + * O_DIRECT for non S_ISBLK files. So i_mutex must be held.
>> + */
>> + if (!S_ISBLK(inode->i_mode))
>> + BUG_ON(!mutex_is_locked(&inode->i_mutex));
>>
>> ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>> &iocb->ki_pos);
>
> I note that you comment here in generic_file_aio_write_nolock(),
> but it's not immediately obvious that this is refering to the
> vmtruncate() call in __generic_file_aio_write_nolock().
This is not about vmtruncate(). __generic_file_aio_write_nolock() may
call generic_file_buffered_write() even in case of O_DIRECT for !S_ISBLK, and
generic_file_buffered_write() has documented locking rules (i_mutex held).
IMHO it is important to explicitly document this . And after we realize
that i_mutex always held, vmtruncate() may be safely called.
>
> IOWs, wouldn't it be better to put this comment and check in
> __generic_file_aio_write_nolock() directly above the vmtruncate()
> call that cares about this?
>
>> @@ -2383,8 +2402,8 @@ ssize_t generic_file_aio_write(struct ki
>> EXPORT_SYMBOL(generic_file_aio_write);
>>
>> /*
>> - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something
>> - * went wrong during pagecache shootdown.
>> + * May be called without i_mutex for writes to S_ISREG files. XFS does this.
>> + * Returns -EIO if something went wrong during pagecache shootdown.
>> */
>
> Not sure you need to say "XFS does this" - other filesystems may do this
> in the future.....
Yes, but where are multiple comments about "reiserfs does this" in fs/buffer.c
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2006-12-19 6:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-18 13:22 [PATCH] incorrect direct io error handling Dmitriy Monakhov
2006-12-18 19:56 ` Chen, Kenneth W
2006-12-19 6:31 ` Dmitriy Monakhov
2006-12-18 22:15 ` David Chinner
2006-12-19 6:07 ` Dmitriy Monakhov [this message]
2006-12-20 14:26 ` David Chinner
2007-01-10 14:36 ` Dmitriy Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87psagto4v.fsf@sw.ru \
--to=dmonakhov@sw.ru \
--cc=akpm@osdl.org \
--cc=devel@openvz.org \
--cc=dgc@sgi.com \
--cc=dmonakhov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox