From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] ocfs2: serialize unaligned aio
Date: Mon, 27 Jun 2011 09:43:34 -0700 [thread overview]
Message-ID: <4E08B336.9040701@oracle.com> (raw)
In-Reply-To: <20110627162306.GB20816@wotan.suse.de>
On 06/27/2011 09:23 AM, Mark Fasheh wrote:
> On Sun, Jun 26, 2011 at 12:22:48AM -0700, Joel Becker wrote:
>> On Wed, Jun 22, 2011 at 02:23:38PM -0700, Mark Fasheh wrote:
>>> Fix a corruption that can happen when we have (two or more) outstanding
>>> aio's to an overlapping unaligned region. Ext4
>>> (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
>>> similar issues.
>>>
>>> In our case what happens is that we can have an outstanding aio on a region
>>> and if a write comes in with some bytes overlapping the original aio we may
>>> decide to read that region into a page before continuing (typically because
>>> of buffered-io fallback). Since we have no ordering guarantees with the
>>> aio, we can read stale or bad data into the page and then write it back out.
>>>
>>> If the i/o is page and block aligned, then we avoid this issue as there
>>> won't be any need to read data from disk.
>>>
>>> I took the same approach as Eric in the ext4 patch and introduced some
>>> serialization of unaligned async direct i/o. I don't expect this to have an
>>> effect on the most common cases of AIO. Unaligned aio will be slower
>>> though, but that's far more acceptable than data corruption.
>> The patch looks good, but I'm a little confused. Why doesn't
>> this matter for buffered I/O? Just because that data is going through
>> the pagecache? For a second, I couldn't see how unaligned dio was
>> possible, until I remembered this was block aligned, not sector aligned.
> Buffered I/O is synchronous so we don't really have any situations in which
> there can be two buffered I/O's at the same time.
>
>
>> Don't most of the major DIO users (read: databases) do
>> sector-aligned I/O? Won't this affect them?
> In 2.6? Anyway, Sunil will have to answer that question... I would guess
> though that since xfs and ext4 have the same patch and there don't seem to
> be major reports from Oracle of DB performance tanking. That's hardly solid
> evidence of course.
The Oracle db is not a concern as it fully allocates (and inits) the blocks.
The exception is the RMAN backup that does extending (aio) direct writes.
If I remember correctly, this issue was reported by KVM users. Atleast
on ext4/xfs.
next prev parent reply other threads:[~2011-06-27 16:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-22 21:23 [Ocfs2-devel] ocfs2: serialize unaligned aio Mark Fasheh
2011-06-23 3:44 ` Tao Ma
2011-06-26 7:22 ` Joel Becker
2011-06-27 16:23 ` Mark Fasheh
2011-06-27 16:43 ` Sunil Mushran [this message]
2011-06-27 17:26 ` Mark Fasheh
2011-07-28 9:08 ` Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E08B336.9040701@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.