From: Dave Chinner <david@fromorbit.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Matthew Wilcox <matthew@wil.cx>,
linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
sandeen@sandeen.net
Subject: Re: [PATCH] dio: track and serialise unaligned direct IO
Date: Wed, 4 Aug 2010 08:56:58 +1000 [thread overview]
Message-ID: <20100803225658.GB26402@dastard> (raw)
In-Reply-To: <1280856865.2436.31.camel@mingming-laptop>
On Tue, Aug 03, 2010 at 10:34:25AM -0700, Mingming Cao wrote:
> On Fri, 2010-07-30 at 14:53 +1000, Dave Chinner wrote:
> > On Thu, Jul 29, 2010 at 08:53:24PM -0600, Matthew Wilcox wrote:
> > > On Fri, Jul 30, 2010 at 08:45:16AM +1000, Dave Chinner wrote:
> > > > If we get two unaligned direct IO's to the same filesystem block
> > > > that is marked as a new allocation (i.e. buffer_new), then both IOs will
> > > > zero the portion of the block they are not writing data to. As a
> > > > result, when the IOs complete there will be a portion of the block
> > > > that contains zeros from the last IO to complete rather than the
> > > > data that should be there.
....
> > I don't want any direct IO for XFS to go through the page cache -
> > unaligned or not. using the page cache for the unaligned blocks
> > would also be much worse for performance that this method because it
> > turns unaligned direct IO into 3 IOs - the unaligned head block, the
> > aligned body and the unaligned tail block. It would also be a
> > performance hit you take on every single dio, whereas this way the
> > hit is only taken when an overlap is detected.
>
> Does this problem also possible for DIO and non AIO case? (Ext4 case
> this only happy with AIO+DIO+unaligned). If not, could we simply force
> unaligned AIO+DIO to be synchronous? Still direct IO...
There is nothing specific to AIO about this bug. XFS (at least)
allows concurrent DIO writes to the same inode regardless of whether
they are dispatched via AIO or multiple separate threads and so the
race condition exists outside just the AIO context...
> > And besides, such decisions on whether to use buffered IO have to be
> > made high up in the filesystem when we are deciding how to lock the
> > inode for the dio - buffered IO requires exclusive locking, not the
> > shared locking we do for dio writes. That further reduces the
> > performance of unaligned direct IO even when there are no overlaps,
> > and it's a solution that every filesystem needs to implement
> > themselves in some way.
> >
> > I've looked at a couple of different ways to fix this (e.g. passing
> > the unaligned state to get_blocks() to allow the filesystem to
> > serialise there) but they've all died a horrible death of dodgy
> > locking and hacky IO completion detection. not to mention
> > requiring a different solution in every filesystem.
>
> I also have been thinking other ways to fix this, initial thoughts about
> this but concerned about the scalability.
>
> I was looking at ways to use buffer head state to indicate any unaligned
> AIO DIO write to buffer_new buffers(do_direct_IO set the state), then
> for any futhur unaligned IO need to wait for the AIO+DIO complete on
> that buffer.
That's effectively what my fix does, except it does not overload
bufferheads to do it.
> but the buffer head passed from do_direct_IO to
> get_blocks() is a dummy buffer, have to get the buffer head from
> device...
Which, IMO, seems like a dangerous layering violation - it's mixing
device associated bufferheads with filesystem IO (i.e. from
different address spaces) and, as such, the two are never guaranteed
to be coherent. To make matters more complex, XFS can have at least two
block devices (data and real-time) that direct IO is targeted at, so
doing something that is block device specific seems much more
complex than just tracking the IO in a separate list...
> this would only impact to all IOs that are really conflict with
> pending AIO DIOs... it should work for ext4 case...would this something
> usable for XFS? I have got the approach started last week but not very
> far.
It might, but I don't think it's a viable approach.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-08-03 22:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-29 22:45 [PATCH] dio: track and serialise unaligned direct IO Dave Chinner
2010-07-30 2:53 ` Matthew Wilcox
2010-07-30 4:53 ` Dave Chinner
2010-08-03 17:34 ` Mingming Cao
2010-08-03 22:56 ` Dave Chinner [this message]
2010-08-03 23:41 ` Mingming Cao
2010-08-04 4:22 ` Dave Chinner
2010-07-30 17:43 ` Badari Pulavarty
2010-07-30 23:13 ` Dave Chinner
2010-08-04 0:11 ` Mingming Cao
2010-08-04 3:37 ` Dave Chinner
2010-08-04 3:58 ` Dave Chinner
2010-08-04 14:55 ` Mingming Cao
2010-08-05 1:02 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100803225658.GB26402@dastard \
--to=david@fromorbit.com \
--cc=cmm@us.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).