From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Nick Piggin <npiggin@suse.de>,
Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] direct I/O fallback sync simplification
Date: Wed, 23 Sep 2009 15:04:30 +0100 [thread overview]
Message-ID: <20090923140430.GB15256@shareable.org> (raw)
In-Reply-To: <20090923130730.GC10759@lst.de>
Christoph Hellwig wrote:
> In the case of direct I/O falling back to buffered I/O we sync data
> twice currently: once at the end of generic_file_buffered_write using
> filemap_write_and_wait_range and once a little later in
> __generic_file_aio_write using do_sync_mapping_range with all flags set.
>
> The wait before write of the do_sync_mapping_range call does not make
> any sense, so just keep the filemap_write_and_wait_range call and move
> it to the right spot.
Are you sure this is an expectation of O_DIRECT?
A few notes from the net, including some documentation from IBM,
advise using O_DIRECT|O_DSYNC if you need sync when direct I/O falls
back to buffered on some other OSes.
IBM (about AIX I believe):
http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fileio.htm
Direct I/O and Data I/O Integrity Completion
Although direct I/O writes are done synchronously, they do not
provide synchronized I/O data integrity completion, as defined by
POSIX. Applications that need this feature should use O_DSYNC in
addition to O_DIRECT. O_DSYNC guarantees that all of the data and
enough of the metadata (for example, indirect blocks) have written
to the stable store to be able to retrieve the data after a system
crash. O_DIRECT only writes the data; it does not write the
metadata.
>From an earlier thread, "O_DIRECT and barriers":
Theodore Tso wrote:
> On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote:
> > > It turns out that applications needing integrity must use fdatasync or
> > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may
> > > choose to use buffered writes at any time, with no signal to the
> > > application.
> >
> > The fallback was a relatively recent addition to the O_DIRECT semantics
> > for broken filesystems that can't handle holes very well. Fortunately
> > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC)
> > semantics for that already.
>
> Um, actually, we don't. If we did that, we would have to wait for a
> journal commit to complete before allowing the write(2) to complete,
> which would be especially painfully slow for ext3.
There's no point in a "half-sync". Nobody expects or can usefully
depend on it. So imho we should drop the filemap_write_and_wait_range
entirely when O_DSYNC is not set.
O_DIRECT without syncing in the buffered fallback will be a useful
performance optimisation for applications (including virtual machines)
which do sequences of writes interspersed with fdatasync calls on
sparse files, or when extending files, or to filesystems which don't
implement O_DIRECT.
Since they need fdatasync anyway, even with direct I/O to get
integrity on some hardware, that's a sensible coding pattern.
-- Jamie
next prev parent reply other threads:[~2009-09-23 14:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-23 13:07 [PATCH] direct I/O fallback sync simplification Christoph Hellwig
2009-09-23 14:04 ` Jamie Lokier [this message]
2009-09-26 15:08 ` Christoph Hellwig
2009-09-29 21:30 ` Jamie Lokier
2009-09-30 12:05 ` Christoph Hellwig
2009-09-30 18:13 ` Jamie Lokier
2009-09-26 19:37 ` Nick Piggin
2009-09-29 13:08 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090923140430.GB15256@shareable.org \
--to=jamie@shareable.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).