From: Theodore Tso <tytso@mit.edu>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jamie Lokier <jamie@shareable.org>,
Jens Axboe <jens.axboe@oracle.com>,
linux-fsdevel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: O_DIRECT and barriers
Date: Fri, 21 Aug 2009 18:08:52 -0400 [thread overview]
Message-ID: <20090821220852.GM9529@mit.edu> (raw)
In-Reply-To: <20090821142635.GB30617@infradead.org>
On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote:
> > It turns out that applications needing integrity must use fdatasync or
> > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may
> > choose to use buffered writes at any time, with no signal to the
> > application.
>
> The fallback was a relatively recent addition to the O_DIRECT semantics
> for broken filesystems that can't handle holes very well. Fortunately
> enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC)
> semantics for that already.
Um, actually, we don't. If we did that, we would have to wait for a
journal commit to complete before allowing the write(2) to complete,
which would be especially painfully slow for ext3.
This question recently came up on the ext4 developer's list, because
of a question of how direct I/O to an preallocated (uninitialized)
extent should be handled. Are we supposed to guarantee synchronous
updates of the metadata by the time write(2) returns, or not? One of
the ext4 developers (I can't remember if it was Mingming or Eric)
asked an XFS developer what they did in that case, and I believe the
answer they were given was that XFS started a commit, but did *not*
wait for the commit to complete before returning from the Direct I/O
write. In fact, they were told (I believe this was from an SGI
engineer, but I don't remember the name; we can track that down if
it's important) that if an application wanted to guarantee metadata
would be updated for an extending write, they had to use fsync() or
O_SYNC/O_DSYNC.
Perhaps they were given an incorrect answer, but it's clear the
semantics of exactly how Direct I/O works in edge cases isn't well
defined, or at least clearly and widely understood.
I have an early draft (for discussion only) what we think it means and
what is currently implemented in Linux, which I've put up, (again, let
me emphasisize) for *discussion* here:
http://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics
Comments are welcome, either on the wiki's talk page, or directly to
me, or to the linux-fsdevel or linux-ext4.
- Ted
next prev parent reply other threads:[~2009-08-21 22:08 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1250697884-22288-1-git-send-email-jack@suse.cz>
2009-08-20 22:12 ` O_DIRECT and barriers Christoph Hellwig
2009-08-21 11:40 ` Jens Axboe
2009-08-21 13:54 ` Jamie Lokier
2009-08-21 14:26 ` Christoph Hellwig
2009-08-21 15:24 ` Jamie Lokier
2009-08-21 17:45 ` Christoph Hellwig
2009-08-21 19:18 ` Ric Wheeler
2009-08-22 0:50 ` Jamie Lokier
2009-08-22 2:19 ` Theodore Tso
2009-08-22 2:31 ` Theodore Tso
2009-08-24 2:34 ` Christoph Hellwig
2009-08-27 14:34 ` Jamie Lokier
2009-08-27 17:10 ` adding proper O_SYNC/O_DSYNC, was " Christoph Hellwig
2009-08-27 17:24 ` Ulrich Drepper
2009-08-28 15:46 ` Christoph Hellwig
2009-08-28 16:06 ` Ulrich Drepper
2009-08-28 16:17 ` Christoph Hellwig
2009-08-28 16:33 ` Ulrich Drepper
2009-08-28 16:41 ` Christoph Hellwig
2009-08-28 20:51 ` Ulrich Drepper
2009-08-28 21:08 ` Christoph Hellwig
2009-08-28 21:16 ` Trond Myklebust
2009-08-28 21:29 ` Christoph Hellwig
2009-08-28 21:43 ` Trond Myklebust
2009-08-28 22:39 ` Christoph Hellwig
2009-08-30 16:44 ` Jamie Lokier
2009-08-28 16:46 ` Jamie Lokier
2009-08-29 0:59 ` Jamie Lokier
2009-08-28 16:44 ` Jamie Lokier
2009-08-28 16:50 ` Jamie Lokier
2009-08-28 21:08 ` Ulrich Drepper
2009-08-30 16:58 ` Jamie Lokier
2009-08-30 17:48 ` Jamie Lokier
2009-08-28 23:06 ` Jamie Lokier
2009-08-28 23:46 ` Christoph Hellwig
2009-08-21 22:08 ` Theodore Tso [this message]
2009-08-21 22:38 ` Joel Becker
2009-08-21 22:45 ` Joel Becker
2009-08-22 2:11 ` Theodore Tso
2009-08-24 2:42 ` Christoph Hellwig
2009-08-24 2:37 ` Christoph Hellwig
2009-08-22 0:56 ` Jamie Lokier
2009-08-22 2:06 ` Theodore Tso
2009-08-26 6:34 ` Dave Chinner
2009-08-26 15:01 ` Jamie Lokier
2009-08-26 18:47 ` Theodore Tso
2009-08-27 14:50 ` Jamie Lokier
2009-08-21 14:20 ` Christoph Hellwig
2009-08-21 15:06 ` James Bottomley
2009-08-21 15:23 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090821220852.GM9529@mit.edu \
--to=tytso@mit.edu \
--cc=hch@infradead.org \
--cc=jamie@shareable.org \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).