All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Nick Piggin <npiggin@suse.de>,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] direct I/O fallback sync simplification
Date: Wed, 23 Sep 2009 15:04:30 +0100	[thread overview]
Message-ID: <20090923140430.GB15256@shareable.org> (raw)
In-Reply-To: <20090923130730.GC10759@lst.de>

Christoph Hellwig wrote:
> In the case of direct I/O falling back to buffered I/O we sync data
> twice currently: once at the end of generic_file_buffered_write using
> filemap_write_and_wait_range and once a little later in
> __generic_file_aio_write using do_sync_mapping_range with all flags set.
> 
> The wait before write of the do_sync_mapping_range call does not make
> any sense, so just keep the filemap_write_and_wait_range call and move
> it to the right spot.

Are you sure this is an expectation of O_DIRECT?

A few notes from the net, including some documentation from IBM,
advise using O_DIRECT|O_DSYNC if you need sync when direct I/O falls
back to buffered on some other OSes.

IBM (about AIX I believe):

    http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fileio.htm

    Direct I/O and Data I/O Integrity Completion

    Although direct I/O writes are done synchronously, they do not
    provide synchronized I/O data integrity completion, as defined by
    POSIX. Applications that need this feature should use O_DSYNC in
    addition to O_DIRECT. O_DSYNC guarantees that all of the data and
    enough of the metadata (for example, indirect blocks) have written
    to the stable store to be able to retrieve the data after a system
    crash. O_DIRECT only writes the data; it does not write the
    metadata.

>From an earlier thread, "O_DIRECT and barriers":

Theodore Tso wrote:
> On Fri, Aug 21, 2009 at 10:26:35AM -0400, Christoph Hellwig wrote:
> > > It turns out that applications needing integrity must use fdatasync or
> > > O_DSYNC (or O_SYNC) *already* with O_DIRECT, because the kernel may
> > > choose to use buffered writes at any time, with no signal to the
> > > application.
> >
> > The fallback was a relatively recent addition to the O_DIRECT semantics
> > for broken filesystems that can't handle holes very well.  Fortunately
> > enough we do force O_SYNC (that is Linux O_SYNC aka Posix O_DSYNC)
> > semantics for that already.
> 
> Um, actually, we don't.  If we did that, we would have to wait for a
> journal commit to complete before allowing the write(2) to complete,
> which would be especially painfully slow for ext3.

There's no point in a "half-sync".  Nobody expects or can usefully
depend on it.  So imho we should drop the filemap_write_and_wait_range
entirely when O_DSYNC is not set.

O_DIRECT without syncing in the buffered fallback will be a useful
performance optimisation for applications (including virtual machines)
which do sequences of writes interspersed with fdatasync calls on
sparse files, or when extending files, or to filesystems which don't
implement O_DIRECT.

Since they need fdatasync anyway, even with direct I/O to get
integrity on some hardware, that's a sensible coding pattern.

-- Jamie

  reply	other threads:[~2009-09-23 14:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-23 13:07 [PATCH] direct I/O fallback sync simplification Christoph Hellwig
2009-09-23 14:04 ` Jamie Lokier [this message]
2009-09-26 15:08   ` Christoph Hellwig
2009-09-29 21:30     ` Jamie Lokier
2009-09-30 12:05       ` Christoph Hellwig
2009-09-30 18:13         ` Jamie Lokier
2009-09-26 19:37 ` Nick Piggin
2009-09-29 13:08 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090923140430.GB15256@shareable.org \
    --to=jamie@shareable.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.