qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <qemu@kernel.dk>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] bdrv_aio_flush
Date: Tue, 2 Sep 2008 16:28:07 +0200	[thread overview]
Message-ID: <20080902142807.GI20055@kernel.dk> (raw)
In-Reply-To: <18621.6542.901899.81559@mariner.uk.xensource.com>

On Tue, Sep 02 2008, Ian Jackson wrote:
> Jamie Lokier writes ("Re: [Qemu-devel] [PATCH] bdrv_aio_flush"):
> > Andrea thinks bdrv_aio_flush does guarantee that in flight operations
> > are flushed, while bdrv_flush definitely does not (fsync doesn't).
> 
> I read Andrea as complaining that bdrv_aio_flush _should_ flush in
> flight operations but does not.
> 
> > I vaguely recall from the discussion before, there was uncertainty
> > about whether that is true, and therefore the right thing to do was
> > wait for the in flight AIOs to complete _first_ and then issue an
> > fsync or aio_fsync call.
> 
> Whether bdrv_aio_flush should do this is a question of the qemu
> internal API.
> 
> > The Open Group text for aio_fsync says: "shall asynchronously force
> > all I/O operations [...]  queued at the time of the call to aio_fsync
> > [...]".
> 
> We discussed the meaning of the unix specs before.  I did a close
> textual analysis of it here:
> 
>   http://lists.nongnu.org/archive/html/qemu-devel/2008-04/msg00046.html
> 
> > Since then, I've read the Open Group specifications more closely, and
> > some other OS man pages, and they are consistent that _writes_ always
> > occur in the order they are submitted to aio_write.
> 
> Can you give me a chapter and verse quote for that ?  I'm looking at
> SuSv3 eg
>   http://www.opengroup.org/onlinepubs/009695399/nfindex.html
> 
> All I can see is this:
>   If O_APPEND is set for the file descriptor, write operations append
>   to the file in the same order as the calls were made.
> 
> If things are as you say and aio writes are always executed in the
> order queued, there would be no need to specify that because it would
> be implied by the behaviour of write(2) and O_APPEND.
> 
> > What _can_ be queued is a WRITE FUA command: meaning write some data
> > and flush _this_ data to non-volatile storage.
> 
> In order to implement that interface without flushing other data
> unecessarily, we need to be able to
> 
>    submit other IO requests
>    submit aio_write request for WRITE FUA
>    asynchronously await completion of the aio_write for WRITE FUA
>    submit and perhaps collect completion of other IO requests
>    collect completion of aio_write for WRITE FUA
>    submit and perhaps collect completion of other IO requests
>    submit aio_fsync (for WRITE FUA)
>    submit and perhaps collect completion of other IO requests
>    collect aio_fsync (for WRITE FUA)
> 
> This is still not perfect because we unnecessarily flush some data
> thus delaying reporting completion of the WRITE FUA.  But there is at
> at least no need to wait for _other_ writes to complete.

I don't see how the above works. There's no dependency on FUA and
non-FUA writes, in fact FUA writes tend to jump the device queue due to
certain other operating systems using it for conditions where that is
appropriate. So unless you do all writes using FUA, there's no way
around a flush for committing dirty data. Unfortunately we don't have a
FLUSH_RANGE command, it's just a big sledge hammer.

-- 
Jens Axboe

  reply	other threads:[~2008-09-02 14:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-29 13:37 [Qemu-devel] [PATCH] bdrv_aio_flush Andrea Arcangeli
2008-09-01 11:27 ` Ian Jackson
2008-09-01 12:25   ` Andrea Arcangeli
2008-09-01 13:54     ` Jamie Lokier
2008-09-02 10:52     ` Ian Jackson
2008-09-02 14:25       ` Jens Axboe
2008-09-02 16:49         ` Ian Jackson
2008-09-01 13:25   ` Jamie Lokier
2008-09-02 10:46     ` Ian Jackson
2008-09-02 14:28       ` Jens Axboe [this message]
2008-09-02 16:52         ` Ian Jackson
2008-09-02 18:22           ` Jamie Lokier
2008-09-03 10:01             ` Ian Jackson
2008-09-02 18:01       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080902142807.GI20055@kernel.dk \
    --to=qemu@kernel.dk \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).