From: Jamie Lokier <jamie@shareable.org>
To: Paul Brook <paul@codesourcery.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
armbru@redhat.com, Christoph Hellwig <hch@lst.de>,
qemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
Date: Wed, 12 May 2010 11:09:59 +0100 [thread overview]
Message-ID: <20100512100959.GB16879@shareable.org> (raw)
In-Reply-To: <201005112311.12875.paul@codesourcery.com>
Paul Brook wrote:
> > > Paul Brook wrote:
> > > > cache=none:
> > > > No host caching. Reads and writes both go directly to underlying
> > > > storage.
> > > >
> > > > Useful to avoid double-caching.
> > > >
> > > > cache=writethrough
> > > >
> > > > Reads are cached. Writes go directly to underlying storage. Useful
> > > > for
> > > >
> > > > broken guests that aren't aware of drive caches.
> > >
> > > These are misleading descriptions - because cache=none does not push
> > > writes down to powerfail-safe storage, while cache=writethrough might.
> >
> > If so, then this is a serious bug.
>
> .. though it may be a kernel bug rather that a qemu bug, depending on the
> exact details.
It's not a kernel bug. cache=none uses O_DIRECT, and O_DIRECT must
not force writes to powerfail-safe storage. If it did, it would be
unusably slow for applications using O_DIRECT as a performance
enhancer / memory saver. They can call fsync/fdatasync when they need
to for integrity. (There might be kernel bugs in the latter department.)
> Either way, I consider any mode that inhibits host filesystem write
> cache but not volatile drive cache to be pretty worthless.
On the contrary, it greatly reduces host memory consumption so that
guest data isn't cached twice (it's already cached in the guest), and
it may improve performance by relaxing the POSIX write-serialisation
constraint (not sure if Linux cares; Solaris does).
> Either we guaranteed data integrity on completion or we don't.
The problem with the description of cache=none is it uses O_DIRECT,
which does always not push writes to powerfail-safe storage,.
O_DIRECT is effectively a hint. It requests less caching in kernel
memory, may reduce memory usage and copying, may invoke direct DMA.
O_DIRECT does not tell the disk hardware to commit to powerfail-safe
storage. I.e. it doesn't issue barriers or disable disk write caching.
(However, depending on a host setup, it might have that effect if disk
write cache is disabled by the admin).
Also, it doesn't even always write to disk: It falls back to buffered
in some circumstances, even on filesystems which support it - see
recent patches for btrfs which use buffered I/O for O_DIRECT for some
parts of some files. (Many non-Linux OSes fall back to buffered
when any other process holds a non-O_DIRECT file descriptor, or when
requests don't meet some criteria).
The POSIX thing to use for cache=none would be O_DSYNC|O_RSYNC, and
that should work on some hosts, but Linux doesn't implement real O_RSYNC.
A combination which ought to work is O_DSYNC|O_DIRECT. O_DIRECT is
the performance hint; O_DSYNC provides the commit request. Christoph
Hellwig has mentioned that combination elsewhere on this thread.
It makes sense to me for cache=none.
O_DIRECT by itself is a useful performance & memory hint, so there
does need to be some option which maps onto O_DIRECT alone. But it
shouldn't be documented as stronger than cache=writethrough, because
it isn't.
-- Jamie
next prev parent reply other threads:[~2010-05-12 10:10 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
2010-05-11 10:55 ` Christoph Hellwig
2010-05-11 12:15 ` Paul Brook
2010-05-11 12:43 ` Anthony Liguori
2010-05-11 13:12 ` Paul Brook
2010-05-11 13:20 ` Anthony Liguori
2010-05-11 13:50 ` Paul Brook
2010-05-11 15:40 ` Anthony Liguori
2010-05-11 15:53 ` Paul Brook
2010-05-11 17:09 ` Anthony Liguori
2010-05-11 22:33 ` Paul Brook
2010-05-11 19:11 ` Avi Kivity
2010-05-11 16:32 ` Jamie Lokier
2010-05-11 17:15 ` Anthony Liguori
2010-05-11 18:13 ` Jamie Lokier
2010-05-11 15:18 ` Alexander Graf
2010-05-11 18:20 ` Jamie Lokier
2010-05-11 21:58 ` Paul Brook
2010-05-11 22:11 ` Paul Brook
2010-05-12 10:09 ` Jamie Lokier [this message]
2010-05-17 12:40 ` Christoph Hellwig
2010-05-14 9:16 ` Markus Armbruster
2010-05-17 12:41 ` Christoph Hellwig
2010-05-17 12:42 ` Alexander Graf
2010-05-11 19:04 ` Avi Kivity
2010-05-12 15:05 ` Alexander Graf
2010-05-12 15:36 ` Kevin Wolf
2010-05-12 15:51 ` Alexander Graf
2010-05-11 6:18 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
2010-05-11 8:29 ` [Qemu-devel] " Kevin Wolf
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
2010-05-10 22:03 ` Alexander Graf
2010-05-10 22:12 ` Anthony Liguori
2010-05-11 21:48 ` Jamie Lokier
2010-05-12 8:51 ` Stefan Hajnoczi
2010-05-12 9:42 ` Jamie Lokier
2010-05-12 10:43 ` Stefan Hajnoczi
2010-05-12 12:50 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100512100959.GB16879@shareable.org \
--to=jamie@shareable.org \
--cc=agraf@suse.de \
--cc=armbru@redhat.com \
--cc=hch@lst.de \
--cc=kwolf@redhat.com \
--cc=paul@codesourcery.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).