From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag
Date: Tue, 1 Sep 2009 11:38:35 +0100 [thread overview]
Message-ID: <20090901103835.GA9548@shareable.org> (raw)
In-Reply-To: <20090831230612.GC10220@lst.de>
Christoph Hellwig wrote:
> > Oh, and QEMU could call whatever "hdparm -F" does when using raw block
> > devices ;-)
>
> Actually for ide/scsi implementing cache control is on my todo list.
> Not sure about virtio yet.
I think hdparm -f -F does for some block devices what fdatasync
ideally does for files. What I was getting at was until we have
perfect fdatasync on block devices for Linux, QEMU could use the
blockdev ioctls to accomplish the same thing on older kernels.
> > It goes to show no matter how hard we try, data integrity is a
> > slippery thing where getting it wrong does not show up under normal
> > circumstances, only during catastrophic system failures.
>
> Honestly, it should not. Digging through all this was a bit of work,
> but I was extremly how carelessly most people that touched it before
> were. It's not rocket sciense and can be tested quite easily using
> various tools - qemu beeing the easiest nowdays but scsi_debug or
> an instrumented iscsi target would do the same thing.
Oh I agree - we have increasingly good debugging tools. What's
missing is a dirty script^H^H^H^H^H^H a good validation test which
stresses the various combinations of ways to sync data on block
devices and various filesystems, and various types of emulated
hardware with/without caches enabled, and various mount options, and
checks the I/O does what is desired in every case.
> > It failed with fsync, which is also important to applications, but
> > filesystem integrity is the most important thing and it's been
> > good at that for many years.
>
> Users might disagree with that. With my user hat on I couldn't care
> less on what state the internal metadata is as long as I get back at
> my data which the OS has guaranteed me to reach the disk after a
> successfull fsync/fdatasync/O_SYNC write.
I guess it depends what you're doing. I've observed more instances of
filesystem corruption due to lack of barriers, resulting in an
inability to find files, than I've ever noticed missing data inside
files, but then I hardly ever keep large amounts of data in databases.
And I get so much mail I wouldn't notice if a few got lost ;-)
> > > E.g. if you want to move your old SCO Unix box into a VM it's the
> > > only safe option.
> >
> > I agree, and for that reason, cache=writethrough or cache=none are the
> > only reasonable defaults.
>
> despite the extremly misleading name cache=none is _NOT_ an alternative,
> unless we make it open the image using O_DIRECT|O_SYNC.
Good point about the misleading name, and good point about O_DIRECT
being insufficient too.
For a safe emulation default with reasonable performance, I wonder if
it would work to emulate drive cache _off_ at the beginning, but with
the capability for the guest to enable it? The theory is that old
guests don't know about drive caches and will leave it off and be safe
(getting O_DSYNC or O_DIRECT|O_DSYNC)[*], and newer guests will turn it on
if they also implement barriers (getting nothing or O_DIRECT, and
fdatasync when they issue barriers). Do you think that would work
with typical guests we know about?
[*] - O_DSYNC as opposed to O_SYNC strikes me as important once proper
cache flushes are implemented, as it may behave very similar to real
hardware when doing data overwrites, whereas O_SYNC should seek back
and forth between the data and inode areas for every write, if it's
updating it's nanosecond timestamps correctly.
-- Jamie
next prev parent reply other threads:[~2009-09-01 10:38 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-31 20:16 [Qemu-devel] [PATCH 0/4] data integrity fixes Christoph Hellwig
2009-08-31 20:16 ` [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag Christoph Hellwig
2009-08-31 22:09 ` Jamie Lokier
2009-08-31 22:16 ` Christoph Hellwig
2009-08-31 22:46 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig
2009-09-01 10:38 ` Jamie Lokier [this message]
2009-08-31 22:53 ` Anthony Liguori
2009-08-31 22:55 ` Jamie Lokier
2009-08-31 22:58 ` Christoph Hellwig
2009-08-31 22:59 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig
2009-08-31 23:09 ` Christoph Hellwig
2009-09-02 3:53 ` Christoph Hellwig
2009-09-02 13:13 ` Anthony Liguori
2009-09-02 14:14 ` Christoph Hellwig
2009-09-02 19:49 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 2/4] block: use fdatasync instead of fsync Christoph Hellwig
2009-08-31 21:51 ` Jamie Lokier
2009-08-31 21:55 ` Christoph Hellwig
2009-08-31 22:48 ` Jamie Lokier
2009-08-31 22:57 ` Christoph Hellwig
2009-09-01 15:59 ` Blue Swirl
2009-09-01 16:04 ` Christoph Hellwig
2009-09-02 0:34 ` Jamie Lokier
2009-09-02 0:37 ` Christoph Hellwig
2009-09-02 1:18 ` Jamie Lokier
2009-09-02 14:02 ` Blue Swirl
2009-09-02 14:15 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 3/4] block: add bdrv_aio_flush operation Christoph Hellwig
2009-09-01 10:24 ` Avi Kivity
2009-09-01 14:25 ` Christoph Hellwig
2009-08-31 20:18 ` [Qemu-devel] [PATCH 4/4] virtio-blk: add volatile writecache feature Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090901103835.GA9548@shareable.org \
--to=jamie@shareable.org \
--cc=hch@lst.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).