From: Christoph Hellwig <hch@lst.de>
To: Jamie Lokier <jamie@shareable.org>
Cc: Christoph Hellwig <hch@lst.de>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag
Date: Tue, 1 Sep 2009 01:06:12 +0200 [thread overview]
Message-ID: <20090831230612.GC10220@lst.de> (raw)
In-Reply-To: <20090831224645.GD24318@shareable.org>
On Mon, Aug 31, 2009 at 11:46:45PM +0100, Jamie Lokier wrote:
> > On Mon, Aug 31, 2009 at 11:09:50PM +0100, Jamie Lokier wrote:
> > > Right now, on a Linux host O_SYNC is unsafe with hardware that has a
> > > volatile write cache. That might not be changed, but if it is than
> > > performance with cache=writethrough will plummet (due to issuing a
> > > CACHE FLUSH to the hardware after every write), while performance with
> > > cache=writeback will be reasonable.
> >
> > Currenly all modes are more or less unsafe with volatile write caches
> > at least when using ext3 or raw block device accesses. XFS is safe
> > two thirds due to doing the right thing and one third due to sheer
> > luck.
>
> Right, but now you've made it worse. By not calling fdatasync at all,
> you've reduced the integrity. Previously it would reach the drive's
> cache, and take whatever (short) time it took to reach the platter.
> Now you're leaving data in the host cache which can stay for much
> longer, and is vulnerable to host kernel crashes.
Your last comment is for data=writeback, which in Avi's proposal that
I implemented would indeed lost any guarantees and be for all pratical
matters unsafe. It's not true for any of the other options.
> Oh, and QEMU could call whatever "hdparm -F" does when using raw block
> devices ;-)
Actually for ide/scsi implementing cache control is on my todo list.
Not sure about virtio yet.
> Well I'd like to start by pointing out your patch introduces a
> regression in the combination cache=writeback with emulated SCSI,
> because it effectively removes the fdatasync calls in that case :-)
Yes, you already pointed this out above.
> It goes to show no matter how hard we try, data integrity is a
> slippery thing where getting it wrong does not show up under normal
> circumstances, only during catastrophic system failures.
Honestly, it should not. Digging through all this was a bit of work,
but I was extremly how carelessly most people that touched it before
were. It's not rocket sciense and can be tested quite easily using
various tools - qemu beeing the easiest nowdays but scsi_debug or
an instrumented iscsi target would do the same thing.
> It failed with fsync, which
> is also important to applications, but filesystem integrity is the
> most important thing and it's been good at that for many years.
Users might disagree with that. With my user hat on I couldn't care
less on what state the internal metadata is as long as I get back at
my data which the OS has guaranteed me to reach the disk after a
successfull fsync/fdatasync/O_SYNC write.
> > E.g. if you want to move your old SCO Unix box into a VM it's the
> > only safe option.
>
> I agree, and for that reason, cache=writethrough or cache=none are the
> only reasonable defaults.
despite the extremly misleading name cache=none is _NOT_ an alternative,
unless we make it open the image using O_DIRECT|O_SYNC.
next prev parent reply other threads:[~2009-08-31 23:06 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-31 20:16 [Qemu-devel] [PATCH 0/4] data integrity fixes Christoph Hellwig
2009-08-31 20:16 ` [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag Christoph Hellwig
2009-08-31 22:09 ` Jamie Lokier
2009-08-31 22:16 ` Christoph Hellwig
2009-08-31 22:46 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig [this message]
2009-09-01 10:38 ` Jamie Lokier
2009-08-31 22:53 ` Anthony Liguori
2009-08-31 22:55 ` Jamie Lokier
2009-08-31 22:58 ` Christoph Hellwig
2009-08-31 22:59 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig
2009-08-31 23:09 ` Christoph Hellwig
2009-09-02 3:53 ` Christoph Hellwig
2009-09-02 13:13 ` Anthony Liguori
2009-09-02 14:14 ` Christoph Hellwig
2009-09-02 19:49 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 2/4] block: use fdatasync instead of fsync Christoph Hellwig
2009-08-31 21:51 ` Jamie Lokier
2009-08-31 21:55 ` Christoph Hellwig
2009-08-31 22:48 ` Jamie Lokier
2009-08-31 22:57 ` Christoph Hellwig
2009-09-01 15:59 ` Blue Swirl
2009-09-01 16:04 ` Christoph Hellwig
2009-09-02 0:34 ` Jamie Lokier
2009-09-02 0:37 ` Christoph Hellwig
2009-09-02 1:18 ` Jamie Lokier
2009-09-02 14:02 ` Blue Swirl
2009-09-02 14:15 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 3/4] block: add bdrv_aio_flush operation Christoph Hellwig
2009-09-01 10:24 ` Avi Kivity
2009-09-01 14:25 ` Christoph Hellwig
2009-08-31 20:18 ` [Qemu-devel] [PATCH 4/4] virtio-blk: add volatile writecache feature Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090831230612.GC10220@lst.de \
--to=hch@lst.de \
--cc=jamie@shareable.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).