From: Christoph Hellwig <hch@lst.de>
To: Jamie Lokier <jamie@shareable.org>
Cc: Christoph Hellwig <hch@lst.de>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag
Date: Tue, 1 Sep 2009 01:06:12 +0200 [thread overview]
Message-ID: <20090831230612.GC10220@lst.de> (raw)
In-Reply-To: <20090831224645.GD24318@shareable.org>
On Mon, Aug 31, 2009 at 11:46:45PM +0100, Jamie Lokier wrote:
> > On Mon, Aug 31, 2009 at 11:09:50PM +0100, Jamie Lokier wrote:
> > > Right now, on a Linux host O_SYNC is unsafe with hardware that has a
> > > volatile write cache. That might not be changed, but if it is than
> > > performance with cache=writethrough will plummet (due to issuing a
> > > CACHE FLUSH to the hardware after every write), while performance with
> > > cache=writeback will be reasonable.
> >
> > Currenly all modes are more or less unsafe with volatile write caches
> > at least when using ext3 or raw block device accesses. XFS is safe
> > two thirds due to doing the right thing and one third due to sheer
> > luck.
>
> Right, but now you've made it worse. By not calling fdatasync at all,
> you've reduced the integrity. Previously it would reach the drive's
> cache, and take whatever (short) time it took to reach the platter.
> Now you're leaving data in the host cache which can stay for much
> longer, and is vulnerable to host kernel crashes.
Your last comment is for data=writeback, which in Avi's proposal that
I implemented would indeed lost any guarantees and be for all pratical
matters unsafe. It's not true for any of the other options.
> Oh, and QEMU could call whatever "hdparm -F" does when using raw block
> devices ;-)
Actually for ide/scsi implementing cache control is on my todo list.
Not sure about virtio yet.
> Well I'd like to start by pointing out your patch introduces a
> regression in the combination cache=writeback with emulated SCSI,
> because it effectively removes the fdatasync calls in that case :-)
Yes, you already pointed this out above.
> It goes to show no matter how hard we try, data integrity is a
> slippery thing where getting it wrong does not show up under normal
> circumstances, only during catastrophic system failures.
Honestly, it should not. Digging through all this was a bit of work,
but I was extremly how carelessly most people that touched it before
were. It's not rocket sciense and can be tested quite easily using
various tools - qemu beeing the easiest nowdays but scsi_debug or
an instrumented iscsi target would do the same thing.
> It failed with fsync, which
> is also important to applications, but filesystem integrity is the
> most important thing and it's been good at that for many years.
Users might disagree with that. With my user hat on I couldn't care
less on what state the internal metadata is as long as I get back at
my data which the OS has guaranteed me to reach the disk after a
successfull fsync/fdatasync/O_SYNC write.
> > E.g. if you want to move your old SCO Unix box into a VM it's the
> > only safe option.
>
> I agree, and for that reason, cache=writethrough or cache=none are the
> only reasonable defaults.
despite the extremly misleading name cache=none is _NOT_ an alternative,
unless we make it open the image using O_DIRECT|O_SYNC.
next prev parent reply other threads:[~2009-08-31 23:06 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-31 20:16 [Qemu-devel] [PATCH 0/4] data integrity fixes Christoph Hellwig
2009-08-31 20:16 ` [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag Christoph Hellwig
2009-08-31 22:09 ` Jamie Lokier
2009-08-31 22:16 ` Christoph Hellwig
2009-08-31 22:46 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig [this message]
2009-09-01 10:38 ` Jamie Lokier
2009-08-31 22:53 ` Anthony Liguori
2009-08-31 22:55 ` Jamie Lokier
2009-08-31 22:58 ` Christoph Hellwig
2009-08-31 22:59 ` Jamie Lokier
2009-08-31 23:06 ` Christoph Hellwig
2009-08-31 23:09 ` Christoph Hellwig
2009-09-02 3:53 ` Christoph Hellwig
2009-09-02 13:13 ` Anthony Liguori
2009-09-02 14:14 ` Christoph Hellwig
2009-09-02 19:49 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 2/4] block: use fdatasync instead of fsync Christoph Hellwig
2009-08-31 21:51 ` Jamie Lokier
2009-08-31 21:55 ` Christoph Hellwig
2009-08-31 22:48 ` Jamie Lokier
2009-08-31 22:57 ` Christoph Hellwig
2009-09-01 15:59 ` Blue Swirl
2009-09-01 16:04 ` Christoph Hellwig
2009-09-02 0:34 ` Jamie Lokier
2009-09-02 0:37 ` Christoph Hellwig
2009-09-02 1:18 ` Jamie Lokier
2009-09-02 14:02 ` Blue Swirl
2009-09-02 14:15 ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 3/4] block: add bdrv_aio_flush operation Christoph Hellwig
2009-09-01 10:24 ` Avi Kivity
2009-09-01 14:25 ` Christoph Hellwig
2009-08-31 20:18 ` [Qemu-devel] [PATCH 4/4] virtio-blk: add volatile writecache feature Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090831230612.GC10220@lst.de \
--to=hch@lst.de \
--cc=jamie@shareable.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.