qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jamie Lokier <jamie@shareable.org>
Cc: Christoph Hellwig <hch@lst.de>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag
Date: Tue, 1 Sep 2009 01:06:12 +0200	[thread overview]
Message-ID: <20090831230612.GC10220@lst.de> (raw)
In-Reply-To: <20090831224645.GD24318@shareable.org>

On Mon, Aug 31, 2009 at 11:46:45PM +0100, Jamie Lokier wrote:
> > On Mon, Aug 31, 2009 at 11:09:50PM +0100, Jamie Lokier wrote:
> > > Right now, on a Linux host O_SYNC is unsafe with hardware that has a
> > > volatile write cache.  That might not be changed, but if it is than
> > > performance with cache=writethrough will plummet (due to issuing a
> > > CACHE FLUSH to the hardware after every write), while performance with
> > > cache=writeback will be reasonable.
> > 
> > Currenly all modes are more or less unsafe with volatile write caches
> > at least when using ext3 or raw block device accesses.  XFS is safe
> > two thirds due to doing the right thing and one third due to sheer
> > luck.
> 
> Right, but now you've made it worse.  By not calling fdatasync at all,
> you've reduced the integrity.  Previously it would reach the drive's
> cache, and take whatever (short) time it took to reach the platter.
> Now you're leaving data in the host cache which can stay for much
> longer, and is vulnerable to host kernel crashes.

Your last comment is for data=writeback, which in Avi's proposal that
I implemented would indeed lost any guarantees and be for all pratical
matters unsafe.  It's not true for any of the other options.

> Oh, and QEMU could call whatever "hdparm -F" does when using raw block
> devices ;-)

Actually for ide/scsi implementing cache control is on my todo list.
Not sure about virtio yet.

> Well I'd like to start by pointing out your patch introduces a
> regression in the combination cache=writeback with emulated SCSI,
> because it effectively removes the fdatasync calls in that case :-)

Yes, you already pointed this out above.

> It goes to show no matter how hard we try, data integrity is a
> slippery thing where getting it wrong does not show up under normal
> circumstances, only during catastrophic system failures.

Honestly, it should not.  Digging through all this was a bit of work,
but I was extremly how carelessly most people that touched it before
were.  It's not rocket sciense and can be tested quite easily using
various tools - qemu beeing the easiest nowdays but scsi_debug or
an instrumented iscsi target would do the same thing.

> It failed with fsync, which
> is also important to applications, but filesystem integrity is the
> most important thing and it's been good at that for many years.

Users might disagree with that.  With my user hat on I couldn't care
less on what state the internal metadata is as long as I get back at
my data which the OS has guaranteed me to reach the disk after a
successfull fsync/fdatasync/O_SYNC write.

> > E.g. if you want to move your old SCO Unix box into a VM it's the
> > only safe option.
> 
> I agree, and for that reason, cache=writethrough or cache=none are the
> only reasonable defaults.

despite the extremly misleading name cache=none is _NOT_ an alternative,
unless we make it open the image using O_DIRECT|O_SYNC.

  reply	other threads:[~2009-08-31 23:06 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-31 20:16 [Qemu-devel] [PATCH 0/4] data integrity fixes Christoph Hellwig
2009-08-31 20:16 ` [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag Christoph Hellwig
2009-08-31 22:09   ` Jamie Lokier
2009-08-31 22:16     ` Christoph Hellwig
2009-08-31 22:46       ` Jamie Lokier
2009-08-31 23:06         ` Christoph Hellwig [this message]
2009-09-01 10:38           ` Jamie Lokier
2009-08-31 22:53       ` Anthony Liguori
2009-08-31 22:55         ` Jamie Lokier
2009-08-31 22:58         ` Christoph Hellwig
2009-08-31 22:59         ` Jamie Lokier
2009-08-31 23:06           ` Christoph Hellwig
2009-08-31 23:09             ` Christoph Hellwig
2009-09-02  3:53         ` Christoph Hellwig
2009-09-02 13:13           ` Anthony Liguori
2009-09-02 14:14             ` Christoph Hellwig
2009-09-02 19:49             ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 2/4] block: use fdatasync instead of fsync Christoph Hellwig
2009-08-31 21:51   ` Jamie Lokier
2009-08-31 21:55     ` Christoph Hellwig
2009-08-31 22:48       ` Jamie Lokier
2009-08-31 22:57         ` Christoph Hellwig
2009-09-01 15:59   ` Blue Swirl
2009-09-01 16:04     ` Christoph Hellwig
2009-09-02  0:34       ` Jamie Lokier
2009-09-02  0:37         ` Christoph Hellwig
2009-09-02  1:18           ` Jamie Lokier
2009-09-02 14:02           ` Blue Swirl
2009-09-02 14:15             ` Christoph Hellwig
2009-08-31 20:17 ` [Qemu-devel] [PATCH 3/4] block: add bdrv_aio_flush operation Christoph Hellwig
2009-09-01 10:24   ` Avi Kivity
2009-09-01 14:25     ` Christoph Hellwig
2009-08-31 20:18 ` [Qemu-devel] [PATCH 4/4] virtio-blk: add volatile writecache feature Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090831230612.GC10220@lst.de \
    --to=hch@lst.de \
    --cc=jamie@shareable.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).