qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steve Ofsthun <sofsthun@virtualiron.com>
To: qemu-devel@nongnu.org
Cc: Chris Wright <chrisw@redhat.com>,
	Mark McLoughlin <markmc@redhat.com>,
	Ryan Harper <ryanh@us.ibm.com>,
	Laurent Vivier <Laurent.Vivier@bull.net>,
	kvm-devel <kvm-devel@lists.sourceforge.net>
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Date: Mon, 13 Oct 2008 10:38:09 -0400	[thread overview]
Message-ID: <48F35D51.6080001@virtualiron.com> (raw)
In-Reply-To: <48F2ADD9.9000804@redhat.com>

Mark Wagner wrote:
> Anthony Liguori wrote:
>> Mark Wagner wrote:
>>> If you stopped and listened to yourself, you'd see that you are
>>> making my point...
>>>
>>> AFAIK, QEMU is neither designed nor intended to be an Enterprise
>>> Storage Array,
>>> I thought this group is designing a virtualization layer.  However,
>>> the persistent
>>> argument is that since Enterprise Storage products will often
>>> acknowledge a write
>>> before the data is actually on the disk, its OK for QEMU to do the same.
>>
>> I think you're a little lost in this thread.  We're going to have QEMU
>> only acknowledge writes when they complete.  I've already sent out a
>> patch.  Just waiting a couple days to let everyone give their input.
>>
> Actually, I'm just don't being clear enough in trying to point out that I
> don't think just setting a default value for "cache" goes far enough. My
> argument has nothing to do with the default value. It has to do with
> what the
> right thing to do is in specific situations regardless of the value of the
> cache setting.
> 
> My point is that if a file is opened in the guest with the O_DIRECT (or
> O_DSYNC)
> then QEMU *must* honor that regardless of whatever value the current
> value of
> "cache" is.

I disagree here.  QEMU's contract is not with any particular guest OS interface.  QEMU's contract is with the faithfulness of the hardware emulation.  The guest OS must perform appropriate actions that would guarantee the behavior advertised to any particular application.  So your discussion should focus on what should QEMU do when asked to flush an I/O stream on a virtual device.  While the specific actions QEMU might perform may be different based on caching mode, the end result should be host caching flushed to the underlying storage hierarchy.  Note that this still doesn't guarantee the I/O is on the disk unless the storage is configured properly.  QEMU shouldn't attempt to provide stronger guarantees than the host OS provides.

Looking at a parallel in the real world.  Most disk drives today ship with write caching enabled.  Most OSes will accept this and allow delayed writes to the actual media.  Is this completely safe?  No.  Is this accepted?  Yes.  Now, to become safe an application will perform extraordinary actions (various sync modes, etc) to guarantee the data is on the media.  Yet even this can be circumvented by specific performance modes in the storage hierarchy.  However, there are best practices to follow to avoid unexpected vulnerabilities.  For certain application environments is to mandatory to disable writeback caching on the drives.  Yet we wouldn't want to impose this constraint on all application environments.  There are always tradeoffs.

Now given that there are data safety issues to deal with, it is important to prevent a default behavior that recklessly endangers guest data.  A customer will expect a single virtual machine to exhibit the same data safety as a single physical machine.  However, running a group of virtual machines on a single host, the guest user will expect the same reliability as a group of physical machines.  Note that the virtualization layer adds vulnerabilities (a host OS crash for example) that reduce the reliability of the virtual machines over the physical machines they replace.  So the default behavior of a virtualization stack may need to be more conservative that the corresponding physical stack it replaces.

On the flip side though, the virtualization layer can exploit new opportunities for optimization.  Imagine a single macro operation running within a virtual machine (backup, OS installation).  Data integrity of the entire operation is important, not the individual I/Os.  So by disabling all individual I/O synchronization semantics, I get a backup or installation to run in half the time.  This can be a key advantage for virtual deployments.  We don't want to prevent this situation because we want to guarantee the integrity of half a backup, or half an install.

Steve

  parent reply	other threads:[~2008-10-13 15:00 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-09 17:00 [Qemu-devel] [RFC] Disk integrity in QEMU Anthony Liguori
2008-10-10  7:54 ` Gerd Hoffmann
2008-10-10  8:12   ` Mark McLoughlin
2008-10-12 23:10     ` Jamie Lokier
2008-10-14 17:15       ` Avi Kivity
2008-10-10  9:32   ` Avi Kivity
2008-10-12 23:00     ` Jamie Lokier
2008-10-10  8:11 ` Aurelien Jarno
2008-10-10 12:26   ` Anthony Liguori
2008-10-10 12:53     ` Paul Brook
2008-10-10 13:55       ` Anthony Liguori
2008-10-10 14:05         ` Paul Brook
2008-10-10 14:19         ` Avi Kivity
2008-10-17 13:14           ` Jens Axboe
2008-10-19  9:13             ` Avi Kivity
2008-10-10 15:48     ` Aurelien Jarno
2008-10-10  9:16 ` Avi Kivity
2008-10-10  9:58   ` Daniel P. Berrange
2008-10-10 10:26     ` Avi Kivity
2008-10-10 12:59       ` Paul Brook
2008-10-10 13:20         ` Avi Kivity
2008-10-10 12:34   ` Anthony Liguori
2008-10-10 12:56     ` Avi Kivity
2008-10-11  9:07     ` andrzej zaborowski
2008-10-11 17:54   ` Mark Wagner
2008-10-11 20:35     ` Anthony Liguori
2008-10-12  0:43       ` Mark Wagner
2008-10-12  1:50         ` Chris Wright
2008-10-12 16:22           ` Jamie Lokier
2008-10-12 17:54         ` Anthony Liguori
2008-10-12 18:14           ` nuitari-qemu
2008-10-13  0:27           ` Mark Wagner
2008-10-13  1:21             ` Anthony Liguori
2008-10-13  2:09               ` Mark Wagner
2008-10-13  3:16                 ` Anthony Liguori
2008-10-13  6:42                 ` Aurelien Jarno
2008-10-13 14:38                 ` Steve Ofsthun [this message]
2008-10-12  0:44       ` Chris Wright
2008-10-12 10:21         ` Avi Kivity
2008-10-12 14:37           ` Dor Laor
2008-10-12 15:35             ` Jamie Lokier
2008-10-12 18:00               ` Anthony Liguori
2008-10-12 18:02             ` Anthony Liguori
2008-10-15 10:17               ` Andrea Arcangeli
2008-10-12 17:59           ` Anthony Liguori
2008-10-12 18:34             ` Avi Kivity
2008-10-12 19:33               ` Izik Eidus
2008-10-14 17:08                 ` Avi Kivity
2008-10-12 19:59               ` Anthony Liguori
2008-10-12 20:43                 ` Avi Kivity
2008-10-12 21:11                   ` Anthony Liguori
2008-10-14 15:21                     ` Avi Kivity
2008-10-14 15:32                       ` Anthony Liguori
2008-10-14 15:43                         ` Avi Kivity
2008-10-14 19:25                       ` Laurent Vivier
2008-10-16  9:47                         ` Avi Kivity
2008-10-12 10:12       ` Avi Kivity
2008-10-17 13:20         ` Jens Axboe
2008-10-19  9:01           ` Avi Kivity
2008-10-19 18:10             ` Jens Axboe
2008-10-19 18:23               ` Avi Kivity
2008-10-19 19:17                 ` M. Warner Losh
2008-10-19 19:31                   ` Avi Kivity
2008-10-19 18:24               ` Avi Kivity
2008-10-19 18:36                 ` Jens Axboe
2008-10-19 19:11                   ` Avi Kivity
2008-10-19 19:30                     ` Jens Axboe
2008-10-19 20:16                       ` Avi Kivity
2008-10-20 14:14                       ` Avi Kivity
2008-10-10 10:03 ` Fabrice Bellard
2008-10-13 16:11 ` Laurent Vivier
2008-10-13 16:58   ` Anthony Liguori
2008-10-13 17:36     ` Jamie Lokier
2008-10-13 17:06 ` [Qemu-devel] " Ryan Harper
2008-10-13 18:43   ` Anthony Liguori
2008-10-14 16:42     ` Avi Kivity
2008-10-13 18:51   ` Laurent Vivier
2008-10-13 19:43     ` Ryan Harper
2008-10-13 20:21       ` Laurent Vivier
2008-10-13 21:05         ` Ryan Harper
2008-10-15 13:10           ` Laurent Vivier
2008-10-16 10:24             ` Laurent Vivier
2008-10-16 13:43               ` Anthony Liguori
2008-10-16 16:08                 ` Laurent Vivier
2008-10-17 12:48                 ` Avi Kivity
2008-10-17 13:17                   ` Laurent Vivier
2008-10-14 10:05       ` Kevin Wolf
2008-10-14 14:32         ` Ryan Harper
2008-10-14 16:37       ` Avi Kivity
2008-10-13 19:00   ` Mark Wagner
2008-10-13 19:15     ` Ryan Harper
2008-10-14 16:49       ` Avi Kivity
2008-10-13 17:58 ` [Qemu-devel] " Rik van Riel
2008-10-13 18:22   ` Jamie Lokier
2008-10-13 18:34     ` Rik van Riel
2008-10-14  1:56       ` Jamie Lokier
2008-10-14  2:28         ` nuitari-qemu
2008-10-28 17:34 ` Ian Jackson
2008-10-28 17:45   ` Anthony Liguori
2008-10-28 17:50     ` Ian Jackson
2008-10-28 18:19       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F35D51.6080001@virtualiron.com \
    --to=sofsthun@virtualiron.com \
    --cc=Laurent.Vivier@bull.net \
    --cc=chrisw@redhat.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=markmc@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ryanh@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).