From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juan Quintela <quintela@redhat.com>
Subject: Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
Date: Wed, 09 Nov 2011 20:53:25 +0100
Message-ID: <m3hb2d843u.fsf@neno.neno>
References: <4EBAAA68.10801@redhat.com> <4EBAACAF.4080407@codemonkey.ws>
	<4EBAB236.2060409@redhat.com> <4EBAB9FA.3070601@codemonkey.ws>
Reply-To: quintela@redhat.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Avi Kivity <avi@redhat.com>,
	Lucas Meneghel Rodrigues <lmr@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	KVM mailing list <kvm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	QEMU devel <qemu-devel@nongnu.org>
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:21375 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753445Ab1KITys (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 9 Nov 2011 14:54:48 -0500
In-Reply-To: <4EBAB9FA.3070601@codemonkey.ws> (Anthony Liguori's message of
	"Wed, 09 Nov 2011 11:35:54 -0600")
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>
>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>> only supported with raw images using coherent shared storage[1].
>>>
>>> [1] NFS is only coherent with close-to-open which right now is not
>>> good enough for migration.
>>
>> Say what?
>
> Due to block format probing, we read at least the first sector of the
> disk during start up.
>
> Strictly going by what NFS guarantees, since we don't open on the
> destination *after* as close on the source, we aren't guaranteed to
> see what's written by the source.
>
> In practice, because of block format probing, unless we're using
> cache=none, the first sector can be out of sync with the source on the
> destination.  If you use cache=none on a Linux client with at least a
> Linux NFS server, you should be relatively safe.

You are not :-(

If you are using a format that "caches" data, like qcow2 with the L1/L2
cache, you are not safe.  You need to reopen (or discard metadata +
re-read it).  Notice that raw nowadays also has metadata (we can resize
the image on the flight, and we need to reopen to find that).

About the coherence problem, I just sent the patches that we had on RHEL
to the list.  With cache=none, both NFS & iSCSI & Fiberchannel are ok
(module the previous problem of metadata).  If you look at the second
patch that I sent, it "tries" to flush the read cache for a block
device.  Problem with the patch are:
- BLKFLSBUF is linux specific
- BLKFLSBUF only works for "some block devices"
- Christoph just Nacked it due to previous reasons.

In resume:
- If we use raw, we don't resize images, and we use a clustered
  filesystem, qemu.git migration works.

- If we change metadata (qcow2, raw resize, ...) we need to re-read
  metadata (we just close +open on RHEL).

- If we use NFS: we need to use cache=none, or close+open consistency

- if we use iSCSI: we need to use cache=none. close+open is not enough
  for consistency.  The ioctl patch that I sent happens to work on
  linux, but it is not even guaranteed to work there.  And if our block
  layer gurus told us not to use the ioctl() I think that we need to do
  just that.

Later, Juan.