From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juan Quintela Subject: Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions Date: Wed, 09 Nov 2011 20:53:25 +0100 Message-ID: References: <4EBAAA68.10801@redhat.com> <4EBAACAF.4080407@codemonkey.ws> <4EBAB236.2060409@redhat.com> <4EBAB9FA.3070601@codemonkey.ws> Reply-To: quintela@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Lucas Meneghel Rodrigues , Kevin Wolf , KVM mailing list , "Michael S. Tsirkin" , Marcelo Tosatti , QEMU devel To: Anthony Liguori Return-path: Received: from mx1.redhat.com ([209.132.183.28]:21375 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753445Ab1KITys (ORCPT ); Wed, 9 Nov 2011 14:54:48 -0500 In-Reply-To: <4EBAB9FA.3070601@codemonkey.ws> (Anthony Liguori's message of "Wed, 09 Nov 2011 11:35:54 -0600") Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > On 11/09/2011 11:02 AM, Avi Kivity wrote: >> On 11/09/2011 06:39 PM, Anthony Liguori wrote: >>> >>> Migration with qcow2 is not a supported feature for 1.0. Migration is >>> only supported with raw images using coherent shared storage[1]. >>> >>> [1] NFS is only coherent with close-to-open which right now is not >>> good enough for migration. >> >> Say what? > > Due to block format probing, we read at least the first sector of the > disk during start up. > > Strictly going by what NFS guarantees, since we don't open on the > destination *after* as close on the source, we aren't guaranteed to > see what's written by the source. > > In practice, because of block format probing, unless we're using > cache=none, the first sector can be out of sync with the source on the > destination. If you use cache=none on a Linux client with at least a > Linux NFS server, you should be relatively safe. You are not :-( If you are using a format that "caches" data, like qcow2 with the L1/L2 cache, you are not safe. You need to reopen (or discard metadata + re-read it). Notice that raw nowadays also has metadata (we can resize the image on the flight, and we need to reopen to find that). About the coherence problem, I just sent the patches that we had on RHEL to the list. With cache=none, both NFS & iSCSI & Fiberchannel are ok (module the previous problem of metadata). If you look at the second patch that I sent, it "tries" to flush the read cache for a block device. Problem with the patch are: - BLKFLSBUF is linux specific - BLKFLSBUF only works for "some block devices" - Christoph just Nacked it due to previous reasons. In resume: - If we use raw, we don't resize images, and we use a clustered filesystem, qemu.git migration works. - If we change metadata (qcow2, raw resize, ...) we need to re-read metadata (we just close +open on RHEL). - If we use NFS: we need to use cache=none, or close+open consistency - if we use iSCSI: we need to use cache=none. close+open is not enough for consistency. The ioctl patch that I sent happens to work on linux, but it is not even guaranteed to work there. And if our block layer gurus told us not to use the ioctl() I think that we need to do just that. Later, Juan.