From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions Date: Mon, 14 Nov 2011 12:24:22 +0200 Message-ID: <20111114102421.GE16454@redhat.com> References: <4EBAAA68.10801@redhat.com> <4EBAACAF.4080407@codemonkey.ws> <4EBAB236.2060409@redhat.com> <4EBAB9FA.3070601@codemonkey.ws> <4EBB919B.7040605@redhat.com> <4EBC1792.3030004@codemonkey.ws> <4EBC4260.1090405@codemonkey.ws> <4EBCF5DA.1000605@redhat.com> <4EBE499E.4030100@redhat.com> <20111114101610.GA32392@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Kevin Wolf , Anthony Liguori , Lucas Meneghel Rodrigues , KVM mailing list , Marcelo Tosatti , QEMU devel , Juan Jose Quintela Carreira , "libvir-list@redhat.com" To: "Daniel P. Berrange" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:31211 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753324Ab1KNKXP (ORCPT ); Mon, 14 Nov 2011 05:23:15 -0500 Content-Disposition: inline In-Reply-To: <20111114101610.GA32392@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote: > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > > On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > > Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > > Live migration with qcow2 or any other image format is just not going to work > > > > right now even with proper clustered storage. I think doing a block level flush > > > > cache interface and letting block devices decide how to do it is the best approach. > > > > > > I would really prefer reusing the existing open/close code. It means > > > less (duplicated) code, is existing code that is well tested and doesn't > > > make migration much of a special case. > > > > > > If you want to avoid reopening the file on the OS level, we can reopen > > > only the topmost layer (i.e. the format, but not the protocol) for now > > > and in 1.1 we can use bdrv_reopen(). > > > > > > > Intuitively I dislike _reopen style interfaces. If the second open > > yields different results from the first, does it invalidate any > > computations in between? > > > > What's wrong with just delaying the open? > > If you delay the 'open' until the mgmt app issues 'cont', then you loose > the ability to rollback to the source host upon open failure for most > deployed versions of libvirt. We only fairly recently switched to a five > stage migration handshake to cope with rollback when 'cont' fails. > > Daniel I guess reopen can fail as well, so this seems to me to be an important fix but not a blocker. > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|