From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: qemu and qemu.git -> Migration + disk stress
 introduces qcow2 corruptions
Date: Mon, 14 Nov 2011 13:56:36 +0200
Message-ID: <20111114115635.GB17560@redhat.com>
References: <4EBC4260.1090405@codemonkey.ws> <4EBCF5DA.1000605@redhat.com>
	<4EBE499E.4030100@redhat.com> <20111114101610.GA32392@redhat.com>
	<20111114102421.GE16454@redhat.com>
	<20111114110802.GB32392@redhat.com> <4EC0F9D1.3060505@redhat.com>
	<20111114112918.GC32392@redhat.com>
	<20111114113415.GB17371@redhat.com>
	<20111114113727.GD32392@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Kevin Wolf <kwolf@redhat.com>, Lucas Meneghel Rodrigues <lmr@redhat.com>,
	KVM mailing list <kvm@vger.kernel.org>,
	Juan Jose Quintela Carreira <quintela@redhat.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	QEMU devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
Content-Disposition: inline
In-Reply-To: <20111114113727.GD32392@redhat.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
List-Id: kvm.vger.kernel.org

On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > > 
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > 
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > > 
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > > 
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> > 
> > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Yes
> 
> Daniel

OK, so let's go back one step now - how is this related to
'rollback to source host'?

-- 
MST