From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [Qemu-devel] KVM call agenda for June 28
Date: Tue, 5 Jul 2011 15:18:19 -0300
Message-ID: <20110705181819.GA27175@amt.cnet>
References: <BANLkTin-7hkUnMHJN9jUY87m8Y=fHS_GYA@mail.gmail.com>
 <20110630143620.GA4366@amt.cnet>
 <4E0C8D90.8050305@redhat.com>
 <20110630183829.GA8752@amt.cnet>
 <4E12C4F5.9000100@redhat.com>
 <CAJSP0QXLJBZn_3RfCWJCBE8-6LMd2jn4gJ3DqM8VHG3gg+85iQ@mail.gmail.com>
 <20110705125858.GA21254@amt.cnet>
 <4E1313FA.1060905@redhat.com>
 <20110705143230.GA22955@amt.cnet>
 <CAJSP0QV5S592BZFygOj9HNqsTGTtPo9wVG1foQ+N9=vGPpm1Vg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Dor Laor <dlaor@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
	Chris Wright <chrisw@redhat.com>,
	KVM devel mailing list <kvm@vger.kernel.org>,
	quintela@redhat.com, jes sorensen <jes.sorensen@redhat.com>,
	qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:37988 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753156Ab1GESSc (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 5 Jul 2011 14:18:32 -0400
Content-Disposition: inline
In-Reply-To: <CAJSP0QV5S592BZFygOj9HNqsTGTtPo9wVG1foQ+N9=vGPpm1Vg@mail.gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Jul 05, 2011 at 04:37:08PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 5, 2011 at 3:32 PM, Marcelo Tosatti <mtosatti@redhat.com>=
 wrote:
> > On Tue, Jul 05, 2011 at 04:39:06PM +0300, Dor Laor wrote:
> >> On 07/05/2011 03:58 PM, Marcelo Tosatti wrote:
> >> >On Tue, Jul 05, 2011 at 01:40:08PM +0100, Stefan Hajnoczi wrote:
> >> >>On Tue, Jul 5, 2011 at 9:01 AM, Dor Laor<dlaor@redhat.com> =A0wr=
ote:
> >> >>>I tried to re-arrange all of the requirements and use cases usi=
ng this wiki
> >> >>>page: http://wiki.qemu.org/Features/LiveBlockMigration
> >> >>>
> >> >>>It would be the best to agree upon the most interesting use cas=
es (while we
> >> >>>make sure we cover future ones) and agree to them.
> >> >>>The next step is to set the interface for all the various verbs=
 since the
> >> >>>implementation seems to be converging.
> >> >>
> >> >>Live block copy was supposed to support snapshot merge. =A0I thi=
nk the
> >> >>current favored approach is to make the source image a backing f=
ile to
> >> >>the destination image and essentially do image streaming.
> >> >>
> >> >>Using this mechanism for snapshot merge is tricky. =A0The COW fi=
le
> >> >>already uses the read-only snapshot base image. =A0So now we can=
not
> >> >>trivally copy the COW file contents back into the snapshot base =
image
> >> >>using live block copy.
> >> >
> >> >It never did. Live copy creates a new image were both snapshot an=
d
> >> >"current" are copied to.
> >> >
> >> >This is similar with image streaming.
> >>
> >> Not sure I realize what's bad to do in-place merge:
> >>
> >> Let's suppose we have this COW chain:
> >>
> >> =A0 base <-- s1 <-- s2
> >>
> >> Now a live snapshot is created over s2, s2 becomes RO and s3 is RW=
:
> >>
> >> =A0 base <-- s1 <-- s2 <-- s3
> >>
> >> Now we've done with s2 (post backup) and like to merge s3 into s2.
> >>
> >> With your approach we use live copy of s3 into newSnap:
> >>
> >> =A0 base <-- s1 <-- s2 <-- s3
> >> =A0 base <-- s1 <-- newSnap
> >>
> >> When it is over s2 and s3 can be erased.
> >> The down side is the IOs for copying s2 data and the temporary
> >> storage. I guess temp storage is cheap but excessive IO are
> >> expensive.
> >>
> >> My approach was to collapse s3 into s2 and erase s3 eventually:
> >>
> >> before: base <-- s1 <-- s2 <-- s3
> >> after: =A0base <-- s1 <-- s2
> >>
> >> If we use live block copy using mirror driver it should be safe as
> >> long as we keep the ordering of new writes into s3 during the
> >> execution.
> >> Even a failure in the the middle won't cause harm since the
> >> management will keep using s3 until it gets success event.
> >
> > Well, it is more complicated than simply streaming into a new
> > image. I'm not entirely sure it is necessary. The common case is:
> >
> > base -> sn-1 -> sn-2 -> ... -> sn-n
> >
> > When n reaches a limit, you do:
> >
> > base -> merge-1
> >
> > You're potentially copying similar amount of data when merging back=
 into
> > a single image (and you can't easily merge multiple snapshots).
> >
> > If the amount of data thats not in 'base' is large, you create
> > leave a new external file around:
> >
> > base -> merge-1 -> sn-1 -> sn-2 ... -> sn-n
> > to
> > base -> merge-1 -> merge-2
> >
> >> >
> >> >>It seems like snapshot merge will require dedicated code that re=
ads
> >> >>the allocated clusters from the COW file and writes them back in=
to the
> >> >>base image.
> >> >>
> >> >>A very inefficient alternative would be to create a third image,=
 the
> >> >>"merge" image file, which has the COW file as its backing file:
> >> >>snapshot (base) -> =A0cow -> =A0merge
> >
> > Remember there is a 'base' before snapshot, you don't copy the enti=
re
> > image.
>=20
> One use case I have in mind is the Live Backup approach that Jagane
> has been developing.  Here the backup solution only creates a snapsho=
t
> for the period of time needed to read out the dirty blocks.  Then the
> snapshot is deleted again and probably contains very little new data
> relative to the base image.  The backup solution does this operation
> every day.
>=20
> This is the pathalogical case for any approach that copies the entire
> base into a new file.  We could have avoided a lot of I/O by doing an
> in-place update.
>=20
> I want to make sure this works well.

This use case does not fit the streaming scheme that has come up. Its a
completly different operation.

IMO it should be implemented separately.

> Stefan