From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53465) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QeqSX-00020v-A3 for qemu-devel@nongnu.org; Thu, 07 Jul 2011 11:25:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QeqSQ-0007QZ-RI for qemu-devel@nongnu.org; Thu, 07 Jul 2011 11:25:45 -0400 Received: from mail-gx0-f173.google.com ([209.85.161.173]:60743) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QeqSQ-0007QP-Ew for qemu-devel@nongnu.org; Thu, 07 Jul 2011 11:25:38 -0400 Received: by gxk26 with SMTP id 26so503647gxk.4 for ; Thu, 07 Jul 2011 08:25:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110705181819.GA27175@amt.cnet> References: <20110630143620.GA4366@amt.cnet> <4E0C8D90.8050305@redhat.com> <20110630183829.GA8752@amt.cnet> <4E12C4F5.9000100@redhat.com> <20110705125858.GA21254@amt.cnet> <4E1313FA.1060905@redhat.com> <20110705143230.GA22955@amt.cnet> <20110705181819.GA27175@amt.cnet> Date: Thu, 7 Jul 2011 16:25:37 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] KVM call agenda for June 28 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti Cc: Kevin Wolf , Chris Wright , KVM devel mailing list , quintela@redhat.com, jes sorensen , Dor Laor , qemu-devel@nongnu.org, Avi Kivity On Tue, Jul 5, 2011 at 7:18 PM, Marcelo Tosatti wrote= : > On Tue, Jul 05, 2011 at 04:37:08PM +0100, Stefan Hajnoczi wrote: >> On Tue, Jul 5, 2011 at 3:32 PM, Marcelo Tosatti wr= ote: >> > On Tue, Jul 05, 2011 at 04:39:06PM +0300, Dor Laor wrote: >> >> On 07/05/2011 03:58 PM, Marcelo Tosatti wrote: >> >> >On Tue, Jul 05, 2011 at 01:40:08PM +0100, Stefan Hajnoczi wrote: >> >> >>On Tue, Jul 5, 2011 at 9:01 AM, Dor Laor =A0wrote= : >> >> >>>I tried to re-arrange all of the requirements and use cases using = this wiki >> >> >>>page: http://wiki.qemu.org/Features/LiveBlockMigration >> >> >>> >> >> >>>It would be the best to agree upon the most interesting use cases = (while we >> >> >>>make sure we cover future ones) and agree to them. >> >> >>>The next step is to set the interface for all the various verbs si= nce the >> >> >>>implementation seems to be converging. >> >> >> >> >> >>Live block copy was supposed to support snapshot merge. =A0I think = the >> >> >>current favored approach is to make the source image a backing file= to >> >> >>the destination image and essentially do image streaming. >> >> >> >> >> >>Using this mechanism for snapshot merge is tricky. =A0The COW file >> >> >>already uses the read-only snapshot base image. =A0So now we cannot >> >> >>trivally copy the COW file contents back into the snapshot base ima= ge >> >> >>using live block copy. >> >> > >> >> >It never did. Live copy creates a new image were both snapshot and >> >> >"current" are copied to. >> >> > >> >> >This is similar with image streaming. >> >> >> >> Not sure I realize what's bad to do in-place merge: >> >> >> >> Let's suppose we have this COW chain: >> >> >> >> =A0 base <-- s1 <-- s2 >> >> >> >> Now a live snapshot is created over s2, s2 becomes RO and s3 is RW: >> >> >> >> =A0 base <-- s1 <-- s2 <-- s3 >> >> >> >> Now we've done with s2 (post backup) and like to merge s3 into s2. >> >> >> >> With your approach we use live copy of s3 into newSnap: >> >> >> >> =A0 base <-- s1 <-- s2 <-- s3 >> >> =A0 base <-- s1 <-- newSnap >> >> >> >> When it is over s2 and s3 can be erased. >> >> The down side is the IOs for copying s2 data and the temporary >> >> storage. I guess temp storage is cheap but excessive IO are >> >> expensive. >> >> >> >> My approach was to collapse s3 into s2 and erase s3 eventually: >> >> >> >> before: base <-- s1 <-- s2 <-- s3 >> >> after: =A0base <-- s1 <-- s2 >> >> >> >> If we use live block copy using mirror driver it should be safe as >> >> long as we keep the ordering of new writes into s3 during the >> >> execution. >> >> Even a failure in the the middle won't cause harm since the >> >> management will keep using s3 until it gets success event. >> > >> > Well, it is more complicated than simply streaming into a new >> > image. I'm not entirely sure it is necessary. The common case is: >> > >> > base -> sn-1 -> sn-2 -> ... -> sn-n >> > >> > When n reaches a limit, you do: >> > >> > base -> merge-1 >> > >> > You're potentially copying similar amount of data when merging back in= to >> > a single image (and you can't easily merge multiple snapshots). >> > >> > If the amount of data thats not in 'base' is large, you create >> > leave a new external file around: >> > >> > base -> merge-1 -> sn-1 -> sn-2 ... -> sn-n >> > to >> > base -> merge-1 -> merge-2 >> > >> >> > >> >> >>It seems like snapshot merge will require dedicated code that reads >> >> >>the allocated clusters from the COW file and writes them back into = the >> >> >>base image. >> >> >> >> >> >>A very inefficient alternative would be to create a third image, th= e >> >> >>"merge" image file, which has the COW file as its backing file: >> >> >>snapshot (base) -> =A0cow -> =A0merge >> > >> > Remember there is a 'base' before snapshot, you don't copy the entire >> > image. >> >> One use case I have in mind is the Live Backup approach that Jagane >> has been developing. =A0Here the backup solution only creates a snapshot >> for the period of time needed to read out the dirty blocks. =A0Then the >> snapshot is deleted again and probably contains very little new data >> relative to the base image. =A0The backup solution does this operation >> every day. >> >> This is the pathalogical case for any approach that copies the entire >> base into a new file. =A0We could have avoided a lot of I/O by doing an >> in-place update. >> >> I want to make sure this works well. > > This use case does not fit the streaming scheme that has come up. Its a > completly different operation. > > IMO it should be implemented separately. Okay, not everything can fit into this one grand unified block copy/image streaming mechanism :). Stefan