From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=33754 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PseQs-0000j1-5c for qemu-devel@nongnu.org; Thu, 24 Feb 2011 11:52:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PseOJ-00005Y-GZ for qemu-devel@nongnu.org; Thu, 24 Feb 2011 11:50:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54393) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PseOJ-00005M-59 for qemu-devel@nongnu.org; Thu, 24 Feb 2011 11:50:11 -0500 Date: Thu, 24 Feb 2011 13:39:33 -0300 From: Marcelo Tosatti Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy Message-ID: <20110224163933.GA10165@amt.cnet> References: <4D642181.4080509@codemonkey.ws> <20110222210735.GA9372@amt.cnet> <4D64266A.3060106@codemonkey.ws> <20110222230935.GA11082@amt.cnet> <4D644343.4050800@codemonkey.ws> <4D65051A.6070707@redhat.com> <20110223174917.GA4630@amt.cnet> <4D661DA2.2050102@redhat.com> <20110224151411.GA3794@amt.cnet> <4D667914.4090602@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D667914.4090602@redhat.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Jes.Sorensen@redhat.com, qemu-devel@nongnu.org On Thu, Feb 24, 2011 at 05:28:20PM +0200, Avi Kivity wrote: > On 02/24/2011 05:14 PM, Marcelo Tosatti wrote: > >> >> The problem with qemu config files is that it splits the > >> >> authoritative source of where images are stored into two. Is it in > >> >> the management tool's database or is it in qemu's config file? > >> >> > >> >> For the problem at hand, one solution is to make qemu stop after the > >> >> copy, and then management can issue an additional command to > >> >> rearrange the disk and resume the guest. A drawback here is that if > >> >> management dies, the guest is stopped until it restarts. We also > >> >> make management latency guest visible, even if it doesn't die at an > >> >> inconvenient place. > >> >> > >> >> An alternative approach is to have the copy be performed by a new > >> >> layered block format driver: > >> >> > >> >> - create a new image, type = live-copy, containing three pieces of > >> >> information > >> >> - source image > >> >> - destination image > >> >> - copy state (initially nothing is copied) > >> >> - tell qemu switch to the new image > > > >There is a similar situation with atomicity here. Mgmt app requests a > >switch and dies immediately, before receiving the command reply. Qemu > >crashes. Which image is the uptodate one, source or live-copy? > > live-copy (or it's new name, RAID-1). Once you've created it it is > equivalent to source. Once it switches to state=synced you can > switch back to either source or destination (I guess by telling qemu > to detach the one you don't want first, so it falls back to > state=degraded). > > > > >> You could hot-unplug the image and hot-plug it later (continuing the > >> copy with qemu-img), > > > >Then there's no need for live copy. qemu-img does that already. > > It will start from the beginning. > > >> or live migrate it. > > > >You can live migrate (but not live migrate with live block migration) > >with live copy in progress, its just that its not supported yet. > > A RAID-1 driver will work with block live migration too. Nobody cares about that one (block copy and block live migration in progress). In fact, i doubt anybody cares about parallel block migration and block copy either (mgmt can easily cope with that limitation, stop live copy if migration is needed). > >> In fact I think a qemu RAID-1 driver > >> removes the restriction that you can't live-migrate and live-copy > >> simultaneously. > > > >As mentioned its just an implementation detail. I meant the restriction of live-migrate and live-copy in parallel. > I think it's an important one. It moves the code from the generic > layer to a driver. Well it is a nice idea, but devil is in the details: - Guest writes must invalidate in progress live copy reads and live copy writes, so you have to maintain a queue for live copy AIO. - Live copy writes must be aware of in progress guest AIO writes, so you have to maintain a queue for guest AIO. - Guest writes must be mirrored to source and destination. - qemu-img must handle this new format. So my view ATM is that this is overengineering. > It allows generic RAID-1 functionality (for high > availablity). Isnt HA responsability of the host filesystem?