From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:42849) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qgh3z-0005BV-Rk for qemu-devel@nongnu.org; Tue, 12 Jul 2011 13:48:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qgh3w-0006DX-5M for qemu-devel@nongnu.org; Tue, 12 Jul 2011 13:48:03 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.142]:41417) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qgh3v-0006DQ-Tx for qemu-devel@nongnu.org; Tue, 12 Jul 2011 13:48:00 -0400 Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by e2.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6CHQv9m016750 for ; Tue, 12 Jul 2011 13:26:57 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p6CHlvM71273866 for ; Tue, 12 Jul 2011 13:47:57 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6CHlucW028870 for ; Tue, 12 Jul 2011 13:47:56 -0400 Message-ID: <4E1C88C5.1010700@us.ibm.com> Date: Tue, 12 Jul 2011 12:47:49 -0500 From: Adam Litke MIME-Version: 1.0 References: <4E131D0D.307@redhat.com> <20110711125432.GA19686@stefanha-thinkpad.localdomain> <20110711163226.GA10924@amt.cnet> <4E1C009C.1010408@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] live block copy/stream/snapshot discussion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Anthony Liguori , Dor Laor , Stefan Hajnoczi , Marcelo Tosatti , qemu-devel , Avi Kivity On 07/12/2011 10:45 AM, Stefan Hajnoczi wrote: > On Tue, Jul 12, 2011 at 9:06 AM, Kevin Wolf wrote: >> Am 11.07.2011 18:32, schrieb Marcelo Tosatti: >>> On Mon, Jul 11, 2011 at 03:47:15PM +0100, Stefan Hajnoczi wrote: >>>> Kevin, Marcelo, >>>> I'd like to reach agreement on the QMP/HMP APIs for live block copy >>>> and image streaming. Libvirt has acked the image streaming APIs that >>>> Adam proposed and I think they are a good fit for the feature. I have >>>> described that API below for your review (it's exactly what the QED >>>> Image Streaming patches provide). >>>> >>>> Marcelo: Are you happy with this API for live block copy? Also please >>>> take a look at the switch command that I am proposing. >>>> >>>> Image streaming API >>>> =================== >>>> >>>> For leaf images with copy-on-read semantics, the stream commands allow the user >>>> to populate local blocks by manually streaming them from the backing image. >>>> Once all blocks have been streamed, the dependency on the original backing >>>> image can be removed. Therefore, stream commands can be used to implement >>>> post-copy live block migration and rapid deployment. >>>> >>>> The block_stream command can be used to stream a single cluster, to >>>> start streaming the entire device, and to cancel an active stream. It >>>> is easiest to allow the block_stream command to manage streaming for the >>>> entire device but a managent tool could use single cluster mode to >>>> throttle the I/O rate. >> >> As discussed earlier, having the management send requests for each >> single cluster doesn't make any sense at all. It wouldn't only throttle >> the I/O rate but bring it down to a level that makes it unusable. What >> you really want is to allow the management to give us a range (offset + >> length) that qemu should stream. > > I feel that an iteration interface is problematic whether the > management tool or QEMU decide what to stream. Let's have just the > background streaming operation. > > The problem with byte ranges is two-fold. The management tool doesn't > know which regions of the image are allocated so it may do a lot of > nop calls to already-allocated regions with no intelligence as to > where the next sensible offset for streaming is. Secondly, because > the progress and performance of image streaming depend largely on > whether or not clusters are allocated (it is very fast when a cluster > is already allocated and we have no work to do), offsets are bad > indicators of progress to the user. I think it's best not to expose > these details to the management tool at all. > > The only reason for the iteration interface was to punt I/O throttling > to the management tool. I think it would be easier to just throttle > inside the streaming function. > > Kevin: Are you happy with dropping the iteration interface? > Adam: Is there a libvirt requirement for iteration or could we support > background copy only? There is no hard requirement for iteration in libvirt. However, I think there is a requirement that we report some sort of progress to an end user. These operations can easily take many minutes (even hours) and such a long-running operation needs to report progress. I think the current information returned by 'query-block-stream' is appropriate for this purpose and should definitely be maintained. -- Adam Litke IBM Linux Technology Center