From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:46869) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R8Tbg-0000c0-Ff for qemu-devel@nongnu.org; Tue, 27 Sep 2011 05:05:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R8Tbe-0004oZ-W2 for qemu-devel@nongnu.org; Tue, 27 Sep 2011 05:05:40 -0400 Received: from mail-wy0-f173.google.com ([74.125.82.173]:45335) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R8Tbe-0004oK-Re for qemu-devel@nongnu.org; Tue, 27 Sep 2011 05:05:38 -0400 Received: by wyh22 with SMTP id 22so7739267wyh.4 for ; Tue, 27 Sep 2011 02:05:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110923155726.GA23088@stefanha-thinkpad.localdomain> References: <20110923155726.GA23088@stefanha-thinkpad.localdomain> Date: Tue, 27 Sep 2011 17:05:37 +0800 Message-ID: From: Zhi Yong Wu Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] Generic image streaming List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Marcelo Tosatti , qemu-devel@nongnu.org, Zhi Yong Wu On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi wrote: > Here is my generic image streaming branch, which aims to provide a way > to copy the contents of a backing file into an image file of a running > guest without requiring specific support in the various block drivers > (e.g. =A0qcow2, qed, vmdk): > > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming= -api Sorry, i missed this support logic. thanks. > > The tree does not provide full image streaming yet but I'd like to > discuss the approach taken in the code. =A0Here are the main points: > > The image streaming API is available through HMP and QMP commands. =A0Whe= n > streaming is started on a block device a coroutine is created to do the > background I/O work. =A0The coroutine can be cancelled. > > While the coroutine copies data from the backing file into the image > file, the guest may be performing I/O to the image file. =A0Guest reads d= o > not conflict with streaming but guest writes require special handling. > If the guest writes to a region of the image file that we are currently > copying, then there is the potential to clobber the guest write with old > data from the backing file. > > Previously I solved this in a QED-specific way by taking advantage of > the serialization of allocating write requests. =A0In order to do this > generically we need to track in-flight requests and have the ability to > queue I/O. =A0Guest writes that affect an in-flight streaming copy > operation must wait for that operation to complete before being issued. > Streaming copy operations must skip overlapping regions of guest writes. > > One big difference to the QED image streaming implementation is that > this generic implementation is not based on copy-on-read operations. > Instead we do a sequence of bdrv_is_allocated() to find regions for > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to > populate the image file. > > It turns out that generic copy-on-read is not an attractive operation > because it requires using bounce buffers for every request. =A0Kevin > pointed out the case where a guest performs a read and pokes the data > buffer before the read completes, copy-on-read would write out the > modified memory into the image file unless we use a bounce buffer. > > There are a few pieces missing in my tree, which have mostly been solved > in other places and just need to be reused: > 1. Arbitration between guest and streaming requests (this is the only > =A0 real new thing). > 2. Efficient zero handling (skip writing those regions or mark them as > =A0 zero clusters). > 3. Queuing/dependencies when arbitration decides a request must wait. > =A0 I'm taking a look at reusing Zhi Yong's block queue. > 4. Rate-limiting to ensure streaming I/O does not impact the guest. > =A0 Already exists in the QED-specific patches, it may make sense to > =A0 extract common code that both migration and the block layer can use. > > Ideas or questions? > > Stefan > > --=20 Regards, Zhi Yong Wu