From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:54858) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R87WD-0007uK-Tb for qemu-devel@nongnu.org; Mon, 26 Sep 2011 05:30:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R87WB-0001xn-SU for qemu-devel@nongnu.org; Mon, 26 Sep 2011 05:30:33 -0400 Received: from mtagate3.uk.ibm.com ([194.196.100.163]:35877) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R87WB-0001xX-Iq for qemu-devel@nongnu.org; Mon, 26 Sep 2011 05:30:31 -0400 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate3.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p8Q9UT6v002469 for ; Mon, 26 Sep 2011 09:30:29 GMT Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p8Q9USfB2531416 for ; Mon, 26 Sep 2011 10:30:29 +0100 Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p8Q9USbq003483 for ; Mon, 26 Sep 2011 03:30:28 -0600 Date: Mon, 26 Sep 2011 10:30:27 +0100 From: Stefan Hajnoczi Message-ID: <20110926093027.GA8923@stefanha-thinkpad.localdomain> References: <20110923155726.GA23088@stefanha-thinkpad.localdomain> <20110926075556.GB6455@stefanha-thinkpad.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] Generic image streaming List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhi Yong Wu Cc: Kevin Wolf , Stefan Hajnoczi , Marcelo Tosatti , qemu-devel@nongnu.org, Zhi Yong Wu On Mon, Sep 26, 2011 at 05:11:00PM +0800, Zhi Yong Wu wrote: > On Mon, Sep 26, 2011 at 3:55 PM, Stefan Hajnoczi w= rote: > > On Mon, Sep 26, 2011 at 01:32:34PM +0800, Zhi Yong Wu wrote: > >> On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi > >> wrote: > >> > Here is my generic image streaming branch, which aims to provide a= way > >> > to copy the contents of a backing file into an image file of a run= ning > >> > guest without requiring specific support in the various block driv= ers > >> > (e.g. =A0qcow2, qed, vmdk): > >> > > >> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-st= reaming-api > >> > > >> > The tree does not provide full image streaming yet but I'd like to > >> > discuss the approach taken in the code. =A0Here are the main point= s: > >> > > >> > The image streaming API is available through HMP and QMP commands.= =A0When > >> > streaming is started on a block device a coroutine is created to d= o the > >> > background I/O work. =A0The coroutine can be cancelled. > >> > > >> > While the coroutine copies data from the backing file into the ima= ge > >> > file, the guest may be performing I/O to the image file. =A0Guest = reads do > >> > not conflict with streaming but guest writes require special handl= ing. > >> > If the guest writes to a region of the image file that we are curr= ently > >> > copying, then there is the potential to clobber the guest write wi= th old > >> > data from the backing file. > >> > > >> > Previously I solved this in a QED-specific way by taking advantage= of > >> > the serialization of allocating write requests. =A0In order to do = this > >> > generically we need to track in-flight requests and have the abili= ty to > >> > queue I/O. =A0Guest writes that affect an in-flight streaming copy > >> > operation must wait for that operation to complete before being is= sued. > >> > Streaming copy operations must skip overlapping regions of guest w= rites. > >> > > >> > One big difference to the QED image streaming implementation is th= at > >> > this generic implementation is not based on copy-on-read operation= s. > >> > Instead we do a sequence of bdrv_is_allocated() to find regions fo= r > >> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order= to > >> > populate the image file. > >> > > >> > It turns out that generic copy-on-read is not an attractive operat= ion > >> > because it requires using bounce buffers for every request. =A0Kev= in > >> bounce buffers =3D=3D buffer ring? > > > > A bounce buffer is a temporary buffer that is used because the actual > > data buffer is not addressable or cannot be directly accessed for som= e > > other reason. =A0In this case it's because the guest should see read > > semantics and not find that writes to its read data buffer result in > > writes to disk. > > > >> > pointed out the case where a guest performs a read and pokes the d= ata > >> > buffer before the read completes, copy-on-read would write out the > >> > modified memory into the image file unless we use a bounce buffer. > Sorry, to be honest, i don't know which scenario will cause guest > modified memory is written out into image file. I showed the scenario in the steps posted below: > >> Can you elaborate this? > > > > 1. Guest issues a read request. > > 2. QEMU issues host read request as first step in copy-on-read. > > 3. Host read request completes... > > 4. Guest overwrites its data buffer before QEMU acknowledges request > > =A0 completion. > > 5. ...QEMU issues host write request. > > 6. Host completes write request and QEMU acknowledges guest read > > =A0 completion. > Good, thanks. > > > > What happened is that we populated the image file with data from gues= t > > memory that does not match what is in the backing file. =A0The guest > How to find that the two data don't match? Reread what I posted and think about the case where a QEMU read buffer (the "bounce buffer") is used in step 2. In that case the guest cannot tamper with the data buffer while performing copy-on-read. Stefan