From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RM0Ls-0001IN-Fl for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RM0Lr-0003hA-15 for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28166) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RM0Lq-0003gw-P3 for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:14 -0400 Date: Thu, 3 Nov 2011 14:34:24 -0200 From: Marcelo Tosatti Message-ID: <20111103163424.GA21743@amt.cnet> References: <1319728975-6069-1-git-send-email-stefanha@linux.vnet.ibm.com> <1319728975-6069-4-git-send-email-stefanha@linux.vnet.ibm.com> <20111101180612.GA13205@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Anthony Liguori , Stefan Hajnoczi , qemu-devel@nongnu.org On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote: > On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti w= rote: > > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote: > >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_n= um, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0voi= d *buf, int max_sectors, int *n) > >> +{ > >> + =A0 =A0BlockDriverState *bs =3D s->common.bs; > >> + =A0 =A0int ret; > >> + > >> + =A0 =A0trace_stream_one_iteration(s, sector_num, max_sectors); > >> + > >> + =A0 =A0ret =3D bdrv_is_allocated(bs, sector_num, max_sectors, n); > >> + =A0 =A0if (ret < 0) { > >> + =A0 =A0 =A0 =A0return ret; > >> + =A0 =A0} > > > > bdrv_is_allocated is still synchronous? If so, there should be at lea= st > > a plan to make it asynchronous. >=20 > Yes, that's a good discussion to have. My thoughts are that > bdrv_is_allocated() should be executed in coroutine context. The > semantics are a little tricky because of parallel requests: >=20 > 1. If a write request is in progress when we do bdrv_is_allocated() we > might get back "unallocated" even though clusters are just being > allocated. > 2. If a TRIM request is in progress when we do bdrv_is_allocated() we > might get back "allocated" even though clusters are just being > deallocated. >=20 > In order to be reliable the caller needs to be aware of parallel > requests. I think it's correct to defer this problem to the caller. >=20 > In the case of image streaming we're not TRIM-safe, I haven't really > thought about it yet. But we are safe against parallel write requests > because there is serialization to prevent copy-on-read requests from > racing with write requests. >=20 > >> + =A0 =A0if (!ret) { > >> + =A0 =A0 =A0 =A0ret =3D stream_populate(bs, sector_num, *n, buf); > >> + =A0 =A0} > >> + =A0 =A0return ret; > >> +} > >> + > >> +static void coroutine_fn stream_run(void *opaque) > >> +{ > >> + =A0 =A0StreamBlockJob *s =3D opaque; > >> + =A0 =A0BlockDriverState *bs =3D s->common.bs; > >> + =A0 =A0int64_t sector_num, end; > >> + =A0 =A0int ret =3D 0; > >> + =A0 =A0int n; > >> + =A0 =A0void *buf; > >> + > >> + =A0 =A0buf =3D qemu_blockalign(bs, STREAM_BUFFER_SIZE); > >> + =A0 =A0s->common.len =3D bdrv_getlength(bs); > >> + =A0 =A0bdrv_get_geometry(bs, (uint64_t *)&end); > >> + > >> + =A0 =A0bdrv_set_zero_detection(bs, true); > >> + =A0 =A0bdrv_set_copy_on_read(bs, true); > > > > Should distinguish between stream initiated and user initiated settin= g > > of zero detection and cor (so that unsetting below does not clear > > user settings). >=20 > For zero detection I agree. >=20 > For copy-on-read it is questionable since once streaming is complete > it does not make sense to have copy-on-read enabled. >=20 > I will address this in the next revision and think more about the > copy-on-read case. >=20 > >> + > >> + =A0 =A0for (sector_num =3D 0; sector_num < end; sector_num +=3D n)= { > >> + =A0 =A0 =A0 =A0if (block_job_is_cancelled(&s->common)) { > >> + =A0 =A0 =A0 =A0 =A0 =A0break; > >> + =A0 =A0 =A0 =A0} > > > > If cancellation is seen here in the last loop iteration, > > bdrv_change_backing_file below should not be executed. >=20 > I documented this case in the QMP API. I'm not sure if it's possible > to guarantee that the operation isn't just completing as you cancel > it. Any blocking point between completion of the last iteration and > completing the operation is vulnerable to missing the cancel. It's > easier to explicitly say the operation might just have completed when > you canceled, rather than trying to protect the completion path. Do > you think it's a problem to have these loose semantics that I > described? No, that is ok. I'm referring to bdrv_change_backing_file() being executed without the entire image being streamed. "if (sector_num =3D=3D end && ret =3D=3D 0)" includes both all sectors be= ing=20 streamed and all sectors except the last iteration being streamed (due to job cancelled break). > >> + > >> + =A0 =A0 =A0 =A0/* TODO rate-limit */ > >> + =A0 =A0 =A0 =A0/* Note that even when no rate limit is applied we = need to yield with > >> + =A0 =A0 =A0 =A0 * no pending I/O here so that qemu_aio_flush() is = able to return. > >> + =A0 =A0 =A0 =A0 */ > >> + =A0 =A0 =A0 =A0co_sleep_ns(rt_clock, 0); > > > > How do you plan to implement rate limit? >=20 > It was implemented in the QED-specific image streaming series: >=20 > http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac460= 4fd23e4744f549a476 >=20 > That implementation works fine and is small but I'd like to reuse the > migration speed limit, if possible. That way we don't have 3 > different rate-limiting implementations in QEMU :). One possibility would be to create a "virtual" block device for streaming, sitting on top of the real block device. Then enforce block I/O limits on the virtual block device, the guest would remain accessing the real block device.