From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:53784)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RM0Ls-0001IN-Fl
	for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RM0Lr-0003hA-15
	for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:16 -0400
Received: from mx1.redhat.com ([209.132.183.28]:28166)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1RM0Lq-0003gw-P3
	for qemu-devel@nongnu.org; Thu, 03 Nov 2011 12:41:14 -0400
Date: Thu, 3 Nov 2011 14:34:24 -0200
From: Marcelo Tosatti <mtosatti@redhat.com>
Message-ID: <20111103163424.GA21743@amt.cnet>
References: <1319728975-6069-1-git-send-email-stefanha@linux.vnet.ibm.com>
	<1319728975-6069-4-git-send-email-stefanha@linux.vnet.ibm.com>
	<20111101180612.GA13205@amt.cnet>
	<CAJSP0QXFrJiq8OhHAZRKW63X4BapnLcQ-DwQjEHH6n8J3KD4Ag@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <CAJSP0QXFrJiq8OhHAZRKW63X4BapnLcQ-DwQjEHH6n8J3KD4Ag@mail.gmail.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 3/8] block: add image streaming block job
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, qemu-devel@nongnu.org

On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosatti@redhat.com> w=
rote:
> > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote:
> >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_n=
um,
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0voi=
d *buf, int max_sectors, int *n)
> >> +{
> >> + =A0 =A0BlockDriverState *bs =3D s->common.bs;
> >> + =A0 =A0int ret;
> >> +
> >> + =A0 =A0trace_stream_one_iteration(s, sector_num, max_sectors);
> >> +
> >> + =A0 =A0ret =3D bdrv_is_allocated(bs, sector_num, max_sectors, n);
> >> + =A0 =A0if (ret < 0) {
> >> + =A0 =A0 =A0 =A0return ret;
> >> + =A0 =A0}
> >
> > bdrv_is_allocated is still synchronous? If so, there should be at lea=
st
> > a plan to make it asynchronous.
>=20
> Yes, that's a good discussion to have.  My thoughts are that
> bdrv_is_allocated() should be executed in coroutine context.  The
> semantics are a little tricky because of parallel requests:
>=20
> 1. If a write request is in progress when we do bdrv_is_allocated() we
> might get back "unallocated" even though clusters are just being
> allocated.
> 2. If a TRIM request is in progress when we do bdrv_is_allocated() we
> might get back "allocated" even though clusters are just being
> deallocated.
>=20
> In order to be reliable the caller needs to be aware of parallel
> requests.  I think it's correct to defer this problem to the caller.
>=20
> In the case of image streaming we're not TRIM-safe, I haven't really
> thought about it yet.  But we are safe against parallel write requests
> because there is serialization to prevent copy-on-read requests from
> racing with write requests.
>=20
> >> + =A0 =A0if (!ret) {
> >> + =A0 =A0 =A0 =A0ret =3D stream_populate(bs, sector_num, *n, buf);
> >> + =A0 =A0}
> >> + =A0 =A0return ret;
> >> +}
> >> +
> >> +static void coroutine_fn stream_run(void *opaque)
> >> +{
> >> + =A0 =A0StreamBlockJob *s =3D opaque;
> >> + =A0 =A0BlockDriverState *bs =3D s->common.bs;
> >> + =A0 =A0int64_t sector_num, end;
> >> + =A0 =A0int ret =3D 0;
> >> + =A0 =A0int n;
> >> + =A0 =A0void *buf;
> >> +
> >> + =A0 =A0buf =3D qemu_blockalign(bs, STREAM_BUFFER_SIZE);
> >> + =A0 =A0s->common.len =3D bdrv_getlength(bs);
> >> + =A0 =A0bdrv_get_geometry(bs, (uint64_t *)&end);
> >> +
> >> + =A0 =A0bdrv_set_zero_detection(bs, true);
> >> + =A0 =A0bdrv_set_copy_on_read(bs, true);
> >
> > Should distinguish between stream initiated and user initiated settin=
g
> > of zero detection and cor (so that unsetting below does not clear
> > user settings).
>=20
> For zero detection I agree.
>=20
> For copy-on-read it is questionable since once streaming is complete
> it does not make sense to have copy-on-read enabled.
>=20
> I will address this in the next revision and think more about the
> copy-on-read case.
>=20
> >> +
> >> + =A0 =A0for (sector_num =3D 0; sector_num < end; sector_num +=3D n)=
 {
> >> + =A0 =A0 =A0 =A0if (block_job_is_cancelled(&s->common)) {
> >> + =A0 =A0 =A0 =A0 =A0 =A0break;
> >> + =A0 =A0 =A0 =A0}
> >
> > If cancellation is seen here in the last loop iteration,
> > bdrv_change_backing_file below should not be executed.
>=20
> I documented this case in the QMP API.  I'm not sure if it's possible
> to guarantee that the operation isn't just completing as you cancel
> it.  Any blocking point between completion of the last iteration and
> completing the operation is vulnerable to missing the cancel.  It's
> easier to explicitly say the operation might just have completed when
> you canceled, rather than trying to protect the completion path.  Do
> you think it's a problem to have these loose semantics that I
> described?

No, that is ok. I'm referring to bdrv_change_backing_file() being
executed without the entire image being streamed.

"if (sector_num =3D=3D end && ret =3D=3D 0)" includes both all sectors be=
ing=20
streamed and all sectors except the last iteration being streamed (due
to job cancelled break).

> >> +
> >> + =A0 =A0 =A0 =A0/* TODO rate-limit */
> >> + =A0 =A0 =A0 =A0/* Note that even when no rate limit is applied we =
need to yield with
> >> + =A0 =A0 =A0 =A0 * no pending I/O here so that qemu_aio_flush() is =
able to return.
> >> + =A0 =A0 =A0 =A0 */
> >> + =A0 =A0 =A0 =A0co_sleep_ns(rt_clock, 0);
> >
> > How do you plan to implement rate limit?
>=20
> It was implemented in the QED-specific image streaming series:
>=20
> http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac460=
4fd23e4744f549a476
>=20
> That implementation works fine and is small but I'd like to reuse the
> migration speed limit, if possible.  That way we don't have 3
> different rate-limiting implementations in QEMU :).

One possibility would be to create a "virtual" block device for
streaming, sitting on top of the real block device. Then enforce block
I/O limits on the virtual block device, the guest would remain accessing
the real block device.