From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap. Date: Tue, 30 Nov 2010 08:25:38 -0200 Message-ID: <20101130102538.GA20921@amt.cnet> References: <1290665220-26478-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <1290665220-26478-10-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <20101130011914.GA9015@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: aliguori@us.ibm.com, ananth@in.ibm.com, kvm@vger.kernel.org, ohmura.kei@lab.ntt.co.jp, dlaor@redhat.com, qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com, avi@redhat.com, psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com To: Yoshiaki Tamura Return-path: Received: from mx1.redhat.com ([209.132.183.28]:31686 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755010Ab0K3K0t (ORCPT ); Tue, 30 Nov 2010 05:26:49 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote: > 2010/11/30 Marcelo Tosatti : > > On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: > >> event-tap controls when to start FT transaction, and provides prox= y > >> functions to called from net/block devices. =A0While FT transactio= n, it > >> queues up net/block requests, and flush them when the transaction = gets > >> completed. > >> > >> Signed-off-by: Yoshiaki Tamura > >> Signed-off-by: OHMURA Kei > > > >> +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0BlockDriverState *bs, BlockRequest *reqs, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0int num_reqs, BlockDriverCompletionFunc *cb, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0void *opaque, bool is_multiwrite) > >> +{ > >> + =A0 =A0int i; > >> + > >> + =A0 =A0blk_req->num_reqs =3D num_reqs; > >> + =A0 =A0blk_req->num_cbs =3D num_reqs; > >> + =A0 =A0blk_req->device_name =3D qemu_strdup(bs->device_name); > >> + =A0 =A0blk_req->is_multiwrite =3D is_multiwrite; > >> + > >> + =A0 =A0for (i =3D 0; i < num_reqs; i++) { > >> + =A0 =A0 =A0 =A0blk_req->reqs[i].sector =3D reqs[i].sector; > >> + =A0 =A0 =A0 =A0blk_req->reqs[i].nb_sectors =3D reqs[i].nb_sector= s; > >> + =A0 =A0 =A0 =A0blk_req->reqs[i].qiov =3D reqs[i].qiov; > >> + =A0 =A0 =A0 =A0blk_req->reqs[i].cb =3D cb; > >> + =A0 =A0 =A0 =A0blk_req->reqs[i].opaque =3D opaque; > >> + =A0 =A0 =A0 =A0blk_req->cb[i] =3D reqs[i].cb; > >> + =A0 =A0 =A0 =A0blk_req->opaque[i] =3D reqs[i].opaque; > >> + =A0 =A0} > >> +} > > > > bdrv_aio_flush should also be logged, so that guest initiated flush= is > > respected on replay. >=20 > In the current implementation w/o flush logging, there might be > order inversion after replay? >=20 > Yoshi Yes, since a vcpu is allowed to continue after synchronization is scheduled via a bh. For virtio-blk, for example: 1) bdrv_aio_write, event queued. 2) bdrv_aio_flush 3) bdrv_aio_write, event queued. On replay, there is no flush between the two writes. Why can't synchronization be done from event-tap itself, synchronously, to avoid this kind of problem? The way you hook synchronization into savevm seems unclean. Perhaps better separation between standard savevm path and FT savevm would make it cleaner.