From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42237) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOoIY-0003q9-3S for qemu-devel@nongnu.org; Thu, 28 Jan 2016 10:15:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aOoIT-0006ml-VY for qemu-devel@nongnu.org; Thu, 28 Jan 2016 10:15:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47962) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOoIT-0006mb-OU for qemu-devel@nongnu.org; Thu, 28 Jan 2016 10:15:45 -0500 Date: Thu, 28 Jan 2016 15:15:43 +0000 From: Stefan Hajnoczi Message-ID: <20160128151543.GH9825@stefanha-x1.localdomain> References: <1452676712-24239-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1452676712-24239-8-git-send-email-xiecl.fnst@cn.fujitsu.com> <20160127144644.GL26163@stefanha-x1.localdomain> <56A96B34.9090906@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="kHRd/tpU31Zn62xO" Content-Disposition: inline In-Reply-To: <56A96B34.9090906@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wen Congyang Cc: Kevin Wolf , Changlong Xie , Fam Zheng , zhanghailiang , fnstml-hwcolo@cn.fujitsu.com, qemu devel , Max Reitz , Gonglei , Paolo Bonzini --kHRd/tpU31Zn62xO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: > On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: > > On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: > >> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **= errp) > >> +{ > >> + Error *local_err =3D NULL; > >> + int ret; > >> + > >> + if (!s->secondary_disk->job) { > >> + error_setg(errp, "Backup job is cancelled unexpectedly"); > >> + return; > >> + } > >> + > >> + block_job_do_checkpoint(s->secondary_disk->job, &local_err); > >> + if (local_err) { > >> + error_propagate(errp, local_err); > >> + return; > >> + } > >> + > >> + ret =3D s->active_disk->drv->bdrv_make_empty(s->active_disk); > >=20 > > What happens to in-flight requests to the active and hidden disks? >=20 > we MUST call do_checkpoint() when the vm is stopped. Please document the environment under which the block replication callback functions run. I'm concerned that the bdrv_drain_all() in vm_stop() can take a long time if the disk is slow/failing. bdrv_drain_all() blocks until all in-flight I/O requests have completed. What does the Primary do if the Secondary becomes unresponsive? > >> + switch (s->mode) { > >> + case REPLICATION_MODE_PRIMARY: > >> + break; > >> + case REPLICATION_MODE_SECONDARY: > >> + s->active_disk =3D bs->file->bs; > >> + if (!bs->file->bs->backing) { > >> + error_setg(errp, "Active disk doesn't have backing file"); > >> + return; > >> + } > >> + > >> + s->hidden_disk =3D s->active_disk->backing->bs; > >> + if (!s->hidden_disk->backing) { > >> + error_setg(errp, "Hidden disk doesn't have backing file"); > >> + return; > >> + } > >> + > >> + s->secondary_disk =3D s->hidden_disk->backing->bs; > >> + if (!s->secondary_disk->blk) { > >> + error_setg(errp, "The secondary disk doesn't have block b= ackend"); > >> + return; > >> + } > >=20 > > Kevin: Is code allowed to stash away BlockDriverState pointers for > > convenience or should it keep the BdrvChild pointers instead? In order > > for replication to work as expected, the graph shouldn't change but for > > consistency maybe BdrvChild is best. I asked Kevin about this on IRC and he agreed that BdrvChild should be used instead of holding on to BlockDriverState * pointers. Although these pointers will not change during replication (if the op blockers are set up correctly), it's more consistent and certainly safer to go through BdrvChild. > >> + /* start backup job now */ > >> + error_setg(&s->blocker, > >> + "block device is in use by internal backup job"); > >> + bdrv_op_block_all(s->top_bs, s->blocker); > >> + bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocke= r); > >> + bdrv_ref(s->hidden_disk); > >=20 > > Why is the explicit reference to hidden_disk (but not secondary_disk or > > active_disk) is necessary? >=20 > IIRC, we should reference the backup target before calling backup_start(), > and we will reference the backup source in backup_start(). I'm not sure why this is necessary since they are part of the backing chain. If it is necessary, please add a comment so it's clear why the reference is being taken. Stefan --kHRd/tpU31Zn62xO Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJWqjCfAAoJEJykq7OBq3PIgZ0IAJ2nr7fLziiz+vN0stvB4+eu W8a9ANDF542cMxTFQpPkI3xOZY6isnUtjsPe/cdW7VlhGKOT8IEzbZAPH0HBw5aY HvB7DkSFUpzelG2V9pE2fmA49eHvgURTiol5gVxqbHnvmFQjfjIPEqZIpQR7ZCLu 9uin8xDHBpDidHg9Lav9VzzN1MwfEhVrTe83IlAaB0tD0+lJVoWwvv6zyfVFroa+ dS2kn6TGp2FHuVCd+PxepePaswG4oci3nrQa0T8N9Cf0B+htR5Th95rrAgRnBouP yufxLX4EHki4jK4tc34ZQt0zMOoDEwucnXBOF0vWNwJHW3MAO7Hj/axJSo6nn4s= =pFCc -----END PGP SIGNATURE----- --kHRd/tpU31Zn62xO--