From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34525) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLJ7v-0004YO-EG for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:13:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WLJ7q-0001Zr-Cd for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:13:19 -0500 Received: from lnantes-156-75-100-125.w80-12.abo.wanadoo.fr ([80.12.84.125]:55789 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLJ7q-0001Zc-3b for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:13:14 -0500 Date: Wed, 5 Mar 2014 22:13:13 +0100 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140305211313.GA5239@irqsave.net> References: <1394032700-1642-1-git-send-email-benoit.canet@irqsave.net> <1394032700-1642-2-git-send-email-benoit.canet@irqsave.net> <53178F14.80402@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <53178F14.80402@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/2] block: Add node-name and to-replace-node-name arguments to drive-mirror. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: =?iso-8859-1?Q?Beno=EEt?= Canet , kwolf@redhat.com, Fam Zheng , qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com, pbonzini@redhat.com The Wednesday 05 Mar 2014 =E0 13:54:44 (-0700), Eric Blake wrote : > On 03/05/2014 08:18 AM, Beno=EEt Canet wrote: > > node-name give a name to the created BDS and register it in the node = graph. >=20 > s/give/gives/ s/register/registers/ >=20 > >=20 > > to-replace-node-name can be used when drive-mirror is called with syn= c=3Dfull. > >=20 > > The purpose of these fields is to be able to reconstruct and replace = a broken > > quorum file. >=20 > There may be other uses possible from this, but the idea makes sense. >=20 > >=20 > > drive-mirror will bdrv_swap the new BDS named node-name with the one > > pointed by to-replace-node-name when the mirroring is finished. > >=20 > > Signed-off-by: Benoit Canet > > --- >=20 > > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque= ) > > s->common.len =3D bdrv_getlength(bs); > > if (s->common.len <=3D 0) { > > block_job_completed(&s->common, s->common.len); > > + /* Fam's new blocker API should be used here. */ > > + if (s->to_replace) { >=20 > Who is getting merged first? It seems like this should be fixed before > taking this patch, if Fam's work is indeed closer to inclusion. At any > rate, the comment seems odd - a year from now, Fam's work won't be new. I would really like to get this merged first before 2.0 reach hard freeze since quorum is not very usable in its current state. This particular comment was here to inform reviewer of the work I plan to= do once Fam's series is merged. I would do the work in 2.1. >=20 > > + BlockDriverState *to_replace; > > + /* if a to-replace-node-name was specified use it's bs */ >=20 > s/it's/its/ - the rule is anywhere that you see "it's", re-read the > sentence with "it is" and see if it still makes sense; if not, you mean= t > "its". Thanks for the rule I just used it above :) >=20 >=20 > > =20 > > static void mirror_start_job(BlockDriverState *bs, BlockDriverState = *target, > > + BlockDriverState *to_replace, > > int64_t speed, int64_t granularity, >=20 > Pre-existing, but as long as you are touching this, you might as well > fix indentation of the other lines in the same signature. >=20 > > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, con= st char *target, > > return; > > } > > =20 > > + /* if we are planning to replace a graph node name the code shou= ld do a full > > + * mirror of the source image > > + */ > > + if (has_to_replace_node_name && sync !=3D MIRROR_SYNC_MODE_FULL)= { > > + error_setg(errp, > > + "to-replace-node-name can only be used with sync=3D= full"); > > + return; > > + } >=20 > I'm not sure I follow this restriction. What's to prevent me from doin= g > a shallow mirror coupled with the mode of reusing an existing file that > already points to a sane backing file, rather than forcing a full sync? > That is, why not let this command be a fully-generic swap command, > where the semantics are that as long as my old and new image have the > same contents from the guest's perspective (or I'm replacing a broken > file out of a quorum, and the new image has the same contents as the > quorum majority), then we are just updating qemu to point to a new BDS. >=20 > On the other hand, back around the 1.5 timeframe, downstream RHEL tried > to add a 'drive-reopen' command that did just that - replaced the > backing file of a guest's disk with an arbitrary other file. But it wa= s > so powerful and risky that at the time upstream finally added > 'transaction' support, we decided to go with the simpler > 'drive-mirror/block-job-complete' sequence as the only supported way to > cause qemu to associate a different BDS with a guest image. Of course, > things have advanced since then, so maybe we finally are at a point > where we want to expose a generic reopen command that can swap out > arbitrary named nodes without interrupting guest services, but now I'm > starting to wonder if it should be a new command instead of adding > optional arguments to the existing drive-mirror. I choose to hook into drive-mirror because it is supposed to do the swap = at the very moment the two files converge. I though it would be harder to implement with a separate command because = new writes could obsolete the mirror after drive-mirror complete and before t= he swap command is launched. >=20 > > +++ b/qapi-schema.json > > @@ -2140,6 +2140,14 @@ > > # @format: #optional the format of the new destination, default is t= o > > # probe if @mode is 'existing', else the format of the sour= ce > > # > > +# @new-node-name: #optional the new block driver state node name in = the graph > > +# (Since 2.1) >=20 > Ah, so you're not trying to get this in before 2.0 freeze - which means > we have more time to think about the implications. I remembered after sending the series that 2.0 was not in hard freeze yet= and that we have a small chance of shipping quorum in an usable state. >=20 > > +# > > +# @to-replace-node-name: #optional with sync=3Dfull graph node name = to be > > +# replaced by the new image when a whole imag= e copy is > > +# done. This can be used to repair broken Quo= rum files. > > +# (Since 2.1) >=20 > This naming feels long, but I'm not sure if I have a better suggestion. > It looks like you only allow swapping out one quorum file per > drive-mirror - but what if I have a 3/5 quorum and want to swap out two > files at once? Also, how does this interact with the 'transaction' com= mand? I think that we should be able to launch multiple separate drive-mirror operation. I don't know about the transaction. >=20 > > ## > > { 'command': 'drive-mirror', > > 'data': { 'device': 'str', 'target': 'str', '*format': 'str', > > - 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', > > - '*speed': 'int', '*granularity': 'uint32', > > - '*buf-size': 'int', '*on-source-error': 'BlockdevOnError= ', > > + '*new-node-name': 'str', '*to-replace-node-name': 'str', > > + 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*spe= ed': 'int', > > + '*granularity': 'uint32', '*buf-size': 'int', > > + '*on-source-error': 'BlockdevOnError', >=20 > Why the reindent of existing options? The first modified line was exceeding the 80 characters limit. >=20 > --=20 > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >=20