From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37302) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLJN7-0005Zm-V0 for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:29:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WLJN3-00077Q-FP for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:29:01 -0500 Received: from lnantes-156-75-100-125.w80-12.abo.wanadoo.fr ([80.12.84.125]:55793 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WLJN3-00077D-74 for qemu-devel@nongnu.org; Wed, 05 Mar 2014 16:28:57 -0500 Date: Wed, 5 Mar 2014 22:28:56 +0100 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140305212856.GA5450@irqsave.net> References: <1394032700-1642-1-git-send-email-benoit.canet@irqsave.net> <1394032700-1642-2-git-send-email-benoit.canet@irqsave.net> <53178F14.80402@redhat.com> <20140305211313.GA5239@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20140305211313.GA5239@irqsave.net> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/2] block: Add node-name and to-replace-node-name arguments to drive-mirror. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?iso-8859-1?Q?Beno=EEt?= Canet Cc: kwolf@redhat.com, Fam Zheng , qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com, pbonzini@redhat.com The Wednesday 05 Mar 2014 =E0 22:13:13 (+0100), Beno=EEt Canet wrote : > The Wednesday 05 Mar 2014 =E0 13:54:44 (-0700), Eric Blake wrote : > > On 03/05/2014 08:18 AM, Beno=EEt Canet wrote: > > > node-name give a name to the created BDS and register it in the nod= e graph. > >=20 > > s/give/gives/ s/register/registers/ > >=20 > > >=20 > > > to-replace-node-name can be used when drive-mirror is called with s= ync=3Dfull. > > >=20 > > > The purpose of these fields is to be able to reconstruct and replac= e a broken > > > quorum file. > >=20 > > There may be other uses possible from this, but the idea makes sense. > >=20 > > >=20 > > > drive-mirror will bdrv_swap the new BDS named node-name with the on= e > > > pointed by to-replace-node-name when the mirroring is finished. > > >=20 > > > Signed-off-by: Benoit Canet > > > --- > >=20 > > > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaq= ue) > > > s->common.len =3D bdrv_getlength(bs); > > > if (s->common.len <=3D 0) { > > > block_job_completed(&s->common, s->common.len); > > > + /* Fam's new blocker API should be used here. */ > > > + if (s->to_replace) { > >=20 > > Who is getting merged first? It seems like this should be fixed befo= re > > taking this patch, if Fam's work is indeed closer to inclusion. At a= ny > > rate, the comment seems odd - a year from now, Fam's work won't be ne= w. >=20 > I would really like to get this merged first before 2.0 reach hard free= ze > since quorum is not very usable in its current state. >=20 > This particular comment was here to inform reviewer of the work I plan = to do > once Fam's series is merged. >=20 > I would do the work in 2.1. >=20 > >=20 > > > + BlockDriverState *to_replace; > > > + /* if a to-replace-node-name was specified use it's bs */ > >=20 > > s/it's/its/ - the rule is anywhere that you see "it's", re-read the > > sentence with "it is" and see if it still makes sense; if not, you me= ant > > "its". >=20 > Thanks for the rule I just used it above :) >=20 > >=20 > >=20 > > > =20 > > > static void mirror_start_job(BlockDriverState *bs, BlockDriverStat= e *target, > > > + BlockDriverState *to_replace, > > > int64_t speed, int64_t granularity, > >=20 > > Pre-existing, but as long as you are touching this, you might as well > > fix indentation of the other lines in the same signature. > >=20 > > > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, c= onst char *target, > > > return; > > > } > > > =20 > > > + /* if we are planning to replace a graph node name the code sh= ould do a full > > > + * mirror of the source image > > > + */ > > > + if (has_to_replace_node_name && sync !=3D MIRROR_SYNC_MODE_FUL= L) { > > > + error_setg(errp, > > > + "to-replace-node-name can only be used with syn= c=3Dfull"); > > > + return; > > > + } > >=20 > > I'm not sure I follow this restriction. What's to prevent me from do= ing > > a shallow mirror coupled with the mode of reusing an existing file th= at > > already points to a sane backing file, rather than forcing a full syn= c? > > That is, why not let this command be a fully-generic swap command, > > where the semantics are that as long as my old and new image have the > > same contents from the guest's perspective (or I'm replacing a broken > > file out of a quorum, and the new image has the same contents as the > > quorum majority), then we are just updating qemu to point to a new BD= S. > >=20 > > On the other hand, back around the 1.5 timeframe, downstream RHEL tri= ed > > to add a 'drive-reopen' command that did just that - replaced the > > backing file of a guest's disk with an arbitrary other file. But it = was > > so powerful and risky that at the time upstream finally added > > 'transaction' support, we decided to go with the simpler > > 'drive-mirror/block-job-complete' sequence as the only supported way = to > > cause qemu to associate a different BDS with a guest image. Of cours= e, > > things have advanced since then, so maybe we finally are at a point > > where we want to expose a generic reopen command that can swap out > > arbitrary named nodes without interrupting guest services, but now I'= m > > starting to wonder if it should be a new command instead of adding > > optional arguments to the existing drive-mirror. >=20 > I choose to hook into drive-mirror because it is supposed to do the swa= p at the > very moment the two files converge. >=20 > I though it would be harder to implement with a separate command becaus= e new > writes could obsolete the mirror after drive-mirror complete and before= the > swap command is launched. >=20 > >=20 > > > +++ b/qapi-schema.json > > > @@ -2140,6 +2140,14 @@ > > > # @format: #optional the format of the new destination, default is= to > > > # probe if @mode is 'existing', else the format of the so= urce > > > # > > > +# @new-node-name: #optional the new block driver state node name i= n the graph > > > +# (Since 2.1) > >=20 > > Ah, so you're not trying to get this in before 2.0 freeze - which mea= ns > > we have more time to think about the implications. >=20 > I remembered after sending the series that 2.0 was not in hard freeze y= et and > that we have a small chance of shipping quorum in an usable state. >=20 > >=20 > > > +# > > > +# @to-replace-node-name: #optional with sync=3Dfull graph node nam= e to be > > > +# replaced by the new image when a whole im= age copy is > > > +# done. This can be used to repair broken Q= uorum files. > > > +# (Since 2.1) > >=20 > > This naming feels long, but I'm not sure if I have a better suggestio= n. > > It looks like you only allow swapping out one quorum file per > > drive-mirror - but what if I have a 3/5 quorum and want to swap out t= wo > > files at once? Also, how does this interact with the 'transaction' c= ommand? >=20 > I think that we should be able to launch multiple separate drive-mirror > operation. I don't know about the transaction. In fact no. The first drive-mirror will set the quorum device as "in use"= . No parallelisation is possible. Best regards Beno=EEt >=20 > >=20 > > > ## > > > { 'command': 'drive-mirror', > > > 'data': { 'device': 'str', 'target': 'str', '*format': 'str', > > > - 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', > > > - '*speed': 'int', '*granularity': 'uint32', > > > - '*buf-size': 'int', '*on-source-error': 'BlockdevOnErr= or', > > > + '*new-node-name': 'str', '*to-replace-node-name': 'str= ', > > > + 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*s= peed': 'int', > > > + '*granularity': 'uint32', '*buf-size': 'int', > > > + '*on-source-error': 'BlockdevOnError', > >=20 > > Why the reindent of existing options? >=20 > The first modified line was exceeding the 80 characters limit. >=20 > >=20 > > --=20 > > Eric Blake eblake redhat com +1-919-301-3266 > > Libvirt virtualization library http://libvirt.org > >=20 >=20 >=20 >=20