From: "Benoît Canet" <benoit.canet@irqsave.net>
To: Eric Blake <eblake@redhat.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
kwolf@redhat.com, "Fam Zheng" <famz@redhat.com>,
qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com,
pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH 1/2] block: Add node-name and to-replace-node-name arguments to drive-mirror.
Date: Wed, 5 Mar 2014 22:13:13 +0100 [thread overview]
Message-ID: <20140305211313.GA5239@irqsave.net> (raw)
In-Reply-To: <53178F14.80402@redhat.com>
The Wednesday 05 Mar 2014 à 13:54:44 (-0700), Eric Blake wrote :
> On 03/05/2014 08:18 AM, Benoît Canet wrote:
> > node-name give a name to the created BDS and register it in the node graph.
>
> s/give/gives/ s/register/registers/
>
> >
> > to-replace-node-name can be used when drive-mirror is called with sync=full.
> >
> > The purpose of these fields is to be able to reconstruct and replace a broken
> > quorum file.
>
> There may be other uses possible from this, but the idea makes sense.
>
> >
> > drive-mirror will bdrv_swap the new BDS named node-name with the one
> > pointed by to-replace-node-name when the mirroring is finished.
> >
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
>
> > @@ -312,6 +313,10 @@ static void coroutine_fn mirror_run(void *opaque)
> > s->common.len = bdrv_getlength(bs);
> > if (s->common.len <= 0) {
> > block_job_completed(&s->common, s->common.len);
> > + /* Fam's new blocker API should be used here. */
> > + if (s->to_replace) {
>
> Who is getting merged first? It seems like this should be fixed before
> taking this patch, if Fam's work is indeed closer to inclusion. At any
> rate, the comment seems odd - a year from now, Fam's work won't be new.
I would really like to get this merged first before 2.0 reach hard freeze
since quorum is not very usable in its current state.
This particular comment was here to inform reviewer of the work I plan to do
once Fam's series is merged.
I would do the work in 2.1.
>
> > + BlockDriverState *to_replace;
> > + /* if a to-replace-node-name was specified use it's bs */
>
> s/it's/its/ - the rule is anywhere that you see "it's", re-read the
> sentence with "it is" and see if it still makes sense; if not, you meant
> "its".
Thanks for the rule I just used it above :)
>
>
> >
> > static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
> > + BlockDriverState *to_replace,
> > int64_t speed, int64_t granularity,
>
> Pre-existing, but as long as you are touching this, you might as well
> fix indentation of the other lines in the same signature.
>
> > @@ -2158,19 +2195,33 @@ void qmp_drive_mirror(const char *device, const char *target,
> > return;
> > }
> >
> > + /* if we are planning to replace a graph node name the code should do a full
> > + * mirror of the source image
> > + */
> > + if (has_to_replace_node_name && sync != MIRROR_SYNC_MODE_FULL) {
> > + error_setg(errp,
> > + "to-replace-node-name can only be used with sync=full");
> > + return;
> > + }
>
> I'm not sure I follow this restriction. What's to prevent me from doing
> a shallow mirror coupled with the mode of reusing an existing file that
> already points to a sane backing file, rather than forcing a full sync?
> That is, why not let this command be a fully-generic swap command,
> where the semantics are that as long as my old and new image have the
> same contents from the guest's perspective (or I'm replacing a broken
> file out of a quorum, and the new image has the same contents as the
> quorum majority), then we are just updating qemu to point to a new BDS.
>
> On the other hand, back around the 1.5 timeframe, downstream RHEL tried
> to add a 'drive-reopen' command that did just that - replaced the
> backing file of a guest's disk with an arbitrary other file. But it was
> so powerful and risky that at the time upstream finally added
> 'transaction' support, we decided to go with the simpler
> 'drive-mirror/block-job-complete' sequence as the only supported way to
> cause qemu to associate a different BDS with a guest image. Of course,
> things have advanced since then, so maybe we finally are at a point
> where we want to expose a generic reopen command that can swap out
> arbitrary named nodes without interrupting guest services, but now I'm
> starting to wonder if it should be a new command instead of adding
> optional arguments to the existing drive-mirror.
I choose to hook into drive-mirror because it is supposed to do the swap at the
very moment the two files converge.
I though it would be harder to implement with a separate command because new
writes could obsolete the mirror after drive-mirror complete and before the
swap command is launched.
>
> > +++ b/qapi-schema.json
> > @@ -2140,6 +2140,14 @@
> > # @format: #optional the format of the new destination, default is to
> > # probe if @mode is 'existing', else the format of the source
> > #
> > +# @new-node-name: #optional the new block driver state node name in the graph
> > +# (Since 2.1)
>
> Ah, so you're not trying to get this in before 2.0 freeze - which means
> we have more time to think about the implications.
I remembered after sending the series that 2.0 was not in hard freeze yet and
that we have a small chance of shipping quorum in an usable state.
>
> > +#
> > +# @to-replace-node-name: #optional with sync=full graph node name to be
> > +# replaced by the new image when a whole image copy is
> > +# done. This can be used to repair broken Quorum files.
> > +# (Since 2.1)
>
> This naming feels long, but I'm not sure if I have a better suggestion.
> It looks like you only allow swapping out one quorum file per
> drive-mirror - but what if I have a 3/5 quorum and want to swap out two
> files at once? Also, how does this interact with the 'transaction' command?
I think that we should be able to launch multiple separate drive-mirror
operation. I don't know about the transaction.
>
> > ##
> > { 'command': 'drive-mirror',
> > 'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> > - 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> > - '*speed': 'int', '*granularity': 'uint32',
> > - '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
> > + '*new-node-name': 'str', '*to-replace-node-name': 'str',
> > + 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', '*speed': 'int',
> > + '*granularity': 'uint32', '*buf-size': 'int',
> > + '*on-source-error': 'BlockdevOnError',
>
> Why the reindent of existing options?
The first modified line was exceeding the 80 characters limit.
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
next prev parent reply other threads:[~2014-03-05 21:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 15:18 [Qemu-devel] [PATCH 0/2] Allow to repair broken quorum files Benoît Canet
2014-03-05 15:18 ` [Qemu-devel] [PATCH 1/2] block: Add node-name and to-replace-node-name arguments to drive-mirror Benoît Canet
2014-03-05 20:54 ` Eric Blake
2014-03-05 21:13 ` Benoît Canet [this message]
2014-03-05 21:28 ` Benoît Canet
2014-03-07 22:28 ` Benoît Canet
2014-03-05 15:18 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: Add TestRepairQuorum to 041 to test drive-mirror node-name mode Benoît Canet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140305211313.GA5239@irqsave.net \
--to=benoit.canet@irqsave.net \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).