qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Zhang Chen <zhangckid@gmail.com>
Cc: Markus Armbruster <armbru@redhat.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	Juan Quintela <quintela@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status
Date: Wed, 13 Jun 2018 17:50:32 +0100	[thread overview]
Message-ID: <20180613165032.GO2676@work-vm> (raw)
In-Reply-To: <CAK3tnv+5i-C5vhkA6BNa=PWVgJY8ikonELB7EQ=nNz2LsbtmMA@mail.gmail.com>

* Zhang Chen (zhangckid@gmail.com) wrote:
> On Mon, Jun 11, 2018 at 2:48 PM, Markus Armbruster <armbru@redhat.com>
> wrote:
> 
> > Zhang Chen <zhangckid@gmail.com> writes:
> >
> > > On Thu, Jun 7, 2018 at 8:59 PM, Markus Armbruster <armbru@redhat.com>
> > wrote:
> > >
> > >> Zhang Chen <zhangckid@gmail.com> writes:
> > >>
> > >> > Libvirt or other high level software can use this command query colo
> > >> status.
> > >> > You can test this command like that:
> > >> > {'execute':'query-colo-status'}
> > >> >
> > >> > Signed-off-by: Zhang Chen <zhangckid@gmail.com>
> > >> > ---
> > >> >  migration/colo.c    | 39 +++++++++++++++++++++++++++++++++++++++
> > >> >  qapi/migration.json | 34 ++++++++++++++++++++++++++++++++++
> > >> >  2 files changed, 73 insertions(+)
> > >> >
> > >> > diff --git a/migration/colo.c b/migration/colo.c
> > >> > index bedb677788..8c6b8e9a4e 100644
> > >> > --- a/migration/colo.c
> > >> > +++ b/migration/colo.c
> > >> > @@ -29,6 +29,7 @@
> > >> >  #include "net/colo.h"
> > >> >  #include "block/block.h"
> > >> >  #include "qapi/qapi-events-migration.h"
> > >> > +#include "qapi/qmp/qerror.h"
> > >> >
> > >> >  static bool vmstate_loading;
> > >> >  static Notifier packets_compare_notifier;
> > >> > @@ -237,6 +238,44 @@ void qmp_xen_colo_do_checkpoint(Error **errp)
> > >> >  #endif
> > >> >  }
> > >> >
> > >> > +COLOStatus *qmp_query_colo_status(Error **errp)
> > >> > +{
> > >> > +    int state;
> > >> > +    COLOStatus *s = g_new0(COLOStatus, 1);
> > >> > +
> > >> > +    s->mode = get_colo_mode();
> > >> > +
> > >> > +    switch (s->mode) {
> > >> > +    case COLO_MODE_UNKNOWN:
> > >> > +        error_setg(errp, "COLO is disabled");
> > >> > +        state = MIGRATION_STATUS_NONE;
> > >> > +        break;
> > >> > +    case COLO_MODE_PRIMARY:
> > >> > +        state = migrate_get_current()->state;
> > >> > +        break;
> > >> > +    case COLO_MODE_SECONDARY:
> > >> > +        state = migration_incoming_get_current()->state;
> > >> > +        break;
> > >> > +    default:
> > >> > +        abort();
> > >> > +    }
> > >> > +
> > >> > +    s->colo_running = state == MIGRATION_STATUS_COLO;
> > >> > +
> > >> > +    switch (failover_get_state()) {
> > >> > +    case FAILOVER_STATUS_NONE:
> > >> > +        s->reason = COLO_EXIT_REASON_NONE;
> > >> > +        break;
> > >> > +    case FAILOVER_STATUS_REQUIRE:
> > >> > +        s->reason = COLO_EXIT_REASON_REQUEST;
> > >> > +        break;
> > >> > +    default:
> > >> > +        s->reason = COLO_EXIT_REASON_ERROR;
> > >> > +    }
> > >> > +
> > >> > +    return s;
> > >> > +}
> > >> > +
> > >> >  static void colo_send_message(QEMUFile *f, COLOMessage msg,
> > >> >                                Error **errp)
> > >> >  {
> > >> > diff --git a/qapi/migration.json b/qapi/migration.json
> > >> > index 93136ce5a0..356a370949 100644
> > >> > --- a/qapi/migration.json
> > >> > +++ b/qapi/migration.json
> > >> > @@ -1231,6 +1231,40 @@
> > >> >  ##
> > >> >  { 'command': 'xen-colo-do-checkpoint' }
> > >> >
> > >> > +##
> > >> > +# @COLOStatus:
> > >> > +#
> > >> > +# The result format for 'query-colo-status'.
> > >> > +#
> > >> > +# @mode: COLO running mode. If COLO is running, this field will
> > return
> > >> > +#        'primary' or 'secodary'.
> > >> > +#
> > >> > +# @colo-running: true if COLO is running.
> > >> > +#
> > >> > +# @reason: describes the reason for the COLO exit.
> > >>
> > >> What's the value of @reason before a "COLO exit"?
> > >>
> > >
> > > Before a "COLO exit", we just return 'none' in this field.
> >
> > Please add that to the documentation.
> >
> 
> OK.
> 
> 
> >
> > Please excuse my ignorance on COLO...  I'm still not sure I fully
> > understand how the three members are related, or even how the COLO state
> > machine works and how its related to / embedded in RunState.  I searched
> > docs/ for a state diagram, but couldn't find one.
> >
> > According to runstate_transitions_def[], the part of the RunState state
> > machine that's directly connected to state "colo" looks like this:
> >
> >     inmigrate  -+
> >                 |
> >     paused  ----+
> >                 |
> >     migrate  ---+->  colo  <------>  running
> >                 |
> >     suspended  -+
> >                 |
> >     watchdog  --+
> >
> > For each of the seven state transitions: how is the state transition
> > triggered (e.g. by QMP command, spontaneously when a certain condition
> > is detected, ...), and what events (if any) are emitted then?
> >
> >
> When you start COLO, the VM always running in "MIGRATION_STATUS_COLO" still
> occur failover.
> And in the flow diagram, you can think COLO always running in migrate state.
> Because into COLO mode, we will control VM state in COLO code itself, for
> example:
> When we start COLO, it will do the first migration as normal live
> migration, after that we will enter
> the COLO process, at that time COLO think the primary VM state is same with
> secondary VM(the first checkpoint),
> so we will use vm_start() start the primary VM(unlike to normal migration)
> and secondary VM.
> In this time, primary VM and secondary VM will parallel running, and if
> COLO found two VM state are
> not same, it will trigger checkpoint(like another migration). Finally, if
> occurred some fault that will trigger
> failover, after that primary VM maybe return to normal running
> mode(secondary dead).
> So, if we just see the primary VM state, may be it has out of the RunState
> state
> machine or it still in migrate state.
> 
> 
> 
> 
> > How is @colo-running related to the run state?
> >
> 
> Not related, as I say above.

Right; this is a different type of 'running' - it might be better to say
'active' rather than running.

  COLO has a pair of VMs in sync with a constant stream of migrations
between them.
The 'mode' is whether it's the source (primary) or destination (secondary) VM.
(Also sometimes written PVM/SVM)

If COLO fails for some reason (e.g. the
secondary host fails) then I think this is saying the 'colo-running'
would be false.

Some monitoring tool would be watching this to make sure you
really do have a redundent pair of VMs, and if one of them failed
you'd want to know and alert.

Dave

> > Which run states are considered to be "before a COLO exit"?  If "before
> > a COLO exit" doesn't map to run states, the state machine is too coarse
> > to fully describe COLO, and I'd like to see a suitably refined one.
> >
> >
> COLO just is a special case. It's worthy to refined one?
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Any comments?
> 
> 
> 
> > If @colo-running is true, then @mode is either "primary" or "secondary".
> > What are the possible values when @colo-running is false?
> >
> 
> The @mode will in "unknown" state.
> 
> 
> Thanks
> Zhang Chen
> 
> 
> 
> >
> > [...]
> >
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-06-13 16:50 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-03  5:05 [Qemu-devel] [PATCH V8 00/17] COLO: integrate colo frame with block replication and COLO proxy Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 01/17] filter-rewriter: fix memory leak for connection in connection_track_table Zhang Chen
2018-06-04  5:51   ` Jason Wang
2018-06-10 14:08     ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 02/17] colo-compare: implement the process of checkpoint Zhang Chen
2018-06-04  6:31   ` Jason Wang
2018-06-10 14:08     ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 03/17] colo-compare: use notifier to notify packets comparing result Zhang Chen
2018-06-04  6:36   ` Jason Wang
2018-06-10 14:09     ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 04/17] COLO: integrate colo compare with colo frame Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 05/17] COLO: Add block replication into colo process Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 06/17] COLO: Remove colo_state migration struct Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 07/17] COLO: Load dirty pages into SVM's RAM cache firstly Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 08/17] ram/COLO: Record the dirty pages that SVM received Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 09/17] COLO: Flush memory data from ram cache Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 10/17] qmp event: Add COLO_EXIT event to notify users while exited COLO Zhang Chen
2018-06-04 22:23   ` Eric Blake
2018-06-07 12:54     ` Markus Armbruster
2018-06-10 17:24       ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status Zhang Chen
2018-06-04 22:23   ` Eric Blake
2018-06-10 17:42     ` Zhang Chen
2018-06-10 17:53       ` Zhang Chen
2018-06-07 12:59   ` Markus Armbruster
2018-06-10 17:39     ` Zhang Chen
2018-06-11  6:48       ` Markus Armbruster
2018-06-11 15:34         ` Zhang Chen
2018-06-13 16:50           ` Dr. David Alan Gilbert [this message]
2018-06-14  8:42             ` Markus Armbruster
2018-06-14  9:25               ` Dr. David Alan Gilbert
2018-06-19  4:00                 ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 12/17] savevm: split the process of different stages for loadvm/savevm Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 13/17] COLO: flush host dirty ram from cache Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 14/17] filter: Add handle_event method for NetFilterClass Zhang Chen
2018-06-04  6:57   ` Jason Wang
2018-06-10 14:09     ` Zhang Chen
2018-06-11  1:56       ` Jason Wang
2018-06-11  6:46         ` Zhang Chen
2018-06-11  7:02           ` Jason Wang
2018-06-11 15:36             ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 15/17] filter-rewriter: handle checkpoint and failover event Zhang Chen
2018-06-04  7:42   ` Jason Wang
2018-06-10 17:20     ` Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 16/17] COLO: notify net filters about checkpoint/failover event Zhang Chen
2018-06-03  5:05 ` [Qemu-devel] [PATCH V8 17/17] COLO: quick failover process by kick COLO thread Zhang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180613165032.GO2676@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    --cc=zhangckid@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).