[Qemu-devel] [RFC/COLO: 0/3] Hybrid mode and parameterisation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC/COLO:  0/3] Hybrid mode and parameterisation
@ 2015-08-04 19:26 Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode Dr. David Alan Gilbert (git)
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-08-04 19:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: arei.gonglei, yanghy, zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is an experimental adition to COLO, based off the colo-v1.5-developing
branch.  It's not ready for inclusion.

The first patch, adds a 'hybrid mode' where the SVM is sent checkpoints
from the primary but does not run, and is thus much similar to a normal
checkpoint setup.   This mode is entered dynamically based on the
checkpoint lengths; if the average checkpoint length drops below
the 'colo_passive_limit' it flips into this mode running
checkpoints each of 'colo_passive_time' ms in length.  After
'colo_passive_count' checkpoints, it runs 5 COLO cycles again
and then decides what to do based on the same limit as before,
thus giving it a chance to return to COLO mode.

A simple demo of this is to:
  a) ssh into the guest
  b) Open a top on the SVM host
  c) start a heavy CPU load in the guest (e.g. md5sum /dev/zero &)
     You see the QEMU at 100% in the host top
  d) now run 'top' in the guest
     this causes checkpoint miscomparisons every time top redisplays.
  e) change the top redisplay time to something short, e.g. type
     d0.2  and hit return
  f) After a few seconds it flips into passive mode and you see
     the CPU load on the SVM drop.

(Watching the added trace-events also shows this happening).

The other patches make all the COLO parameters changeable via
migrate_set_parameter (I wasn't sure what to call the time
after the miscompare - I called it 'relax time')

(This work has been partially funded by the EU Orbit project:
  see http://www.orbitproject.eu/about/ )

Dave

Dr. David Alan Gilbert (3):
  COLO: Hybrid mode
  Parameterise min/max/relax time
  COLO: Parameterise background RAM transfer limit

 hmp-commands.hx        |  15 -----
 hmp.c                  |  65 +++++++++++++++++--
 hmp.h                  |   1 -
 migration/colo.c       | 170 ++++++++++++++++++++++++++++++++++++-------------
 migration/migration.c  | 139 +++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json       |  56 +++++++++++-----
 qmp-commands.hx        |  31 +++------
 stubs/migration-colo.c |   4 --
 trace-events           |   8 +++
 9 files changed, 378 insertions(+), 111 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC/COLO:  1/3] COLO: Hybrid mode
  2015-08-04 19:26 [Qemu-devel] [RFC/COLO: 0/3] Hybrid mode and parameterisation Dr. David Alan Gilbert (git)
@ 2015-08-04 19:26 ` Dr. David Alan Gilbert (git)
  2015-08-05  3:23   ` zhanghailiang
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 2/3] Parameterise min/max/relax time Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 3/3] COLO: Parameterise background RAM transfer limit Dr. David Alan Gilbert (git)
  2 siblings, 1 reply; 6+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-08-04 19:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: arei.gonglei, yanghy, zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Automatically switch into a passive checkpoint mode when checkpoints are
repeatedly short.  This saves CPU time on the SVM (since it's not running)
and the network traffic and PVM CPU time for the comparison processing.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp.c                 |  26 ++++++++++
 migration/colo.c      | 136 +++++++++++++++++++++++++++++++++++++++++++-------
 migration/migration.c |  65 +++++++++++++++++++++++-
 qapi-schema.json      |  22 ++++++--
 qmp-commands.hx       |   9 ++++
 trace-events          |   8 +++
 6 files changed, 244 insertions(+), 22 deletions(-)

diff --git a/hmp.c b/hmp.c
index f34e2c2..8828756 100644
--- a/hmp.c
+++ b/hmp.c
@@ -289,6 +289,16 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: %" PRId64,
             MigrationParameter_lookup[MIGRATION_PARAMETER_DECOMPRESS_THREADS],
             params->decompress_threads);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT],
+            params->colo_passive_count);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT],
+            params->colo_passive_limit);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_TIME],
+            params->colo_passive_time);
+
         monitor_printf(mon, "\n");
     }
 
@@ -1238,6 +1248,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     bool has_compress_level = false;
     bool has_compress_threads = false;
     bool has_decompress_threads = false;
+    bool has_colo_passive_count = false;
+    bool has_colo_passive_limit = false;
+    bool has_colo_passive_time = false;
+
     int i;
 
     for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
@@ -1252,10 +1266,22 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
             case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
                 has_decompress_threads = true;
                 break;
+            case MIGRATION_PARAMETER_COLO_PASSIVE_COUNT:
+                has_colo_passive_count = true;
+                break;
+            case MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT:
+                has_colo_passive_limit = true;
+                break;
+            case MIGRATION_PARAMETER_COLO_PASSIVE_TIME:
+                has_colo_passive_time = true;
+                break;
             }
             qmp_migrate_set_parameters(has_compress_level, value,
                                        has_compress_threads, value,
                                        has_decompress_threads, value,
+                                       has_colo_passive_count, value,
+                                       has_colo_passive_limit, value,
+                                       has_colo_passive_time, value,
                                        &err);
             break;
         }
diff --git a/migration/colo.c b/migration/colo.c
index d8ec283..37f63f2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,6 +21,7 @@
 #include "net/colo-nic.h"
 #include "qmp-commands.h"
 #include "block/block_int.h"
+#include "trace.h"
 
 /*
 * We should not do checkpoint one after another without any time interval,
@@ -66,6 +67,7 @@ typedef enum COLOCommand {
     *    go forward a lot when this side just receives the sync-point.
     */
     COLO_CHECKPOINT_NEW,
+    COLO_CHECKPOINT_NEW_PASSIVE, /* Simple checkpoint mode, SVM doesn't run */
     COLO_CHECKPOINT_SUSPENDED,
     COLO_CHECKPOINT_SEND,
     COLO_CHECKPOINT_RECEIVED,
@@ -294,7 +296,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
     return ret;
 }
 
-static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
+static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control,
+                                          bool passive)
 {
     int colo_shutdown, ret;
     size_t size;
@@ -302,7 +305,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
     int64_t start_time, end_time, down_time;
     Error *local_err = NULL;
 
-    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+    ret = colo_ctl_put(s->file, passive?COLO_CHECKPOINT_NEW_PASSIVE:
+                                        COLO_CHECKPOINT_NEW);
     if (ret < 0) {
         goto out;
     }
@@ -438,6 +442,71 @@ out:
     return ret;
 }
 
+/*
+ * Counter that is reset to 'n' when we enter passive mode and
+ * is decremented once per checkpoint; when it hits zero we flip
+ * back to COLO mode.
+ */
+static unsigned int passive_count;
+
+/*
+ * Weighted average of checkpoint lengths, used to decide on mode.
+ */
+static double colo_checkpoint_time_mean;
+/* Count of checkpoints since we reset colo_checkpoint_time_mean */
+static uint64_t colo_checkpoint_time_count;
+
+/* Decides whether the checkpoint that's about to start should be
+ * a COLO type (with the secondary running and packet comparison) or
+ * a 'passive' type (with the secondary idle and running for fixed time)
+ *
+ * Returns:
+ *   True: 'passive' type checkpoint
+ */
+static bool checkpoint_choice(MigrationState *s)
+{
+    trace_checkpoint_choice(passive_count,
+                            colo_checkpoint_time_count,
+                            colo_checkpoint_time_mean);
+    if (passive_count) {
+        /*
+         * The last checkpoint was passive; we stay in passive
+         * mode for a number of checkpoints before trying colo
+         * again.
+         */
+        passive_count--;
+        if (passive_count) {
+            /* Stay passive */
+            return true;
+        } else {
+            /* Transition back to COLO */
+            trace_checkpoint_choice_to_colo();
+            colo_checkpoint_time_mean = 0.0;
+            colo_checkpoint_time_count = 0;
+            return false;
+        }
+    } else {
+        /* The last checkpoint was COLO */
+        /* Could make that tunable, I'm not particularly worried about
+         * load behaviour for this, startup etc is probably more interesting.
+         */
+        if (colo_checkpoint_time_count < 5) {
+            /* Not done enough COLO cycles to evaluate times yet */
+            return false;
+        }
+        if (colo_checkpoint_time_mean <
+            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT]) {
+            trace_checkpoint_choice_to_passive(colo_checkpoint_time_mean);
+            /* We've had a few short checkpoints, switch to passive */
+            passive_count = s->parameters[
+                                      MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
+            return true;
+        }
+        /* Keep going in COLO mode */
+        return false;
+    }
+}
+
 /* should be calculated by bandwidth and max downtime ? */
 #define THRESHOLD_PENDING_SIZE (10 * 1024 * 1024UL)
 
@@ -470,7 +539,7 @@ static void *colo_thread(void *opaque)
 {
     MigrationState *s = opaque;
     QEMUFile *colo_control = NULL;
-    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    int64_t current_time = 0, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int i, ret;
     Error *local_err = NULL;
 
@@ -518,8 +587,12 @@ static void *colo_thread(void *opaque)
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
+    passive_count = 0;
+    colo_checkpoint_time_mean = 0.0;
+    colo_checkpoint_time_count = 0;
     while (s->state == MIGRATION_STATUS_COLO) {
         int proxy_checkpoint_req;
+        unsigned int checkpoint_limit;
 
         if (failover_request_is_active()) {
             error_report("failover request");
@@ -546,13 +619,17 @@ static void *colo_thread(void *opaque)
             goto do_checkpoint;
         }
 
+        checkpoint_limit = passive_count ?
+            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] :
+            colo_checkpoint_period;
+
         /*
          * No proxy checkpoint is request, wait for 100ms or
          * transfer some dirty ram page,
          * and then check if we need checkpoint again.
          */
         current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-        if (current_time - checkpoint_time < colo_checkpoint_period) {
+        if (current_time - checkpoint_time < checkpoint_limit) {
             if (colo_need_live_migrate_ram(s)) {
                 ret = colo_ctl_put(s->file, COLO_RAM_LIVE_MIGRATE);
                 if (ret < 0) {
@@ -572,8 +649,16 @@ static void *colo_thread(void *opaque)
         }
 
 do_checkpoint:
+        /* Update a weighted mean of checkpoint lengths, weighted
+         * so that an occasional short checkpoint doesn't cause a switch
+         * to passive.
+         */
+        colo_checkpoint_time_mean = colo_checkpoint_time_mean * 0.7 +
+                                 0.3 * (current_time - checkpoint_time);
+        colo_checkpoint_time_count++;
         /* start a colo checkpoint */
-        if (colo_do_checkpoint_transaction(s, colo_control)) {
+        if (colo_do_checkpoint_transaction(s, colo_control,
+                                              checkpoint_choice(s))) {
             goto out;
         }
         checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
@@ -638,7 +723,10 @@ void colo_init_checkpointer(MigrationState *s)
 
 /*
  * return:
- * 0: start a checkpoint
+ * COLO_CHECKPOINT_NEW: Primary requests a COLO checkpoint cycle
+ * COLO_CHECKPOINT_NEW_PASSIVE: Primary requests a basic checkpoint
+ *                              cycle
+ *
  * -1: some error happened, exit colo restore
  */
 static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
@@ -654,8 +742,9 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
 
     switch (cmd) {
     case COLO_CHECKPOINT_NEW:
+    case COLO_CHECKPOINT_NEW_PASSIVE:
         *checkpoint_request = 1;
-        return 0;
+        return cmd;
     case COLO_GUEST_SHUTDOWN:
         qemu_mutex_lock_iothread();
         vm_stop_force_state(RUN_STATE_COLO);
@@ -695,6 +784,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
     int fd = qemu_get_fd(f);
     QEMUFile *ctl = NULL, *fb = NULL;
     uint64_t total_size;
+    bool last_was_passive = false;
     int i, ret;
     Error *local_err = NULL;
 
@@ -750,9 +840,9 @@ void *colo_process_incoming_checkpoints(void *opaque)
 
     while (mis->state == MIGRATION_STATUS_COLO) {
         int request = 0;
-        int ret = colo_wait_handle_cmd(f, &request);
+        int mode = colo_wait_handle_cmd(f, &request);
 
-        if (ret < 0) {
+        if (mode < 0) {
             break;
         } else {
             if (!request) {
@@ -765,11 +855,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
             goto out;
         }
 
-        /* suspend guest */
-        qemu_mutex_lock_iothread();
-        vm_stop_force_state(RUN_STATE_COLO);
-        qemu_mutex_unlock_iothread();
-        trace_colo_vm_state_change("run", "stop");
+        if (!last_was_passive) {
+            /* suspend guest */
+            qemu_mutex_lock_iothread();
+            vm_stop_force_state(RUN_STATE_COLO);
+            qemu_mutex_unlock_iothread();
+            trace_colo_vm_state_change("run", "stop");
+        }
 
         ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
         if (ret < 0) {
@@ -848,10 +940,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
         }
 
         /* resume guest */
-        qemu_mutex_lock_iothread();
-        vm_start();
-        qemu_mutex_unlock_iothread();
-        trace_colo_vm_state_change("stop", "start");
+        if (mode == COLO_CHECKPOINT_NEW_PASSIVE) {
+            last_was_passive = true;
+            trace_colo_process_incoming_checkpoints_passive();
+        } else {
+            qemu_mutex_lock_iothread();
+            vm_start();
+            qemu_mutex_unlock_iothread();
+            last_was_passive = false;
+            trace_colo_vm_state_change("stop", "start");
+            trace_colo_process_incoming_checkpoints_active();
+        }
+
 
         qemu_fclose(fb);
         fb = NULL;
diff --git a/migration/migration.c b/migration/migration.c
index f7f0884..f84c676 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -49,6 +49,14 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
+/* COLO: The number of passive checkpoints to try before switching back
+ *       to COLO */
+#define DEFAULT_MIGRATE_COLO_PASSIVE_COUNT 100
+/* COLO Checkpoint time (ms) below which we switch into passive mode */
+#define DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT 400
+/* COLO passive mode checkpoint time (ms) */
+#define DEFAULT_MIGRATE_COLO_PASSIVE_TIME 250
+
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -72,6 +80,12 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
         .parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
                 DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
+                DEFAULT_MIGRATE_COLO_PASSIVE_COUNT,
+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
+                DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT,
+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
+                DEFAULT_MIGRATE_COLO_PASSIVE_TIME
         .checkpoint_state.max_downtime = 0,
         .checkpoint_state.min_downtime = INT64_MAX
     };
@@ -389,6 +403,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
             s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
     params->decompress_threads =
             s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
+    params->colo_passive_count = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
+    params->colo_passive_limit = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
+    params->colo_passive_time = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
 
     return params;
 }
@@ -539,7 +556,14 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                                 bool has_compress_threads,
                                 int64_t compress_threads,
                                 bool has_decompress_threads,
-                                int64_t decompress_threads, Error **errp)
+                                int64_t decompress_threads,
+                                bool has_colo_passive_count,
+                                int64_t colo_passive_count,
+                                bool has_colo_passive_limit,
+                                int64_t colo_passive_limit,
+                                bool has_colo_passive_time,
+                                int64_t colo_passive_time,
+                                Error **errp)
 {
     MigrationState *s = migrate_get_current();
 
@@ -562,6 +586,21 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                    "is invalid, it should be in the range of 1 to 255");
         return;
     }
+    if (has_colo_passive_count && (colo_passive_count < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_passive_count",
+                  "is invalid, it must be positive");
+    }
+    if (has_colo_passive_limit && (colo_passive_limit < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_passive_limit",
+                  "is invalid, it must be positive");
+    }
+    if (has_colo_passive_time && (colo_passive_time < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_passive_time",
+                  "is invalid, it must be positive");
+    }
 
     if (has_compress_level) {
         s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
@@ -573,6 +612,18 @@ void qmp_migrate_set_parameters(bool has_compress_level,
         s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
                                                     decompress_threads;
     }
+    if (has_colo_passive_count) {
+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
+                                                    colo_passive_count;
+    }
+    if (has_colo_passive_limit) {
+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
+                                                    colo_passive_limit;
+    }
+    if (has_colo_passive_time) {
+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
+                                                    colo_passive_time;
+    }
 }
 
 /* shared migration helpers */
@@ -689,6 +740,13 @@ static MigrationState *migrate_init(const MigrationParams *params)
             s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
     int decompress_thread_count =
             s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
+    int colo_passive_count = s->parameters[
+                            MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
+    int colo_passive_limit = s->parameters[
+                            MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
+    int colo_passive_time = s->parameters[
+                            MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
+
 
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
@@ -704,6 +762,11 @@ static MigrationState *migrate_init(const MigrationParams *params)
                compress_thread_count;
     s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
                decompress_thread_count;
+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] = colo_passive_count;
+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
+                  colo_passive_limit;
+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] = colo_passive_time;
+
     s->bandwidth_limit = bandwidth_limit;
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 9f094d2..9d2b6d4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -630,10 +630,15 @@
 #          compression, so set the decompress-threads to the number about 1/4
 #          of compress-threads is adequate.
 #
+# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
+# @colo-passive-limit: Time (in ms) below which we switch into passive mode
+# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
+#
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
-  'data': ['compress-level', 'compress-threads', 'decompress-threads'] }
+  'data': ['compress-level', 'compress-threads', 'decompress-threads',
+           'colo-passive-count', 'colo-passive-limit', 'colo-passive-time'] }
 
 #
 # @migrate-set-parameters
@@ -646,12 +651,19 @@
 #
 # @decompress-threads: decompression thread count
 #
+# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
+# @colo-passive-limit: Time (in ms) below which we switch into passive mode
+# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
+#
 # Since: 2.4
 ##
 { 'command': 'migrate-set-parameters',
   'data': { '*compress-level': 'int',
             '*compress-threads': 'int',
-            '*decompress-threads': 'int'} }
+            '*decompress-threads': 'int',
+            '*colo-passive-count': 'int',
+            '*colo-passive-limit': 'int',
+            '*colo-passive-time': 'int' } }
 
 #
 # @MigrationParameters
@@ -667,7 +679,11 @@
 { 'struct': 'MigrationParameters',
   'data': { 'compress-level': 'int',
             'compress-threads': 'int',
-            'decompress-threads': 'int'} }
+            'decompress-threads': 'int',
+            'colo-passive-count': 'int',
+            'colo-passive-limit': 'int',
+            'colo-passive-time': 'int' } }
+
 ##
 # @query-migrate-parameters
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4fd01a7..a809710 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3501,6 +3501,10 @@ Set migration parameters
 - "compress-level": set compression level during migration (json-int)
 - "compress-threads": set compression thread count for migration (json-int)
 - "decompress-threads": set decompression thread count for migration (json-int)
+- "colo-passive-count": In COLO, the number of passive checkpoint cycles to
+                        try before reverting to COLO mode
+- "colo-passive-limit": Time (in ms) below which we switch into passive mode
+- "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
 
 Arguments:
 
@@ -3527,6 +3531,11 @@ Query current migration parameters
          - "compress-level" : compression level value (json-int)
          - "compress-threads" : compression thread count value (json-int)
          - "decompress-threads" : decompression thread count value (json-int)
+         - "colo-passive-count": In COLO, the number of passive checkpoints
+                                 cycles to try before reverting to COLO mode
+         - "colo-passive-limit": Time (in ms) below which we switch into
+                                 passive mode
+         - "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
 
 Arguments:
 
diff --git a/trace-events b/trace-events
index 03cd035..4d74166 100644
--- a/trace-events
+++ b/trace-events
@@ -1408,6 +1408,14 @@ migrate_state_too_big(void) ""
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
 
+# migration/colo.c
+checkpoint_choice_to_colo(void) ""
+checkpoint_choice_to_passive(double check_time) "weighted checkpoint time=%f"
+colo_process_incoming_checkpoints_new(int mode, int last_was_passive) "mode=%d last_was_passive=%d"
+colo_process_incoming_checkpoints_passive(void) ""
+colo_process_incoming_checkpoints_active(void) ""
+checkpoint_choice(uint64_t passive_count, uint64_t colo_checkpoint_time_count, double colo_checkpoint_time_mean) "%" PRIu64 "/%" PRIu64 " %f"
+
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode Dr. David Alan Gilbert (git)
@ 2015-08-05  3:23   ` zhanghailiang
  2015-08-24 17:54     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 6+ messages in thread
From: zhanghailiang @ 2015-08-05  3:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: Gonglei (Arei), Hongyang Yang, peter.huangpeng, Markus Armbruster

Seems pretty good overall~

For the part of migration parameters command, we have discussed before and
Markus promised to reconstruct this part in qemu 2.5 cycle. But for now,
it is OK.

Cc: Markus Armbruster <armbru@redhat.com>

On 2015/8/5 3:26, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Automatically switch into a passive checkpoint mode when checkpoints are
> repeatedly short.  This saves CPU time on the SVM (since it's not running)
> and the network traffic and PVM CPU time for the comparison processing.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   hmp.c                 |  26 ++++++++++
>   migration/colo.c      | 136 +++++++++++++++++++++++++++++++++++++++++++-------
>   migration/migration.c |  65 +++++++++++++++++++++++-
>   qapi-schema.json      |  22 ++++++--
>   qmp-commands.hx       |   9 ++++
>   trace-events          |   8 +++
>   6 files changed, 244 insertions(+), 22 deletions(-)
>
> diff --git a/hmp.c b/hmp.c
> index f34e2c2..8828756 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -289,6 +289,16 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>           monitor_printf(mon, " %s: %" PRId64,
>               MigrationParameter_lookup[MIGRATION_PARAMETER_DECOMPRESS_THREADS],
>               params->decompress_threads);
> +        monitor_printf(mon, " %s: %" PRId64,
> +            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT],
> +            params->colo_passive_count);
> +        monitor_printf(mon, " %s: %" PRId64,
> +            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT],
> +            params->colo_passive_limit);
> +        monitor_printf(mon, " %s: %" PRId64,
> +            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_TIME],
> +            params->colo_passive_time);
> +
>           monitor_printf(mon, "\n");
>       }
>
> @@ -1238,6 +1248,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>       bool has_compress_level = false;
>       bool has_compress_threads = false;
>       bool has_decompress_threads = false;
> +    bool has_colo_passive_count = false;
> +    bool has_colo_passive_limit = false;
> +    bool has_colo_passive_time = false;
> +
>       int i;
>
>       for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
> @@ -1252,10 +1266,22 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>               case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
>                   has_decompress_threads = true;
>                   break;
> +            case MIGRATION_PARAMETER_COLO_PASSIVE_COUNT:
> +                has_colo_passive_count = true;
> +                break;
> +            case MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT:
> +                has_colo_passive_limit = true;
> +                break;
> +            case MIGRATION_PARAMETER_COLO_PASSIVE_TIME:
> +                has_colo_passive_time = true;
> +                break;
>               }
>               qmp_migrate_set_parameters(has_compress_level, value,
>                                          has_compress_threads, value,
>                                          has_decompress_threads, value,
> +                                       has_colo_passive_count, value,
> +                                       has_colo_passive_limit, value,
> +                                       has_colo_passive_time, value,
>                                          &err);
>               break;
>           }
> diff --git a/migration/colo.c b/migration/colo.c
> index d8ec283..37f63f2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -21,6 +21,7 @@
>   #include "net/colo-nic.h"
>   #include "qmp-commands.h"
>   #include "block/block_int.h"
> +#include "trace.h"
>
>   /*
>   * We should not do checkpoint one after another without any time interval,
> @@ -66,6 +67,7 @@ typedef enum COLOCommand {
>       *    go forward a lot when this side just receives the sync-point.
>       */
>       COLO_CHECKPOINT_NEW,
> +    COLO_CHECKPOINT_NEW_PASSIVE, /* Simple checkpoint mode, SVM doesn't run */
>       COLO_CHECKPOINT_SUSPENDED,
>       COLO_CHECKPOINT_SEND,
>       COLO_CHECKPOINT_RECEIVED,
> @@ -294,7 +296,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
>       return ret;
>   }
>
> -static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> +static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control,
> +                                          bool passive)
>   {
>       int colo_shutdown, ret;
>       size_t size;
> @@ -302,7 +305,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
>       int64_t start_time, end_time, down_time;
>       Error *local_err = NULL;
>
> -    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> +    ret = colo_ctl_put(s->file, passive?COLO_CHECKPOINT_NEW_PASSIVE:
                                           ^ Space

> +                                        COLO_CHECKPOINT_NEW);
>       if (ret < 0) {
>           goto out;
>       }
> @@ -438,6 +442,71 @@ out:
>       return ret;
>   }
>
> +/*
> + * Counter that is reset to 'n' when we enter passive mode and
> + * is decremented once per checkpoint; when it hits zero we flip
> + * back to COLO mode.
> + */
> +static unsigned int passive_count;
> +
> +/*
> + * Weighted average of checkpoint lengths, used to decide on mode.
> + */
> +static double colo_checkpoint_time_mean;
> +/* Count of checkpoints since we reset colo_checkpoint_time_mean */
> +static uint64_t colo_checkpoint_time_count;
> +
> +/* Decides whether the checkpoint that's about to start should be
> + * a COLO type (with the secondary running and packet comparison) or
> + * a 'passive' type (with the secondary idle and running for fixed time)
> + *
> + * Returns:
> + *   True: 'passive' type checkpoint
> + */
> +static bool checkpoint_choice(MigrationState *s)

Confused name, maybe 'checkpoint_to_passive_mode()' is better ~

> +{
> +    trace_checkpoint_choice(passive_count,
> +                            colo_checkpoint_time_count,
> +                            colo_checkpoint_time_mean);
> +    if (passive_count) {
> +        /*
> +         * The last checkpoint was passive; we stay in passive
> +         * mode for a number of checkpoints before trying colo
> +         * again.
> +         */
> +        passive_count--;
> +        if (passive_count) {
> +            /* Stay passive */
> +            return true;
> +        } else {
> +            /* Transition back to COLO */
> +            trace_checkpoint_choice_to_colo();
> +            colo_checkpoint_time_mean = 0.0;
> +            colo_checkpoint_time_count = 0;
> +            return false;
> +        }
> +    } else {
> +        /* The last checkpoint was COLO */
> +        /* Could make that tunable, I'm not particularly worried about
> +         * load behaviour for this, startup etc is probably more interesting.
> +         */
> +        if (colo_checkpoint_time_count < 5) {
> +            /* Not done enough COLO cycles to evaluate times yet */
> +            return false;
> +        }
> +        if (colo_checkpoint_time_mean <
> +            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT]) {
> +            trace_checkpoint_choice_to_passive(colo_checkpoint_time_mean);
> +            /* We've had a few short checkpoints, switch to passive */
> +            passive_count = s->parameters[
> +                                      MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];

		passive_count =
                            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];

> +            return true;
> +        }
> +        /* Keep going in COLO mode */
> +        return false;
> +    }
> +}
> +
>   /* should be calculated by bandwidth and max downtime ? */
>   #define THRESHOLD_PENDING_SIZE (10 * 1024 * 1024UL)
>
> @@ -470,7 +539,7 @@ static void *colo_thread(void *opaque)
>   {
>       MigrationState *s = opaque;
>       QEMUFile *colo_control = NULL;
> -    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> +    int64_t current_time = 0, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>       int i, ret;
>       Error *local_err = NULL;
>
> @@ -518,8 +587,12 @@ static void *colo_thread(void *opaque)
>       qemu_mutex_unlock_iothread();
>       trace_colo_vm_state_change("stop", "run");
>
> +    passive_count = 0;
> +    colo_checkpoint_time_mean = 0.0;
> +    colo_checkpoint_time_count = 0;
>       while (s->state == MIGRATION_STATUS_COLO) {
>           int proxy_checkpoint_req;
> +        unsigned int checkpoint_limit;
>
>           if (failover_request_is_active()) {
>               error_report("failover request");
> @@ -546,13 +619,17 @@ static void *colo_thread(void *opaque)
>               goto do_checkpoint;
>           }
>
> +        checkpoint_limit = passive_count ?
> +            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] :
> +            colo_checkpoint_period;
> +
>           /*
>            * No proxy checkpoint is request, wait for 100ms or
>            * transfer some dirty ram page,
>            * and then check if we need checkpoint again.
>            */
>           current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -        if (current_time - checkpoint_time < colo_checkpoint_period) {
> +        if (current_time - checkpoint_time < checkpoint_limit) {
>               if (colo_need_live_migrate_ram(s)) {
>                   ret = colo_ctl_put(s->file, COLO_RAM_LIVE_MIGRATE);
>                   if (ret < 0) {
> @@ -572,8 +649,16 @@ static void *colo_thread(void *opaque)
>           }
>
>   do_checkpoint:
> +        /* Update a weighted mean of checkpoint lengths, weighted
> +         * so that an occasional short checkpoint doesn't cause a switch
> +         * to passive.
> +         */
> +        colo_checkpoint_time_mean = colo_checkpoint_time_mean * 0.7 +
> +                                 0.3 * (current_time - checkpoint_time);

Why the weight value are '0.7' and '0.3'?
Are they based on some tests?

> +        colo_checkpoint_time_count++;
>           /* start a colo checkpoint */
> -        if (colo_do_checkpoint_transaction(s, colo_control)) {
> +        if (colo_do_checkpoint_transaction(s, colo_control,
> +                                              checkpoint_choice(s))) {
>               goto out;
>           }
>           checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> @@ -638,7 +723,10 @@ void colo_init_checkpointer(MigrationState *s)
>
>   /*
>    * return:
> - * 0: start a checkpoint
> + * COLO_CHECKPOINT_NEW: Primary requests a COLO checkpoint cycle
> + * COLO_CHECKPOINT_NEW_PASSIVE: Primary requests a basic checkpoint
> + *                              cycle
> + *
>    * -1: some error happened, exit colo restore
>    */
>   static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> @@ -654,8 +742,9 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
>
>       switch (cmd) {
>       case COLO_CHECKPOINT_NEW:
> +    case COLO_CHECKPOINT_NEW_PASSIVE:
>           *checkpoint_request = 1;
> -        return 0;
> +        return cmd;
>       case COLO_GUEST_SHUTDOWN:
>           qemu_mutex_lock_iothread();
>           vm_stop_force_state(RUN_STATE_COLO);
> @@ -695,6 +784,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
>       int fd = qemu_get_fd(f);
>       QEMUFile *ctl = NULL, *fb = NULL;
>       uint64_t total_size;
> +    bool last_was_passive = false;
>       int i, ret;
>       Error *local_err = NULL;
>
> @@ -750,9 +840,9 @@ void *colo_process_incoming_checkpoints(void *opaque)
>
>       while (mis->state == MIGRATION_STATUS_COLO) {
>           int request = 0;
> -        int ret = colo_wait_handle_cmd(f, &request);
> +        int mode = colo_wait_handle_cmd(f, &request);
>
> -        if (ret < 0) {
> +        if (mode < 0) {
>               break;
>           } else {
>               if (!request) {
> @@ -765,11 +855,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
>               goto out;
>           }
>
> -        /* suspend guest */
> -        qemu_mutex_lock_iothread();
> -        vm_stop_force_state(RUN_STATE_COLO);
> -        qemu_mutex_unlock_iothread();
> -        trace_colo_vm_state_change("run", "stop");
> +        if (!last_was_passive) {
> +            /* suspend guest */
> +            qemu_mutex_lock_iothread();
> +            vm_stop_force_state(RUN_STATE_COLO);
> +            qemu_mutex_unlock_iothread();
> +            trace_colo_vm_state_change("run", "stop");
> +        }
>
>           ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
>           if (ret < 0) {
> @@ -848,10 +940,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
>           }
>
>           /* resume guest */
> -        qemu_mutex_lock_iothread();
> -        vm_start();
> -        qemu_mutex_unlock_iothread();
> -        trace_colo_vm_state_change("stop", "start");
> +        if (mode == COLO_CHECKPOINT_NEW_PASSIVE) {
> +            last_was_passive = true;
> +            trace_colo_process_incoming_checkpoints_passive();
> +        } else {
> +            qemu_mutex_lock_iothread();
> +            vm_start();
> +            qemu_mutex_unlock_iothread();
> +            last_was_passive = false;
> +            trace_colo_vm_state_change("stop", "start");
> +            trace_colo_process_incoming_checkpoints_active();
> +        }
> +
>
>           qemu_fclose(fb);
>           fb = NULL;
> diff --git a/migration/migration.c b/migration/migration.c
> index f7f0884..f84c676 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -49,6 +49,14 @@
>   /* Migration XBZRLE default cache size */
>   #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
>
> +/* COLO: The number of passive checkpoints to try before switching back
> + *       to COLO */
> +#define DEFAULT_MIGRATE_COLO_PASSIVE_COUNT 100
> +/* COLO Checkpoint time (ms) below which we switch into passive mode */
> +#define DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT 400
> +/* COLO passive mode checkpoint time (ms) */
> +#define DEFAULT_MIGRATE_COLO_PASSIVE_TIME 250
> +
>   static NotifierList migration_state_notifiers =
>       NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
>
> @@ -72,6 +80,12 @@ MigrationState *migrate_get_current(void)
>                   DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
>           .parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
>                   DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
> +        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
> +                DEFAULT_MIGRATE_COLO_PASSIVE_COUNT,
> +        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> +                DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT,
> +        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
> +                DEFAULT_MIGRATE_COLO_PASSIVE_TIME
>           .checkpoint_state.max_downtime = 0,
>           .checkpoint_state.min_downtime = INT64_MAX
>       };
> @@ -389,6 +403,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>               s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
>       params->decompress_threads =
>               s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
> +    params->colo_passive_count = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
> +    params->colo_passive_limit = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
> +    params->colo_passive_time = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
>
>       return params;
>   }
> @@ -539,7 +556,14 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>                                   bool has_compress_threads,
>                                   int64_t compress_threads,
>                                   bool has_decompress_threads,
> -                                int64_t decompress_threads, Error **errp)
> +                                int64_t decompress_threads,
> +                                bool has_colo_passive_count,
> +                                int64_t colo_passive_count,
> +                                bool has_colo_passive_limit,
> +                                int64_t colo_passive_limit,
> +                                bool has_colo_passive_time,
> +                                int64_t colo_passive_time,
> +                                Error **errp)
>   {
>       MigrationState *s = migrate_get_current();
>
> @@ -562,6 +586,21 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>                      "is invalid, it should be in the range of 1 to 255");
>           return;
>       }
> +    if (has_colo_passive_count && (colo_passive_count < 0)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                  "colo_passive_count",
> +                  "is invalid, it must be positive");
> +    }
> +    if (has_colo_passive_limit && (colo_passive_limit < 0)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                  "colo_passive_limit",
> +                  "is invalid, it must be positive");
> +    }
> +    if (has_colo_passive_time && (colo_passive_time < 0)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                  "colo_passive_time",
> +                  "is invalid, it must be positive");
> +    }
>
>       if (has_compress_level) {
>           s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
> @@ -573,6 +612,18 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>           s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
>                                                       decompress_threads;
>       }
> +    if (has_colo_passive_count) {
> +        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
> +                                                    colo_passive_count;
> +    }
> +    if (has_colo_passive_limit) {
> +        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> +                                                    colo_passive_limit;
> +    }
> +    if (has_colo_passive_time) {
> +        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
> +                                                    colo_passive_time;
> +    }
>   }
>
>   /* shared migration helpers */
> @@ -689,6 +740,13 @@ static MigrationState *migrate_init(const MigrationParams *params)
>               s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
>       int decompress_thread_count =
>               s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
> +    int colo_passive_count = s->parameters[
> +                            MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
> +    int colo_passive_limit = s->parameters[
> +                            MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
> +    int colo_passive_time = s->parameters[
> +                            MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
> +
>
>       memcpy(enabled_capabilities, s->enabled_capabilities,
>              sizeof(enabled_capabilities));
> @@ -704,6 +762,11 @@ static MigrationState *migrate_init(const MigrationParams *params)
>                  compress_thread_count;
>       s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
>                  decompress_thread_count;
> +    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] = colo_passive_count;
> +    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> +                  colo_passive_limit;
> +    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] = colo_passive_time;
> +
>       s->bandwidth_limit = bandwidth_limit;
>       migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 9f094d2..9d2b6d4 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -630,10 +630,15 @@
>   #          compression, so set the decompress-threads to the number about 1/4
>   #          of compress-threads is adequate.
>   #
> +# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
> +# @colo-passive-limit: Time (in ms) below which we switch into passive mode
> +# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
> +#
>   # Since: 2.4
>   ##
>   { 'enum': 'MigrationParameter',
> -  'data': ['compress-level', 'compress-threads', 'decompress-threads'] }
> +  'data': ['compress-level', 'compress-threads', 'decompress-threads',
> +           'colo-passive-count', 'colo-passive-limit', 'colo-passive-time'] }
>
>   #
>   # @migrate-set-parameters
> @@ -646,12 +651,19 @@
>   #
>   # @decompress-threads: decompression thread count
>   #
> +# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
> +# @colo-passive-limit: Time (in ms) below which we switch into passive mode
> +# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
> +#
>   # Since: 2.4
>   ##
>   { 'command': 'migrate-set-parameters',
>     'data': { '*compress-level': 'int',
>               '*compress-threads': 'int',
> -            '*decompress-threads': 'int'} }
> +            '*decompress-threads': 'int',
> +            '*colo-passive-count': 'int',
> +            '*colo-passive-limit': 'int',
> +            '*colo-passive-time': 'int' } }
>
>   #
>   # @MigrationParameters
> @@ -667,7 +679,11 @@
>   { 'struct': 'MigrationParameters',
>     'data': { 'compress-level': 'int',
>               'compress-threads': 'int',
> -            'decompress-threads': 'int'} }
> +            'decompress-threads': 'int',
> +            'colo-passive-count': 'int',
> +            'colo-passive-limit': 'int',
> +            'colo-passive-time': 'int' } }
> +
>   ##
>   # @query-migrate-parameters
>   #
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 4fd01a7..a809710 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3501,6 +3501,10 @@ Set migration parameters
>   - "compress-level": set compression level during migration (json-int)
>   - "compress-threads": set compression thread count for migration (json-int)
>   - "decompress-threads": set decompression thread count for migration (json-int)
> +- "colo-passive-count": In COLO, the number of passive checkpoint cycles to
> +                        try before reverting to COLO mode
> +- "colo-passive-limit": Time (in ms) below which we switch into passive mode
> +- "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
>
>   Arguments:
>
> @@ -3527,6 +3531,11 @@ Query current migration parameters
>            - "compress-level" : compression level value (json-int)
>            - "compress-threads" : compression thread count value (json-int)
>            - "decompress-threads" : decompression thread count value (json-int)
> +         - "colo-passive-count": In COLO, the number of passive checkpoints
> +                                 cycles to try before reverting to COLO mode
> +         - "colo-passive-limit": Time (in ms) below which we switch into
> +                                 passive mode
> +         - "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
>
>   Arguments:
>
> diff --git a/trace-events b/trace-events
> index 03cd035..4d74166 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1408,6 +1408,14 @@ migrate_state_too_big(void) ""
>   migrate_global_state_post_load(const char *state) "loaded state: %s"
>   migrate_global_state_pre_save(const char *state) "saved state: %s"
>
> +# migration/colo.c
> +checkpoint_choice_to_colo(void) ""
> +checkpoint_choice_to_passive(double check_time) "weighted checkpoint time=%f"
> +colo_process_incoming_checkpoints_new(int mode, int last_was_passive) "mode=%d last_was_passive=%d"
> +colo_process_incoming_checkpoints_passive(void) ""
> +colo_process_incoming_checkpoints_active(void) ""
> +checkpoint_choice(uint64_t passive_count, uint64_t colo_checkpoint_time_count, double colo_checkpoint_time_mean) "%" PRIu64 "/%" PRIu64 " %f"
> +
>   # migration/rdma.c
>   qemu_rdma_accept_incoming_migration(void) ""
>   qemu_rdma_accept_incoming_migration_accepted(void) ""
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode
  2015-08-05  3:23   ` zhanghailiang
@ 2015-08-24 17:54     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-24 17:54 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Markus Armbruster, Gonglei (Arei), Hongyang Yang, qemu-devel,
	peter.huangpeng

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Seems pretty good overall~
> 
> For the part of migration parameters command, we have discussed before and
> Markus promised to reconstruct this part in qemu 2.5 cycle. But for now,
> it is OK.

Thanks,

> Cc: Markus Armbruster <armbru@redhat.com>
> 
> On 2015/8/5 3:26, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Automatically switch into a passive checkpoint mode when checkpoints are
> >repeatedly short.  This saves CPU time on the SVM (since it's not running)
> >and the network traffic and PVM CPU time for the comparison processing.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  hmp.c                 |  26 ++++++++++
> >  migration/colo.c      | 136 +++++++++++++++++++++++++++++++++++++++++++-------
> >  migration/migration.c |  65 +++++++++++++++++++++++-
> >  qapi-schema.json      |  22 ++++++--
> >  qmp-commands.hx       |   9 ++++
> >  trace-events          |   8 +++
> >  6 files changed, 244 insertions(+), 22 deletions(-)
> >
> >diff --git a/hmp.c b/hmp.c
> >index f34e2c2..8828756 100644
> >--- a/hmp.c
> >+++ b/hmp.c
> >@@ -289,6 +289,16 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
> >          monitor_printf(mon, " %s: %" PRId64,
> >              MigrationParameter_lookup[MIGRATION_PARAMETER_DECOMPRESS_THREADS],
> >              params->decompress_threads);
> >+        monitor_printf(mon, " %s: %" PRId64,
> >+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT],
> >+            params->colo_passive_count);
> >+        monitor_printf(mon, " %s: %" PRId64,
> >+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT],
> >+            params->colo_passive_limit);
> >+        monitor_printf(mon, " %s: %" PRId64,
> >+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_TIME],
> >+            params->colo_passive_time);
> >+
> >          monitor_printf(mon, "\n");
> >      }
> >
> >@@ -1238,6 +1248,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
> >      bool has_compress_level = false;
> >      bool has_compress_threads = false;
> >      bool has_decompress_threads = false;
> >+    bool has_colo_passive_count = false;
> >+    bool has_colo_passive_limit = false;
> >+    bool has_colo_passive_time = false;
> >+
> >      int i;
> >
> >      for (i = 0; i < MIGRATION_PARAMETER_MAX; i++) {
> >@@ -1252,10 +1266,22 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
> >              case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
> >                  has_decompress_threads = true;
> >                  break;
> >+            case MIGRATION_PARAMETER_COLO_PASSIVE_COUNT:
> >+                has_colo_passive_count = true;
> >+                break;
> >+            case MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT:
> >+                has_colo_passive_limit = true;
> >+                break;
> >+            case MIGRATION_PARAMETER_COLO_PASSIVE_TIME:
> >+                has_colo_passive_time = true;
> >+                break;
> >              }
> >              qmp_migrate_set_parameters(has_compress_level, value,
> >                                         has_compress_threads, value,
> >                                         has_decompress_threads, value,
> >+                                       has_colo_passive_count, value,
> >+                                       has_colo_passive_limit, value,
> >+                                       has_colo_passive_time, value,
> >                                         &err);
> >              break;
> >          }
> >diff --git a/migration/colo.c b/migration/colo.c
> >index d8ec283..37f63f2 100644
> >--- a/migration/colo.c
> >+++ b/migration/colo.c
> >@@ -21,6 +21,7 @@
> >  #include "net/colo-nic.h"
> >  #include "qmp-commands.h"
> >  #include "block/block_int.h"
> >+#include "trace.h"
> >
> >  /*
> >  * We should not do checkpoint one after another without any time interval,
> >@@ -66,6 +67,7 @@ typedef enum COLOCommand {
> >      *    go forward a lot when this side just receives the sync-point.
> >      */
> >      COLO_CHECKPOINT_NEW,
> >+    COLO_CHECKPOINT_NEW_PASSIVE, /* Simple checkpoint mode, SVM doesn't run */
> >      COLO_CHECKPOINT_SUSPENDED,
> >      COLO_CHECKPOINT_SEND,
> >      COLO_CHECKPOINT_RECEIVED,
> >@@ -294,7 +296,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
> >      return ret;
> >  }
> >
> >-static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> >+static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control,
> >+                                          bool passive)
> >  {
> >      int colo_shutdown, ret;
> >      size_t size;
> >@@ -302,7 +305,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
> >      int64_t start_time, end_time, down_time;
> >      Error *local_err = NULL;
> >
> >-    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> >+    ret = colo_ctl_put(s->file, passive?COLO_CHECKPOINT_NEW_PASSIVE:
>                                           ^ Space

Added.

> >+                                        COLO_CHECKPOINT_NEW);
> >      if (ret < 0) {
> >          goto out;
> >      }
> >@@ -438,6 +442,71 @@ out:
> >      return ret;
> >  }
> >
> >+/*
> >+ * Counter that is reset to 'n' when we enter passive mode and
> >+ * is decremented once per checkpoint; when it hits zero we flip
> >+ * back to COLO mode.
> >+ */
> >+static unsigned int passive_count;
> >+
> >+/*
> >+ * Weighted average of checkpoint lengths, used to decide on mode.
> >+ */
> >+static double colo_checkpoint_time_mean;
> >+/* Count of checkpoints since we reset colo_checkpoint_time_mean */
> >+static uint64_t colo_checkpoint_time_count;
> >+
> >+/* Decides whether the checkpoint that's about to start should be
> >+ * a COLO type (with the secondary running and packet comparison) or
> >+ * a 'passive' type (with the secondary idle and running for fixed time)
> >+ *
> >+ * Returns:
> >+ *   True: 'passive' type checkpoint
> >+ */
> >+static bool checkpoint_choice(MigrationState *s)
> 
> Confused name, maybe 'checkpoint_to_passive_mode()' is better ~

Changed.

> 
> >+{
> >+    trace_checkpoint_choice(passive_count,
> >+                            colo_checkpoint_time_count,
> >+                            colo_checkpoint_time_mean);
> >+    if (passive_count) {
> >+        /*
> >+         * The last checkpoint was passive; we stay in passive
> >+         * mode for a number of checkpoints before trying colo
> >+         * again.
> >+         */
> >+        passive_count--;
> >+        if (passive_count) {
> >+            /* Stay passive */
> >+            return true;
> >+        } else {
> >+            /* Transition back to COLO */
> >+            trace_checkpoint_choice_to_colo();
> >+            colo_checkpoint_time_mean = 0.0;
> >+            colo_checkpoint_time_count = 0;
> >+            return false;
> >+        }
> >+    } else {
> >+        /* The last checkpoint was COLO */
> >+        /* Could make that tunable, I'm not particularly worried about
> >+         * load behaviour for this, startup etc is probably more interesting.
> >+         */
> >+        if (colo_checkpoint_time_count < 5) {
> >+            /* Not done enough COLO cycles to evaluate times yet */
> >+            return false;
> >+        }
> >+        if (colo_checkpoint_time_mean <
> >+            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT]) {
> >+            trace_checkpoint_choice_to_passive(colo_checkpoint_time_mean);
> >+            /* We've had a few short checkpoints, switch to passive */
> >+            passive_count = s->parameters[
> >+                                      MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
> 
> 		passive_count =
>                            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];

Done.

> 
> >+            return true;
> >+        }
> >+        /* Keep going in COLO mode */
> >+        return false;
> >+    }
> >+}
> >+
> >  /* should be calculated by bandwidth and max downtime ? */
> >  #define THRESHOLD_PENDING_SIZE (10 * 1024 * 1024UL)
> >
> >@@ -470,7 +539,7 @@ static void *colo_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> >      QEMUFile *colo_control = NULL;
> >-    int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >+    int64_t current_time = 0, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >      int i, ret;
> >      Error *local_err = NULL;
> >
> >@@ -518,8 +587,12 @@ static void *colo_thread(void *opaque)
> >      qemu_mutex_unlock_iothread();
> >      trace_colo_vm_state_change("stop", "run");
> >
> >+    passive_count = 0;
> >+    colo_checkpoint_time_mean = 0.0;
> >+    colo_checkpoint_time_count = 0;
> >      while (s->state == MIGRATION_STATUS_COLO) {
> >          int proxy_checkpoint_req;
> >+        unsigned int checkpoint_limit;
> >
> >          if (failover_request_is_active()) {
> >              error_report("failover request");
> >@@ -546,13 +619,17 @@ static void *colo_thread(void *opaque)
> >              goto do_checkpoint;
> >          }
> >
> >+        checkpoint_limit = passive_count ?
> >+            s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] :
> >+            colo_checkpoint_period;
> >+
> >          /*
> >           * No proxy checkpoint is request, wait for 100ms or
> >           * transfer some dirty ram page,
> >           * and then check if we need checkpoint again.
> >           */
> >          current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >-        if (current_time - checkpoint_time < colo_checkpoint_period) {
> >+        if (current_time - checkpoint_time < checkpoint_limit) {
> >              if (colo_need_live_migrate_ram(s)) {
> >                  ret = colo_ctl_put(s->file, COLO_RAM_LIVE_MIGRATE);
> >                  if (ret < 0) {
> >@@ -572,8 +649,16 @@ static void *colo_thread(void *opaque)
> >          }
> >
> >  do_checkpoint:
> >+        /* Update a weighted mean of checkpoint lengths, weighted
> >+         * so that an occasional short checkpoint doesn't cause a switch
> >+         * to passive.
> >+         */
> >+        colo_checkpoint_time_mean = colo_checkpoint_time_mean * 0.7 +
> >+                                 0.3 * (current_time - checkpoint_time);
> 
> Why the weight value are '0.7' and '0.3'?
> Are they based on some tests?

Only simple tests; I'm sure it's possible to come up with a better/more robust
weighting mechanism; but this is simple and seems to work.  Tests on
real applications are probably the best way to fine tune it.

Dave

> 
> >+        colo_checkpoint_time_count++;
> >          /* start a colo checkpoint */
> >-        if (colo_do_checkpoint_transaction(s, colo_control)) {
> >+        if (colo_do_checkpoint_transaction(s, colo_control,
> >+                                              checkpoint_choice(s))) {
> >              goto out;
> >          }
> >          checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >@@ -638,7 +723,10 @@ void colo_init_checkpointer(MigrationState *s)
> >
> >  /*
> >   * return:
> >- * 0: start a checkpoint
> >+ * COLO_CHECKPOINT_NEW: Primary requests a COLO checkpoint cycle
> >+ * COLO_CHECKPOINT_NEW_PASSIVE: Primary requests a basic checkpoint
> >+ *                              cycle
> >+ *
> >   * -1: some error happened, exit colo restore
> >   */
> >  static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> >@@ -654,8 +742,9 @@ static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request)
> >
> >      switch (cmd) {
> >      case COLO_CHECKPOINT_NEW:
> >+    case COLO_CHECKPOINT_NEW_PASSIVE:
> >          *checkpoint_request = 1;
> >-        return 0;
> >+        return cmd;
> >      case COLO_GUEST_SHUTDOWN:
> >          qemu_mutex_lock_iothread();
> >          vm_stop_force_state(RUN_STATE_COLO);
> >@@ -695,6 +784,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >      int fd = qemu_get_fd(f);
> >      QEMUFile *ctl = NULL, *fb = NULL;
> >      uint64_t total_size;
> >+    bool last_was_passive = false;
> >      int i, ret;
> >      Error *local_err = NULL;
> >
> >@@ -750,9 +840,9 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >
> >      while (mis->state == MIGRATION_STATUS_COLO) {
> >          int request = 0;
> >-        int ret = colo_wait_handle_cmd(f, &request);
> >+        int mode = colo_wait_handle_cmd(f, &request);
> >
> >-        if (ret < 0) {
> >+        if (mode < 0) {
> >              break;
> >          } else {
> >              if (!request) {
> >@@ -765,11 +855,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >              goto out;
> >          }
> >
> >-        /* suspend guest */
> >-        qemu_mutex_lock_iothread();
> >-        vm_stop_force_state(RUN_STATE_COLO);
> >-        qemu_mutex_unlock_iothread();
> >-        trace_colo_vm_state_change("run", "stop");
> >+        if (!last_was_passive) {
> >+            /* suspend guest */
> >+            qemu_mutex_lock_iothread();
> >+            vm_stop_force_state(RUN_STATE_COLO);
> >+            qemu_mutex_unlock_iothread();
> >+            trace_colo_vm_state_change("run", "stop");
> >+        }
> >
> >          ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
> >          if (ret < 0) {
> >@@ -848,10 +940,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
> >          }
> >
> >          /* resume guest */
> >-        qemu_mutex_lock_iothread();
> >-        vm_start();
> >-        qemu_mutex_unlock_iothread();
> >-        trace_colo_vm_state_change("stop", "start");
> >+        if (mode == COLO_CHECKPOINT_NEW_PASSIVE) {
> >+            last_was_passive = true;
> >+            trace_colo_process_incoming_checkpoints_passive();
> >+        } else {
> >+            qemu_mutex_lock_iothread();
> >+            vm_start();
> >+            qemu_mutex_unlock_iothread();
> >+            last_was_passive = false;
> >+            trace_colo_vm_state_change("stop", "start");
> >+            trace_colo_process_incoming_checkpoints_active();
> >+        }
> >+
> >
> >          qemu_fclose(fb);
> >          fb = NULL;
> >diff --git a/migration/migration.c b/migration/migration.c
> >index f7f0884..f84c676 100644
> >--- a/migration/migration.c
> >+++ b/migration/migration.c
> >@@ -49,6 +49,14 @@
> >  /* Migration XBZRLE default cache size */
> >  #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
> >
> >+/* COLO: The number of passive checkpoints to try before switching back
> >+ *       to COLO */
> >+#define DEFAULT_MIGRATE_COLO_PASSIVE_COUNT 100
> >+/* COLO Checkpoint time (ms) below which we switch into passive mode */
> >+#define DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT 400
> >+/* COLO passive mode checkpoint time (ms) */
> >+#define DEFAULT_MIGRATE_COLO_PASSIVE_TIME 250
> >+
> >  static NotifierList migration_state_notifiers =
> >      NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
> >
> >@@ -72,6 +80,12 @@ MigrationState *migrate_get_current(void)
> >                  DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
> >          .parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
> >                  DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
> >+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
> >+                DEFAULT_MIGRATE_COLO_PASSIVE_COUNT,
> >+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> >+                DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT,
> >+        .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
> >+                DEFAULT_MIGRATE_COLO_PASSIVE_TIME
> >          .checkpoint_state.max_downtime = 0,
> >          .checkpoint_state.min_downtime = INT64_MAX
> >      };
> >@@ -389,6 +403,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >              s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
> >      params->decompress_threads =
> >              s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
> >+    params->colo_passive_count = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
> >+    params->colo_passive_limit = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
> >+    params->colo_passive_time = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
> >
> >      return params;
> >  }
> >@@ -539,7 +556,14 @@ void qmp_migrate_set_parameters(bool has_compress_level,
> >                                  bool has_compress_threads,
> >                                  int64_t compress_threads,
> >                                  bool has_decompress_threads,
> >-                                int64_t decompress_threads, Error **errp)
> >+                                int64_t decompress_threads,
> >+                                bool has_colo_passive_count,
> >+                                int64_t colo_passive_count,
> >+                                bool has_colo_passive_limit,
> >+                                int64_t colo_passive_limit,
> >+                                bool has_colo_passive_time,
> >+                                int64_t colo_passive_time,
> >+                                Error **errp)
> >  {
> >      MigrationState *s = migrate_get_current();
> >
> >@@ -562,6 +586,21 @@ void qmp_migrate_set_parameters(bool has_compress_level,
> >                     "is invalid, it should be in the range of 1 to 255");
> >          return;
> >      }
> >+    if (has_colo_passive_count && (colo_passive_count < 0)) {
> >+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> >+                  "colo_passive_count",
> >+                  "is invalid, it must be positive");
> >+    }
> >+    if (has_colo_passive_limit && (colo_passive_limit < 0)) {
> >+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> >+                  "colo_passive_limit",
> >+                  "is invalid, it must be positive");
> >+    }
> >+    if (has_colo_passive_time && (colo_passive_time < 0)) {
> >+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> >+                  "colo_passive_time",
> >+                  "is invalid, it must be positive");
> >+    }
> >
> >      if (has_compress_level) {
> >          s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
> >@@ -573,6 +612,18 @@ void qmp_migrate_set_parameters(bool has_compress_level,
> >          s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
> >                                                      decompress_threads;
> >      }
> >+    if (has_colo_passive_count) {
> >+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] =
> >+                                                    colo_passive_count;
> >+    }
> >+    if (has_colo_passive_limit) {
> >+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> >+                                                    colo_passive_limit;
> >+    }
> >+    if (has_colo_passive_time) {
> >+        s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
> >+                                                    colo_passive_time;
> >+    }
> >  }
> >
> >  /* shared migration helpers */
> >@@ -689,6 +740,13 @@ static MigrationState *migrate_init(const MigrationParams *params)
> >              s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
> >      int decompress_thread_count =
> >              s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
> >+    int colo_passive_count = s->parameters[
> >+                            MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
> >+    int colo_passive_limit = s->parameters[
> >+                            MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
> >+    int colo_passive_time = s->parameters[
> >+                            MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
> >+
> >
> >      memcpy(enabled_capabilities, s->enabled_capabilities,
> >             sizeof(enabled_capabilities));
> >@@ -704,6 +762,11 @@ static MigrationState *migrate_init(const MigrationParams *params)
> >                 compress_thread_count;
> >      s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
> >                 decompress_thread_count;
> >+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT] = colo_passive_count;
> >+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
> >+                  colo_passive_limit;
> >+    s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] = colo_passive_time;
> >+
> >      s->bandwidth_limit = bandwidth_limit;
> >      migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
> >
> >diff --git a/qapi-schema.json b/qapi-schema.json
> >index 9f094d2..9d2b6d4 100644
> >--- a/qapi-schema.json
> >+++ b/qapi-schema.json
> >@@ -630,10 +630,15 @@
> >  #          compression, so set the decompress-threads to the number about 1/4
> >  #          of compress-threads is adequate.
> >  #
> >+# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
> >+# @colo-passive-limit: Time (in ms) below which we switch into passive mode
> >+# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
> >+#
> >  # Since: 2.4
> >  ##
> >  { 'enum': 'MigrationParameter',
> >-  'data': ['compress-level', 'compress-threads', 'decompress-threads'] }
> >+  'data': ['compress-level', 'compress-threads', 'decompress-threads',
> >+           'colo-passive-count', 'colo-passive-limit', 'colo-passive-time'] }
> >
> >  #
> >  # @migrate-set-parameters
> >@@ -646,12 +651,19 @@
> >  #
> >  # @decompress-threads: decompression thread count
> >  #
> >+# @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
> >+# @colo-passive-limit: Time (in ms) below which we switch into passive mode
> >+# @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
> >+#
> >  # Since: 2.4
> >  ##
> >  { 'command': 'migrate-set-parameters',
> >    'data': { '*compress-level': 'int',
> >              '*compress-threads': 'int',
> >-            '*decompress-threads': 'int'} }
> >+            '*decompress-threads': 'int',
> >+            '*colo-passive-count': 'int',
> >+            '*colo-passive-limit': 'int',
> >+            '*colo-passive-time': 'int' } }
> >
> >  #
> >  # @MigrationParameters
> >@@ -667,7 +679,11 @@
> >  { 'struct': 'MigrationParameters',
> >    'data': { 'compress-level': 'int',
> >              'compress-threads': 'int',
> >-            'decompress-threads': 'int'} }
> >+            'decompress-threads': 'int',
> >+            'colo-passive-count': 'int',
> >+            'colo-passive-limit': 'int',
> >+            'colo-passive-time': 'int' } }
> >+
> >  ##
> >  # @query-migrate-parameters
> >  #
> >diff --git a/qmp-commands.hx b/qmp-commands.hx
> >index 4fd01a7..a809710 100644
> >--- a/qmp-commands.hx
> >+++ b/qmp-commands.hx
> >@@ -3501,6 +3501,10 @@ Set migration parameters
> >  - "compress-level": set compression level during migration (json-int)
> >  - "compress-threads": set compression thread count for migration (json-int)
> >  - "decompress-threads": set decompression thread count for migration (json-int)
> >+- "colo-passive-count": In COLO, the number of passive checkpoint cycles to
> >+                        try before reverting to COLO mode
> >+- "colo-passive-limit": Time (in ms) below which we switch into passive mode
> >+- "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
> >
> >  Arguments:
> >
> >@@ -3527,6 +3531,11 @@ Query current migration parameters
> >           - "compress-level" : compression level value (json-int)
> >           - "compress-threads" : compression thread count value (json-int)
> >           - "decompress-threads" : decompression thread count value (json-int)
> >+         - "colo-passive-count": In COLO, the number of passive checkpoints
> >+                                 cycles to try before reverting to COLO mode
> >+         - "colo-passive-limit": Time (in ms) below which we switch into
> >+                                 passive mode
> >+         - "colo-passive-time": Time (in ms) for a COLO passive mode checkpoint
> >
> >  Arguments:
> >
> >diff --git a/trace-events b/trace-events
> >index 03cd035..4d74166 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1408,6 +1408,14 @@ migrate_state_too_big(void) ""
> >  migrate_global_state_post_load(const char *state) "loaded state: %s"
> >  migrate_global_state_pre_save(const char *state) "saved state: %s"
> >
> >+# migration/colo.c
> >+checkpoint_choice_to_colo(void) ""
> >+checkpoint_choice_to_passive(double check_time) "weighted checkpoint time=%f"
> >+colo_process_incoming_checkpoints_new(int mode, int last_was_passive) "mode=%d last_was_passive=%d"
> >+colo_process_incoming_checkpoints_passive(void) ""
> >+colo_process_incoming_checkpoints_active(void) ""
> >+checkpoint_choice(uint64_t passive_count, uint64_t colo_checkpoint_time_count, double colo_checkpoint_time_mean) "%" PRIu64 "/%" PRIu64 " %f"
> >+
> >  # migration/rdma.c
> >  qemu_rdma_accept_incoming_migration(void) ""
> >  qemu_rdma_accept_incoming_migration_accepted(void) ""
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC/COLO:  2/3] Parameterise min/max/relax time
  2015-08-04 19:26 [Qemu-devel] [RFC/COLO: 0/3] Hybrid mode and parameterisation Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode Dr. David Alan Gilbert (git)
@ 2015-08-04 19:26 ` Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 3/3] COLO: Parameterise background RAM transfer limit Dr. David Alan Gilbert (git)
  2 siblings, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-08-04 19:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: arei.gonglei, yanghy, zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---
 hmp-commands.hx        | 15 -------------
 hmp.c                  | 31 ++++++++++++++++++++------
 hmp.h                  |  1 -
 migration/colo.c       | 32 ++++++---------------------
 migration/migration.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++--
 qapi-schema.json       | 33 ++++++++++++++--------------
 qmp-commands.hx        | 22 -------------------
 stubs/migration-colo.c |  4 ----
 8 files changed, 105 insertions(+), 92 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9164961..410637f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1049,21 +1049,6 @@ Tell COLO that heartbeat is lost, a failover or takeover is needed.
 ETEXI
 
     {
-        .name       = "colo_set_checkpoint_period",
-        .args_type  = "value:i",
-        .params     = "value",
-        .help       = "set checkpoint period (in ms) for colo. "
-        "Defaults to 100ms",
-        .mhandler.cmd = hmp_colo_set_checkpoint_period,
-    },
-
-STEXI
-@item migrate_set_checkpoint_period @var{value}
-@findex migrate_set_checkpoint_period
-Set checkpoint period to @var{value} (in ms) for colo.
-ETEXI
-
-    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 8828756..0e92d11 100644
--- a/hmp.c
+++ b/hmp.c
@@ -298,6 +298,15 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: %" PRId64,
             MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_PASSIVE_TIME],
             params->colo_passive_time);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_MIN_TIME],
+            params->colo_min_time);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_MAX_TIME],
+            params->colo_max_time);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_RELAX_TIME],
+            params->colo_relax_time);
 
         monitor_printf(mon, "\n");
     }
@@ -1251,6 +1260,9 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     bool has_colo_passive_count = false;
     bool has_colo_passive_limit = false;
     bool has_colo_passive_time = false;
+    bool has_colo_min_time = false;
+    bool has_colo_max_time = false;
+    bool has_colo_relax_time = false;
 
     int i;
 
@@ -1275,6 +1287,15 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
             case MIGRATION_PARAMETER_COLO_PASSIVE_TIME:
                 has_colo_passive_time = true;
                 break;
+            case MIGRATION_PARAMETER_COLO_MIN_TIME:
+                has_colo_min_time = true;
+                break;
+            case MIGRATION_PARAMETER_COLO_MAX_TIME:
+                has_colo_max_time = true;
+                break;
+            case MIGRATION_PARAMETER_COLO_RELAX_TIME:
+                has_colo_relax_time = true;
+                break;
             }
             qmp_migrate_set_parameters(has_compress_level, value,
                                        has_compress_threads, value,
@@ -1282,6 +1303,9 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                                        has_colo_passive_count, value,
                                        has_colo_passive_limit, value,
                                        has_colo_passive_time, value,
+                                       has_colo_min_time, value,
+                                       has_colo_max_time, value,
+                                       has_colo_relax_time, value,
                                        &err);
             break;
         }
@@ -1323,13 +1347,6 @@ void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
-void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict)
-{
-    int64_t value = qdict_get_int(qdict, "value");
-
-    qmp_colo_set_checkpoint_period(value, NULL);
-}
-
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index d66dc76..c36c99c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -69,7 +69,6 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
-void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration/colo.c b/migration/colo.c
index 37f63f2..5c8096d 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -23,20 +23,6 @@
 #include "block/block_int.h"
 #include "trace.h"
 
-/*
-* We should not do checkpoint one after another without any time interval,
-* Because this will lead continuous 'stop' status for VM.
-* CHECKPOINT_MIN_PERIOD is the min time limit between two checkpoint action.
-*/
-#define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
-
-/*
- * force checkpoint timer: unit ms
- * this is large because COLO checkpoint will mostly depend on
- * COLO compare module.
- */
-#define CHECKPOINT_MAX_PEROID 10000
-
 /* Fix me: Convert to use QAPI */
 typedef enum COLOCommand {
     COLO_CHECPOINT_READY = 0x46,
@@ -93,8 +79,6 @@ const char * const COLOCommand_lookup[] = {
 static QEMUBH *colo_bh;
 static bool vmstate_loading;
 
-int64_t colo_checkpoint_period = CHECKPOINT_MAX_PEROID;
-
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -117,11 +101,6 @@ bool migration_incoming_in_colo_state(void)
     return (mis && (mis->state == MIGRATION_STATUS_COLO));
 }
 
-void qmp_colo_set_checkpoint_period(int64_t value, Error **errp)
-{
-    colo_checkpoint_period = value;
-}
-
 static bool colo_runstate_is_stopped(void)
 {
     return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
@@ -611,9 +590,11 @@ static void *colo_thread(void *opaque)
 
             current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
             interval = current_time - checkpoint_time;
-            if (interval < CHECKPOINT_MIN_PERIOD) {
+            if (interval < s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME]) {
                 /* Limit the min time between two checkpoint */
-                g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
+                g_usleep((1000 * 
+                         (s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME] -
+                          interval)));
             }
             s->checkpoint_state.proxy_discompare_count++;
             goto do_checkpoint;
@@ -621,7 +602,7 @@ static void *colo_thread(void *opaque)
 
         checkpoint_limit = passive_count ?
             s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] :
-            colo_checkpoint_period;
+            s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME];
 
         /*
          * No proxy checkpoint is request, wait for 100ms or
@@ -641,7 +622,8 @@ static void *colo_thread(void *opaque)
                 }
                 s->checkpoint_state.live_transfer_pages += ret;
             } else {
-                g_usleep(100000);
+                g_usleep(1000 *
+                         s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME]);
             }
             continue;
         } else {
diff --git a/migration/migration.c b/migration/migration.c
index f84c676..d41914c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -56,6 +56,11 @@
 #define DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT 400
 /* COLO passive mode checkpoint time (ms) */
 #define DEFAULT_MIGRATE_COLO_PASSIVE_TIME 250
+/* COLO minimum/maximum times for normal comparitive checkpoint (ms) */
+#define DEFAULT_MIGRATE_COLO_MIN_TIME 100
+#define DEFAULT_MIGRATE_COLO_MAX_TIME 10000
+/* Time after a miscompare before resuming comparison (ms) */
+#define DEFAULT_MIGRATE_COLO_RELAX_TIME 100
 
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -85,7 +90,13 @@ MigrationState *migrate_get_current(void)
         .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
                 DEFAULT_MIGRATE_COLO_PASSIVE_LIMIT,
         .parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
-                DEFAULT_MIGRATE_COLO_PASSIVE_TIME
+                DEFAULT_MIGRATE_COLO_PASSIVE_TIME,
+        .parameters[MIGRATION_PARAMETER_COLO_MIN_TIME] =
+                DEFAULT_MIGRATE_COLO_MIN_TIME,
+        .parameters[MIGRATION_PARAMETER_COLO_MAX_TIME] =
+                DEFAULT_MIGRATE_COLO_MAX_TIME,
+        .parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] =
+                DEFAULT_MIGRATE_COLO_RELAX_TIME,
         .checkpoint_state.max_downtime = 0,
         .checkpoint_state.min_downtime = INT64_MAX
     };
@@ -406,6 +417,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->colo_passive_count = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_COUNT];
     params->colo_passive_limit = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
     params->colo_passive_time = s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
+    params->colo_min_time = s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME];
+    params->colo_max_time = s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME];
+    params->colo_relax_time = s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME];
 
     return params;
 }
@@ -563,6 +577,12 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                                 int64_t colo_passive_limit,
                                 bool has_colo_passive_time,
                                 int64_t colo_passive_time,
+                                bool has_colo_min_time,
+                                int64_t colo_min_time,
+                                bool has_colo_max_time,
+                                int64_t colo_max_time,
+                                bool has_colo_relax_time,
+                                int64_t colo_relax_time,
                                 Error **errp)
 {
     MigrationState *s = migrate_get_current();
@@ -601,6 +621,21 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                   "colo_passive_time",
                   "is invalid, it must be positive");
     }
+    if (has_colo_min_time && (colo_min_time < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_min_time",
+                  "is invalid, it must be positive");
+    }
+    if (has_colo_max_time && (colo_max_time < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_max_time",
+                  "is invalid, it must be positive");
+    }
+    if (has_colo_relax_time && (colo_relax_time < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_relax_time",
+                  "is invalid, it must be positive");
+    }
 
     if (has_compress_level) {
         s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
@@ -624,6 +659,18 @@ void qmp_migrate_set_parameters(bool has_compress_level,
         s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] =
                                                     colo_passive_time;
     }
+    if (has_colo_min_time) {
+        s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME] =
+                                                    colo_min_time;
+    }
+    if (has_colo_max_time) {
+        s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME] =
+                                                    colo_max_time;
+    }
+    if (has_colo_relax_time) {
+        s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] =
+                                                    colo_relax_time;
+    }
 }
 
 /* shared migration helpers */
@@ -746,7 +793,12 @@ static MigrationState *migrate_init(const MigrationParams *params)
                             MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT];
     int colo_passive_time = s->parameters[
                             MIGRATION_PARAMETER_COLO_PASSIVE_TIME];
-
+    int64_t colo_min_time = s->parameters[
+                            MIGRATION_PARAMETER_COLO_MIN_TIME];
+    int64_t colo_max_time = s->parameters[
+                            MIGRATION_PARAMETER_COLO_MAX_TIME];
+    int64_t colo_relax_time = s->parameters[
+                            MIGRATION_PARAMETER_COLO_RELAX_TIME];
 
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
@@ -766,6 +818,9 @@ static MigrationState *migrate_init(const MigrationParams *params)
     s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_LIMIT] =
                   colo_passive_limit;
     s->parameters[MIGRATION_PARAMETER_COLO_PASSIVE_TIME] = colo_passive_time;
+    s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME] = colo_min_time;
+    s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME] = colo_max_time;
+    s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] = colo_relax_time;
 
     s->bandwidth_limit = bandwidth_limit;
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
diff --git a/qapi-schema.json b/qapi-schema.json
index 9d2b6d4..30113fc 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -633,12 +633,16 @@
 # @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
 # @colo-passive-limit: Time (in ms) below which we switch into passive mode
 # @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
+# @colo-min-time: Minimum Time (in ms) for a COLO comparative checkpoint
+# @colo-max-time: Maximum Time (in ms) for a COLO comparative checkpoint
+# @colo-relax-time: Time (in ms) after a miscompare before starting a new COLO checkpoint
 #
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
-           'colo-passive-count', 'colo-passive-limit', 'colo-passive-time'] }
+           'colo-passive-count', 'colo-passive-limit', 'colo-passive-time',
+           'colo-min-time', 'colo-max-time', 'colo-relax-time' ] }
 
 #
 # @migrate-set-parameters
@@ -654,6 +658,9 @@
 # @colo-passive-count: Time (in ms) for a COLO passive mode checkpoint
 # @colo-passive-limit: Time (in ms) below which we switch into passive mode
 # @colo-passive-time: Time (in ms) for a COLO passive mode checkpoint
+# @colo-min-time: Minimum Time (in ms) for a COLO comparative checkpoint
+# @colo-max-time: Maximum Time (in ms) for a COLO comparative checkpoint
+# @colo-relax-time: Time (in ms) after a miscompare before starting a new COLO checkpoint
 #
 # Since: 2.4
 ##
@@ -663,7 +670,11 @@
             '*decompress-threads': 'int',
             '*colo-passive-count': 'int',
             '*colo-passive-limit': 'int',
-            '*colo-passive-time': 'int' } }
+            '*colo-passive-time': 'int',
+            '*colo-min-time': 'int',
+            '*colo-max-time': 'int',
+            '*colo-relax-time': 'int'
+          } }
 
 #
 # @MigrationParameters
@@ -682,7 +693,10 @@
             'decompress-threads': 'int',
             'colo-passive-count': 'int',
             'colo-passive-limit': 'int',
-            'colo-passive-time': 'int' } }
+            'colo-passive-time': 'int',
+            'colo-min-time': 'int',
+            'colo-max-time': 'int',
+            'colo-relax-time': 'int' } }
 
 ##
 # @query-migrate-parameters
@@ -741,19 +755,6 @@
 { 'command': 'colo-lost-heartbeat' }
 
 ##
-# @colo-set-checkpoint-period
-#
-# Set colo checkpoint period
-#
-# @value: period of colo checkpoint in ms
-#
-# Returns: nothing on success
-#
-# Since: 2.4
-##
-{ 'command': 'colo-set-checkpoint-period', 'data': {'value': 'int'} }
-
-##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index a809710..7dd67cf 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -800,28 +800,6 @@ Example:
 EQMP
 
     {
-         .name       = "colo-set-checkpoint-period",
-         .args_type  = "value:i",
-         .mhandler.cmd_new = qmp_marshal_input_colo_set_checkpoint_period,
-    },
-
-SQMP
-colo-set-checkpoint-period
---------------------------
-
-set checkpoint period
-
-Arguments:
-- "value": checkpoint period
-
-Example:
-
--> { "execute": "colo-set-checkpoint-period", "arguments": { "value": "1000" } }
-<- { "return": {} }
-
-EQMP
-
-    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 9d3b9e7..0edc59c 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -52,7 +52,3 @@ void qmp_colo_lost_heartbeat(Error **errp)
                      " with --enable-colo option in order to support"
                      " COLO feature");
 }
-
-void qmp_colo_set_checkpoint_period(int64_t value, Error **errp)
-{
-}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [RFC/COLO: 3/3] COLO: Parameterise background RAM transfer limit
  2015-08-04 19:26 [Qemu-devel] [RFC/COLO: 0/3] Hybrid mode and parameterisation Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode Dr. David Alan Gilbert (git)
  2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 2/3] Parameterise min/max/relax time Dr. David Alan Gilbert (git)
@ 2015-08-04 19:26 ` Dr. David Alan Gilbert (git)
  2 siblings, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-08-04 19:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: arei.gonglei, yanghy, zhang.zhanghailiang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

COLO (experimentally) transfers RAM in the background when the amount
to transfer reaches a limit; allow this limit to be set via the
parameter mechanism.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp.c                 |  8 ++++++++
 migration/colo.c      |  6 ++----
 migration/migration.c | 19 +++++++++++++++++++
 qapi-schema.json      | 13 ++++++++++---
 4 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/hmp.c b/hmp.c
index 0e92d11..c177fbf 100644
--- a/hmp.c
+++ b/hmp.c
@@ -307,6 +307,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, " %s: %" PRId64,
             MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_RELAX_TIME],
             params->colo_relax_time);
+        monitor_printf(mon, " %s: %" PRId64,
+            MigrationParameter_lookup[MIGRATION_PARAMETER_COLO_RAM_LIVE],
+            params->colo_ram_live);
 
         monitor_printf(mon, "\n");
     }
@@ -1263,6 +1266,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     bool has_colo_min_time = false;
     bool has_colo_max_time = false;
     bool has_colo_relax_time = false;
+    bool has_colo_ram_live = false;
 
     int i;
 
@@ -1296,6 +1300,9 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
             case MIGRATION_PARAMETER_COLO_RELAX_TIME:
                 has_colo_relax_time = true;
                 break;
+            case MIGRATION_PARAMETER_COLO_RAM_LIVE:
+                has_colo_ram_live = true;
+                break;
             }
             qmp_migrate_set_parameters(has_compress_level, value,
                                        has_compress_threads, value,
@@ -1306,6 +1313,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
                                        has_colo_min_time, value,
                                        has_colo_max_time, value,
                                        has_colo_relax_time, value,
+                                       has_colo_ram_live, value,
                                        &err);
             break;
         }
diff --git a/migration/colo.c b/migration/colo.c
index 5c8096d..9db9de1 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -486,13 +486,11 @@ static bool checkpoint_choice(MigrationState *s)
     }
 }
 
-/* should be calculated by bandwidth and max downtime ? */
-#define THRESHOLD_PENDING_SIZE (10 * 1024 * 1024UL)
-
 static int colo_need_live_migrate_ram(MigrationState *s)
 {
     uint64_t pending_size;
-    int64_t max_size = THRESHOLD_PENDING_SIZE;
+    int64_t max_size = s->parameters[MIGRATION_PARAMETER_COLO_RAM_LIVE] *
+                       1024 * 1024;
 
     pending_size = qemu_savevm_state_pending(s->file, max_size);
     return (pending_size && pending_size >= max_size);
diff --git a/migration/migration.c b/migration/migration.c
index d41914c..d4214be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -61,6 +61,8 @@
 #define DEFAULT_MIGRATE_COLO_MAX_TIME 10000
 /* Time after a miscompare before resuming comparison (ms) */
 #define DEFAULT_MIGRATE_COLO_RELAX_TIME 100
+/* Amount of RAM changes to trigger background RAM transfer (MiB) */ 
+#define DEFAULT_MIGRATE_COLO_RAM_LIVE 10
 
 static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -97,6 +99,8 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_COLO_MAX_TIME,
         .parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] =
                 DEFAULT_MIGRATE_COLO_RELAX_TIME,
+        .parameters[MIGRATION_PARAMETER_COLO_RAM_LIVE] =
+                DEFAULT_MIGRATE_COLO_RAM_LIVE,
         .checkpoint_state.max_downtime = 0,
         .checkpoint_state.min_downtime = INT64_MAX
     };
@@ -420,6 +424,7 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->colo_min_time = s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME];
     params->colo_max_time = s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME];
     params->colo_relax_time = s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME];
+    params->colo_ram_live = s->parameters[MIGRATION_PARAMETER_COLO_RAM_LIVE];
 
     return params;
 }
@@ -583,6 +588,8 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                                 int64_t colo_max_time,
                                 bool has_colo_relax_time,
                                 int64_t colo_relax_time,
+                                bool has_colo_ram_live,
+                                int64_t colo_ram_live,
                                 Error **errp)
 {
     MigrationState *s = migrate_get_current();
@@ -636,6 +643,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
                   "colo_relax_time",
                   "is invalid, it must be positive");
     }
+    if (has_colo_ram_live && (colo_ram_live < 0)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                  "colo_ram_live",
+                  "is invalid, it must be positive");
+    }
 
     if (has_compress_level) {
         s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
@@ -671,6 +683,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
         s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] =
                                                     colo_relax_time;
     }
+    if (has_colo_ram_live) {
+        s->parameters[MIGRATION_PARAMETER_COLO_RAM_LIVE] =
+                                                    colo_ram_live;
+    }
 }
 
 /* shared migration helpers */
@@ -799,6 +815,8 @@ static MigrationState *migrate_init(const MigrationParams *params)
                             MIGRATION_PARAMETER_COLO_MAX_TIME];
     int64_t colo_relax_time = s->parameters[
                             MIGRATION_PARAMETER_COLO_RELAX_TIME];
+    int64_t colo_ram_live = s->parameters[
+                            MIGRATION_PARAMETER_COLO_RAM_LIVE];
 
     memcpy(enabled_capabilities, s->enabled_capabilities,
            sizeof(enabled_capabilities));
@@ -821,6 +839,7 @@ static MigrationState *migrate_init(const MigrationParams *params)
     s->parameters[MIGRATION_PARAMETER_COLO_MIN_TIME] = colo_min_time;
     s->parameters[MIGRATION_PARAMETER_COLO_MAX_TIME] = colo_max_time;
     s->parameters[MIGRATION_PARAMETER_COLO_RELAX_TIME] = colo_relax_time;
+    s->parameters[MIGRATION_PARAMETER_COLO_RAM_LIVE] = colo_ram_live;
 
     s->bandwidth_limit = bandwidth_limit;
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
diff --git a/qapi-schema.json b/qapi-schema.json
index 30113fc..bd264d0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -636,13 +636,15 @@
 # @colo-min-time: Minimum Time (in ms) for a COLO comparative checkpoint
 # @colo-max-time: Maximum Time (in ms) for a COLO comparative checkpoint
 # @colo-relax-time: Time (in ms) after a miscompare before starting a new COLO checkpoint
+# @colo-ram-live: Amount of RAM changes to trigger background RAM transfer (MiB)
 #
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
            'colo-passive-count', 'colo-passive-limit', 'colo-passive-time',
-           'colo-min-time', 'colo-max-time', 'colo-relax-time' ] }
+           'colo-min-time', 'colo-max-time', 'colo-relax-time',
+           'colo-ram-live' ] }
 
 #
 # @migrate-set-parameters
@@ -661,6 +663,7 @@
 # @colo-min-time: Minimum Time (in ms) for a COLO comparative checkpoint
 # @colo-max-time: Maximum Time (in ms) for a COLO comparative checkpoint
 # @colo-relax-time: Time (in ms) after a miscompare before starting a new COLO checkpoint
+# @colo-ram-live: Amount of RAM changes to trigger background RAM transfer (MiB)
 #
 # Since: 2.4
 ##
@@ -673,7 +676,8 @@
             '*colo-passive-time': 'int',
             '*colo-min-time': 'int',
             '*colo-max-time': 'int',
-            '*colo-relax-time': 'int'
+            '*colo-relax-time': 'int',
+            '*colo-ram-live': 'int'
           } }
 
 #
@@ -685,6 +689,8 @@
 #
 # @decompress-threads: decompression thread count
 #
+# @colo-ram-live: Amount of RAM changes to trigger background RAM transfer (MiB)
+#
 # Since: 2.4
 ##
 { 'struct': 'MigrationParameters',
@@ -696,7 +702,8 @@
             'colo-passive-time': 'int',
             'colo-min-time': 'int',
             'colo-max-time': 'int',
-            'colo-relax-time': 'int' } }
+            'colo-relax-time': 'int',
+            'colo-ram-live': 'int' } }
 
 ##
 # @query-migrate-parameters
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-24 17:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 19:26 [Qemu-devel] [RFC/COLO: 0/3] Hybrid mode and parameterisation Dr. David Alan Gilbert (git)
2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 1/3] COLO: Hybrid mode Dr. David Alan Gilbert (git)
2015-08-05  3:23   ` zhanghailiang
2015-08-24 17:54     ` Dr. David Alan Gilbert
2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 2/3] Parameterise min/max/relax time Dr. David Alan Gilbert (git)
2015-08-04 19:26 ` [Qemu-devel] [RFC/COLO: 3/3] COLO: Parameterise background RAM transfer limit Dr. David Alan Gilbert (git)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).