[PATCH v2 0/3] migration: auto-converge refinements for huge VM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] migration: auto-converge refinements for huge VM
@ 2024-09-29 17:14 yong.huang
  2024-09-29 17:14 ` [PATCH v2 1/3] migration: Support background ramblock dirty sync yong.huang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: yong.huang @ 2024-09-29 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini, yong.huang

From: Hyman Huang <yong.huang@smartx.com>

v2:
1. background sync - Throw out the idea of "not updating the bitmap"
   when the RAMBlock of the RAM list is iterated during migration;
   re-implement the background RAM dirty sync using Peter's updated
   method.

2. responsive throttle - Rename the "cpu-responsive throttle" to
   "cpu-throttle-responsive," as suggested by Fabiano.

For the background sync refinement, The new method achieves a much
shorter migration time overall and is much simpler than the previous
approach.

The test approach is the same as in version 1, but we use a new test
host with 1.93 Gb bandwidth. Please refer to the information included
in version 1 below for more details.

The new test result for background dirty sync is as follows; we didn't
test the responsive throttle feature because its logic is still the same.

For each scenario, we tested twice and used 3500MB as high memory load.
ramsize: 150MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 57579     | 577      | 107       | 90%          |
|            | 57434     | 466      | 110       | 90%          |
|------------+-----------+----------+-----------+--------------|
| background | 57728     | 488      | 102       | 90%          |
| sync       | 60223     | 447      | 115       | 90%          |
|------------+-----------+----------+-----------+--------------|

ramsize: 3000MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 291275    | 416      | 26        | 99%          |
|            | 290907    | 567      | 27        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 210912    | 465      | 30        | 99%          |
| sync       | 203890    | 442      | 28        | 99%          |
|------------+-----------+----------+-----------+--------------|

ramsize: 3500MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 338445    | 491      | 28        | 99%          |
|            | 341168    | 496      | 28        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 209099    | 440      | 27        | 99%          |
| sync       | 206241    | 467      | 30        | 99%          |
|------------+-----------+----------+-----------+--------------|

v1:
This is the first version for auto-converge refinements; refer to the
following link for details about the RFC version:
https://patchew.org/QEMU/cover.1725891841.git.yong.huang@smartx.com/

This series introduces two refinements called "background sync" and
"responsive throttle," respectively.

1. background sync:

   The original auto-converge throttle logic doesn't look like it will
   scale because migration_trigger_throttle() is only called for each
   iteration, so it won't be invoked for a long time if one iteration
   can take a long time.

   The background sync would fix this issue by implementing the background
   dirty bitmap sync and throttle automatically once detect that
   the iteration lasts a long time during the migration.

   The background sync is implemented transparently, and there is no
   new-added interface for upper apps.

2. responsive throttle:

   The original auto-converge throttle logic determines if the migration
   is convergent by one criteria, and if the iteration fits twice, then
   launch the CPU throttle or increase the throttle percentage. This
   results in that the migration_trigger_throttle() won't be invoked for
   a long time if one iteration can take a long time too.

   The responsive throttle introduce one more criteria to assist detecting
   the convergence of the migration, if either of the two criteria is
   met, migration_trigger_throttle() would be called. This also makes it
   more likely that the CPU throttle will be activated, thereby
   accelerating the migration process.

   The responsive throttle provides the 'cpu-responsive-throttle' option
   to enable this feature.

We test this two features with the following environment:

a. Test tool:
guestperf

Refer to the following link to see details:
https://github.com/qemu/qemu/tree/master/tests/migration/guestperf

b. Test VM scale:
CPU: 16; Memory: 100GB

c. Average bandwidth between source and destination for migration:
1.59 Gbits/sec

We use stress tool contained in the initrd-stress.img to update
ramsize MB on every CPU in guest, refer to the following link to
see the source code:
https://github.com/qemu/qemu/blob/master/tests/migration/stress.c

The following command is executed to compare our refined QEMU with the
original QEMU:
# python3.6 guestperf.py --binary /path/to/qemu-kvm --cpus 16 \
--mem 100 --max-iters 200 --max-time 1200  --dst-host {dst_ip} \
--kernel /path/to/vmlinuz --initrd /path/to/initrd-stress.img \
--transport tcp --downtime 500 --auto-converge --auto-converge-step 10 \
--verbose --stress-mem {ramsize}

We set ramsize to 150MB to simulate the light load, 3000MB as moderate
load and 5000MB as heavy load. Test cases were executed three times in
each scenario.

The following data shows the migration test results with an increase in
stress.

ramsize: 150MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 123685    | 490      | 87        | 99%          |
|            | 116249    | 542      | 45        | 60%          |
|            | 107772    | 587      | 8         | 0%           |
|------------+-----------+----------+-----------+--------------|
| background | 113744    | 1654     | 16        | 20%          |
| sync       | 122623    | 758      | 60        | 80%          |
|            | 112668    | 547      | 23        | 20%          |
|------------+-----------+----------+-----------+--------------|
| background | 113660    | 573      | 5         | 0%           |
| sync +     | 109357    | 576      | 6         | 0%           |
| responsive | 126792    | 494      | 37        | 99%          |
| throttle   |           |          |           |              |
|------------+-----------+----------+-----------+--------------|

ramsize: 3000MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 404398    | 515      | 26        | 99%          |
|            | 392552    | 528      | 25        | 99%          |
|            | 400113    | 447      | 24        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 239151    | 681      | 25        | 99%          |
| sync       | 295047    | 587      | 41        | 99%          |
|            | 289936    | 681      | 34        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 212786    | 487      | 22        | 99%          |
| sync +     | 225246    | 666      | 23        | 99%          |
| responsive | 244053    | 572      | 27        | 99%          |
| throttle   |           |          |           |              |
|------------+-----------+----------+-----------+--------------|

ramsize: 5000MB
|------------+-----------+----------+-----------+--------------|
|            | totaltime | downtime | iteration | max throttle |
|            | (ms)      | (ms)     | count     | percent      |
|------------+-----------+----------+-----------+--------------|
| original   | 566357    | 644      | 22        | 99%          |
|            | 607471    | 320      | 23        | 99%          |
|            | 603136    | 417      | 22        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 284605    | 793      | 27        | 99%          |
| sync       | 272270    | 668      | 28        | 99%          |
|            | 267543    | 545      | 28        | 99%          |
|------------+-----------+----------+-----------+--------------|
| background | 226446    | 413      | 22        | 99%          |
| sync +     | 232082    | 494      | 23        | 99%          |
| responsive | 269863    | 533      | 23        | 99%          |
| throttle   |           |          |           |              |
|------------+-----------+----------+-----------+--------------|

To summarize the data above, any data that implies negative optimization
does not appear, the refinement saves the total time significantly and,
therefore, shortens the duration of the guest performance degradation.

Additionally, we examined the memory performance curves generated from
the test case results above; while no negative optimization is there,
but the performance degradation occurs more quickly. Since it is
inconvenient to display the graphic data, one can independently
verify it.

Hyman Huang (3):
  migration: Support background ramblock dirty sync
  qapi/migration: Introduce cpu-throttle-responsive parameter
  migration: Support responsive CPU throttle

 include/migration/misc.h       |   3 +
 migration/migration-hmp-cmds.c |   8 +++
 migration/migration.c          |  11 +++
 migration/options.c            |  20 ++++++
 migration/options.h            |   1 +
 migration/ram.c                | 119 ++++++++++++++++++++++++++++++++-
 migration/ram.h                |   3 +
 migration/trace-events         |   2 +
 qapi/migration.json            |  16 ++++-
 system/cpu-timers.c            |   2 +
 tests/qtest/migration-test.c   |  30 +++++++++
 11 files changed, 212 insertions(+), 3 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] migration: Support background ramblock dirty sync
  2024-09-29 17:14 [PATCH v2 0/3] migration: auto-converge refinements for huge VM yong.huang
@ 2024-09-29 17:14 ` yong.huang
  2024-09-30 20:41   ` Peter Xu
  2024-09-29 17:14 ` [PATCH v2 2/3] qapi/migration: Introduce cpu-throttle-responsive parameter yong.huang
  2024-09-29 17:14 ` [PATCH v2 3/3] migration: Support responsive CPU throttle yong.huang
  2 siblings, 1 reply; 12+ messages in thread
From: yong.huang @ 2024-09-29 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini, yong.huang

From: Hyman Huang <yong.huang@smartx.com>

When VM is configured with huge memory, the current throttle logic
doesn't look like to scale, because migration_trigger_throttle()
is only called for each iteration, so it won't be invoked for a long
time if one iteration can take a long time.

The background dirty sync aim to fix the above issue by synchronizing
the ramblock from remote dirty bitmap and, when necessary, triggering
the CPU throttle multiple times during a long iteration.

This is a trade-off between synchronization overhead and CPU throttle
impact.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 include/migration/misc.h     |  3 ++
 migration/migration.c        | 11 +++++++
 migration/ram.c              | 64 ++++++++++++++++++++++++++++++++++++
 migration/ram.h              |  3 ++
 migration/trace-events       |  1 +
 system/cpu-timers.c          |  2 ++
 tests/qtest/migration-test.c | 29 ++++++++++++++++
 7 files changed, 113 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index bfadc5613b..67c00d98f5 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
 /* migration/block-dirty-bitmap.c */
 void dirty_bitmap_mig_init(void);
 
+/* migration/ram.c */
+void bg_ram_dirty_sync_init(void);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 3dea06d577..224b5dfb4f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3285,6 +3285,9 @@ static void migration_iteration_finish(MigrationState *s)
 {
     /* If we enabled cpu throttling for auto-converge, turn it off. */
     cpu_throttle_stop();
+    if (migrate_auto_converge()) {
+        bg_ram_dirty_sync_timer_enable(false);
+    }
 
     bql_lock();
     switch (s->state) {
@@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
 
     trace_migration_thread_setup_complete();
 
+    /*
+     * Tick the background ramblock dirty sync timer after setup
+     * phase.
+     */
+    if (migrate_auto_converge()) {
+        bg_ram_dirty_sync_timer_enable(true);
+    }
+
     while (migration_is_active()) {
         if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
             MigIterateState iter_state = migration_iteration_run(s);
diff --git a/migration/ram.c b/migration/ram.c
index 67ca3d5d51..995bae1ac9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -110,6 +110,12 @@
  */
 #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
 
+/* Background ramblock dirty sync trigger every five seconds */
+#define BG_RAM_SYNC_TIMESLICE_MS 5000
+#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000
+
+static QEMUTimer *bg_ram_dirty_sync_timer;
+
 XBZRLECacheStats xbzrle_counters;
 
 /* used by the search for pages to send */
@@ -4543,6 +4549,64 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
     }
 }
 
+static void bg_ram_dirty_sync_timer_tick(void *opaque)
+{
+    static int prev_pct;
+    static uint64_t prev_sync_cnt = 2;
+    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
+    int cur_pct = cpu_throttle_get_percentage();
+
+    if (prev_pct && !cur_pct) {
+        /* CPU throttle timer stopped, so do we */
+        return;
+    }
+
+    /*
+     * The first iteration copies all memory anyhow and has no
+     * effect on guest performance, therefore omit it to avoid
+     * paying extra for the sync penalty.
+     */
+    if (sync_cnt <= 1) {
+        goto end;
+    }
+
+    if (sync_cnt == prev_sync_cnt) {
+        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        assert(ram_state);
+        if ((curr_time - ram_state->time_last_bitmap_sync) >
+            BG_RAM_SYNC_TIMESLICE_MS) {
+            trace_bg_ram_dirty_sync();
+            WITH_RCU_READ_LOCK_GUARD() {
+                migration_bitmap_sync_precopy(ram_state, false);
+            }
+        }
+    }
+
+end:
+    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
+    prev_pct = cpu_throttle_get_percentage();
+
+    timer_mod(bg_ram_dirty_sync_timer,
+        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
+            BG_RAM_SYNC_TIMER_INTERVAL_MS);
+}
+
+void bg_ram_dirty_sync_timer_enable(bool enable)
+{
+    if (enable) {
+        bg_ram_dirty_sync_timer_tick(NULL);
+    } else {
+        timer_del(bg_ram_dirty_sync_timer);
+    }
+}
+
+void bg_ram_dirty_sync_init(void)
+{
+    bg_ram_dirty_sync_timer =
+        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
+                     bg_ram_dirty_sync_timer_tick, NULL);
+}
+
 static RAMBlockNotifier ram_mig_ram_notifier = {
     .ram_block_resized = ram_mig_ram_block_resized,
 };
diff --git a/migration/ram.h b/migration/ram.h
index bc0318b834..9c1a2f30f1 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+/* Background ramblock dirty sync */
+void bg_ram_dirty_sync_timer_enable(bool enable);
+
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index c65902f042..3f09e7f383 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
 qemu_file_fclose(void) ""
 
 # ram.c
+bg_ram_dirty_sync(void) ""
 get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
 get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
 migration_bitmap_sync_start(void) ""
diff --git a/system/cpu-timers.c b/system/cpu-timers.c
index 0b31c9a1b6..64f0834be4 100644
--- a/system/cpu-timers.c
+++ b/system/cpu-timers.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "migration/vmstate.h"
+#include "migration/misc.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "sysemu/cpus.h"
@@ -274,4 +275,5 @@ void cpu_timers_init(void)
     vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
 
     cpu_throttle_init();
+    bg_ram_dirty_sync_init();
 }
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d6768d5d71..3296f5244d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState *who)
     migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
 }
 
+static void migrate_ensure_iteration_last_long(QTestState *who)
+{
+    /* Set 10Byte/s bandwidth limit to make the iteration last long enough */
+    migrate_set_parameter_int(who, "max-bandwidth", 10);
+}
+
 /*
  * Our goal is to ensure that we run a single full migration
  * iteration, and also dirty memory, ensuring that at least
@@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
      * so we need to decrease a bandwidth.
      */
     const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
+    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
 
     if (test_migrate_start(&from, &to, uri, &args)) {
         return;
@@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
     } while (true);
     /* The first percentage of throttling should be at least init_pct */
     g_assert_cmpint(percentage, >=, init_pct);
+
+    /* Make sure the iteration last a long time enough */
+    migrate_ensure_iteration_last_long(from);
+
+    /*
+     * End the loop when the dirty sync count greater than 1.
+     */
+    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
+        usleep(1000 * 1000);
+    }
+
+    prev_dirty_sync_cnt = dirty_sync_cnt;
+
+    /*
+     * The dirty sync count must changes in 5 seconds, here we
+     * plus 1 second as error value.
+     */
+    sleep(5 + 1);
+
+    dirty_sync_cnt = get_migration_pass(from);
+    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
+
     /* Now, when we tested that throttling works, let it converge */
     migrate_ensure_converge(from);
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] qapi/migration: Introduce cpu-throttle-responsive parameter
  2024-09-29 17:14 [PATCH v2 0/3] migration: auto-converge refinements for huge VM yong.huang
  2024-09-29 17:14 ` [PATCH v2 1/3] migration: Support background ramblock dirty sync yong.huang
@ 2024-09-29 17:14 ` yong.huang
  2024-09-29 17:14 ` [PATCH v2 3/3] migration: Support responsive CPU throttle yong.huang
  2 siblings, 0 replies; 12+ messages in thread
From: yong.huang @ 2024-09-29 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini, yong.huang

From: Hyman Huang <yong.huang@smartx.com>

To enable the responsive throttle that will be implemented
in the next commit, introduce the cpu-responsive-throttle
parameter.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 migration/migration-hmp-cmds.c |  8 ++++++++
 migration/options.c            | 20 ++++++++++++++++++++
 migration/options.h            |  1 +
 qapi/migration.json            | 16 +++++++++++++++-
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 28165cfc9e..e7c292fa51 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -264,6 +264,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW),
             params->cpu_throttle_tailslow ? "on" : "off");
+        assert(params->has_cpu_throttle_responsive);
+        monitor_printf(mon, "%s: %s\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_RESPONSIVE),
+            params->cpu_throttle_responsive ? "on" : "off");
         assert(params->has_max_cpu_throttle);
         monitor_printf(mon, "%s: %u\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MAX_CPU_THROTTLE),
@@ -512,6 +516,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_cpu_throttle_tailslow = true;
         visit_type_bool(v, param, &p->cpu_throttle_tailslow, &err);
         break;
+    case MIGRATION_PARAMETER_CPU_THROTTLE_RESPONSIVE:
+        p->has_cpu_throttle_responsive = true;
+        visit_type_bool(v, param, &p->cpu_throttle_responsive, &err);
+        break;
     case MIGRATION_PARAMETER_MAX_CPU_THROTTLE:
         p->has_max_cpu_throttle = true;
         visit_type_uint8(v, param, &p->max_cpu_throttle, &err);
diff --git a/migration/options.c b/migration/options.c
index 147cd2b8fd..568d5b1074 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -111,6 +111,8 @@ Property migration_properties[] = {
                       DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT),
     DEFINE_PROP_BOOL("x-cpu-throttle-tailslow", MigrationState,
                       parameters.cpu_throttle_tailslow, false),
+    DEFINE_PROP_BOOL("x-cpu-throttle-responsive", MigrationState,
+                      parameters.cpu_throttle_responsive, false),
     DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
                       parameters.max_bandwidth, MAX_THROTTLE),
     DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
@@ -705,6 +707,13 @@ uint8_t migrate_cpu_throttle_initial(void)
     return s->parameters.cpu_throttle_initial;
 }
 
+bool migrate_cpu_throttle_responsive(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->parameters.cpu_throttle_responsive;
+}
+
 bool migrate_cpu_throttle_tailslow(void)
 {
     MigrationState *s = migrate_get_current();
@@ -891,6 +900,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
     params->has_cpu_throttle_tailslow = true;
     params->cpu_throttle_tailslow = s->parameters.cpu_throttle_tailslow;
+    params->has_cpu_throttle_responsive = true;
+    params->cpu_throttle_responsive = s->parameters.cpu_throttle_responsive;
     params->tls_creds = g_strdup(s->parameters.tls_creds);
     params->tls_hostname = g_strdup(s->parameters.tls_hostname);
     params->tls_authz = g_strdup(s->parameters.tls_authz ?
@@ -959,6 +970,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_cpu_throttle_initial = true;
     params->has_cpu_throttle_increment = true;
     params->has_cpu_throttle_tailslow = true;
+    params->has_cpu_throttle_responsive = true;
     params->has_max_bandwidth = true;
     params->has_downtime_limit = true;
     params->has_x_checkpoint_delay = true;
@@ -1191,6 +1203,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->cpu_throttle_tailslow = params->cpu_throttle_tailslow;
     }
 
+    if (params->has_cpu_throttle_responsive) {
+        dest->cpu_throttle_responsive = params->cpu_throttle_responsive;
+    }
+
     if (params->tls_creds) {
         assert(params->tls_creds->type == QTYPE_QSTRING);
         dest->tls_creds = params->tls_creds->u.s;
@@ -1302,6 +1318,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.cpu_throttle_tailslow = params->cpu_throttle_tailslow;
     }
 
+    if (params->has_cpu_throttle_responsive) {
+        s->parameters.cpu_throttle_responsive = params->cpu_throttle_responsive;
+    }
+
     if (params->tls_creds) {
         g_free(s->parameters.tls_creds);
         assert(params->tls_creds->type == QTYPE_QSTRING);
diff --git a/migration/options.h b/migration/options.h
index a0bd6edc06..28caab83cd 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -68,6 +68,7 @@ bool migrate_has_block_bitmap_mapping(void);
 uint32_t migrate_checkpoint_delay(void);
 uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
+bool migrate_cpu_throttle_responsive(void);
 bool migrate_cpu_throttle_tailslow(void);
 bool migrate_direct_io(void);
 uint64_t migrate_downtime_limit(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index b66cccf107..7322bfdd39 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -728,6 +728,10 @@
 #     be excessive at tail stage.  The default value is false.  (Since
 #     5.1)
 #
+# @cpu-throttle-responsive: Make CPU throttling more responsive by
+#                           introduce an extra detection metric of
+#                           migration convergence.  (Since 9.2)
+#
 # @tls-creds: ID of the 'tls-creds' object that provides credentials
 #     for establishing a TLS connection over the migration data
 #     channel.  On the outgoing side of the migration, the credentials
@@ -853,7 +857,7 @@
            'announce-rounds', 'announce-step',
            'throttle-trigger-threshold',
            'cpu-throttle-initial', 'cpu-throttle-increment',
-           'cpu-throttle-tailslow',
+           'cpu-throttle-tailslow', 'cpu-throttle-responsive',
            'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
            'avail-switchover-bandwidth', 'downtime-limit',
            { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
@@ -909,6 +913,10 @@
 #     be excessive at tail stage.  The default value is false.  (Since
 #     5.1)
 #
+# @cpu-throttle-responsive: Make CPU throttling more responsive by
+#                           introduce an extra detection metric of
+#                           migration convergence.  (Since 9.1)
+#
 # @tls-creds: ID of the 'tls-creds' object that provides credentials
 #     for establishing a TLS connection over the migration data
 #     channel.  On the outgoing side of the migration, the credentials
@@ -1041,6 +1049,7 @@
             '*cpu-throttle-initial': 'uint8',
             '*cpu-throttle-increment': 'uint8',
             '*cpu-throttle-tailslow': 'bool',
+            '*cpu-throttle-responsive': 'bool',
             '*tls-creds': 'StrOrNull',
             '*tls-hostname': 'StrOrNull',
             '*tls-authz': 'StrOrNull',
@@ -1123,6 +1132,10 @@
 #     be excessive at tail stage.  The default value is false.  (Since
 #     5.1)
 #
+# @cpu-throttle-responsive: Make CPU throttling more responsive by
+#                           introduce an extra detection metric of
+#                           migration convergence. (Since 9.1)
+#
 # @tls-creds: ID of the 'tls-creds' object that provides credentials
 #     for establishing a TLS connection over the migration data
 #     channel.  On the outgoing side of the migration, the credentials
@@ -1248,6 +1261,7 @@
             '*cpu-throttle-initial': 'uint8',
             '*cpu-throttle-increment': 'uint8',
             '*cpu-throttle-tailslow': 'bool',
+            '*cpu-throttle-responsive': 'bool',
             '*tls-creds': 'str',
             '*tls-hostname': 'str',
             '*tls-authz': 'str',
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] migration: Support responsive CPU throttle
  2024-09-29 17:14 [PATCH v2 0/3] migration: auto-converge refinements for huge VM yong.huang
  2024-09-29 17:14 ` [PATCH v2 1/3] migration: Support background ramblock dirty sync yong.huang
  2024-09-29 17:14 ` [PATCH v2 2/3] qapi/migration: Introduce cpu-throttle-responsive parameter yong.huang
@ 2024-09-29 17:14 ` yong.huang
  2024-09-30 20:47   ` Peter Xu
  2 siblings, 1 reply; 12+ messages in thread
From: yong.huang @ 2024-09-29 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini, yong.huang

From: Hyman Huang <yong.huang@smartx.com>

Currently, the convergence algorithm determines that the migration
cannot converge according to the following principle:
The dirty pages generated in current iteration exceed a specific
percentage (throttle-trigger-threshold, 50 by default) of the number
of transmissions. Let's refer to this criteria as the "dirty rate".
If this criteria is met more than or equal to twice
(dirty_rate_high_cnt >= 2), the throttle percentage increased.

In most cases, above implementation is appropriate. However, for a
VM with high memory overload, each iteration is time-consuming.
The VM's computing performance may be throttled at a high percentage
and last for a long time due to the repeated confirmation behavior.
Which may be intolerable for some computationally sensitive software
in the VM.

As the comment mentioned in the migration_trigger_throttle function,
in order to avoid erroneous detection, the original algorithm confirms
the criteria repeatedly. Put differently, the criteria does not need
to be validated again once the detection is more reliable.

In the refinement, in order to make the detection more accurate, we
introduce another criteria, called the "dirty ratio" to determine
the migration convergence. The "dirty ratio" is the ratio of
bytes_xfer_period and bytes_dirty_period. When the algorithm
repeatedly detects that the "dirty ratio" of current sync is lower
than the previous, the algorithm determines that the migration cannot
converge. For the "dirty rate" and "dirty ratio", if one of the two
criteria is met, the penalty percentage would be increased. This
makes CPU throttle more responsively and therefor saves the time of
the entire iteration and therefore reduces the time of VM performance
degradation.

In conclusion, this refinement significantly reduces the processing
time required for the throttle percentage step to its maximum while
the VM is under a high memory load.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 migration/ram.c              | 55 ++++++++++++++++++++++++++++++++++--
 migration/trace-events       |  1 +
 tests/qtest/migration-test.c |  1 +
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 995bae1ac9..c36fed5135 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -420,6 +420,12 @@ struct RAMState {
      * RAM migration.
      */
     unsigned int postcopy_bmap_sync_requested;
+
+    /*
+     * Ratio of bytes_dirty_period and bytes_xfer_period in the
+     * previous sync.
+     */
+    uint64_t dirty_ratio_pct;
 };
 typedef struct RAMState RAMState;
 
@@ -1019,6 +1025,43 @@ static void migration_dirty_limit_guest(void)
     trace_migration_dirty_limit_guest(quota_dirtyrate);
 }
 
+static bool migration_dirty_ratio_high(RAMState *rs)
+{
+    static int dirty_ratio_high_cnt;
+    uint64_t threshold = migrate_throttle_trigger_threshold();
+    uint64_t bytes_xfer_period =
+        migration_transferred_bytes() - rs->bytes_xfer_prev;
+    uint64_t bytes_dirty_period = rs->num_dirty_pages_period * TARGET_PAGE_SIZE;
+    bool dirty_ratio_high = false;
+    uint64_t prev, curr;
+
+    /* Calculate the dirty ratio percentage */
+    curr = 100 * (bytes_dirty_period * 1.0 / bytes_xfer_period);
+
+    prev = rs->dirty_ratio_pct;
+    rs->dirty_ratio_pct = curr;
+
+    if (prev == 0) {
+        return false;
+    }
+
+    /*
+     * If current dirty ratio is greater than previouse, determine
+     * that the migration do not converge.
+     */
+    if (curr > threshold && curr >= prev) {
+        trace_migration_dirty_ratio_high(curr, prev);
+        dirty_ratio_high_cnt++;
+    }
+
+    if (dirty_ratio_high_cnt >= 2) {
+        dirty_ratio_high = true;
+        dirty_ratio_high_cnt = 0;
+    }
+
+    return dirty_ratio_high;
+}
+
 static void migration_trigger_throttle(RAMState *rs)
 {
     uint64_t threshold = migrate_throttle_trigger_threshold();
@@ -1026,6 +1069,11 @@ static void migration_trigger_throttle(RAMState *rs)
         migration_transferred_bytes() - rs->bytes_xfer_prev;
     uint64_t bytes_dirty_period = rs->num_dirty_pages_period * TARGET_PAGE_SIZE;
     uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
+    bool dirty_ratio_high = false;
+
+    if (migrate_cpu_throttle_responsive() && (bytes_xfer_period != 0)) {
+        dirty_ratio_high = migration_dirty_ratio_high(rs);
+    }
 
     /*
      * The following detection logic can be refined later. For now:
@@ -1035,8 +1083,11 @@ static void migration_trigger_throttle(RAMState *rs)
      * twice, start or increase throttling.
      */
     if ((bytes_dirty_period > bytes_dirty_threshold) &&
-        (++rs->dirty_rate_high_cnt >= 2)) {
-        rs->dirty_rate_high_cnt = 0;
+        ((++rs->dirty_rate_high_cnt >= 2) || dirty_ratio_high)) {
+
+        rs->dirty_rate_high_cnt =
+            rs->dirty_rate_high_cnt >= 2 ? 0 : rs->dirty_rate_high_cnt;
+
         if (migrate_auto_converge()) {
             trace_migration_throttle();
             mig_throttle_guest_down(bytes_dirty_period,
diff --git a/migration/trace-events b/migration/trace-events
index 3f09e7f383..19a1ff7973 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -96,6 +96,7 @@ get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
+migration_dirty_ratio_high(uint64_t cur, uint64_t prev) "current ratio: %" PRIu64 " previous ratio: %" PRIu64
 migration_throttle(void) ""
 migration_dirty_limit_guest(int64_t dirtyrate) "guest dirty page rate limit %" PRIi64 " MB/s"
 ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: %" PRIx64 " %zx"
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 3296f5244d..acdc1d6358 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2807,6 +2807,7 @@ static void test_migrate_auto_converge(void)
     migrate_set_parameter_int(from, "cpu-throttle-initial", init_pct);
     migrate_set_parameter_int(from, "cpu-throttle-increment", inc_pct);
     migrate_set_parameter_int(from, "max-cpu-throttle", max_pct);
+    migrate_set_parameter_bool(from, "cpu-throttle-responsive", true);
 
     /*
      * Set the initial parameters so that the migration could not converge
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] migration: Support background ramblock dirty sync
  2024-09-29 17:14 ` [PATCH v2 1/3] migration: Support background ramblock dirty sync yong.huang
@ 2024-09-30 20:41   ` Peter Xu
  2024-10-01  2:02     ` Yong Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-09-30 20:41 UTC (permalink / raw)
  To: yong.huang
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.huang@smartx.com wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> When VM is configured with huge memory, the current throttle logic
> doesn't look like to scale, because migration_trigger_throttle()
> is only called for each iteration, so it won't be invoked for a long
> time if one iteration can take a long time.
> 
> The background dirty sync aim to fix the above issue by synchronizing
> the ramblock from remote dirty bitmap and, when necessary, triggering
> the CPU throttle multiple times during a long iteration.
> 
> This is a trade-off between synchronization overhead and CPU throttle
> impact.
> 
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  include/migration/misc.h     |  3 ++
>  migration/migration.c        | 11 +++++++
>  migration/ram.c              | 64 ++++++++++++++++++++++++++++++++++++
>  migration/ram.h              |  3 ++
>  migration/trace-events       |  1 +
>  system/cpu-timers.c          |  2 ++
>  tests/qtest/migration-test.c | 29 ++++++++++++++++
>  7 files changed, 113 insertions(+)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index bfadc5613b..67c00d98f5 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
>  /* migration/block-dirty-bitmap.c */
>  void dirty_bitmap_mig_init(void);
>  
> +/* migration/ram.c */
> +void bg_ram_dirty_sync_init(void);

IMO it's better we don't add this logic to ram.c as I mentioned.  It's
closely relevant to auto converge so I think cpu-throttle.c is more
suitable.

> +
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index 3dea06d577..224b5dfb4f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3285,6 +3285,9 @@ static void migration_iteration_finish(MigrationState *s)
>  {
>      /* If we enabled cpu throttling for auto-converge, turn it off. */
>      cpu_throttle_stop();
> +    if (migrate_auto_converge()) {
> +        bg_ram_dirty_sync_timer_enable(false);
> +    }

Please avoid adding this code.  When it's part of auto-converge,
cpu_throttle_stop() should already guarantee that timer disabled
altogether.

>  
>      bql_lock();
>      switch (s->state) {
> @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
>  
>      trace_migration_thread_setup_complete();
>  
> +    /*
> +     * Tick the background ramblock dirty sync timer after setup
> +     * phase.
> +     */
> +    if (migrate_auto_converge()) {
> +        bg_ram_dirty_sync_timer_enable(true);
> +    }

Might be good to still stick the enablement with auto-converge; the latter
was done in migration_trigger_throttle().  Maybe also enable the timer only
there?

> +
>      while (migration_is_active()) {
>          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
>              MigIterateState iter_state = migration_iteration_run(s);
> diff --git a/migration/ram.c b/migration/ram.c
> index 67ca3d5d51..995bae1ac9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -110,6 +110,12 @@
>   */
>  #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
>  
> +/* Background ramblock dirty sync trigger every five seconds */
> +#define BG_RAM_SYNC_TIMESLICE_MS 5000
> +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000

Why this timer needs to be invoked every 1sec, if it's a 5sec timer?

> +
> +static QEMUTimer *bg_ram_dirty_sync_timer;
> +
>  XBZRLECacheStats xbzrle_counters;
>  
>  /* used by the search for pages to send */
> @@ -4543,6 +4549,64 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
>      }
>  }
>  
> +static void bg_ram_dirty_sync_timer_tick(void *opaque)

Please consider moving this function to cpu-throttle.c.

Also please prefix the functions with cpu_throttle_*(), rather than bg_*().
It should be part of auto converge function.

> +{
> +    static int prev_pct;
> +    static uint64_t prev_sync_cnt = 2;
> +    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> +    int cur_pct = cpu_throttle_get_percentage();
> +
> +    if (prev_pct && !cur_pct) {
> +        /* CPU throttle timer stopped, so do we */
> +        return;
> +    }

Didn't follow why this is not needed if cpu throttle is 0.

It means dirty rate is probably very low, but then shouldn't this
background sync still working (just to notice it grows again)?

> +
> +    /*
> +     * The first iteration copies all memory anyhow and has no
> +     * effect on guest performance, therefore omit it to avoid
> +     * paying extra for the sync penalty.
> +     */
> +    if (sync_cnt <= 1) {
> +        goto end;
> +    }
> +
> +    if (sync_cnt == prev_sync_cnt) {
> +        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +        assert(ram_state);
> +        if ((curr_time - ram_state->time_last_bitmap_sync) >
> +            BG_RAM_SYNC_TIMESLICE_MS) {
> +            trace_bg_ram_dirty_sync();
> +            WITH_RCU_READ_LOCK_GUARD() {
> +                migration_bitmap_sync_precopy(ram_state, false);
> +            }
> +        }
> +    }
> +
> +end:
> +    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> +    prev_pct = cpu_throttle_get_percentage();
> +
> +    timer_mod(bg_ram_dirty_sync_timer,
> +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> +            BG_RAM_SYNC_TIMER_INTERVAL_MS);

IIUC we only need to fire per 5sec, not 1s?

> +}
> +
> +void bg_ram_dirty_sync_timer_enable(bool enable)
> +{
> +    if (enable) {
> +        bg_ram_dirty_sync_timer_tick(NULL);
> +    } else {
> +        timer_del(bg_ram_dirty_sync_timer);
> +    }
> +}
> +
> +void bg_ram_dirty_sync_init(void)
> +{
> +    bg_ram_dirty_sync_timer =
> +        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> +                     bg_ram_dirty_sync_timer_tick, NULL);
> +}

IMHO all these functions should move to cpu-throttle.c.

> +
>  static RAMBlockNotifier ram_mig_ram_notifier = {
>      .ram_block_resized = ram_mig_ram_block_resized,
>  };
> diff --git a/migration/ram.h b/migration/ram.h
> index bc0318b834..9c1a2f30f1 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
>  int ram_write_tracking_start(void);
>  void ram_write_tracking_stop(void);
>  
> +/* Background ramblock dirty sync */
> +void bg_ram_dirty_sync_timer_enable(bool enable);
> +
>  #endif
> diff --git a/migration/trace-events b/migration/trace-events
> index c65902f042..3f09e7f383 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
>  qemu_file_fclose(void) ""
>  
>  # ram.c
> +bg_ram_dirty_sync(void) ""
>  get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>  get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>  migration_bitmap_sync_start(void) ""
> diff --git a/system/cpu-timers.c b/system/cpu-timers.c
> index 0b31c9a1b6..64f0834be4 100644
> --- a/system/cpu-timers.c
> +++ b/system/cpu-timers.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/cutils.h"
>  #include "migration/vmstate.h"
> +#include "migration/misc.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "sysemu/cpus.h"
> @@ -274,4 +275,5 @@ void cpu_timers_init(void)
>      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
>  
>      cpu_throttle_init();
> +    bg_ram_dirty_sync_init();
>  }
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index d6768d5d71..3296f5244d 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState *who)
>      migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
>  }
>  
> +static void migrate_ensure_iteration_last_long(QTestState *who)
> +{
> +    /* Set 10Byte/s bandwidth limit to make the iteration last long enough */
> +    migrate_set_parameter_int(who, "max-bandwidth", 10);
> +}
> +
>  /*
>   * Our goal is to ensure that we run a single full migration
>   * iteration, and also dirty memory, ensuring that at least
> @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
>       * so we need to decrease a bandwidth.
>       */
>      const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> +    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
>  
>      if (test_migrate_start(&from, &to, uri, &args)) {
>          return;
> @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
>      } while (true);
>      /* The first percentage of throttling should be at least init_pct */
>      g_assert_cmpint(percentage, >=, init_pct);
> +
> +    /* Make sure the iteration last a long time enough */
> +    migrate_ensure_iteration_last_long(from);

There's already migrate_ensure_non_converge(), why this is needed?

> +
> +    /*
> +     * End the loop when the dirty sync count greater than 1.
> +     */
> +    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
> +        usleep(1000 * 1000);
> +    }
> +
> +    prev_dirty_sync_cnt = dirty_sync_cnt;
> +
> +    /*
> +     * The dirty sync count must changes in 5 seconds, here we
> +     * plus 1 second as error value.
> +     */
> +    sleep(5 + 1);
> +
> +    dirty_sync_cnt = get_migration_pass(from);
> +    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
> +
>      /* Now, when we tested that throttling works, let it converge */
>      migrate_ensure_converge(from);

Please move the test change into a separate patch.  I had a feeling 5+1 sec
might still easily fail on CIs (even though this test is not yet run).
Maybe it needs to still provide a loop so it waits for a few rounds just in
case.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] migration: Support responsive CPU throttle
  2024-09-29 17:14 ` [PATCH v2 3/3] migration: Support responsive CPU throttle yong.huang
@ 2024-09-30 20:47   ` Peter Xu
  2024-10-01  2:18     ` Yong Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-09-30 20:47 UTC (permalink / raw)
  To: yong.huang
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

On Mon, Sep 30, 2024 at 01:14:28AM +0800, yong.huang@smartx.com wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> Currently, the convergence algorithm determines that the migration
> cannot converge according to the following principle:
> The dirty pages generated in current iteration exceed a specific
> percentage (throttle-trigger-threshold, 50 by default) of the number
> of transmissions. Let's refer to this criteria as the "dirty rate".
> If this criteria is met more than or equal to twice
> (dirty_rate_high_cnt >= 2), the throttle percentage increased.
> 
> In most cases, above implementation is appropriate. However, for a
> VM with high memory overload, each iteration is time-consuming.
> The VM's computing performance may be throttled at a high percentage
> and last for a long time due to the repeated confirmation behavior.
> Which may be intolerable for some computationally sensitive software
> in the VM.
> 
> As the comment mentioned in the migration_trigger_throttle function,
> in order to avoid erroneous detection, the original algorithm confirms
> the criteria repeatedly. Put differently, the criteria does not need
> to be validated again once the detection is more reliable.
> 
> In the refinement, in order to make the detection more accurate, we
> introduce another criteria, called the "dirty ratio" to determine
> the migration convergence. The "dirty ratio" is the ratio of
> bytes_xfer_period and bytes_dirty_period. When the algorithm
> repeatedly detects that the "dirty ratio" of current sync is lower
> than the previous, the algorithm determines that the migration cannot
> converge. For the "dirty rate" and "dirty ratio", if one of the two
> criteria is met, the penalty percentage would be increased. This
> makes CPU throttle more responsively and therefor saves the time of
> the entire iteration and therefore reduces the time of VM performance
> degradation.
> 
> In conclusion, this refinement significantly reduces the processing
> time required for the throttle percentage step to its maximum while
> the VM is under a high memory load.

I'm a bit lost on why this patch 2-3 is still needed if patch 1 works.
Wouldn't that greatly increase the chance of throttle code being inovked
already?  Why we still need this?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] migration: Support background ramblock dirty sync
  2024-09-30 20:41   ` Peter Xu
@ 2024-10-01  2:02     ` Yong Huang
  2024-10-01 15:28       ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Yong Huang @ 2024-10-01  2:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 12545 bytes --]

On Tue, Oct 1, 2024 at 4:41 AM Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.huang@smartx.com wrote:
> > From: Hyman Huang <yong.huang@smartx.com>
> >
> > When VM is configured with huge memory, the current throttle logic
> > doesn't look like to scale, because migration_trigger_throttle()
> > is only called for each iteration, so it won't be invoked for a long
> > time if one iteration can take a long time.
> >
> > The background dirty sync aim to fix the above issue by synchronizing
> > the ramblock from remote dirty bitmap and, when necessary, triggering
> > the CPU throttle multiple times during a long iteration.
> >
> > This is a trade-off between synchronization overhead and CPU throttle
> > impact.
> >
> > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > ---
> >  include/migration/misc.h     |  3 ++
> >  migration/migration.c        | 11 +++++++
> >  migration/ram.c              | 64 ++++++++++++++++++++++++++++++++++++
> >  migration/ram.h              |  3 ++
> >  migration/trace-events       |  1 +
> >  system/cpu-timers.c          |  2 ++
> >  tests/qtest/migration-test.c | 29 ++++++++++++++++
> >  7 files changed, 113 insertions(+)
> >
> > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > index bfadc5613b..67c00d98f5 100644
> > --- a/include/migration/misc.h
> > +++ b/include/migration/misc.h
> > @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
> >  /* migration/block-dirty-bitmap.c */
> >  void dirty_bitmap_mig_init(void);
> >
> > +/* migration/ram.c */
> > +void bg_ram_dirty_sync_init(void);
>
> IMO it's better we don't add this logic to ram.c as I mentioned.  It's
> closely relevant to auto converge so I think cpu-throttle.c is more
> suitable.


> > +
> >  #endif
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 3dea06d577..224b5dfb4f 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3285,6 +3285,9 @@ static void
> migration_iteration_finish(MigrationState *s)
> >  {
> >      /* If we enabled cpu throttling for auto-converge, turn it off. */
> >      cpu_throttle_stop();
> > +    if (migrate_auto_converge()) {
> > +        bg_ram_dirty_sync_timer_enable(false);
> > +    }
>
> Please avoid adding this code.  When it's part of auto-converge,
> cpu_throttle_stop() should already guarantee that timer disabled
> altogether.


> >
> >      bql_lock();
> >      switch (s->state) {
> > @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
> >
> >      trace_migration_thread_setup_complete();
> >
> > +    /*
> > +     * Tick the background ramblock dirty sync timer after setup
> > +     * phase.
> > +     */
> > +    if (migrate_auto_converge()) {
> > +        bg_ram_dirty_sync_timer_enable(true);
> > +    }
>
> Might be good to still stick the enablement with auto-converge; the latter
> was done in migration_trigger_throttle().  Maybe also enable the timer only
> there?
>

Ok.


> > +
> >      while (migration_is_active()) {
> >          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
> >              MigIterateState iter_state = migration_iteration_run(s);
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 67ca3d5d51..995bae1ac9 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -110,6 +110,12 @@
> >   */
> >  #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
> >
> > +/* Background ramblock dirty sync trigger every five seconds */
> > +#define BG_RAM_SYNC_TIMESLICE_MS 5000
> > +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000
>
> Why this timer needs to be invoked every 1sec, if it's a 5sec timer?
>

The logic is stupid :(, I'll fix that.


> > +
> > +static QEMUTimer *bg_ram_dirty_sync_timer;
> > +
> >  XBZRLECacheStats xbzrle_counters;
> >
> >  /* used by the search for pages to send */
> > @@ -4543,6 +4549,64 @@ static void
> ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> >      }
> >  }
> >
> > +static void bg_ram_dirty_sync_timer_tick(void *opaque)
>
> Please consider moving this function to cpu-throttle.c.
>

Yes, I got your idea, but, IMHO, the cpu-throttle.c only implements the CPU
throttle, not the ramblock dirty sync, the migration_bitmap_sync_precopy
could or should not be called in the cpu-throttle.c.Do we want to change
this code level?

Another way is we define the bg_ram_dirty_sync_timer in cpu-throttle.c
and modify it in bg_ram_dirty_sync_timer_tick as a extern variable in ram.c
I prefer the latter, What do you think of it?


>
> Also please prefix the functions with cpu_throttle_*(), rather than bg_*().
> It should be part of auto converge function.
>
> > +{
> > +    static int prev_pct;
> > +    static uint64_t prev_sync_cnt = 2;
> > +    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > +    int cur_pct = cpu_throttle_get_percentage();
> > +
> > +    if (prev_pct && !cur_pct) {
> > +        /* CPU throttle timer stopped, so do we */
> > +        return;
> > +    }
>
> Didn't follow why this is not needed if cpu throttle is 0.
>

The sequence in my head is:

bg dirty sync -> mig_throttle_guest_down -> throttle -> throttle stop-> bg
dirty sync stop

The bg dirty sync may be invoked before throttle, and
the throttle is also 0 at that time, if we code like:

if (!cpu_throttle_get_percentage()) {
    return;
}

The bg dirty sync timer can tick only once and stop.

If we change the sequence as following, we can ignore this case:

mig_throttle_guest_down -> bg dirty sync -> throttle -> throttle stop-> bg
dirty sync stop

But as the code in migration_trigger_throttle indicate:

  if ((bytes_dirty_period > bytes_dirty_threshold) &&
        ++rs->dirty_rate_high_cnt >= 2) {
     ...
  }

Since the 1st iteration can not satisfy the condition rs->dirty_rate_high_cnt
>= 2
as usual, the mig_throttle_guest_down gets invoked in 3nd iteration with
high
probability. If the 2nd iteration is very long, the bg dirty sync can not
be invoked and
we may lose the chance to trigger CPU throttle as I mentioned.


> It means dirty rate is probably very low, but then shouldn't this
> background sync still working (just to notice it grows again)?
>
> > +
> > +    /*
> > +     * The first iteration copies all memory anyhow and has no
> > +     * effect on guest performance, therefore omit it to avoid
> > +     * paying extra for the sync penalty.
> > +     */
> > +    if (sync_cnt <= 1) {
> > +        goto end;
> > +    }
> > +
> > +    if (sync_cnt == prev_sync_cnt) {
> > +        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +        assert(ram_state);
> > +        if ((curr_time - ram_state->time_last_bitmap_sync) >
> > +            BG_RAM_SYNC_TIMESLICE_MS) {
> > +            trace_bg_ram_dirty_sync();
> > +            WITH_RCU_READ_LOCK_GUARD() {
> > +                migration_bitmap_sync_precopy(ram_state, false);
> > +            }
> > +        }
> > +    }
> > +
> > +end:
> > +    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > +    prev_pct = cpu_throttle_get_percentage();
> > +
> > +    timer_mod(bg_ram_dirty_sync_timer,
> > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> > +            BG_RAM_SYNC_TIMER_INTERVAL_MS);
>
> IIUC we only need to fire per 5sec, not 1s?
>

Thanks to point out, I'll refine this logic.


>
> > +}
> > +
> > +void bg_ram_dirty_sync_timer_enable(bool enable)
> > +{
> > +    if (enable) {
> > +        bg_ram_dirty_sync_timer_tick(NULL);
> > +    } else {
> > +        timer_del(bg_ram_dirty_sync_timer);
> > +    }
> > +}
> > +
> > +void bg_ram_dirty_sync_init(void)
> > +{
> > +    bg_ram_dirty_sync_timer =
> > +        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> > +                     bg_ram_dirty_sync_timer_tick, NULL);
> > +}
>
> IMHO all these functions should move to cpu-throttle.c.
>
> > +
> >  static RAMBlockNotifier ram_mig_ram_notifier = {
> >      .ram_block_resized = ram_mig_ram_block_resized,
> >  };
> > diff --git a/migration/ram.h b/migration/ram.h
> > index bc0318b834..9c1a2f30f1 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
> >  int ram_write_tracking_start(void);
> >  void ram_write_tracking_stop(void);
> >
> > +/* Background ramblock dirty sync */
> > +void bg_ram_dirty_sync_timer_enable(bool enable);
> > +
> >  #endif
> > diff --git a/migration/trace-events b/migration/trace-events
> > index c65902f042..3f09e7f383 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char
> *vmsd_name) "%s(%s)"
> >  qemu_file_fclose(void) ""
> >
> >  # ram.c
> > +bg_ram_dirty_sync(void) ""
> >  get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned
> long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> >  get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset,
> unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> >  migration_bitmap_sync_start(void) ""
> > diff --git a/system/cpu-timers.c b/system/cpu-timers.c
> > index 0b31c9a1b6..64f0834be4 100644
> > --- a/system/cpu-timers.c
> > +++ b/system/cpu-timers.c
> > @@ -25,6 +25,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/cutils.h"
> >  #include "migration/vmstate.h"
> > +#include "migration/misc.h"
> >  #include "qapi/error.h"
> >  #include "qemu/error-report.h"
> >  #include "sysemu/cpus.h"
> > @@ -274,4 +275,5 @@ void cpu_timers_init(void)
> >      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> >
> >      cpu_throttle_init();
> > +    bg_ram_dirty_sync_init();
> >  }
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index d6768d5d71..3296f5244d 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState *who)
> >      migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> >  }
> >
> > +static void migrate_ensure_iteration_last_long(QTestState *who)
> > +{
> > +    /* Set 10Byte/s bandwidth limit to make the iteration last long
> enough */
> > +    migrate_set_parameter_int(who, "max-bandwidth", 10);
> > +}
> > +
> >  /*
> >   * Our goal is to ensure that we run a single full migration
> >   * iteration, and also dirty memory, ensuring that at least
> > @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
> >       * so we need to decrease a bandwidth.
> >       */
> >      const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> > +    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
> >
> >      if (test_migrate_start(&from, &to, uri, &args)) {
> >          return;
> > @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
> >      } while (true);
> >      /* The first percentage of throttling should be at least init_pct */
> >      g_assert_cmpint(percentage, >=, init_pct);
> > +
> > +    /* Make sure the iteration last a long time enough */
> > +    migrate_ensure_iteration_last_long(from);
>
> There's already migrate_ensure_non_converge(), why this is needed?
>

I'm feeling that migrate_ensure_non_converge cannot ensure the
iteration lasts greater than 5s, so i and an extra util function.


> > +
> > +    /*
> > +     * End the loop when the dirty sync count greater than 1.
> > +     */
> > +    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
> > +        usleep(1000 * 1000);
> > +    }
> > +
> > +    prev_dirty_sync_cnt = dirty_sync_cnt;
> > +
> > +    /*
> > +     * The dirty sync count must changes in 5 seconds, here we
> > +     * plus 1 second as error value.
> > +     */
> > +    sleep(5 + 1);
> > +
> > +    dirty_sync_cnt = get_migration_pass(from);
> > +    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
> > +
> >      /* Now, when we tested that throttling works, let it converge */
> >      migrate_ensure_converge(from);
>
> Please move the test change into a separate patch.  I had a feeling 5+1 sec
> might still easily fail on CIs (even though this test is not yet run).
> Maybe it needs to still provide a loop so it waits for a few rounds just in
> case.
>

OK.


>
> Thanks,
>
> --
> Peter Xu
>
>
Thanks,
Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 21400 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] migration: Support responsive CPU throttle
  2024-09-30 20:47   ` Peter Xu
@ 2024-10-01  2:18     ` Yong Huang
  2024-10-01 15:37       ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Yong Huang @ 2024-10-01  2:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2806 bytes --]

On Tue, Oct 1, 2024 at 4:47 AM Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 30, 2024 at 01:14:28AM +0800, yong.huang@smartx.com wrote:
> > From: Hyman Huang <yong.huang@smartx.com>
> >
> > Currently, the convergence algorithm determines that the migration
> > cannot converge according to the following principle:
> > The dirty pages generated in current iteration exceed a specific
> > percentage (throttle-trigger-threshold, 50 by default) of the number
> > of transmissions. Let's refer to this criteria as the "dirty rate".
> > If this criteria is met more than or equal to twice
> > (dirty_rate_high_cnt >= 2), the throttle percentage increased.
> >
> > In most cases, above implementation is appropriate. However, for a
> > VM with high memory overload, each iteration is time-consuming.
> > The VM's computing performance may be throttled at a high percentage
> > and last for a long time due to the repeated confirmation behavior.
> > Which may be intolerable for some computationally sensitive software
> > in the VM.
> >
> > As the comment mentioned in the migration_trigger_throttle function,
> > in order to avoid erroneous detection, the original algorithm confirms
> > the criteria repeatedly. Put differently, the criteria does not need
> > to be validated again once the detection is more reliable.
> >
> > In the refinement, in order to make the detection more accurate, we
> > introduce another criteria, called the "dirty ratio" to determine
> > the migration convergence. The "dirty ratio" is the ratio of
> > bytes_xfer_period and bytes_dirty_period. When the algorithm
> > repeatedly detects that the "dirty ratio" of current sync is lower
> > than the previous, the algorithm determines that the migration cannot
> > converge. For the "dirty rate" and "dirty ratio", if one of the two
> > criteria is met, the penalty percentage would be increased. This
> > makes CPU throttle more responsively and therefor saves the time of
> > the entire iteration and therefore reduces the time of VM performance
> > degradation.
> >
> > In conclusion, this refinement significantly reduces the processing
> > time required for the throttle percentage step to its maximum while
> > the VM is under a high memory load.
>
> I'm a bit lost on why this patch 2-3 is still needed if patch 1 works.
> Wouldn't that greatly increase the chance of throttle code being inovked
> already?  Why we still need this?
>

Indeed, if we are considering how to increase the change of throttle.
Patch 1 is sufficient, and I'm not insisting.

If we are talking about how to detect the migration convergence, this
patch, IMHO, is still helpful. Anyway, it depends on your judgment. :)


>
> Thanks,
>
> --
> Peter Xu
>
>
Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 4780 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] migration: Support background ramblock dirty sync
  2024-10-01  2:02     ` Yong Huang
@ 2024-10-01 15:28       ` Peter Xu
  2024-10-08  2:27         ` Yong Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-10-01 15:28 UTC (permalink / raw)
  To: Yong Huang
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

On Tue, Oct 01, 2024 at 10:02:53AM +0800, Yong Huang wrote:
> On Tue, Oct 1, 2024 at 4:41 AM Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.huang@smartx.com wrote:
> > > From: Hyman Huang <yong.huang@smartx.com>
> > >
> > > When VM is configured with huge memory, the current throttle logic
> > > doesn't look like to scale, because migration_trigger_throttle()
> > > is only called for each iteration, so it won't be invoked for a long
> > > time if one iteration can take a long time.
> > >
> > > The background dirty sync aim to fix the above issue by synchronizing
> > > the ramblock from remote dirty bitmap and, when necessary, triggering
> > > the CPU throttle multiple times during a long iteration.
> > >
> > > This is a trade-off between synchronization overhead and CPU throttle
> > > impact.
> > >
> > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > ---
> > >  include/migration/misc.h     |  3 ++
> > >  migration/migration.c        | 11 +++++++
> > >  migration/ram.c              | 64 ++++++++++++++++++++++++++++++++++++
> > >  migration/ram.h              |  3 ++
> > >  migration/trace-events       |  1 +
> > >  system/cpu-timers.c          |  2 ++
> > >  tests/qtest/migration-test.c | 29 ++++++++++++++++
> > >  7 files changed, 113 insertions(+)
> > >
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index bfadc5613b..67c00d98f5 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
> > >  /* migration/block-dirty-bitmap.c */
> > >  void dirty_bitmap_mig_init(void);
> > >
> > > +/* migration/ram.c */
> > > +void bg_ram_dirty_sync_init(void);
> >
> > IMO it's better we don't add this logic to ram.c as I mentioned.  It's
> > closely relevant to auto converge so I think cpu-throttle.c is more
> > suitable.
> 
> 
> > > +
> > >  #endif
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 3dea06d577..224b5dfb4f 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -3285,6 +3285,9 @@ static void
> > migration_iteration_finish(MigrationState *s)
> > >  {
> > >      /* If we enabled cpu throttling for auto-converge, turn it off. */
> > >      cpu_throttle_stop();
> > > +    if (migrate_auto_converge()) {
> > > +        bg_ram_dirty_sync_timer_enable(false);
> > > +    }
> >
> > Please avoid adding this code.  When it's part of auto-converge,
> > cpu_throttle_stop() should already guarantee that timer disabled
> > altogether.
> 
> 
> > >
> > >      bql_lock();
> > >      switch (s->state) {
> > > @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
> > >
> > >      trace_migration_thread_setup_complete();
> > >
> > > +    /*
> > > +     * Tick the background ramblock dirty sync timer after setup
> > > +     * phase.
> > > +     */
> > > +    if (migrate_auto_converge()) {
> > > +        bg_ram_dirty_sync_timer_enable(true);
> > > +    }
> >
> > Might be good to still stick the enablement with auto-converge; the latter
> > was done in migration_trigger_throttle().  Maybe also enable the timer only
> > there?
> >
> 
> Ok.
> 
> 
> > > +
> > >      while (migration_is_active()) {
> > >          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
> > >              MigIterateState iter_state = migration_iteration_run(s);
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 67ca3d5d51..995bae1ac9 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -110,6 +110,12 @@
> > >   */
> > >  #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
> > >
> > > +/* Background ramblock dirty sync trigger every five seconds */
> > > +#define BG_RAM_SYNC_TIMESLICE_MS 5000
> > > +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000
> >
> > Why this timer needs to be invoked every 1sec, if it's a 5sec timer?
> >
> 
> The logic is stupid :(, I'll fix that.
> 
> 
> > > +
> > > +static QEMUTimer *bg_ram_dirty_sync_timer;
> > > +
> > >  XBZRLECacheStats xbzrle_counters;
> > >
> > >  /* used by the search for pages to send */
> > > @@ -4543,6 +4549,64 @@ static void
> > ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> > >      }
> > >  }
> > >
> > > +static void bg_ram_dirty_sync_timer_tick(void *opaque)
> >
> > Please consider moving this function to cpu-throttle.c.
> >
> 
> Yes, I got your idea, but, IMHO, the cpu-throttle.c only implements the CPU
> throttle, not the ramblock dirty sync, the migration_bitmap_sync_precopy
> could or should not be called in the cpu-throttle.c.Do we want to change
> this code level?
> 
> Another way is we define the bg_ram_dirty_sync_timer in cpu-throttle.c
> and modify it in bg_ram_dirty_sync_timer_tick as a extern variable in ram.c
> I prefer the latter, What do you think of it?

I think it's better all logic resides in cpu-throttle.c.

You can have one pre-requisite patch remove "rs" parameter in
migration_bitmap_sync_precopy(), then export it in migration/misc.h.

Maybe you also need to export time_last_bitmap_sync, you can make another
helper for that and also put it in misc.h too.

Said that all, I wonder whether we should move cpu-throttle.c under
migration/, as that's only used in migration.. then we can avoid exporting
in misc.h, but export them in migration.h (which is for internal only).

> 
> 
> >
> > Also please prefix the functions with cpu_throttle_*(), rather than bg_*().
> > It should be part of auto converge function.
> >
> > > +{
> > > +    static int prev_pct;
> > > +    static uint64_t prev_sync_cnt = 2;
> > > +    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > +    int cur_pct = cpu_throttle_get_percentage();
> > > +
> > > +    if (prev_pct && !cur_pct) {
> > > +        /* CPU throttle timer stopped, so do we */
> > > +        return;
> > > +    }
> >
> > Didn't follow why this is not needed if cpu throttle is 0.
> >
> 
> The sequence in my head is:
> 
> bg dirty sync -> mig_throttle_guest_down -> throttle -> throttle stop-> bg
> dirty sync stop
> 
> The bg dirty sync may be invoked before throttle, and
> the throttle is also 0 at that time, if we code like:
> 
> if (!cpu_throttle_get_percentage()) {
>     return;
> }
> 
> The bg dirty sync timer can tick only once and stop.
> 
> If we change the sequence as following, we can ignore this case:
> 
> mig_throttle_guest_down -> bg dirty sync -> throttle -> throttle stop-> bg
> dirty sync stop
> 
> But as the code in migration_trigger_throttle indicate:
> 
>   if ((bytes_dirty_period > bytes_dirty_threshold) &&
>         ++rs->dirty_rate_high_cnt >= 2) {
>      ...
>   }
> 
> Since the 1st iteration can not satisfy the condition rs->dirty_rate_high_cnt
> >= 2
> as usual, the mig_throttle_guest_down gets invoked in 3nd iteration with
> high
> probability. If the 2nd iteration is very long, the bg dirty sync can not
> be invoked and
> we may lose the chance to trigger CPU throttle as I mentioned.

I wonder whether it's working if we simply put:

  if (!migrate_auto_converge()) {
      return;
  }

I think this timer should kick even if !cpu_throttle_active(), becuase
!active doesn't mean the feature is off.  In this case the feature is
enabled as long as migrate_auto_converge()==true.

This addes another reason that maybe we want to move system/cpu-throttle.c
under migration/.. because otherwise we'll need to export
migrate_auto_converge() once more.

> 
> 
> > It means dirty rate is probably very low, but then shouldn't this
> > background sync still working (just to notice it grows again)?
> >
> > > +
> > > +    /*
> > > +     * The first iteration copies all memory anyhow and has no
> > > +     * effect on guest performance, therefore omit it to avoid
> > > +     * paying extra for the sync penalty.
> > > +     */
> > > +    if (sync_cnt <= 1) {
> > > +        goto end;
> > > +    }
> > > +
> > > +    if (sync_cnt == prev_sync_cnt) {
> > > +        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +        assert(ram_state);
> > > +        if ((curr_time - ram_state->time_last_bitmap_sync) >
> > > +            BG_RAM_SYNC_TIMESLICE_MS) {
> > > +            trace_bg_ram_dirty_sync();
> > > +            WITH_RCU_READ_LOCK_GUARD() {
> > > +                migration_bitmap_sync_precopy(ram_state, false);
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +end:
> > > +    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > +    prev_pct = cpu_throttle_get_percentage();
> > > +
> > > +    timer_mod(bg_ram_dirty_sync_timer,
> > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> > > +            BG_RAM_SYNC_TIMER_INTERVAL_MS);
> >
> > IIUC we only need to fire per 5sec, not 1s?
> >
> 
> Thanks to point out, I'll refine this logic.
> 
> 
> >
> > > +}
> > > +
> > > +void bg_ram_dirty_sync_timer_enable(bool enable)
> > > +{
> > > +    if (enable) {
> > > +        bg_ram_dirty_sync_timer_tick(NULL);
> > > +    } else {
> > > +        timer_del(bg_ram_dirty_sync_timer);
> > > +    }
> > > +}
> > > +
> > > +void bg_ram_dirty_sync_init(void)
> > > +{
> > > +    bg_ram_dirty_sync_timer =
> > > +        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> > > +                     bg_ram_dirty_sync_timer_tick, NULL);
> > > +}
> >
> > IMHO all these functions should move to cpu-throttle.c.
> >
> > > +
> > >  static RAMBlockNotifier ram_mig_ram_notifier = {
> > >      .ram_block_resized = ram_mig_ram_block_resized,
> > >  };
> > > diff --git a/migration/ram.h b/migration/ram.h
> > > index bc0318b834..9c1a2f30f1 100644
> > > --- a/migration/ram.h
> > > +++ b/migration/ram.h
> > > @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
> > >  int ram_write_tracking_start(void);
> > >  void ram_write_tracking_stop(void);
> > >
> > > +/* Background ramblock dirty sync */
> > > +void bg_ram_dirty_sync_timer_enable(bool enable);
> > > +
> > >  #endif
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index c65902f042..3f09e7f383 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char
> > *vmsd_name) "%s(%s)"
> > >  qemu_file_fclose(void) ""
> > >
> > >  # ram.c
> > > +bg_ram_dirty_sync(void) ""
> > >  get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned
> > long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > >  get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset,
> > unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > >  migration_bitmap_sync_start(void) ""
> > > diff --git a/system/cpu-timers.c b/system/cpu-timers.c
> > > index 0b31c9a1b6..64f0834be4 100644
> > > --- a/system/cpu-timers.c
> > > +++ b/system/cpu-timers.c
> > > @@ -25,6 +25,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/cutils.h"
> > >  #include "migration/vmstate.h"
> > > +#include "migration/misc.h"
> > >  #include "qapi/error.h"
> > >  #include "qemu/error-report.h"
> > >  #include "sysemu/cpus.h"
> > > @@ -274,4 +275,5 @@ void cpu_timers_init(void)
> > >      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> > >
> > >      cpu_throttle_init();
> > > +    bg_ram_dirty_sync_init();
> > >  }
> > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > > index d6768d5d71..3296f5244d 100644
> > > --- a/tests/qtest/migration-test.c
> > > +++ b/tests/qtest/migration-test.c
> > > @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState *who)
> > >      migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> > >  }
> > >
> > > +static void migrate_ensure_iteration_last_long(QTestState *who)
> > > +{
> > > +    /* Set 10Byte/s bandwidth limit to make the iteration last long
> > enough */
> > > +    migrate_set_parameter_int(who, "max-bandwidth", 10);
> > > +}
> > > +
> > >  /*
> > >   * Our goal is to ensure that we run a single full migration
> > >   * iteration, and also dirty memory, ensuring that at least
> > > @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
> > >       * so we need to decrease a bandwidth.
> > >       */
> > >      const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> > > +    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
> > >
> > >      if (test_migrate_start(&from, &to, uri, &args)) {
> > >          return;
> > > @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
> > >      } while (true);
> > >      /* The first percentage of throttling should be at least init_pct */
> > >      g_assert_cmpint(percentage, >=, init_pct);
> > > +
> > > +    /* Make sure the iteration last a long time enough */
> > > +    migrate_ensure_iteration_last_long(from);
> >
> > There's already migrate_ensure_non_converge(), why this is needed?
> >
> 
> I'm feeling that migrate_ensure_non_converge cannot ensure the
> iteration lasts greater than 5s, so i and an extra util function.

non_converge sets it to 3MB/s, while all qtest mem should be >=100MB on all
archs, it looks ok as of now.  Maybe add a comment would suffice?

It's not extremely bad to even miss one here in unit test, if we target
this feature as pretty much optional on top of auto converge.  If you want
to provide a solid test, you can add a stat counter and check it here with
query-migrate.  But again it'll be better move cpu-throttle.c over first,
or it requires another export in misc.h..

> 
> 
> > > +
> > > +    /*
> > > +     * End the loop when the dirty sync count greater than 1.
> > > +     */
> > > +    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
> > > +        usleep(1000 * 1000);
> > > +    }
> > > +
> > > +    prev_dirty_sync_cnt = dirty_sync_cnt;
> > > +
> > > +    /*
> > > +     * The dirty sync count must changes in 5 seconds, here we
> > > +     * plus 1 second as error value.
> > > +     */
> > > +    sleep(5 + 1);
> > > +
> > > +    dirty_sync_cnt = get_migration_pass(from);
> > > +    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
> > > +
> > >      /* Now, when we tested that throttling works, let it converge */
> > >      migrate_ensure_converge(from);
> >
> > Please move the test change into a separate patch.  I had a feeling 5+1 sec
> > might still easily fail on CIs (even though this test is not yet run).
> > Maybe it needs to still provide a loop so it waits for a few rounds just in
> > case.
> >
> 
> OK.
> 
> 
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> >
> Thanks,
> Yong
> 
> -- 
> Best regards

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] migration: Support responsive CPU throttle
  2024-10-01  2:18     ` Yong Huang
@ 2024-10-01 15:37       ` Peter Xu
  2024-10-08  2:34         ` Yong Huang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-10-01 15:37 UTC (permalink / raw)
  To: Yong Huang
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

On Tue, Oct 01, 2024 at 10:18:54AM +0800, Yong Huang wrote:
> On Tue, Oct 1, 2024 at 4:47 AM Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Sep 30, 2024 at 01:14:28AM +0800, yong.huang@smartx.com wrote:
> > > From: Hyman Huang <yong.huang@smartx.com>
> > >
> > > Currently, the convergence algorithm determines that the migration
> > > cannot converge according to the following principle:
> > > The dirty pages generated in current iteration exceed a specific
> > > percentage (throttle-trigger-threshold, 50 by default) of the number
> > > of transmissions. Let's refer to this criteria as the "dirty rate".
> > > If this criteria is met more than or equal to twice
> > > (dirty_rate_high_cnt >= 2), the throttle percentage increased.
> > >
> > > In most cases, above implementation is appropriate. However, for a
> > > VM with high memory overload, each iteration is time-consuming.
> > > The VM's computing performance may be throttled at a high percentage
> > > and last for a long time due to the repeated confirmation behavior.
> > > Which may be intolerable for some computationally sensitive software
> > > in the VM.
> > >
> > > As the comment mentioned in the migration_trigger_throttle function,
> > > in order to avoid erroneous detection, the original algorithm confirms
> > > the criteria repeatedly. Put differently, the criteria does not need
> > > to be validated again once the detection is more reliable.
> > >
> > > In the refinement, in order to make the detection more accurate, we
> > > introduce another criteria, called the "dirty ratio" to determine
> > > the migration convergence. The "dirty ratio" is the ratio of
> > > bytes_xfer_period and bytes_dirty_period. When the algorithm
> > > repeatedly detects that the "dirty ratio" of current sync is lower
> > > than the previous, the algorithm determines that the migration cannot
> > > converge. For the "dirty rate" and "dirty ratio", if one of the two
> > > criteria is met, the penalty percentage would be increased. This
> > > makes CPU throttle more responsively and therefor saves the time of
> > > the entire iteration and therefore reduces the time of VM performance
> > > degradation.
> > >
> > > In conclusion, this refinement significantly reduces the processing
> > > time required for the throttle percentage step to its maximum while
> > > the VM is under a high memory load.
> >
> > I'm a bit lost on why this patch 2-3 is still needed if patch 1 works.
> > Wouldn't that greatly increase the chance of throttle code being inovked
> > already?  Why we still need this?
> >
> 
> Indeed, if we are considering how to increase the change of throttle.
> Patch 1 is sufficient, and I'm not insisting.
> 
> If we are talking about how to detect the migration convergence, this
> patch, IMHO, is still helpful. Anyway, it depends on your judgment. :)

Thanks.  I really hope we can stick with patch 1 only for now, and we leave
patches like 2-3 for future, or probably never.

I want to avoid more magical tunables, and I want to avoid the code harder
to read.  Unlike most of other migration features, auto converge so far is
already pretty heavy on the "engineering" aspect of things.  More people
care about downtime with 100ms or even less, then it makes zero sense a
throttle feature can stop a group of vCPUs for more than that easily.

I hope we can unite more dev/qe resources on postcopy across QEMU community
for enterprise users.  PoCs are always good stuff for QEMU as it's a
community project and people experiment things on it, but I hope at least
from design level, not small tunables like this one.  We could have
introduced 10 more tunables all over, feed them to AI and train some
numbers that migration can improve 10%, but IMHO that doesn't hugely help.

If you really care about convergence issues, I want to know whether you
agree on postcopy being a better way to go.  There're still plenty of
things we can do better in that area on either postcopy in general, or
downtime optimizations that lots of people are working (e.g. VFIO's), so
again IMHO it'll be good we keep focused there.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] migration: Support background ramblock dirty sync
  2024-10-01 15:28       ` Peter Xu
@ 2024-10-08  2:27         ` Yong Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Yong Huang @ 2024-10-08  2:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 15975 bytes --]

On Tue, Oct 1, 2024 at 11:28 PM Peter Xu <peterx@redhat.com> wrote:

> On Tue, Oct 01, 2024 at 10:02:53AM +0800, Yong Huang wrote:
> > On Tue, Oct 1, 2024 at 4:41 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.huang@smartx.com wrote:
> > > > From: Hyman Huang <yong.huang@smartx.com>
> > > >
> > > > When VM is configured with huge memory, the current throttle logic
> > > > doesn't look like to scale, because migration_trigger_throttle()
> > > > is only called for each iteration, so it won't be invoked for a long
> > > > time if one iteration can take a long time.
> > > >
> > > > The background dirty sync aim to fix the above issue by synchronizing
> > > > the ramblock from remote dirty bitmap and, when necessary, triggering
> > > > the CPU throttle multiple times during a long iteration.
> > > >
> > > > This is a trade-off between synchronization overhead and CPU throttle
> > > > impact.
> > > >
> > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > > > ---
> > > >  include/migration/misc.h     |  3 ++
> > > >  migration/migration.c        | 11 +++++++
> > > >  migration/ram.c              | 64
> ++++++++++++++++++++++++++++++++++++
> > > >  migration/ram.h              |  3 ++
> > > >  migration/trace-events       |  1 +
> > > >  system/cpu-timers.c          |  2 ++
> > > >  tests/qtest/migration-test.c | 29 ++++++++++++++++
> > > >  7 files changed, 113 insertions(+)
> > > >
> > > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > > index bfadc5613b..67c00d98f5 100644
> > > > --- a/include/migration/misc.h
> > > > +++ b/include/migration/misc.h
> > > > @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void);
> > > >  /* migration/block-dirty-bitmap.c */
> > > >  void dirty_bitmap_mig_init(void);
> > > >
> > > > +/* migration/ram.c */
> > > > +void bg_ram_dirty_sync_init(void);
> > >
> > > IMO it's better we don't add this logic to ram.c as I mentioned.  It's
> > > closely relevant to auto converge so I think cpu-throttle.c is more
> > > suitable.
> >
> >
> > > > +
> > > >  #endif
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 3dea06d577..224b5dfb4f 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -3285,6 +3285,9 @@ static void
> > > migration_iteration_finish(MigrationState *s)
> > > >  {
> > > >      /* If we enabled cpu throttling for auto-converge, turn it off.
> */
> > > >      cpu_throttle_stop();
> > > > +    if (migrate_auto_converge()) {
> > > > +        bg_ram_dirty_sync_timer_enable(false);
> > > > +    }
> > >
> > > Please avoid adding this code.  When it's part of auto-converge,
> > > cpu_throttle_stop() should already guarantee that timer disabled
> > > altogether.
> >
> >
> > > >
> > > >      bql_lock();
> > > >      switch (s->state) {
> > > > @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque)
> > > >
> > > >      trace_migration_thread_setup_complete();
> > > >
> > > > +    /*
> > > > +     * Tick the background ramblock dirty sync timer after setup
> > > > +     * phase.
> > > > +     */
> > > > +    if (migrate_auto_converge()) {
> > > > +        bg_ram_dirty_sync_timer_enable(true);
> > > > +    }
> > >
> > > Might be good to still stick the enablement with auto-converge; the
> latter
> > > was done in migration_trigger_throttle().  Maybe also enable the timer
> only
> > > there?
> > >
> >
> > Ok.
> >
> >
> > > > +
> > > >      while (migration_is_active()) {
> > > >          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
> > > >              MigIterateState iter_state = migration_iteration_run(s);
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 67ca3d5d51..995bae1ac9 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -110,6 +110,12 @@
> > > >   */
> > > >  #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000
> > > >
> > > > +/* Background ramblock dirty sync trigger every five seconds */
> > > > +#define BG_RAM_SYNC_TIMESLICE_MS 5000
> > > > +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000
> > >
> > > Why this timer needs to be invoked every 1sec, if it's a 5sec timer?
> > >
> >
> > The logic is stupid :(, I'll fix that.
> >
> >
> > > > +
> > > > +static QEMUTimer *bg_ram_dirty_sync_timer;
> > > > +
> > > >  XBZRLECacheStats xbzrle_counters;
> > > >
> > > >  /* used by the search for pages to send */
> > > > @@ -4543,6 +4549,64 @@ static void
> > > ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> > > >      }
> > > >  }
> > > >
> > > > +static void bg_ram_dirty_sync_timer_tick(void *opaque)
> > >
> > > Please consider moving this function to cpu-throttle.c.
> > >
> >
> > Yes, I got your idea, but, IMHO, the cpu-throttle.c only implements the
> CPU
> > throttle, not the ramblock dirty sync, the migration_bitmap_sync_precopy
> > could or should not be called in the cpu-throttle.c.Do we want to change
> > this code level?
> >
> > Another way is we define the bg_ram_dirty_sync_timer in cpu-throttle.c
> > and modify it in bg_ram_dirty_sync_timer_tick as a extern variable in
> ram.c
> > I prefer the latter, What do you think of it?
>
> I think it's better all logic resides in cpu-throttle.c.
>
> You can have one pre-requisite patch remove "rs" parameter in
> migration_bitmap_sync_precopy(), then export it in migration/misc.h.
>
> Maybe you also need to export time_last_bitmap_sync, you can make another
> helper for that and also put it in misc.h too.
>
> Said that all, I wonder whether we should move cpu-throttle.c under
> migration/, as that's only used in migration.. then we can avoid exporting
> in misc.h, but export them in migration.h (which is for internal only).


> >
> >
> > >
> > > Also please prefix the functions with cpu_throttle_*(), rather than
> bg_*().
> > > It should be part of auto converge function.
> > >
> > > > +{
> > > > +    static int prev_pct;
> > > > +    static uint64_t prev_sync_cnt = 2;
> > > > +    uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > > +    int cur_pct = cpu_throttle_get_percentage();
> > > > +
> > > > +    if (prev_pct && !cur_pct) {
> > > > +        /* CPU throttle timer stopped, so do we */
> > > > +        return;
> > > > +    }
> > >
> > > Didn't follow why this is not needed if cpu throttle is 0.
> > >
> >
> > The sequence in my head is:
> >
> > bg dirty sync -> mig_throttle_guest_down -> throttle -> throttle stop->
> bg
> > dirty sync stop
> >
> > The bg dirty sync may be invoked before throttle, and
> > the throttle is also 0 at that time, if we code like:
> >
> > if (!cpu_throttle_get_percentage()) {
> >     return;
> > }
> >
> > The bg dirty sync timer can tick only once and stop.
> >
> > If we change the sequence as following, we can ignore this case:
> >
> > mig_throttle_guest_down -> bg dirty sync -> throttle -> throttle stop->
> bg
> > dirty sync stop
> >
> > But as the code in migration_trigger_throttle indicate:
> >
> >   if ((bytes_dirty_period > bytes_dirty_threshold) &&
> >         ++rs->dirty_rate_high_cnt >= 2) {
> >      ...
> >   }
> >
> > Since the 1st iteration can not satisfy the condition
> rs->dirty_rate_high_cnt
> > >= 2
> > as usual, the mig_throttle_guest_down gets invoked in 3nd iteration with
> > high
> > probability. If the 2nd iteration is very long, the bg dirty sync can not
> > be invoked and
> > we may lose the chance to trigger CPU throttle as I mentioned.
>
> I wonder whether it's working if we simply put:
>
>   if (!migrate_auto_converge()) {
>       return;
>   }
>
> I think this timer should kick even if !cpu_throttle_active(), becuase
> !active doesn't mean the feature is off.  In this case the feature is
> enabled as long as migrate_auto_converge()==true.
>
> This addes another reason that maybe we want to move system/cpu-throttle.c
> under migration/.. because otherwise we'll need to export
> migrate_auto_converge() once more.
>
> >
> >
> > > It means dirty rate is probably very low, but then shouldn't this
> > > background sync still working (just to notice it grows again)?
> > >
> > > > +
> > > > +    /*
> > > > +     * The first iteration copies all memory anyhow and has no
> > > > +     * effect on guest performance, therefore omit it to avoid
> > > > +     * paying extra for the sync penalty.
> > > > +     */
> > > > +    if (sync_cnt <= 1) {
> > > > +        goto end;
> > > > +    }
> > > > +
> > > > +    if (sync_cnt == prev_sync_cnt) {
> > > > +        int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > +        assert(ram_state);
> > > > +        if ((curr_time - ram_state->time_last_bitmap_sync) >
> > > > +            BG_RAM_SYNC_TIMESLICE_MS) {
> > > > +            trace_bg_ram_dirty_sync();
> > > > +            WITH_RCU_READ_LOCK_GUARD() {
> > > > +                migration_bitmap_sync_precopy(ram_state, false);
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +
> > > > +end:
> > > > +    prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count);
> > > > +    prev_pct = cpu_throttle_get_percentage();
> > > > +
> > > > +    timer_mod(bg_ram_dirty_sync_timer,
> > > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> > > > +            BG_RAM_SYNC_TIMER_INTERVAL_MS);
> > >
> > > IIUC we only need to fire per 5sec, not 1s?
> > >
> >
> > Thanks to point out, I'll refine this logic.
> >
> >
> > >
> > > > +}
> > > > +
> > > > +void bg_ram_dirty_sync_timer_enable(bool enable)
> > > > +{
> > > > +    if (enable) {
> > > > +        bg_ram_dirty_sync_timer_tick(NULL);
> > > > +    } else {
> > > > +        timer_del(bg_ram_dirty_sync_timer);
> > > > +    }
> > > > +}
> > > > +
> > > > +void bg_ram_dirty_sync_init(void)
> > > > +{
> > > > +    bg_ram_dirty_sync_timer =
> > > > +        timer_new_ms(QEMU_CLOCK_VIRTUAL_RT,
> > > > +                     bg_ram_dirty_sync_timer_tick, NULL);
> > > > +}
> > >
> > > IMHO all these functions should move to cpu-throttle.c.
> > >
> > > > +
> > > >  static RAMBlockNotifier ram_mig_ram_notifier = {
> > > >      .ram_block_resized = ram_mig_ram_block_resized,
> > > >  };
> > > > diff --git a/migration/ram.h b/migration/ram.h
> > > > index bc0318b834..9c1a2f30f1 100644
> > > > --- a/migration/ram.h
> > > > +++ b/migration/ram.h
> > > > @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void);
> > > >  int ram_write_tracking_start(void);
> > > >  void ram_write_tracking_stop(void);
> > > >
> > > > +/* Background ramblock dirty sync */
> > > > +void bg_ram_dirty_sync_timer_enable(bool enable);
> > > > +
> > > >  #endif
> > > > diff --git a/migration/trace-events b/migration/trace-events
> > > > index c65902f042..3f09e7f383 100644
> > > > --- a/migration/trace-events
> > > > +++ b/migration/trace-events
> > > > @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char
> > > *vmsd_name) "%s(%s)"
> > > >  qemu_file_fclose(void) ""
> > > >
> > > >  # ram.c
> > > > +bg_ram_dirty_sync(void) ""
> > > >  get_queued_page(const char *block_name, uint64_t tmp_offset,
> unsigned
> > > long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > > >  get_queued_page_not_dirty(const char *block_name, uint64_t
> tmp_offset,
> > > unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> > > >  migration_bitmap_sync_start(void) ""
> > > > diff --git a/system/cpu-timers.c b/system/cpu-timers.c
> > > > index 0b31c9a1b6..64f0834be4 100644
> > > > --- a/system/cpu-timers.c
> > > > +++ b/system/cpu-timers.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/cutils.h"
> > > >  #include "migration/vmstate.h"
> > > > +#include "migration/misc.h"
> > > >  #include "qapi/error.h"
> > > >  #include "qemu/error-report.h"
> > > >  #include "sysemu/cpus.h"
> > > > @@ -274,4 +275,5 @@ void cpu_timers_init(void)
> > > >      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
> > > >
> > > >      cpu_throttle_init();
> > > > +    bg_ram_dirty_sync_init();
> > > >  }
> > > > diff --git a/tests/qtest/migration-test.c
> b/tests/qtest/migration-test.c
> > > > index d6768d5d71..3296f5244d 100644
> > > > --- a/tests/qtest/migration-test.c
> > > > +++ b/tests/qtest/migration-test.c
> > > > @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState
> *who)
> > > >      migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
> > > >  }
> > > >
> > > > +static void migrate_ensure_iteration_last_long(QTestState *who)
> > > > +{
> > > > +    /* Set 10Byte/s bandwidth limit to make the iteration last long
> > > enough */
> > > > +    migrate_set_parameter_int(who, "max-bandwidth", 10);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Our goal is to ensure that we run a single full migration
> > > >   * iteration, and also dirty memory, ensuring that at least
> > > > @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void)
> > > >       * so we need to decrease a bandwidth.
> > > >       */
> > > >      const int64_t init_pct = 5, inc_pct = 25, max_pct = 95;
> > > > +    uint64_t prev_dirty_sync_cnt, dirty_sync_cnt;
> > > >
> > > >      if (test_migrate_start(&from, &to, uri, &args)) {
> > > >          return;
> > > > @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void)
> > > >      } while (true);
> > > >      /* The first percentage of throttling should be at least
> init_pct */
> > > >      g_assert_cmpint(percentage, >=, init_pct);
> > > > +
> > > > +    /* Make sure the iteration last a long time enough */
> > > > +    migrate_ensure_iteration_last_long(from);
> > >
> > > There's already migrate_ensure_non_converge(), why this is needed?
> > >
> >
> > I'm feeling that migrate_ensure_non_converge cannot ensure the
> > iteration lasts greater than 5s, so i and an extra util function.
>
> non_converge sets it to 3MB/s, while all qtest mem should be >=100MB on all
> archs, it looks ok as of now.  Maybe add a comment would suffice?
>
> It's not extremely bad to even miss one here in unit test, if we target
> this feature as pretty much optional on top of auto converge.  If you want
> to provide a solid test, you can add a stat counter and check it here with
> query-migrate.  But again it'll be better move cpu-throttle.c over first,
> or it requires another export in misc.h..
>
> >
> >
> > > > +
> > > > +    /*
> > > > +     * End the loop when the dirty sync count greater than 1.
> > > > +     */
> > > > +    while ((dirty_sync_cnt = get_migration_pass(from)) < 2) {
> > > > +        usleep(1000 * 1000);
> > > > +    }
> > > > +
> > > > +    prev_dirty_sync_cnt = dirty_sync_cnt;
> > > > +
> > > > +    /*
> > > > +     * The dirty sync count must changes in 5 seconds, here we
> > > > +     * plus 1 second as error value.
> > > > +     */
> > > > +    sleep(5 + 1);
> > > > +
> > > > +    dirty_sync_cnt = get_migration_pass(from);
> > > > +    g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt);
> > > > +
> > > >      /* Now, when we tested that throttling works, let it converge */
> > > >      migrate_ensure_converge(from);
> > >
> > > Please move the test change into a separate patch.  I had a feeling
> 5+1 sec
> > > might still easily fail on CIs (even though this test is not yet run).
> > > Maybe it needs to still provide a loop so it waits for a few rounds
> just in
> > > case.
> > >
> >
> > OK.
> >
> >
> > >
> > > Thanks,
> > >
> > > --
> > > Peter Xu
> > >
> > >
> > Thanks,
> > Yong
> >
> > --
> > Best regards
>
>
Sorry for the late reply,  The whole suggestions above are OK for me and
I'll try that in the next version.


> --
> Peter Xu
>
>
Thanks, Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 21956 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] migration: Support responsive CPU throttle
  2024-10-01 15:37       ` Peter Xu
@ 2024-10-08  2:34         ` Yong Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Yong Huang @ 2024-10-08  2:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Eric Blake, Markus Armbruster,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 4651 bytes --]

On Tue, Oct 1, 2024 at 11:37 PM Peter Xu <peterx@redhat.com> wrote:

> On Tue, Oct 01, 2024 at 10:18:54AM +0800, Yong Huang wrote:
> > On Tue, Oct 1, 2024 at 4:47 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > > On Mon, Sep 30, 2024 at 01:14:28AM +0800, yong.huang@smartx.com wrote:
> > > > From: Hyman Huang <yong.huang@smartx.com>
> > > >
> > > > Currently, the convergence algorithm determines that the migration
> > > > cannot converge according to the following principle:
> > > > The dirty pages generated in current iteration exceed a specific
> > > > percentage (throttle-trigger-threshold, 50 by default) of the number
> > > > of transmissions. Let's refer to this criteria as the "dirty rate".
> > > > If this criteria is met more than or equal to twice
> > > > (dirty_rate_high_cnt >= 2), the throttle percentage increased.
> > > >
> > > > In most cases, above implementation is appropriate. However, for a
> > > > VM with high memory overload, each iteration is time-consuming.
> > > > The VM's computing performance may be throttled at a high percentage
> > > > and last for a long time due to the repeated confirmation behavior.
> > > > Which may be intolerable for some computationally sensitive software
> > > > in the VM.
> > > >
> > > > As the comment mentioned in the migration_trigger_throttle function,
> > > > in order to avoid erroneous detection, the original algorithm
> confirms
> > > > the criteria repeatedly. Put differently, the criteria does not need
> > > > to be validated again once the detection is more reliable.
> > > >
> > > > In the refinement, in order to make the detection more accurate, we
> > > > introduce another criteria, called the "dirty ratio" to determine
> > > > the migration convergence. The "dirty ratio" is the ratio of
> > > > bytes_xfer_period and bytes_dirty_period. When the algorithm
> > > > repeatedly detects that the "dirty ratio" of current sync is lower
> > > > than the previous, the algorithm determines that the migration cannot
> > > > converge. For the "dirty rate" and "dirty ratio", if one of the two
> > > > criteria is met, the penalty percentage would be increased. This
> > > > makes CPU throttle more responsively and therefor saves the time of
> > > > the entire iteration and therefore reduces the time of VM performance
> > > > degradation.
> > > >
> > > > In conclusion, this refinement significantly reduces the processing
> > > > time required for the throttle percentage step to its maximum while
> > > > the VM is under a high memory load.
> > >
> > > I'm a bit lost on why this patch 2-3 is still needed if patch 1 works.
> > > Wouldn't that greatly increase the chance of throttle code being
> inovked
> > > already?  Why we still need this?
> > >
> >
> > Indeed, if we are considering how to increase the change of throttle.
> > Patch 1 is sufficient, and I'm not insisting.
> >
> > If we are talking about how to detect the migration convergence, this
> > patch, IMHO, is still helpful. Anyway, it depends on your judgment. :)
>
> Thanks.  I really hope we can stick with patch 1 only for now, and we leave
> patches like 2-3 for future, or probably never.
>
> I want to avoid more magical tunables, and I want to avoid the code harder
> to read.  Unlike most of other migration features, auto converge so far is
> already pretty heavy on the "engineering" aspect of things.  More people
> care about downtime with 100ms or even less, then it makes zero sense a
> throttle feature can stop a group of vCPUs for more than that easily.
>
> I hope we can unite more dev/qe resources on postcopy across QEMU community
> for enterprise users.  PoCs are always good stuff for QEMU as it's a
> community project and people experiment things on it, but I hope at least
> from design level, not small tunables like this one.  We could have
> introduced 10 more tunables all over, feed them to AI and train some
> numbers that migration can improve 10%, but IMHO that doesn't hugely help.
>
> If you really care about convergence issues, I want to know whether you
> agree on postcopy being a better way to go.  There're still plenty of
>

Agree, postcopy ought to deserve more attention as respect to refining the
huge
VM migration.


> things we can do better in that area on either postcopy in general, or
> downtime optimizations that lots of people are working (e.g. VFIO's), so
> again IMHO it'll be good we keep focused there.
>
> Thanks,
>
> --
> Peter Xu
>
>
Thanks for sharing your idea, I'll drop these 2 patches in the next version.

Yong

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 6782 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-10-08  2:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-29 17:14 [PATCH v2 0/3] migration: auto-converge refinements for huge VM yong.huang
2024-09-29 17:14 ` [PATCH v2 1/3] migration: Support background ramblock dirty sync yong.huang
2024-09-30 20:41   ` Peter Xu
2024-10-01  2:02     ` Yong Huang
2024-10-01 15:28       ` Peter Xu
2024-10-08  2:27         ` Yong Huang
2024-09-29 17:14 ` [PATCH v2 2/3] qapi/migration: Introduce cpu-throttle-responsive parameter yong.huang
2024-09-29 17:14 ` [PATCH v2 3/3] migration: Support responsive CPU throttle yong.huang
2024-09-30 20:47   ` Peter Xu
2024-10-01  2:18     ` Yong Huang
2024-10-01 15:37       ` Peter Xu
2024-10-08  2:34         ` Yong Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).