qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] migration: Downtime tracepoints
@ 2023-10-30 16:33 Peter Xu
  2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

v2:
- Added two more patches (patch 4&5) to add the checkpoints too, it means
  it merges Joao's series into tracepoints, and extend that to dest QEMU.
  - Patch 5: Prefixed checkpoints with "src-" and "dst-"

This small series wants to improve ability of QEMU downtime analysis
similarly to what Joao used to propose here:

  https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com

But with a few enhancements:

  - Nothing exported yet to qapi, all tracepoints so far

  - Besides major checkpoints, finer granule by providing downtime
    measurements for each vmstate (I made microsecond to be the unit to be
    accurate) alongside.

  - Trace dest QEMU too for either the checkpoints or vmsd load()s

For the last bullet: consider the case where a device save() can be super
fast, while load() can actually be super slow.  Both of them will
contribute to the ultimate downtime, but not a simple addition: when src
QEMU is save()ing on device1, dst QEMU can logically be load()ing on
device2.  So they can run in parallel.  However the only way to figure all
components of the downtime is to record both.

Please have a look, thanks.

Peter Xu (5):
  migration: Set downtime_start even for postcopy
  migration: Add migration_downtime_start|end() helpers
  migration: Add per vmstate downtime tracepoints
  migration: migration_stop_vm() helper
  migration: Add tracepoints for downtime checkpoints

 migration/migration.h  |  2 ++
 migration/migration.c  | 63 +++++++++++++++++++++++++++++++-----------
 migration/savevm.c     | 63 ++++++++++++++++++++++++++++++++++++------
 migration/trace-events |  4 ++-
 4 files changed, 106 insertions(+), 26 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] migration: Set downtime_start even for postcopy
  2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
@ 2023-10-30 16:33 ` Peter Xu
  2023-10-31 11:38   ` Juan Quintela
  2023-10-30 16:33 ` [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers Peter Xu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

Postcopy calculates its downtime separately.  It always sets
MigrationState.downtime properly, but not MigrationState.downtime_start.

Make postcopy do the same as other modes on properly recording the
timestamp when the VM is going to be stopped.  Drop the temporary variable
in postcopy_start() along the way.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 67547eb6a1..f8a54ff4d1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2121,7 +2121,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     int ret;
     QIOChannelBuffer *bioc;
     QEMUFile *fb;
-    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     uint64_t bandwidth = migrate_max_postcopy_bandwidth();
     bool restart_block = false;
     int cur_state = MIGRATION_STATUS_ACTIVE;
@@ -2143,6 +2142,8 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     qemu_mutex_lock_iothread();
     trace_postcopy_start_set_run();
 
+    ms->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     global_state_store();
     ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
@@ -2245,7 +2246,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     ms->postcopy_after_devices = true;
     migration_call_notifiers(ms);
 
-    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
+    ms->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - ms->downtime_start;
 
     qemu_mutex_unlock_iothread();
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers
  2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
  2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
@ 2023-10-30 16:33 ` Peter Xu
  2023-10-31 11:39   ` Juan Quintela
  2023-10-30 16:33 ` [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints Peter Xu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

Unify the three users on recording downtimes with the same pair of helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f8a54ff4d1..70d775942a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -101,6 +101,24 @@ static int migration_maybe_pause(MigrationState *s,
 static void migrate_fd_cancel(MigrationState *s);
 static int close_return_path_on_source(MigrationState *s);
 
+static void migration_downtime_start(MigrationState *s)
+{
+    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+}
+
+static void migration_downtime_end(MigrationState *s)
+{
+    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    /*
+     * If downtime already set, should mean that postcopy already set it,
+     * then that should be the real downtime already.
+     */
+    if (!s->downtime) {
+        s->downtime = now - s->downtime_start;
+    }
+}
+
 static bool migration_needs_multiple_sockets(void)
 {
     return migrate_multifd() || migrate_postcopy_preempt();
@@ -2142,7 +2160,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     qemu_mutex_lock_iothread();
     trace_postcopy_start_set_run();
 
-    ms->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migration_downtime_start(ms);
 
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     global_state_store();
@@ -2246,7 +2264,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     ms->postcopy_after_devices = true;
     migration_call_notifiers(ms);
 
-    ms->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - ms->downtime_start;
+    migration_downtime_end(ms);
 
     qemu_mutex_unlock_iothread();
 
@@ -2342,7 +2360,7 @@ static int migration_completion_precopy(MigrationState *s,
     int ret;
 
     qemu_mutex_lock_iothread();
-    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migration_downtime_start(s);
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
     s->vm_old_state = runstate_get();
@@ -2699,15 +2717,8 @@ static void migration_calculate_complete(MigrationState *s)
     int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t transfer_time;
 
+    migration_downtime_end(s);
     s->total_time = end_time - s->start_time;
-    if (!s->downtime) {
-        /*
-         * It's still not set, so we are precopy migration.  For
-         * postcopy, downtime is calculated during postcopy_start().
-         */
-        s->downtime = end_time - s->downtime_start;
-    }
-
     transfer_time = s->total_time - s->setup_time;
     if (transfer_time) {
         s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
@@ -3126,7 +3137,7 @@ static void bg_migration_vm_start_bh(void *opaque)
     s->vm_start_bh = NULL;
 
     vm_start();
-    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - s->downtime_start;
+    migration_downtime_end(s);
 }
 
 /**
@@ -3193,7 +3204,7 @@ static void *bg_migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 
     trace_migration_thread_setup_complete();
-    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migration_downtime_start(s);
 
     qemu_mutex_lock_iothread();
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints
  2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
  2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
  2023-10-30 16:33 ` [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers Peter Xu
@ 2023-10-30 16:33 ` Peter Xu
  2023-10-31 12:52   ` Juan Quintela
  2023-10-30 16:33 ` [PATCH v2 4/5] migration: migration_stop_vm() helper Peter Xu
  2023-10-30 16:33 ` [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints Peter Xu
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

We have a bunch of savevm_section* tracepoints, they're good to analyze
migration stream, but not always suitable if someone would like to analyze
the migration downtime.  Two major problems:

  - savevm_section* tracepoints are dumping all sections, we only care
    about the sections that contribute to the downtime

  - They don't have an identifier to show the type of sections, so no way
    to filter downtime information either easily.

We can add type into the tracepoints, but instead of doing so, this patch
kept them untouched, instead of adding a bunch of downtime specific
tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
full picture of how the downtime is distributed across iterative and
non-iterative vmstate save/load.

Note that here both save() and load() need to be traced, because both of
them may contribute to the downtime.  The contribution is not a simple "add
them together", though: consider when the src is doing a save() of device1
while the dest can be load()ing for device2, so they can happen
concurrently.

Tracking both sides make sense because device load() and save() can be
imbalanced, one device can save() super fast, but load() super slow, vice
versa.  We can't figure that out without tracing both.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 49 ++++++++++++++++++++++++++++++++++++++----
 migration/trace-events |  2 ++
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 8622f229e5..cd6d6ba493 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1459,6 +1459,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 static
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
+    int64_t start_ts_each, end_ts_each;
     SaveStateEntry *se;
     int ret;
 
@@ -1475,6 +1476,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
                 continue;
             }
         }
+
+        start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
         trace_savevm_section_start(se->idstr, se->section_id);
 
         save_section_header(f, se, QEMU_VM_SECTION_END);
@@ -1486,6 +1489,9 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
             qemu_file_set_error(f, ret);
             return -1;
         }
+        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
+                                    end_ts_each - start_ts_each);
     }
 
     return 0;
@@ -1496,6 +1502,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool inactivate_disks)
 {
     MigrationState *ms = migrate_get_current();
+    int64_t start_ts_each, end_ts_each;
     JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
@@ -1507,11 +1514,17 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
             continue;
         }
 
+        start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
             return ret;
         }
+
+        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_vmstate_downtime_save("non-iterable", se->idstr, se->instance_id,
+                                    end_ts_each - start_ts_each);
     }
 
     if (inactivate_disks) {
@@ -2506,9 +2519,12 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 }
 
 static int
-qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
+qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis,
+                               uint8_t type)
 {
+    bool trace_downtime = (type == QEMU_VM_SECTION_FULL);
     uint32_t instance_id, version_id, section_id;
+    int64_t start_ts, end_ts;
     SaveStateEntry *se;
     char idstr[256];
     int ret;
@@ -2557,12 +2573,23 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
         return -EINVAL;
     }
 
+    if (trace_downtime) {
+        start_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+    }
+
     ret = vmstate_load(f, se);
     if (ret < 0) {
         error_report("error while loading state for instance 0x%"PRIx32" of"
                      " device '%s'", instance_id, idstr);
         return ret;
     }
+
+    if (trace_downtime) {
+        end_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_vmstate_downtime_load("non-iterable", se->idstr,
+                                    se->instance_id, end_ts - start_ts);
+    }
+
     if (!check_section_footer(f, se)) {
         return -EINVAL;
     }
@@ -2571,8 +2598,11 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 }
 
 static int
-qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
+qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis,
+                             uint8_t type)
 {
+    bool trace_downtime = (type == QEMU_VM_SECTION_END);
+    int64_t start_ts, end_ts;
     uint32_t section_id;
     SaveStateEntry *se;
     int ret;
@@ -2597,12 +2627,23 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
         return -EINVAL;
     }
 
+    if (trace_downtime) {
+        start_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+    }
+
     ret = vmstate_load(f, se);
     if (ret < 0) {
         error_report("error while loading state section id %d(%s)",
                      section_id, se->idstr);
         return ret;
     }
+
+    if (trace_downtime) {
+        end_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_vmstate_downtime_load("iterable", se->idstr,
+                                    se->instance_id, end_ts - start_ts);
+    }
+
     if (!check_section_footer(f, se)) {
         return -EINVAL;
     }
@@ -2791,14 +2832,14 @@ retry:
         switch (section_type) {
         case QEMU_VM_SECTION_START:
         case QEMU_VM_SECTION_FULL:
-            ret = qemu_loadvm_section_start_full(f, mis);
+            ret = qemu_loadvm_section_start_full(f, mis, section_type);
             if (ret < 0) {
                 goto out;
             }
             break;
         case QEMU_VM_SECTION_PART:
         case QEMU_VM_SECTION_END:
-            ret = qemu_loadvm_section_part_end(f, mis);
+            ret = qemu_loadvm_section_part_end(f, mis, section_type);
             if (ret < 0) {
                 goto out;
             }
diff --git a/migration/trace-events b/migration/trace-events
index fa9486dffe..5820add1f3 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -48,6 +48,8 @@ savevm_state_cleanup(void) ""
 savevm_state_complete_precopy(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
+vmstate_downtime_save(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
+vmstate_downtime_load(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
 postcopy_page_req_sync(void *host_addr) "sync page req %p"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] migration: migration_stop_vm() helper
  2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
                   ` (2 preceding siblings ...)
  2023-10-30 16:33 ` [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints Peter Xu
@ 2023-10-30 16:33 ` Peter Xu
  2023-10-31 12:53   ` Juan Quintela
  2023-10-30 16:33 ` [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints Peter Xu
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

Provide a helper for non-COLO use case of migration to stop a VM.  This
prepares for adding some downtime relevant tracepoints to migration, where
they may or may not apply to COLO.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  2 ++
 migration/migration.c | 11 ++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index ae82004892..5944107ad5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -544,4 +544,6 @@ void migration_rp_wait(MigrationState *s);
  */
 void migration_rp_kick(MigrationState *s);
 
+int migration_stop_vm(RunState state);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 70d775942a..9013c1b500 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -149,6 +149,11 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
     return (a > b) - (a < b);
 }
 
+int migration_stop_vm(RunState state)
+{
+    return vm_stop_force_state(state);
+}
+
 void migration_object_init(void)
 {
     /* This can only be called once. */
@@ -2164,7 +2169,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     global_state_store();
-    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
         goto fail;
     }
@@ -2366,7 +2371,7 @@ static int migration_completion_precopy(MigrationState *s,
     s->vm_old_state = runstate_get();
     global_state_store();
 
-    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
     trace_migration_completion_vm_stop(ret);
     if (ret < 0) {
         goto out_unlock;
@@ -3217,7 +3222,7 @@ static void *bg_migration_thread(void *opaque)
 
     global_state_store();
     /* Forcibly stop VM before saving state of vCPUs and devices */
-    if (vm_stop_force_state(RUN_STATE_PAUSED)) {
+    if (migration_stop_vm(RUN_STATE_PAUSED)) {
         goto fail;
     }
     /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints
  2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
                   ` (3 preceding siblings ...)
  2023-10-30 16:33 ` [PATCH v2 4/5] migration: migration_stop_vm() helper Peter Xu
@ 2023-10-30 16:33 ` Peter Xu
  2023-10-31 12:54   ` Juan Quintela
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2023-10-30 16:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, Fabiano Rosas, Joao Martins, Juan Quintela

This patch is inspired by Joao Martin's patch here:

https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com

Add tracepoints for major downtime checkpoints on both src and dst.  They
share the same tracepoint with a string showing its stage.

Besides the checkpoints in the previous patch, this patch also added
destination checkpoints.

On src, we have these checkpoints added:

  - src-downtime-start: right before vm stops on src
  - src-vm-stopped: after vm is fully stopped
  - src-iterable-saved: after all iterables saved (END sections)
  - src-non-iterable-saved: after all non-iterable saved (FULL sections)
  - src-downtime-stop: migration fully completed

On dst, we have these checkpoints added:

  - dst-precopy-loadvm-completes: after loadvm all done for precopy
  - dst-precopy-bh-*: record BH steps to resume VM for precopy
  - dst-postcopy-bh-*: record BH steps to resume VM for postcopy

On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now.  We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.

With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
enable all tracepoints for downtime measurements necessary.

Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
service the same purpose, which was only for postcopy.  We then have
unified prefix for all downtime relevant tracepoints.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 16 +++++++++++++++-
 migration/savevm.c     | 14 +++++++++-----
 migration/trace-events |  2 +-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9013c1b500..52f4ed41b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -103,6 +103,7 @@ static int close_return_path_on_source(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
 {
+    trace_vmstate_downtime_checkpoint("src-downtime-start");
     s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 }
 
@@ -117,6 +118,8 @@ static void migration_downtime_end(MigrationState *s)
     if (!s->downtime) {
         s->downtime = now - s->downtime_start;
     }
+
+    trace_vmstate_downtime_checkpoint("src-downtime-end");
 }
 
 static bool migration_needs_multiple_sockets(void)
@@ -151,7 +154,11 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 
 int migration_stop_vm(RunState state)
 {
-    return vm_stop_force_state(state);
+    int ret = vm_stop_force_state(state);
+
+    trace_vmstate_downtime_checkpoint("src-vm-stopped");
+
+    return ret;
 }
 
 void migration_object_init(void)
@@ -500,6 +507,8 @@ static void process_incoming_migration_bh(void *opaque)
     Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
+    trace_vmstate_downtime_checkpoint("dst-precopy-bh-enter");
+
     /* If capability late_block_activate is set:
      * Only fire up the block code now if we're going to restart the
      * VM, else 'cont' will do it.
@@ -525,6 +534,8 @@ static void process_incoming_migration_bh(void *opaque)
      */
     qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
+    trace_vmstate_downtime_checkpoint("dst-precopy-bh-announced");
+
     multifd_load_shutdown();
 
     dirty_bitmap_mig_before_vm_start();
@@ -542,6 +553,7 @@ static void process_incoming_migration_bh(void *opaque)
     } else {
         runstate_set(global_state_get_runstate());
     }
+    trace_vmstate_downtime_checkpoint("dst-precopy-bh-vm-started");
     /*
      * This must happen after any state changes since as soon as an external
      * observer sees this event they might start to prod at the VM assuming
@@ -576,6 +588,8 @@ process_incoming_migration_co(void *opaque)
     ret = qemu_loadvm_state(mis->from_src_file);
     mis->loadvm_co = NULL;
 
+    trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
+
     ps = postcopy_state_get();
     trace_process_incoming_migration_co_end(ret, ps);
     if (ps != POSTCOPY_INCOMING_NONE) {
diff --git a/migration/savevm.c b/migration/savevm.c
index cd6d6ba493..2578137ee7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1494,6 +1494,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
                                     end_ts_each - start_ts_each);
     }
 
+    trace_vmstate_downtime_checkpoint("src-iterable-saved");
+
     return 0;
 }
 
@@ -1560,6 +1562,8 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     json_writer_free(vmdesc);
     ms->vmdesc = NULL;
 
+    trace_vmstate_downtime_checkpoint("src-non-iterable-saved");
+
     return 0;
 }
 
@@ -2102,18 +2106,18 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
-    trace_loadvm_postcopy_handle_run_bh("enter");
+    trace_vmstate_downtime_checkpoint("dst-postcopy-bh-enter");
 
     /* TODO we should move all of this lot into postcopy_ram.c or a shared code
      * in migration.c
      */
     cpu_synchronize_all_post_init();
 
-    trace_loadvm_postcopy_handle_run_bh("after cpu sync");
+    trace_vmstate_downtime_checkpoint("dst-postcopy-bh-cpu-synced");
 
     qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
-    trace_loadvm_postcopy_handle_run_bh("after announce");
+    trace_vmstate_downtime_checkpoint("dst-postcopy-bh-announced");
 
     /* Make sure all file formats throw away their mutable metadata.
      * If we get an error here, just don't restart the VM yet. */
@@ -2124,7 +2128,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
         autostart = false;
     }
 
-    trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
+    trace_vmstate_downtime_checkpoint("dst-postcopy-bh-cache-invalidated");
 
     dirty_bitmap_mig_before_vm_start();
 
@@ -2138,7 +2142,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 
     qemu_bh_delete(mis->bh);
 
-    trace_loadvm_postcopy_handle_run_bh("return");
+    trace_vmstate_downtime_checkpoint("dst-postcopy-bh-vm-started");
 }
 
 /* After all discards we can start running and asking for pages */
diff --git a/migration/trace-events b/migration/trace-events
index 5820add1f3..e54f317e3b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -17,7 +17,6 @@ loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
-loadvm_postcopy_handle_run_bh(const char *str) "%s"
 loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
@@ -50,6 +49,7 @@ vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_downtime_save(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
 vmstate_downtime_load(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
+vmstate_downtime_checkpoint(const char *checkpoint) "%s"
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
 postcopy_page_req_sync(void *host_addr) "sync page req %p"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/5] migration: Set downtime_start even for postcopy
  2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
@ 2023-10-31 11:38   ` Juan Quintela
  0 siblings, 0 replies; 11+ messages in thread
From: Juan Quintela @ 2023-10-31 11:38 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Joao Martins

Peter Xu <peterx@redhat.com> wrote:
> Postcopy calculates its downtime separately.  It always sets
> MigrationState.downtime properly, but not MigrationState.downtime_start.
>
> Make postcopy do the same as other modes on properly recording the
> timestamp when the VM is going to be stopped.  Drop the temporary variable
> in postcopy_start() along the way.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers
  2023-10-30 16:33 ` [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers Peter Xu
@ 2023-10-31 11:39   ` Juan Quintela
  0 siblings, 0 replies; 11+ messages in thread
From: Juan Quintela @ 2023-10-31 11:39 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Joao Martins

Peter Xu <peterx@redhat.com> wrote:
> Unify the three users on recording downtimes with the same pair of helpers.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints
  2023-10-30 16:33 ` [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints Peter Xu
@ 2023-10-31 12:52   ` Juan Quintela
  0 siblings, 0 replies; 11+ messages in thread
From: Juan Quintela @ 2023-10-31 12:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Joao Martins

Peter Xu <peterx@redhat.com> wrote:
> We have a bunch of savevm_section* tracepoints, they're good to analyze
> migration stream, but not always suitable if someone would like to analyze
> the migration downtime.  Two major problems:
>
>   - savevm_section* tracepoints are dumping all sections, we only care
>     about the sections that contribute to the downtime
>
>   - They don't have an identifier to show the type of sections, so no way
>     to filter downtime information either easily.
>
> We can add type into the tracepoints, but instead of doing so, this patch
> kept them untouched, instead of adding a bunch of downtime specific
> tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
> full picture of how the downtime is distributed across iterative and
> non-iterative vmstate save/load.
>
> Note that here both save() and load() need to be traced, because both of
> them may contribute to the downtime.  The contribution is not a simple "add
> them together", though: consider when the src is doing a save() of device1
> while the dest can be load()ing for device2, so they can happen
> concurrently.
>
> Tracking both sides make sense because device load() and save() can be
> imbalanced, one device can save() super fast, but load() super slow, vice
> versa.  We can't figure that out without tracing both.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.

>  static
>  int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
>  {
> +    int64_t start_ts_each, end_ts_each;
>      SaveStateEntry *se;
>      int ret;
>  
> @@ -1475,6 +1476,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
>                  continue;
>              }
>          }
> +
> +        start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
I still think that:
           int64_t start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);

>          trace_savevm_section_start(se->idstr, se->section_id);
>  
>          save_section_header(f, se, QEMU_VM_SECTION_END);
> @@ -1486,6 +1489,9 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
>              qemu_file_set_error(f, ret);
>              return -1;
>          }
> +        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);

and

           int64_t end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);

is clearer.

Having to pass the type thing is not "pleasant", but I can't think of a
better way to do it.

Later, Juan.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/5] migration: migration_stop_vm() helper
  2023-10-30 16:33 ` [PATCH v2 4/5] migration: migration_stop_vm() helper Peter Xu
@ 2023-10-31 12:53   ` Juan Quintela
  0 siblings, 0 replies; 11+ messages in thread
From: Juan Quintela @ 2023-10-31 12:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Joao Martins

Peter Xu <peterx@redhat.com> wrote:
> Provide a helper for non-COLO use case of migration to stop a VM.  This
> prepares for adding some downtime relevant tracepoints to migration, where
> they may or may not apply to COLO.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints
  2023-10-30 16:33 ` [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints Peter Xu
@ 2023-10-31 12:54   ` Juan Quintela
  0 siblings, 0 replies; 11+ messages in thread
From: Juan Quintela @ 2023-10-31 12:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas, Joao Martins

Peter Xu <peterx@redhat.com> wrote:
> This patch is inspired by Joao Martin's patch here:
>
> https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
>
> Add tracepoints for major downtime checkpoints on both src and dst.  They
> share the same tracepoint with a string showing its stage.
>
> Besides the checkpoints in the previous patch, this patch also added
> destination checkpoints.
>
> On src, we have these checkpoints added:
>
>   - src-downtime-start: right before vm stops on src
>   - src-vm-stopped: after vm is fully stopped
>   - src-iterable-saved: after all iterables saved (END sections)
>   - src-non-iterable-saved: after all non-iterable saved (FULL sections)
>   - src-downtime-stop: migration fully completed
>
> On dst, we have these checkpoints added:
>
>   - dst-precopy-loadvm-completes: after loadvm all done for precopy
>   - dst-precopy-bh-*: record BH steps to resume VM for precopy
>   - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
>
> On dst side, we don't have a good way to trace total time consumed by
> iterable or non-iterable for now.  We can mark it by 1st time receiving a
> FULL / END section, but rather than that let's just rely on the other
> tracepoints added for vmstates to back up the information.
>
> With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
> enable all tracepoints for downtime measurements necessary.
>
> Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
> service the same purpose, which was only for postcopy.  We then have
> unified prefix for all downtime relevant tracepoints.
>
> Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-31 12:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
2023-10-31 11:38   ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers Peter Xu
2023-10-31 11:39   ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints Peter Xu
2023-10-31 12:52   ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 4/5] migration: migration_stop_vm() helper Peter Xu
2023-10-31 12:53   ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints Peter Xu
2023-10-31 12:54   ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).