From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: peterx@redhat.com, Fabiano Rosas <farosas@suse.de>,
Joao Martins <joao.m.martins@oracle.com>,
Juan Quintela <quintela@redhat.com>
Subject: [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints
Date: Mon, 30 Oct 2023 12:33:46 -0400 [thread overview]
Message-ID: <20231030163346.765724-6-peterx@redhat.com> (raw)
In-Reply-To: <20231030163346.765724-1-peterx@redhat.com>
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They
share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added
destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src
- src-vm-stopped: after vm is fully stopped
- src-iterable-saved: after all iterables saved (END sections)
- src-non-iterable-saved: after all non-iterable saved (FULL sections)
- src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy
- dst-precopy-bh-*: record BH steps to resume VM for precopy
- dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now. We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
service the same purpose, which was only for postcopy. We then have
unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 16 +++++++++++++++-
migration/savevm.c | 14 +++++++++-----
migration/trace-events | 2 +-
3 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 9013c1b500..52f4ed41b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -103,6 +103,7 @@ static int close_return_path_on_source(MigrationState *s);
static void migration_downtime_start(MigrationState *s)
{
+ trace_vmstate_downtime_checkpoint("src-downtime-start");
s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
}
@@ -117,6 +118,8 @@ static void migration_downtime_end(MigrationState *s)
if (!s->downtime) {
s->downtime = now - s->downtime_start;
}
+
+ trace_vmstate_downtime_checkpoint("src-downtime-end");
}
static bool migration_needs_multiple_sockets(void)
@@ -151,7 +154,11 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
int migration_stop_vm(RunState state)
{
- return vm_stop_force_state(state);
+ int ret = vm_stop_force_state(state);
+
+ trace_vmstate_downtime_checkpoint("src-vm-stopped");
+
+ return ret;
}
void migration_object_init(void)
@@ -500,6 +507,8 @@ static void process_incoming_migration_bh(void *opaque)
Error *local_err = NULL;
MigrationIncomingState *mis = opaque;
+ trace_vmstate_downtime_checkpoint("dst-precopy-bh-enter");
+
/* If capability late_block_activate is set:
* Only fire up the block code now if we're going to restart the
* VM, else 'cont' will do it.
@@ -525,6 +534,8 @@ static void process_incoming_migration_bh(void *opaque)
*/
qemu_announce_self(&mis->announce_timer, migrate_announce_params());
+ trace_vmstate_downtime_checkpoint("dst-precopy-bh-announced");
+
multifd_load_shutdown();
dirty_bitmap_mig_before_vm_start();
@@ -542,6 +553,7 @@ static void process_incoming_migration_bh(void *opaque)
} else {
runstate_set(global_state_get_runstate());
}
+ trace_vmstate_downtime_checkpoint("dst-precopy-bh-vm-started");
/*
* This must happen after any state changes since as soon as an external
* observer sees this event they might start to prod at the VM assuming
@@ -576,6 +588,8 @@ process_incoming_migration_co(void *opaque)
ret = qemu_loadvm_state(mis->from_src_file);
mis->loadvm_co = NULL;
+ trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
+
ps = postcopy_state_get();
trace_process_incoming_migration_co_end(ret, ps);
if (ps != POSTCOPY_INCOMING_NONE) {
diff --git a/migration/savevm.c b/migration/savevm.c
index cd6d6ba493..2578137ee7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1494,6 +1494,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
end_ts_each - start_ts_each);
}
+ trace_vmstate_downtime_checkpoint("src-iterable-saved");
+
return 0;
}
@@ -1560,6 +1562,8 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
json_writer_free(vmdesc);
ms->vmdesc = NULL;
+ trace_vmstate_downtime_checkpoint("src-non-iterable-saved");
+
return 0;
}
@@ -2102,18 +2106,18 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
Error *local_err = NULL;
MigrationIncomingState *mis = opaque;
- trace_loadvm_postcopy_handle_run_bh("enter");
+ trace_vmstate_downtime_checkpoint("dst-postcopy-bh-enter");
/* TODO we should move all of this lot into postcopy_ram.c or a shared code
* in migration.c
*/
cpu_synchronize_all_post_init();
- trace_loadvm_postcopy_handle_run_bh("after cpu sync");
+ trace_vmstate_downtime_checkpoint("dst-postcopy-bh-cpu-synced");
qemu_announce_self(&mis->announce_timer, migrate_announce_params());
- trace_loadvm_postcopy_handle_run_bh("after announce");
+ trace_vmstate_downtime_checkpoint("dst-postcopy-bh-announced");
/* Make sure all file formats throw away their mutable metadata.
* If we get an error here, just don't restart the VM yet. */
@@ -2124,7 +2128,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
autostart = false;
}
- trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
+ trace_vmstate_downtime_checkpoint("dst-postcopy-bh-cache-invalidated");
dirty_bitmap_mig_before_vm_start();
@@ -2138,7 +2142,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
qemu_bh_delete(mis->bh);
- trace_loadvm_postcopy_handle_run_bh("return");
+ trace_vmstate_downtime_checkpoint("dst-postcopy-bh-vm-started");
}
/* After all discards we can start running and asking for pages */
diff --git a/migration/trace-events b/migration/trace-events
index 5820add1f3..e54f317e3b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -17,7 +17,6 @@ loadvm_handle_recv_bitmap(char *s) "%s"
loadvm_postcopy_handle_advise(void) ""
loadvm_postcopy_handle_listen(const char *str) "%s"
loadvm_postcopy_handle_run(void) ""
-loadvm_postcopy_handle_run_bh(const char *str) "%s"
loadvm_postcopy_handle_resume(void) ""
loadvm_postcopy_ram_handle_discard(void) ""
loadvm_postcopy_ram_handle_discard_end(void) ""
@@ -50,6 +49,7 @@ vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
vmstate_downtime_save(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
vmstate_downtime_load(const char *type, const char *idstr, uint32_t instance_id, int64_t downtime) "type=%s idstr=%s instance_id=%d downtime=%"PRIi64
+vmstate_downtime_checkpoint(const char *checkpoint) "%s"
postcopy_pause_incoming(void) ""
postcopy_pause_incoming_continued(void) ""
postcopy_page_req_sync(void *host_addr) "sync page req %p"
--
2.41.0
next prev parent reply other threads:[~2023-10-30 16:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 16:33 [PATCH v2 0/5] migration: Downtime tracepoints Peter Xu
2023-10-30 16:33 ` [PATCH v2 1/5] migration: Set downtime_start even for postcopy Peter Xu
2023-10-31 11:38 ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 2/5] migration: Add migration_downtime_start|end() helpers Peter Xu
2023-10-31 11:39 ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 3/5] migration: Add per vmstate downtime tracepoints Peter Xu
2023-10-31 12:52 ` Juan Quintela
2023-10-30 16:33 ` [PATCH v2 4/5] migration: migration_stop_vm() helper Peter Xu
2023-10-31 12:53 ` Juan Quintela
2023-10-30 16:33 ` Peter Xu [this message]
2023-10-31 12:54 ` [PATCH v2 5/5] migration: Add tracepoints for downtime checkpoints Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231030163346.765724-6-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=farosas@suse.de \
--cc=joao.m.martins@oracle.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.