[PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Alexey Perevalov <a.perevalov@samsung.com>,
	Juraj Marcin <jmarcin@redhat.com>,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	peterx@redhat.com, Fabiano Rosas <farosas@suse.de>,
	Markus Armbruster <armbru@redhat.com>
Subject: [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime
Date: Mon,  9 Jun 2025 15:12:54 -0400	[thread overview]
Message-ID: <20250609191259.9053-9-peterx@redhat.com> (raw)
In-Reply-To: <20250609191259.9053-1-peterx@redhat.com>

Blocktime so far only cares about the time one vcpu (or the whole system)
got blocked.  It would be also be helpful if it can also report the latency
of page requests, which could be very sensitive during postcopy.

Blocktime itself is sometimes not very important, especially when one
thinks about KVM async PF support, which means vCPUs are literally almost
not blocked at all because the guest OS is smart enough to switch to
another task when a remote fault is needed.

However, latency is still sensitive and important because even if the guest
vCPU is running on threads that do not need a remote fault, the workload
that accesses some missing page is still affected.

Add two entries to the report, showing how long it takes to resolve a
remote fault.  Mention in the QAPI doc that this is not the real average
fault latency, but only the ones that was requested for a remote fault.

Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice,
meanwhile add the entry checks in qtests for all postcopy tests.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json                   | 13 +++++
 migration/migration-hmp-cmds.c        | 68 ++++++++++++++++++---------
 migration/postcopy-ram.c              | 48 +++++++++++++------
 tests/qtest/migration/migration-qmp.c |  3 ++
 4 files changed, 96 insertions(+), 36 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 4963f6ca12..e95b7402cb 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -236,6 +236,17 @@
 #     This is only present when the postcopy-blocktime migration
 #     capability is enabled.  (Since 3.0)
 #
+# @postcopy-latency: average remote page fault latency (in us).  Note that
+#     this doesn't include all faults, but only the ones that require a
+#     remote page request.  So it should be always bigger than the real
+#     average page fault latency. This is only present when the
+#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
+#
+# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
+#     us).  It has the same definition of @postcopy-latency, but instead
+#     this is the per-vCPU statistics.  This is only present when the
+#     postcopy-blocktime migration capability is enabled.  (Since 10.1)
+#
 # @socket-address: Only used for tcp, to know what the real port is
 #     (Since 4.0)
 #
@@ -275,6 +286,8 @@
            '*blocked-reasons': ['str'],
            '*postcopy-blocktime': 'uint32',
            '*postcopy-vcpu-blocktime': ['uint32'],
+           '*postcopy-latency': 'uint64',
+           '*postcopy-vcpu-latency': ['uint64'],
            '*socket-address': ['SocketAddress'],
            '*dirty-limit-throttle-time-per-round': 'uint64',
            '*dirty-limit-ring-full-time': 'uint64'} }
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 6c36e202a0..600b0f8071 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -52,6 +52,52 @@ static void migration_global_dump(Monitor *mon)
                    ms->clear_bitmap_shift);
 }
 
+static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
+{
+    if (info->has_postcopy_blocktime) {
+        monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n",
+                       info->postcopy_blocktime);
+    }
+
+    if (info->has_postcopy_vcpu_blocktime) {
+        uint32List *item = info->postcopy_vcpu_blocktime;
+        const char *sep = "";
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n [");
+
+        while (item) {
+            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
+        }
+        monitor_printf(mon, "]\n");
+    }
+
+    if (info->has_postcopy_latency) {
+        monitor_printf(mon, "Postcopy Latency (us): %" PRIu64 "\n",
+                       info->postcopy_latency);
+    }
+
+    if (info->has_postcopy_vcpu_latency) {
+        uint64List *item = info->postcopy_vcpu_latency;
+        int count = 0;
+
+        monitor_printf(mon, "Postcopy vCPU Latencies (us): \n [");
+
+        while (item) {
+            monitor_printf(mon, "%"PRIu64", ", item->value);
+            item = item->next;
+            /* Each line 10 vcpu results, newline if there's more */
+            if ((++count % 10 == 0) && item) {
+                monitor_printf(mon, "\n  ");
+            }
+        }
+        monitor_printf(mon, "\b\b]\n");
+    }
+}
+
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
     bool show_all = qdict_get_try_bool(qdict, "all", false);
@@ -202,27 +248,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->dirty_limit_ring_full_time);
     }
 
-    if (info->has_postcopy_blocktime) {
-        monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n",
-                       info->postcopy_blocktime);
-    }
-
-    if (info->has_postcopy_vcpu_blocktime) {
-        uint32List *item = info->postcopy_vcpu_blocktime;
-        const char *sep = "";
-        int count = 0;
-
-        monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n [");
-
-        while (item) {
-            monitor_printf(mon, "%s%"PRIu32, sep, item->value);
-            item = item->next;
-            /* Each line 10 vcpu results, newline if there's more */
-            sep = ((++count % 10 == 0) && item) ? ",\n  " : ", ";
-        }
-        monitor_printf(mon, "]\n");
-    }
-
+    migration_dump_blocktime(mon, info);
 out:
     qapi_free_MigrationInfo(info);
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index f5c58a6ca7..98d4c29532 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -166,20 +166,6 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     return ctx;
 }
 
-static uint32List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
-{
-    MachineState *ms = MACHINE(qdev_get_machine());
-    uint32List *list = NULL;
-    int i;
-
-    for (i = ms->smp.cpus - 1; i >= 0; i--) {
-        QAPI_LIST_PREPEND(
-            list, (uint32_t)(ctx->vcpu_blocktime_total[i] / 1000));
-    }
-
-    return list;
-}
-
 /*
  * This function just populates MigrationInfo from postcopy's
  * blocktime context. It will not populate MigrationInfo,
@@ -191,15 +177,47 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    uint64_t latency_total = 0, faults = 0;
+    uint32List *list_blocktime = NULL;
+    uint64List *list_latency = NULL;
+    int i;
 
     if (!bc) {
         return;
     }
 
+    for (i = ms->smp.cpus - 1; i >= 0; i--) {
+        uint64_t latency, total, count;
+
+        /* This is in milliseconds */
+        QAPI_LIST_PREPEND(list_blocktime,
+                          (uint32_t)(bc->vcpu_blocktime_total[i] / 1000));
+
+        /* The rest in microseconds */
+        total = bc->vcpu_blocktime_total[i];
+        latency_total += total;
+        count = bc->vcpu_faults_count[i];
+        faults += count;
+
+        if (count) {
+            latency = total / count;
+        } else {
+            /* No fault detected */
+            latency = 0;
+        }
+
+        QAPI_LIST_PREPEND(list_latency, latency);
+    }
+
     info->has_postcopy_blocktime = true;
     info->postcopy_blocktime = (uint32_t)(bc->total_blocktime / 1000);
     info->has_postcopy_vcpu_blocktime = true;
-    info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
+    info->postcopy_vcpu_blocktime = list_blocktime;
+    info->has_postcopy_latency = true;
+    info->postcopy_latency = faults ? (latency_total / faults) : 0;
+    info->has_postcopy_vcpu_latency = true;
+    info->postcopy_vcpu_latency = list_latency;
 }
 
 static uint64_t get_postcopy_total_blocktime(void)
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index fb59741b2c..1a5ab2d229 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -358,6 +358,9 @@ void read_blocktime(QTestState *who)
 
     rsp_return = migrate_query_not_failed(who);
     g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
     qobject_unref(rsp_return);
 }
 
-- 
2.49.0

next prev parent reply	other threads:[~2025-06-09 19:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 19:12 [PATCH v2 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-06-09 19:12 ` [PATCH v2 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-09 19:12 ` [PATCH v2 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-09 19:12 ` [PATCH v2 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-09 19:12 ` [PATCH v2 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-09 19:12 ` [PATCH v2 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-09 19:12 ` [PATCH v2 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-09 19:12 ` Peter Xu [this message]
2025-06-09 22:05   ` [PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-09 22:25     ` Peter Xu
2025-06-10  0:08   ` Dr. David Alan Gilbert
2025-06-10 13:39     ` Peter Xu
2025-06-10 13:53       ` Dr. David Alan Gilbert
2025-06-10 14:08         ` Peter Xu
2025-06-09 19:12 ` [PATCH v2 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-09 19:12 ` [PATCH v2 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-09 19:12 ` [PATCH v2 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-09 19:12 ` [PATCH v2 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-09 19:12 ` [PATCH v2 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Peter Xu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4963f6ca1 dfblob:e95b7402c dfblob:6c36e202a dfblob:600b0f807
dfblob:f5c58a6ca dfblob:98d4c2953 dfblob:fb59741b2 dfblob:1a5ab2d22 )
 OR (
bs:"[PATCH v2 08/13] migration/postcopy: Report fault latencies in blocktime" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250609191259.9053-9-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).