[PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Dr . David Alan Gilbert" <dave@treblig.org>,
	peterx@redhat.com, Alexey Perevalov <a.perevalov@samsung.com>,
	Fabiano Rosas <farosas@suse.de>,
	Juraj Marcin <jmarcin@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: [PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults
Date: Tue, 27 May 2025 19:12:48 -0400	[thread overview]
Message-ID: <20250527231248.1279174-14-peterx@redhat.com> (raw)
In-Reply-To: <20250527231248.1279174-1-peterx@redhat.com>

When used to report page fault latencies, the blocktime feature can be
almost useless when KVM async page fault is enabled, because in most cases
such remote fault will kickoff async page faults, then it's not trackable
from blocktime layer.

After all these recent rewrites to blocktime layer, it's finally so easy to
also support tracking non-vCPU faults.  It'll be even faster if we could
always index fault records with TIDs, unfortunately we need to maintain the
blocktime API which report things in vCPU indexes.

Of course this can work not only for kworkers, but also any guest accesses
that may reach a missing page, for example, very likely when in the QEMU
main thread too (and all other threads whenever applicable).

In this case, we don't care about "how long the threads are blocked", but
we only care about "how long the fault will be resolved".

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json                   |  7 +++
 migration/migration-hmp-cmds.c        |  5 +++
 migration/postcopy-ram.c              | 64 +++++++++++++++++++++------
 tests/qtest/migration/migration-qmp.c |  1 +
 migration/trace-events                |  2 +-
 5 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8b13cea169..f59c473842 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -247,6 +247,12 @@
 #     this is the per-vCPU statistics. This is only present when the
 #     postcopy-blocktime migration capability is enabled.  (Since 10.1)
 #
+# @postcopy-non-vcpu-latency: average remote page fault latency for all
+#     faults happend in non-vCPU threads (in us).  It has the same
+#     definition of @postcopy-latency but this only provides statistics to
+#     non-vCPU faults. This is only present when the postcopy-blocktime
+#     migration capability is enabled.  (Since 10.1)
+#
 # @socket-address: Only used for tcp, to know what the real port is
 #     (Since 4.0)
 #
@@ -288,6 +294,7 @@
            '*postcopy-vcpu-blocktime': ['uint32'],
            '*postcopy-latency': 'uint64',
            '*postcopy-vcpu-latency': ['uint64'],
+           '*postcopy-non-vcpu-latency': 'uint64',
            '*socket-address': ['SocketAddress'],
            '*dirty-limit-throttle-time-per-round': 'uint64',
            '*dirty-limit-ring-full-time': 'uint64'} }
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index a18049a7e8..bdfc754d70 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -81,6 +81,11 @@ static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
                        info->postcopy_latency);
     }
 
+    if (info->has_postcopy_non_vcpu_latency) {
+        monitor_printf(mon, "Postcopy non-vCPU Latencies (us): %" PRIu64 "\n",
+                       info->postcopy_non_vcpu_latency);
+    }
+
     if (info->has_postcopy_vcpu_latency) {
         uint64List *item = info->postcopy_vcpu_latency;
         int count = 0;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 066dc9f3ce..bc6736f262 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -192,6 +192,8 @@ typedef struct PostcopyBlocktimeContext {
     GHashTable *tid_to_vcpu_hash;
     /* Count of non-vCPU faults.  This is only for debugging purpose. */
     uint64_t non_vcpu_faults;
+    /* total blocktime when a non-vCPU thread is stopped */
+    uint64_t non_vcpu_blocktime_total;
 
     /*
      * Handler for exit event, necessary for
@@ -203,7 +205,10 @@ typedef struct PostcopyBlocktimeContext {
 typedef struct {
     /* The time the fault was triggered */
     uint64_t fault_time;
-    /* The vCPU index that was blocked */
+    /*
+     * The vCPU index that was blocked, when cpu==-1, it means it's a
+     * fault from non-vCPU threads.
+     */
     int cpu;
 } BlocktimeVCPUEntry;
 
@@ -344,6 +349,12 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
         QAPI_LIST_PREPEND(list_latency, latency);
     }
 
+    latency_total += bc->non_vcpu_blocktime_total;
+    faults += bc->non_vcpu_faults;
+
+    info->has_postcopy_non_vcpu_latency = true;
+    info->postcopy_non_vcpu_latency = bc->non_vcpu_faults ?
+        (bc->non_vcpu_blocktime_total / bc->non_vcpu_faults) : 0;
     info->has_postcopy_blocktime = true;
     info->postcopy_blocktime = (uint32_t)(bc->total_blocktime / 1000);
     info->has_postcopy_vcpu_blocktime = true;
@@ -982,7 +993,10 @@ static uint64_t get_current_us(void)
     return (uint64_t)qemu_clock_get_us(QEMU_CLOCK_REALTIME);
 }
 
-/* Inject an (cpu, fault_time) entry into the database, using addr as key */
+/*
+ * Inject an (cpu, fault_time) entry into the database, using addr as key.
+ * When cpu==-1, it means it's a non-vCPU fault.
+ */
 static void blocktime_fault_inject(PostcopyBlocktimeContext *ctx,
                                    uintptr_t addr, int cpu, uint64_t time)
 {
@@ -1065,9 +1079,17 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
         /* Making sure it won't overflow - it really should never! */
         assert(dc->vcpu_faults_current[cpu] <= 255);
     } else {
-        /* We do not support non-vCPU thread tracking yet */
+        /*
+         * For non-vCPU thread faults, we don't care about tid or cpu index
+         * or time the thread is blocked (e.g., a kworker trying to help
+         * KVM when async_pf=on is OK to be blocked and not affect guest
+         * responsiveness), but we care about latency.  Track it with
+         * cpu=-1.
+         *
+         * Note that this will NOT affect blocktime reports on vCPU being
+         * blocked, but only about system-wise latency reports.
+         */
         dc->non_vcpu_faults++;
-        return;
     }
 
     blocktime_fault_inject(dc, addr, cpu, current_us);
@@ -1077,6 +1099,7 @@ typedef struct {
     PostcopyBlocktimeContext *ctx;
     uint64_t current_us;
     int affected_cpus;
+    int affected_non_cpus;
 } BlockTimeVCPUIter;
 
 static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
@@ -1084,6 +1107,7 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
     BlockTimeVCPUIter *iter = user_data;
     PostcopyBlocktimeContext *ctx = iter->ctx;
     BlocktimeVCPUEntry *entry = data;
+    uint64_t time_passed;
     int cpu = entry->cpu;
 
     /*
@@ -1091,17 +1115,27 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
      * later than when it was faulted.
      */
     assert(iter->current_us >= entry->fault_time);
+    time_passed = iter->current_us - entry->fault_time;
 
-    /*
-     * If we resolved all pending faults on one vCPU due to this page
-     * resolution, take a note.
-     */
-    if (--ctx->vcpu_faults_current[cpu] == 0) {
-        ctx->vcpu_blocktime_total[cpu] += iter->current_us - entry->fault_time;
-        iter->affected_cpus += 1;
+    if (cpu >= 0) {
+        /*
+         * If we resolved all pending faults on one vCPU due to this page
+         * resolution, take a note.
+         */
+        if (--ctx->vcpu_faults_current[cpu] == 0) {
+            ctx->vcpu_blocktime_total[cpu] += time_passed;
+            iter->affected_cpus += 1;
+        }
+        trace_postcopy_blocktime_end_one(cpu, ctx->vcpu_faults_current[cpu]);
+    } else {
+        iter->affected_non_cpus++;
+        ctx->non_vcpu_blocktime_total += time_passed;
+        /*
+         * We do not maintain how many pending non-vCPU faults because we
+         * do not care about blocktime, only latency.
+         */
+        trace_postcopy_blocktime_end_one(-1, 0);
     }
-
-    trace_postcopy_blocktime_end_one(cpu, ctx->vcpu_faults_current[cpu]);
 }
 
 /*
@@ -1140,6 +1174,7 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     BlockTimeVCPUIter iter = {
         .current_us = get_current_us(),
         .affected_cpus = 0,
+        .affected_non_cpus = 0,
         .ctx = dc,
     };
     gpointer key = (gpointer)addr;
@@ -1173,7 +1208,8 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
     }
     dc->smp_cpus_down -= iter.affected_cpus;
 
-    trace_postcopy_blocktime_end(addr, iter.current_us, iter.affected_cpus);
+    trace_postcopy_blocktime_end(addr, iter.current_us, iter.affected_cpus,
+                                 iter.affected_non_cpus);
 }
 
 static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
index 1a5ab2d229..67a67d4bd6 100644
--- a/tests/qtest/migration/migration-qmp.c
+++ b/tests/qtest/migration/migration-qmp.c
@@ -361,6 +361,7 @@ void read_blocktime(QTestState *who)
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
     g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
     g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
+    g_assert(qdict_haskey(rsp_return, "postcopy-non-vcpu-latency"));
     qobject_unref(rsp_return);
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index a36a78f01a..706db97def 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -310,7 +310,7 @@ postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
 postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
 postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu, bool exists) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d, exist: %d"
-postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d"
+postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu, int affected_non_cpus) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d, affected_non_cpus: %d"
 postcopy_blocktime_end_one(int cpu, uint8_t left_faults) "cpu: %d, left_faults: %" PRIu8
 
 # exec.c
-- 
2.49.0

next prev parent reply	other threads:[~2025-05-27 23:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 23:12 [PATCH 00/13] migration/postcopy: Blocktime tracking overhaul Peter Xu
2025-05-27 23:12 ` [PATCH 01/13] migration: Add option to set postcopy-blocktime Peter Xu
2025-06-02 16:50   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex Peter Xu
2025-06-02 17:46   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 03/13] migration/postcopy: Drop all atomic ops in blocktime feature Peter Xu
2025-06-02 18:00   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 04/13] migration/postcopy: Make all blocktime vars 64bits Peter Xu
2025-06-02 20:42   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 05/13] migration/postcopy: Drop PostcopyBlocktimeContext.start_time Peter Xu
2025-06-03 15:52   ` Fabiano Rosas
2025-06-03 15:55     ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 06/13] migration/postcopy: Bring blocktime layer to us level Peter Xu
2025-06-03 15:54   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 07/13] migration/postcopy: Add blocktime fault counts per-vcpu Peter Xu
2025-06-03 15:59   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime Peter Xu
2025-06-02  9:26   ` Markus Armbruster
2025-06-02 16:29     ` Peter Xu
2025-06-03 16:07   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 09/13] migration/postcopy: Initialize blocktime context only until listen Peter Xu
2025-06-03 16:08   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 10/13] migration/postcopy: Cache the tid->vcpu mapping for blocktime Peter Xu
2025-06-03 16:20   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 11/13] migration/postcopy: Cleanup the total blocktime accounting Peter Xu
2025-06-03 16:22   ` Fabiano Rosas
2025-05-27 23:12 ` [PATCH 12/13] migration/postcopy: Optimize blocktime fault tracking with hashtable Peter Xu
2025-06-03 16:44   ` Fabiano Rosas
2025-05-27 23:12 ` Peter Xu [this message]
2025-06-03 16:50   ` [PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults Fabiano Rosas

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8b13cea16 dfblob:f59c47384 dfblob:a18049a7e dfblob:bdfc754d7
dfblob:066dc9f3c dfblob:bc6736f26 dfblob:1a5ab2d22 dfblob:67a67d4bd
dfblob:a36a78f01 dfblob:706db97de )
 OR (
bs:"[PATCH 13/13] migration/postcopy: blocktime allows track / report non-vCPU faults" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250527231248.1279174-14-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).