Re: [PATCH] migration/postcopy: Add latency distribution report for blocktime

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Juraj Marcin <jmarcin@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH] migration/postcopy: Add latency distribution report for blocktime
Date: Tue, 10 Jun 2025 00:27:19 +0000	[thread overview]
Message-ID: <aEd7526urkX56eq0@gallifrey> (raw)
In-Reply-To: <20250609223607.34387-1-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> Add the latency distribution too for blocktime, using order-of-two buckets.
> It accounts for all the faults, from either vCPU or non-vCPU threads.  With
> prior rework, it's very easy to achieve by adding an array to account for
> faults in each buckets.
> 
> Sample output for HMP (while for QMP it's simply an array):
> 
> Postcopy Latency Distribution:
>   [     1 us -     2 us ]:          0
>   [     2 us -     4 us ]:          0
>   [     4 us -     8 us ]:          1
>   [     8 us -    16 us ]:          2
>   [    16 us -    32 us ]:          2
>   [    32 us -    64 us ]:          3
>   [    64 us -   128 us ]:      10169
>   [   128 us -   256 us ]:      50151
>   [   256 us -   512 us ]:      12876
>   [   512 us -     1 ms ]:         97
>   [     1 ms -     2 ms ]:         42
>   [     2 ms -     4 ms ]:         44
>   [     4 ms -     8 ms ]:         93
>   [     8 ms -    16 ms ]:        138

Nice.

>   [    16 ms -    32 ms ]:          0
>   [    32 ms -    65 ms ]:          0
>   [    65 ms -   131 ms ]:          0
>   [   131 ms -   262 ms ]:          0
>   [   262 ms -   524 ms ]:          0
>   [   524 ms -    1 sec ]:          0
>   [    1 sec -    2 sec ]:          0
>   [    2 sec -    4 sec ]:          0
>   [    4 sec -    8 sec ]:          0
>   [    8 sec -   16 sec ]:          0
> 
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
> This patch is based on:
> 
> [PATCH v2 00/13] migration/postcopy: Blocktime tracking overhaul
> Based-on: <20250609191259.9053-1-peterx@redhat.com>
> 
> It is the TODO mentioned in the other series, this patch implemented the
> buckets distribution for fault latency traps.
> ---
>  qapi/migration.json                   |  9 ++++++
>  migration/migration-hmp-cmds.c        | 32 +++++++++++++++++++
>  migration/postcopy-ram.c              | 45 +++++++++++++++++++++++++++
>  tests/qtest/migration/migration-qmp.c |  1 +
>  4 files changed, 87 insertions(+)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index cc680dda46..7d9524a8ea 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -242,6 +242,14 @@
>  #     average page fault latency. This is only present when the
>  #     postcopy-blocktime migration capability is enabled.  (Since 10.1)
>  #
> +# @postcopy-latency-dist: remote page fault latency distributions.  Each
> +#     element of the array is the number of faults that fall into the
> +#     bucket period.  For the N-th bucket (N>=0), the latency window is
> +#     [2^N, 2^(N+1)).  For example, the 8th element stores how many remote
> +#     faults got resolved within [256us, 512us) window. This is only
> +#     present when the postcopy-blocktime migration capability is enabled.
> +#     (Since 10.1)
> +#
>  # @postcopy-vcpu-latency: average remote page fault latency per vCPU (in
>  #     us).  It has the same definition of @postcopy-latency, but instead
>  #     this is the per-vCPU statistics.  This is only present when the
> @@ -293,6 +301,7 @@
>             '*postcopy-blocktime': 'uint32',
>             '*postcopy-vcpu-blocktime': ['uint32'],
>             '*postcopy-latency': 'uint64',
> +           '*postcopy-latency-dist': ['uint64'],
>             '*postcopy-vcpu-latency': ['uint64'],
>             '*postcopy-non-vcpu-latency': 'uint64',
>             '*socket-address': ['SocketAddress'],
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index b234eb6478..b83befd26c 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -52,6 +52,21 @@ static void migration_global_dump(Monitor *mon)
>                     ms->clear_bitmap_shift);
>  }
>  
> +static const gchar *format_time_str(uint64_t us)
> +{
> +    const char *units[] = {"us", "ms", "sec"};
> +    int index = 0;
> +
> +    while (us > 1000) {
> +        us /= 1000;
> +        if (++index >= (sizeof(units) - 1)) {
> +            break;
> +        }
> +    }
> +
> +    return g_strdup_printf("%"PRIu64" %s", us, units[index]);
> +}

(It surprises me we don't have a function like that somewhere)

> +
>  static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
>  {
>      if (info->has_postcopy_blocktime) {
> @@ -100,6 +115,23 @@ static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info)
>          }
>          monitor_printf(mon, "]\n");
>      }
> +
> +    if (info->has_postcopy_latency_dist) {
> +        uint64List *item = info->postcopy_latency_dist;
> +        int count = 0;
> +
> +        monitor_printf(mon, "Postcopy Latency Distribution:\n");
> +
> +        while (item) {
> +            g_autofree const gchar *from = format_time_str(1UL << count);
> +            g_autofree const gchar *to = format_time_str(1UL << (count + 1));
> +
> +            monitor_printf(mon, "  [ %8s - %8s ]: %10"PRIu64"\n",
> +                           from, to, item->value);
> +            item = item->next;
> +            count++;
> +        }
> +    }
>  }

For HMP,

Acked-by: Dr. David Alan Gilbert <dave@treblig.org>

>  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 23332ef3dd..0b40b79f64 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -110,6 +110,15 @@ void postcopy_thread_create(MigrationIncomingState *mis,
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> +/*
> + * Here we use 24 buckets, which means the last bucket will cover [2^24 us,
> + * 2^25 us) ~= [16, 32) seconds.  It should be far enough to record even
> + * extreme (perf-wise broken) 1G pages moving over, which can sometimes
> + * take a few seconds due to various reasons.  Anything more than that
> + * might be unsensible to account anymore.
> + */
> +#define  BLOCKTIME_LATENCY_BUCKET_N  (24)
> +
>  /* All the time records are in unit of microseconds (us) */
>  typedef struct PostcopyBlocktimeContext {
>      /* blocktime per vCPU */
> @@ -175,6 +184,11 @@ typedef struct PostcopyBlocktimeContext {
>       * that a fault was requested.
>       */
>      GHashTable *vcpu_addr_hash;
> +    /*
> +     * Each bucket stores the count of faults that were resolved within the
> +     * bucket window [2^N us, 2^(N+1) us).
> +     */
> +    uint64_t latency_buckets[BLOCKTIME_LATENCY_BUCKET_N];
>      /* total blocktime when all vCPUs are stopped */
>      uint64_t total_blocktime;
>      /* point in time when last page fault was initiated */
> @@ -283,6 +297,9 @@ static struct PostcopyBlocktimeContext *blocktime_context_new(void)
>      unsigned int smp_cpus = ms->smp.cpus;
>      PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
>  
> +    /* Initialize all counters to be zeros */
> +    memset(ctx->latency_buckets, 0, sizeof(ctx->latency_buckets));
> +
>      ctx->vcpu_blocktime_total = g_new0(uint64_t, smp_cpus);
>      ctx->vcpu_faults_count = g_new0(uint64_t, smp_cpus);
>      ctx->vcpu_faults_current = g_new0(uint8_t, smp_cpus);
> @@ -320,6 +337,7 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
>      uint64_t latency_total = 0, faults = 0;
>      uint32List *list_blocktime = NULL;
>      uint64List *list_latency = NULL;
> +    uint64List *latency_buckets = NULL;
>      int i;
>  
>      if (!bc) {
> @@ -349,6 +367,10 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
>          QAPI_LIST_PREPEND(list_latency, latency);
>      }
>  
> +    for (i = BLOCKTIME_LATENCY_BUCKET_N - 1; i >= 0; i--) {
> +        QAPI_LIST_PREPEND(latency_buckets, bc->latency_buckets[i]);
> +    }
> +
>      latency_total += bc->non_vcpu_blocktime_total;
>      faults += bc->non_vcpu_faults;
>  
> @@ -363,6 +385,8 @@ void fill_destination_postcopy_migration_info(MigrationInfo *info)
>      info->postcopy_latency = faults ? (latency_total / faults) : 0;
>      info->has_postcopy_vcpu_latency = true;
>      info->postcopy_vcpu_latency = list_latency;
> +    info->has_postcopy_latency_dist = true;
> +    info->postcopy_latency_dist = latency_buckets;
>  }
>  
>  static uint64_t get_postcopy_total_blocktime(void)
> @@ -1095,6 +1119,25 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
>      blocktime_fault_inject(dc, addr, cpu, current_us);
>  }
>  
> +static void blocktime_latency_account(PostcopyBlocktimeContext *ctx,
> +                                      uint64_t time_us)
> +{
> +    /*
> +     * Convert time (in us) to bucket index it belongs.  Take extra caution
> +     * of time_us==0 even if normally rare - when happens put into bucket 0.
> +     */
> +    int index = time_us ? (63 - clz64(time_us)) : 0;
> +
> +    assert(index >= 0);
> +
> +    /* If it's too large, put into top bucket */
> +    if (index >= BLOCKTIME_LATENCY_BUCKET_N) {
> +        index = BLOCKTIME_LATENCY_BUCKET_N - 1;
> +    }
> +
> +    ctx->latency_buckets[index]++;
> +}
> +
>  typedef struct {
>      PostcopyBlocktimeContext *ctx;
>      uint64_t current_us;
> @@ -1117,6 +1160,8 @@ static void blocktime_cpu_list_iter_fn(gpointer data, gpointer user_data)
>      assert(iter->current_us >= entry->fault_time);
>      time_passed = iter->current_us - entry->fault_time;
>  
> +    blocktime_latency_account(ctx, time_passed);
> +
>      if (cpu >= 0) {
>          /*
>           * If we resolved all pending faults on one vCPU due to this page
> diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/migration-qmp.c
> index 67a67d4bd6..66dd369ba7 100644
> --- a/tests/qtest/migration/migration-qmp.c
> +++ b/tests/qtest/migration/migration-qmp.c
> @@ -360,6 +360,7 @@ void read_blocktime(QTestState *who)
>      g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
>      g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime"));
>      g_assert(qdict_haskey(rsp_return, "postcopy-latency"));
> +    g_assert(qdict_haskey(rsp_return, "postcopy-latency-dist"));
>      g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency"));
>      g_assert(qdict_haskey(rsp_return, "postcopy-non-vcpu-latency"));
>      qobject_unref(rsp_return);
> -- 
> 2.49.0
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

next prev parent reply	other threads:[~2025-06-10  0:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 22:36 [PATCH] migration/postcopy: Add latency distribution report for blocktime Peter Xu
2025-06-10  0:27 ` Dr. David Alan Gilbert [this message]
2025-06-10 17:46 ` Fabiano Rosas
2025-06-13 14:25 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aEd7526urkX56eq0@gallifrey \
    --to=dave@treblig.org \
    --cc=armbru@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).