[PATCH 0/5] migration: Downtime observability improvements

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] migration: Downtime observability improvements
@ 2023-09-26 16:18 Joao Martins
  2023-09-26 16:18 ` [PATCH 1/5] migration: Store downtime timestamps in an array Joao Martins
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

Hey,

The cost of switchover is usually not accounted in the migration
algorithm, as the migration algorithm reduces all of it to "pending
bytes" fitting a "threshold" (which represents some available or
proactively-measured link bandwidth) as the rule of thumb to calculate
downtime.

External latencies (OS, or Qemu ones), as well as when VFs are
present, may affect how big or small the switchover may be. Given the wide
range of configurations possible, it is either non exactly determinist or
predictable to have some generic rule to calculate the cost of switchover.

This series is aimed at improving observability what contributes to the
switchover/downtime particularly. The breakdown:

* The first 2 patches move storage of downtime timestamps to its dedicated
data structure, and then we add a couple key places to measure those
timestamps. 

* What we do with those timestamps is the next 2 patches by
calculating the downtime breakdown when asked for the data as well as
adding the tracepointt.

* Finally last patch provides introspection to the
calculated expected-downtime (pending_bytes vs threshold_size) which is
when we decide to switchover, and print that data when available to give
some comparison.

For now, mainly precopy data, and here I added both tracepoints and
QMP stats via query-migrate. Postcopy is still missing.

Thoughts, comments appreciated as usual.

Thanks!
	Joao

Joao Martins (5):
  migration: Store downtime timestamps in an array
  migration: Collect more timestamps during switchover
  migration: Add a tracepoint for the downtime stats
  migration: Provide QMP access to downtime stats
  migration: Print expected-downtime on completion

 qapi/migration.json    | 50 +++++++++++++++++++++++++
 migration/migration.h  |  7 +++-
 migration/migration.c  | 85 ++++++++++++++++++++++++++++++++++++++++--
 migration/savevm.c     |  2 +
 migration/trace-events |  1 +
 5 files changed, 139 insertions(+), 6 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] migration: Store downtime timestamps in an array
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
@ 2023-09-26 16:18 ` Joao Martins
  2023-09-28  1:55   ` Wang, Lei
  2023-09-26 16:18 ` [PATCH 2/5] migration: Collect more timestamps during switchover Joao Martins
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

Right now downtime_start is stored in MigrationState.

In preparation to having more downtime timestamps during
switchover, move downtime_start to an array namely, @timestamp.

Add a setter/getter surrounding which timestamps to record,
to make it easier to spread to various migration functions.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 qapi/migration.json   | 14 ++++++++++++++
 migration/migration.h |  7 +++++--
 migration/migration.c | 24 ++++++++++++++++++++----
 3 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59c7..b836cc881d33 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -190,6 +190,20 @@
 { 'struct': 'VfioStats',
   'data': {'transferred': 'int' } }
 
+##
+# @MigrationDowntime:
+#
+# An enumeration of downtime timestamps for all
+# steps of the switchover.
+#
+# @start:Timestamp taken at the start of the switchover right before
+#        we stop the VM.
+#
+# Since: 8.2
+##
+{ 'enum': 'MigrationDowntime',
+  'data': [ 'start' ] }
+
 ##
 # @MigrationInfo:
 #
diff --git a/migration/migration.h b/migration/migration.h
index c390500604b6..180dc31c5306 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -319,8 +319,8 @@ struct MigrationState {
     int64_t start_time;
     /* Total time used by latest migration (ms) */
     int64_t total_time;
-    /* Timestamp when VM is down (ms) to migrate the last stuff */
-    int64_t downtime_start;
+    /* Timestamps e.g. when VM is down (ms) to migrate the last stuff */
+    int64_t timestamp[MIGRATION_DOWNTIME__MAX];
     int64_t downtime;
     int64_t expected_downtime;
     bool capabilities[MIGRATION_CAPABILITY__MAX];
@@ -516,4 +516,7 @@ void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+void migration_set_timestamp(MigrationDowntime tm);
+int64_t migration_get_timestamp(MigrationDowntime tm);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index d61e5727429a..dd955c61acc7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2312,6 +2312,21 @@ static int migration_maybe_pause(MigrationState *s,
     return s->state == new_state ? 0 : -EINVAL;
 }
 
+void migration_set_timestamp(MigrationDowntime type)
+{
+    MigrationState *s = migrate_get_current();
+
+    s->timestamp[type] = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+}
+
+int64_t migration_get_timestamp(MigrationDowntime type)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->timestamp[type];
+}
+
+
 /**
  * migration_completion: Used by migration_thread when there's not much left.
  *   The caller 'breaks' the loop when this returns.
@@ -2325,7 +2340,7 @@ static void migration_completion(MigrationState *s)
 
     if (s->state == MIGRATION_STATUS_ACTIVE) {
         qemu_mutex_lock_iothread();
-        s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        migration_set_timestamp(MIGRATION_DOWNTIME_START);
         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
         s->vm_old_state = runstate_get();
@@ -2670,7 +2685,7 @@ static void migration_calculate_complete(MigrationState *s)
          * It's still not set, so we are precopy migration.  For
          * postcopy, downtime is calculated during postcopy_start().
          */
-        s->downtime = end_time - s->downtime_start;
+        s->downtime = end_time - migration_get_timestamp(MIGRATION_DOWNTIME_START);
     }
 
     transfer_time = s->total_time - s->setup_time;
@@ -3069,7 +3084,8 @@ static void bg_migration_vm_start_bh(void *opaque)
     s->vm_start_bh = NULL;
 
     vm_start();
-    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - s->downtime_start;
+    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
+             migration_get_timestamp(MIGRATION_DOWNTIME_START);
 }
 
 /**
@@ -3134,7 +3150,7 @@ static void *bg_migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 
     trace_migration_thread_setup_complete();
-    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migration_set_timestamp(MIGRATION_DOWNTIME_START);
 
     qemu_mutex_lock_iothread();
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] migration: Collect more timestamps during switchover
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
  2023-09-26 16:18 ` [PATCH 1/5] migration: Store downtime timestamps in an array Joao Martins
@ 2023-09-26 16:18 ` Joao Martins
  2023-09-26 16:18 ` [PATCH 3/5] migration: Add a tracepoint for the downtime stats Joao Martins
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

In addition to the start of switchover (previously @downtime_start),
add other timestamps during saving of state and return path.

For precopy migration, it should be able to capture the principal
steps of switchover. Resume-side downtime is sadly only possible when
return path is enabled.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 qapi/migration.json   | 16 +++++++++++++++-
 migration/migration.c |  5 +++++
 migration/savevm.c    |  2 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index b836cc881d33..2d91fbcb22ff 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -199,10 +199,24 @@
 # @start:Timestamp taken at the start of the switchover right before
 #        we stop the VM.
 #
+# @stop: Timestamp taken after stopping the VM.
+#
+# @precopy-iterable: Timestamp taken after copying all iterable state
+#                    in stop copy phase.
+#
+# @precopy-noniterable: Timestamp taken after copying all non iterable
+#                       state in stop copy phase.
+#
+# @resume-return-path: Timestamp taken after return path has finished
+#                      migration on destination. It is zero if
+#                      'return-path' is not enabled.
+#
 # Since: 8.2
 ##
 { 'enum': 'MigrationDowntime',
-  'data': [ 'start' ] }
+  'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
+            'resume-return-path' ] }
+
 
 ##
 # @MigrationInfo:
diff --git a/migration/migration.c b/migration/migration.c
index dd955c61acc7..d06840024a1e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2348,6 +2348,8 @@ static void migration_completion(MigrationState *s)
 
         ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
         trace_migration_completion_vm_stop(ret);
+        migration_set_timestamp(MIGRATION_DOWNTIME_STOP);
+
         if (ret >= 0) {
             ret = migration_maybe_pause(s, &current_active_state,
                                         MIGRATION_STATUS_DEVICE);
@@ -2401,6 +2403,7 @@ static void migration_completion(MigrationState *s)
         trace_migration_return_path_end_before();
         rp_error = await_return_path_close_on_source(s);
         trace_migration_return_path_end_after(rp_error);
+        migration_set_timestamp(MIGRATION_DOWNTIME_RESUME_RETURN_PATH);
         if (rp_error) {
             goto fail;
         }
@@ -3166,6 +3169,8 @@ static void *bg_migration_thread(void *opaque)
     if (vm_stop_force_state(RUN_STATE_PAUSED)) {
         goto fail;
     }
+    migration_set_timestamp(MIGRATION_DOWNTIME_STOP);
+
     /*
      * Put vCPUs in sync with shadow context structures, then
      * save their state to channel-buffer along with devices.
diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c60..e93701ff2393 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1468,6 +1468,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
         }
     }
 
+    migration_set_timestamp(MIGRATION_DOWNTIME_PRECOPY_ITERABLE);
     return 0;
 }
 
@@ -1524,6 +1525,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     json_writer_free(vmdesc);
     ms->vmdesc = NULL;
 
+    migration_set_timestamp(MIGRATION_DOWNTIME_PRECOPY_NONITERABLE);
     return 0;
 }
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] migration: Add a tracepoint for the downtime stats
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
  2023-09-26 16:18 ` [PATCH 1/5] migration: Store downtime timestamps in an array Joao Martins
  2023-09-26 16:18 ` [PATCH 2/5] migration: Collect more timestamps during switchover Joao Martins
@ 2023-09-26 16:18 ` Joao Martins
  2023-09-26 16:18 ` [PATCH 4/5] migration: Provide QMP access to " Joao Martins
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

To facilitate understanding of what constitutes downtime, add
a tracepoint that gives the downtime breakdown throughout all
steps of switchover.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 migration/migration.c  | 34 ++++++++++++++++++++++++++++++++++
 migration/trace-events |  1 +
 2 files changed, 35 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index d06840024a1e..4b5bed3eb09b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -893,6 +893,34 @@ static bool migrate_show_downtime(MigrationState *s)
     return (s->state == MIGRATION_STATUS_COMPLETED) || migration_in_postcopy();
 }
 
+static int64_t migrate_get_downtime_stop(MigrationState *s)
+{
+    return migration_get_timestamp(MIGRATION_DOWNTIME_STOP) -
+        migration_get_timestamp(MIGRATION_DOWNTIME_START);
+}
+
+static int64_t migrate_get_downtime_precopy_iterable(MigrationState *s)
+{
+    return migration_get_timestamp(MIGRATION_DOWNTIME_PRECOPY_ITERABLE) -
+         migration_get_timestamp(MIGRATION_DOWNTIME_STOP);
+}
+
+static int64_t migrate_get_downtime_precopy_noniterable(MigrationState *s)
+{
+    return migration_get_timestamp(MIGRATION_DOWNTIME_PRECOPY_NONITERABLE) -
+           migration_get_timestamp(MIGRATION_DOWNTIME_PRECOPY_ITERABLE);
+}
+
+static int64_t migrate_get_downtime_resume_rp(MigrationState *s)
+{
+    if (migrate_return_path()) {
+        return migration_get_timestamp(MIGRATION_DOWNTIME_RESUME_RETURN_PATH) -
+               migration_get_timestamp(MIGRATION_DOWNTIME_PRECOPY_NONITERABLE);
+    }
+
+    return 0;
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
     info->has_status = true;
@@ -2689,6 +2717,12 @@ static void migration_calculate_complete(MigrationState *s)
          * postcopy, downtime is calculated during postcopy_start().
          */
         s->downtime = end_time - migration_get_timestamp(MIGRATION_DOWNTIME_START);
+
+        trace_source_downtime_stats(s->downtime,
+                                    migrate_get_downtime_stop(s),
+                                    migrate_get_downtime_precopy_iterable(s),
+                                    migrate_get_downtime_precopy_noniterable(s),
+                                    migrate_get_downtime_resume_rp(s));
     }
 
     transfer_time = s->total_time - s->setup_time;
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325e7..163dc74bc49a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -189,6 +189,7 @@ migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidt
 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
+source_downtime_stats(int64_t downtime, int64_t stop, int64_t iterable, int64_t non_iterable, int64_t resume_rp) "downtime %"PRIi64": stop %"PRIi64" iterable %"PRIi64" non_iterable %"PRIi64" resume-rp %"PRIi64
 
 # migration-stats
 migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd) "qemu_file %" PRIu64 " multifd %" PRIu64
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] migration: Provide QMP access to downtime stats
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
                   ` (2 preceding siblings ...)
  2023-09-26 16:18 ` [PATCH 3/5] migration: Add a tracepoint for the downtime stats Joao Martins
@ 2023-09-26 16:18 ` Joao Martins
  2023-10-04 17:10   ` Peter Xu
  2023-09-26 16:18 ` [PATCH 5/5] migration: Print expected-downtime on completion Joao Martins
  2023-10-04 17:19 ` [PATCH 0/5] migration: Downtime observability improvements Peter Xu
  5 siblings, 1 reply; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

Deliver the downtime breakdown also via `query-migrate`
to allow users to understand what their downtime value
represents.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 qapi/migration.json   | 22 ++++++++++++++++++++++
 migration/migration.c | 14 ++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 2d91fbcb22ff..088e1b2bf440 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -217,6 +217,27 @@
   'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
             'resume-return-path' ] }
 
+##
+# @DowntimeStats:
+#
+# Detailed migration downtime statistics.
+#
+# @stop: Time taken to stop the VM during switchover.
+#
+# @precopy: Time taken to save all precopy state during switchover.
+#
+# @precopy-iterable: Time taken to save all precopy iterable state.
+#
+# @precopy-noniterable: Time taken to save all precopy non iterable state.
+#
+# @resume-return-path: Time taken to resume if return path is enabled,
+#                      otherwise zero.
+#
+# Since: 8.2
+##
+{ 'struct': 'DowntimeStats',
+  'data': {'stop': 'int64', 'precopy': 'int64', 'precopy-iterable': 'int64',
+           'precopy-noniterable': 'int64', 'resume-return-path': 'int64' } }
 
 ##
 # @MigrationInfo:
@@ -308,6 +329,7 @@
            '*total-time': 'int',
            '*expected-downtime': 'int',
            '*downtime': 'int',
+           '*downtime-stats': 'DowntimeStats',
            '*setup-time': 'int',
            '*cpu-throttle-percentage': 'int',
            '*error-desc': 'str',
diff --git a/migration/migration.c b/migration/migration.c
index 4b5bed3eb09b..dec6c88fbff9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -921,6 +921,19 @@ static int64_t migrate_get_downtime_resume_rp(MigrationState *s)
     return 0;
 }
 
+static void populate_downtime_info(MigrationInfo *info, MigrationState *s)
+{
+    DowntimeStats *stats;
+
+    info->downtime_stats = g_malloc0(sizeof(*info->downtime_stats));
+    stats = info->downtime_stats;
+    stats->stop = migrate_get_downtime_stop(s);
+    stats->precopy_iterable = migrate_get_downtime_precopy_iterable(s);
+    stats->precopy_noniterable = migrate_get_downtime_precopy_noniterable(s);
+    stats->precopy = stats->precopy_iterable + stats->precopy_noniterable;
+    stats->resume_return_path = migrate_get_downtime_resume_rp(s);
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
     info->has_status = true;
@@ -939,6 +952,7 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
     if (migrate_show_downtime(s)) {
         info->has_downtime = true;
         info->downtime = s->downtime;
+        populate_downtime_info(info, s);
     } else {
         info->has_expected_downtime = true;
         info->expected_downtime = s->expected_downtime;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] migration: Print expected-downtime on completion
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
                   ` (3 preceding siblings ...)
  2023-09-26 16:18 ` [PATCH 4/5] migration: Provide QMP access to " Joao Martins
@ 2023-09-26 16:18 ` Joao Martins
  2023-10-04 19:33   ` Peter Xu
  2023-10-31 13:14   ` Juan Quintela
  2023-10-04 17:19 ` [PATCH 0/5] migration: Downtime observability improvements Peter Xu
  5 siblings, 2 replies; 17+ messages in thread
From: Joao Martins @ 2023-09-26 16:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero, Joao Martins

Right now, migration statistics either print downtime or expected
downtime depending on migration completing of in progress. Also in the
beginning of migration by printing the downtime limit as expected
downtime, when estimation is not available.

The pending_size is private in migration iteration and not necessarily
accessible outside. Given the non-determinism of the switchover cost, it
can be useful to understand if the downtime was far off from the one
detected by the migration algoritm, thus print the resultant downtime
alongside its estimation.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 migration/migration.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index dec6c88fbff9..f08f65b4b1c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -943,6 +943,10 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         info->has_total_time = true;
         info->total_time = s->total_time;
+        if (s->expected_downtime) {
+            info->has_expected_downtime = true;
+            info->expected_downtime = s->expected_downtime;
+        }
     } else {
         info->has_total_time = true;
         info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
@@ -2844,6 +2848,10 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 
     if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
         trace_migration_thread_low_pending(pending_size);
+        if (s->threshold_size) {
+            s->expected_downtime = (pending_size * s->parameters.downtime_limit) /
+                                   s->threshold_size;
+        }
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] migration: Store downtime timestamps in an array
  2023-09-26 16:18 ` [PATCH 1/5] migration: Store downtime timestamps in an array Joao Martins
@ 2023-09-28  1:55   ` Wang, Lei
  2023-09-28 13:31     ` Joao Martins
  0 siblings, 1 reply; 17+ messages in thread
From: Wang, Lei @ 2023-09-28  1:55 UTC (permalink / raw)
  To: Joao Martins, qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On 9/27/2023 0:18, Joao Martins wrote:
> Right now downtime_start is stored in MigrationState.
> 
> In preparation to having more downtime timestamps during
> switchover, move downtime_start to an array namely, @timestamp.
> 
> Add a setter/getter surrounding which timestamps to record,
> to make it easier to spread to various migration functions.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  qapi/migration.json   | 14 ++++++++++++++
>  migration/migration.h |  7 +++++--
>  migration/migration.c | 24 ++++++++++++++++++++----
>  3 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59c7..b836cc881d33 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -190,6 +190,20 @@
>  { 'struct': 'VfioStats',
>    'data': {'transferred': 'int' } }
>  
> +##
> +# @MigrationDowntime:
> +#
> +# An enumeration of downtime timestamps for all
> +# steps of the switchover.
> +#
> +# @start:Timestamp taken at the start of the switchover right before
           ^
Suggest adding a space here to keep consistent with other attributes.


> +#        we stop the VM.
> +#
> +# Since: 8.2
> +##
> +{ 'enum': 'MigrationDowntime',
> +  'data': [ 'start' ] }
> +
>  ##
>  # @MigrationInfo:
>  #
> diff --git a/migration/migration.h b/migration/migration.h
> index c390500604b6..180dc31c5306 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -319,8 +319,8 @@ struct MigrationState {
>      int64_t start_time;
>      /* Total time used by latest migration (ms) */
>      int64_t total_time;
> -    /* Timestamp when VM is down (ms) to migrate the last stuff */
> -    int64_t downtime_start;
> +    /* Timestamps e.g. when VM is down (ms) to migrate the last stuff */
> +    int64_t timestamp[MIGRATION_DOWNTIME__MAX];
>      int64_t downtime;
>      int64_t expected_downtime;
>      bool capabilities[MIGRATION_CAPABILITY__MAX];
> @@ -516,4 +516,7 @@ void migration_populate_vfio_info(MigrationInfo *info);
>  void migration_reset_vfio_bytes_transferred(void);
>  void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>  
> +void migration_set_timestamp(MigrationDowntime tm);
> +int64_t migration_get_timestamp(MigrationDowntime tm);
> +
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index d61e5727429a..dd955c61acc7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2312,6 +2312,21 @@ static int migration_maybe_pause(MigrationState *s,
>      return s->state == new_state ? 0 : -EINVAL;
>  }
>  
> +void migration_set_timestamp(MigrationDowntime type)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    s->timestamp[type] = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +}
> +
> +int64_t migration_get_timestamp(MigrationDowntime type)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->timestamp[type];
> +}
> +
> +
>  /**
>   * migration_completion: Used by migration_thread when there's not much left.
>   *   The caller 'breaks' the loop when this returns.
> @@ -2325,7 +2340,7 @@ static void migration_completion(MigrationState *s)
>  
>      if (s->state == MIGRATION_STATUS_ACTIVE) {
>          qemu_mutex_lock_iothread();
> -        s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +        migration_set_timestamp(MIGRATION_DOWNTIME_START);
>          qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>  
>          s->vm_old_state = runstate_get();
> @@ -2670,7 +2685,7 @@ static void migration_calculate_complete(MigrationState *s)
>           * It's still not set, so we are precopy migration.  For
>           * postcopy, downtime is calculated during postcopy_start().
>           */
> -        s->downtime = end_time - s->downtime_start;
> +        s->downtime = end_time - migration_get_timestamp(MIGRATION_DOWNTIME_START);
>      }
>  
>      transfer_time = s->total_time - s->setup_time;
> @@ -3069,7 +3084,8 @@ static void bg_migration_vm_start_bh(void *opaque)
>      s->vm_start_bh = NULL;
>  
>      vm_start();
> -    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - s->downtime_start;
> +    s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
> +             migration_get_timestamp(MIGRATION_DOWNTIME_START);
>  }
>  
>  /**
> @@ -3134,7 +3150,7 @@ static void *bg_migration_thread(void *opaque)
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>  
>      trace_migration_thread_setup_complete();
> -    s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    migration_set_timestamp(MIGRATION_DOWNTIME_START);
>  
>      qemu_mutex_lock_iothread();
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] migration: Store downtime timestamps in an array
  2023-09-28  1:55   ` Wang, Lei
@ 2023-09-28 13:31     ` Joao Martins
  0 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-09-28 13:31 UTC (permalink / raw)
  To: Wang, Lei, qemu-devel
  Cc: Juan Quintela, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On 28/09/2023 02:55, Wang, Lei wrote:
> On 9/27/2023 0:18, Joao Martins wrote:
>> Right now downtime_start is stored in MigrationState.
>>
>> In preparation to having more downtime timestamps during
>> switchover, move downtime_start to an array namely, @timestamp.
>>
>> Add a setter/getter surrounding which timestamps to record,
>> to make it easier to spread to various migration functions.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  qapi/migration.json   | 14 ++++++++++++++
>>  migration/migration.h |  7 +++++--
>>  migration/migration.c | 24 ++++++++++++++++++++----
>>  3 files changed, 39 insertions(+), 6 deletions(-)
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 8843e74b59c7..b836cc881d33 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -190,6 +190,20 @@
>>  { 'struct': 'VfioStats',
>>    'data': {'transferred': 'int' } }
>>  
>> +##
>> +# @MigrationDowntime:
>> +#
>> +# An enumeration of downtime timestamps for all
>> +# steps of the switchover.
>> +#
>> +# @start:Timestamp taken at the start of the switchover right before
>            ^
> Suggest adding a space here to keep consistent with other attributes.
> 
Yeap, will do for v2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] migration: Provide QMP access to downtime stats
  2023-09-26 16:18 ` [PATCH 4/5] migration: Provide QMP access to " Joao Martins
@ 2023-10-04 17:10   ` Peter Xu
  2023-10-06 11:37     ` Joao Martins
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-10-04 17:10 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

Hi, Joao,

On Tue, Sep 26, 2023 at 05:18:40PM +0100, Joao Martins wrote:
> Deliver the downtime breakdown also via `query-migrate`
> to allow users to understand what their downtime value
> represents.

I agree downtime is an area we definitely need to improve.. however do we
need to make it part of qapi?  Or do we need them mostly for debugging
purpose?

Any introduction of motivation of this work, especially on exporting the
values to the mgmt apps?

> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  qapi/migration.json   | 22 ++++++++++++++++++++++
>  migration/migration.c | 14 ++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2d91fbcb22ff..088e1b2bf440 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -217,6 +217,27 @@
>    'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
>              'resume-return-path' ] }
>  
> +##
> +# @DowntimeStats:
> +#
> +# Detailed migration downtime statistics.
> +#
> +# @stop: Time taken to stop the VM during switchover.
> +#
> +# @precopy: Time taken to save all precopy state during switchover.
> +#
> +# @precopy-iterable: Time taken to save all precopy iterable state.
> +#
> +# @precopy-noniterable: Time taken to save all precopy non iterable state.
> +#
> +# @resume-return-path: Time taken to resume if return path is enabled,
> +#                      otherwise zero.

All these fields will more or less duplicate the ones in the other
MigrationDowntime just introduced.

We suffer from duplicated fields for migration parameters, proof shows that
dropping the duplication is normally harder..  I'm trying to dedup the
existing Migration*Parameter* objects and still in progress, so even if we
want to expose downtime in qapi I hope we can expose only one and single
object.

IIUC we can already do that by keeping DowntimeStats, keeing an object in
MigrationState, and just drop MigrationDowntime?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] migration: Downtime observability improvements
  2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
                   ` (4 preceding siblings ...)
  2023-09-26 16:18 ` [PATCH 5/5] migration: Print expected-downtime on completion Joao Martins
@ 2023-10-04 17:19 ` Peter Xu
  2023-10-06 11:39   ` Joao Martins
  5 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-10-04 17:19 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On Tue, Sep 26, 2023 at 05:18:36PM +0100, Joao Martins wrote:
> For now, mainly precopy data, and here I added both tracepoints and
> QMP stats via query-migrate. Postcopy is still missing.

IIUC many of those will cover postcopy too, but not all?

I think the problem is postcopy didn't update downtime_start, however it
updates MigrationState.downtime, and probably we can start to keep it more
like precopy, e.g., in postcopy_start(), where downtime_start can be
time_at_stop (or it can be more accurate; now it's probably fetching the
timestamp too early).

Basically if we want to expose anything, especially some qapi object, IMHO
we'd better make it work for both pre/post copy because otherwise it'll be
hard for mgmt app to know which qemu supports precopy only, and which
support both (if we'll add that for postcopy too).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] migration: Print expected-downtime on completion
  2023-09-26 16:18 ` [PATCH 5/5] migration: Print expected-downtime on completion Joao Martins
@ 2023-10-04 19:33   ` Peter Xu
  2023-10-06 11:45     ` Joao Martins
  2023-10-31 13:14   ` Juan Quintela
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-10-04 19:33 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On Tue, Sep 26, 2023 at 05:18:41PM +0100, Joao Martins wrote:
> Right now, migration statistics either print downtime or expected
> downtime depending on migration completing of in progress. Also in the
> beginning of migration by printing the downtime limit as expected
> downtime, when estimation is not available.
> 
> The pending_size is private in migration iteration and not necessarily
> accessible outside. Given the non-determinism of the switchover cost, it
> can be useful to understand if the downtime was far off from the one
> detected by the migration algoritm, thus print the resultant downtime
> alongside its estimation.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  migration/migration.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index dec6c88fbff9..f08f65b4b1c3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -943,6 +943,10 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
>      if (s->state == MIGRATION_STATUS_COMPLETED) {
>          info->has_total_time = true;
>          info->total_time = s->total_time;
> +        if (s->expected_downtime) {
> +            info->has_expected_downtime = true;
> +            info->expected_downtime = s->expected_downtime;
> +        }

There's another chunk right below that will also show
expected_downtime.. How about we merge them to be clear?

IIUC the current patch will not display expected_downtime during postcopy,
which makes sense.  But it'll pop up again after postcopy completes... so
not ideal either. If so sounds easier to just show it as long as we have a
value, and the user can ignore it.

@@ -913,7 +913,9 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
     if (migrate_show_downtime(s)) {
         info->has_downtime = true;
         info->downtime = s->downtime;
-    } else {
+    }
+
+    if (s->expected_downtime) {
         info->has_expected_downtime = true;
         info->expected_downtime = s->expected_downtime;
     }

IIUC currently expected_downtime for postcopy makes less sense.  Maybe one
day we can make it reflect reality, by taking more things into account
(besides dirty RAM rate).

>      } else {
>          info->has_total_time = true;
>          info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
> @@ -2844,6 +2848,10 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>  
>      if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
>          trace_migration_thread_low_pending(pending_size);
> +        if (s->threshold_size) {
> +            s->expected_downtime = (pending_size * s->parameters.downtime_limit) /
> +                                   s->threshold_size;
> +        }

I had a feeling that you did the calculation to avoid accessing ->mbps. :)

I'd suggest we move this into migration_completion(), and use ->mbps
(before the other avail-switchover-bandwidth patch lands).  It's just that
using the bandwidth value seems more straightforward.  Or maybe I missed
something tricky?

>          migration_completion(s);
>          return MIG_ITERATE_BREAK;
>      }
> -- 
> 2.39.3
> 

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] migration: Provide QMP access to downtime stats
  2023-10-04 17:10   ` Peter Xu
@ 2023-10-06 11:37     ` Joao Martins
  2023-10-06 14:27       ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Joao Martins @ 2023-10-06 11:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On 04/10/2023 18:10, Peter Xu wrote:
> Hi, Joao,
> 
> On Tue, Sep 26, 2023 at 05:18:40PM +0100, Joao Martins wrote:
>> Deliver the downtime breakdown also via `query-migrate`
>> to allow users to understand what their downtime value
>> represents.
> 
> I agree downtime is an area we definitely need to improve.. however do we
> need to make it part of qapi?  Or do we need them mostly for debugging
> purpose?
> 
> Any introduction of motivation of this work, especially on exporting the
> values to the mgmt apps?
> 

I added the statistics mainly for observability (e.g. you would grep in the
libvirt logs for a non developer and they can understand how downtime is
explained). I wasn't specifically thinking about management app using this, just
broad access to the metrics.

One can get the same level of observability with a BPF/dtrace/systemtap script,
albeit in a less obvious way.

With respect to motivation: I am doing migration with VFs and sometimes
vhost-net, and the downtime/switchover is the only thing that is either
non-determinisc or not captured in the migration math. There are some things
that aren't accounted (e.g. vhost with enough queues will give you high
downtimes), and algorithimally not really possible to account for as one needs
to account every possible instruction when we quiesce the guest (or at least
that's my understanding).

Just having these metrics, help the developer *and* user see why such downtime
is high, and maybe open up window for fixes/bug-reports or where to improve.

Furthermore, hopefully these tracepoints or stats could be a starting point for
developers to understand how much downtime is spent in a particular device in
Qemu(as a follow-up to this series), or allow to implement bounds check limits
in switchover limits in way that doesn't violate downtime-limit SLAs (I have a
small set of patches for this).

>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  qapi/migration.json   | 22 ++++++++++++++++++++++
>>  migration/migration.c | 14 ++++++++++++++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 2d91fbcb22ff..088e1b2bf440 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -217,6 +217,27 @@
>>    'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
>>              'resume-return-path' ] }
>>  
>> +##
>> +# @DowntimeStats:
>> +#
>> +# Detailed migration downtime statistics.
>> +#
>> +# @stop: Time taken to stop the VM during switchover.
>> +#
>> +# @precopy: Time taken to save all precopy state during switchover.
>> +#
>> +# @precopy-iterable: Time taken to save all precopy iterable state.
>> +#
>> +# @precopy-noniterable: Time taken to save all precopy non iterable state.
>> +#
>> +# @resume-return-path: Time taken to resume if return path is enabled,
>> +#                      otherwise zero.
> 
> All these fields will more or less duplicate the ones in the other
> MigrationDowntime just introduced.
> 
> We suffer from duplicated fields for migration parameters, proof shows that
> dropping the duplication is normally harder..  I'm trying to dedup the
> existing Migration*Parameter* objects and still in progress, so even if we
> want to expose downtime in qapi I hope we can expose only one and single
> object.
>

Thanks for the background; I am now recalling your other series doing this sort
of duplication[0]

[0] https://lore.kernel.org/qemu-devel/20230905162335.235619-5-peterx@redhat.com/

> IIUC we can already do that by keeping DowntimeStats, keeing an object in
> MigrationState, and just drop MigrationDowntime?
> 

I can try that, sounds way cleaner. I didn't like the duplication either.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] migration: Downtime observability improvements
  2023-10-04 17:19 ` [PATCH 0/5] migration: Downtime observability improvements Peter Xu
@ 2023-10-06 11:39   ` Joao Martins
  0 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-10-06 11:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On 04/10/2023 18:19, Peter Xu wrote:
> On Tue, Sep 26, 2023 at 05:18:36PM +0100, Joao Martins wrote:
>> For now, mainly precopy data, and here I added both tracepoints and
>> QMP stats via query-migrate. Postcopy is still missing.
> 
> IIUC many of those will cover postcopy too, but not all?
> 
> I think the problem is postcopy didn't update downtime_start, however it
> updates MigrationState.downtime, and probably we can start to keep it more
> like precopy, e.g., in postcopy_start(), where downtime_start can be
> time_at_stop (or it can be more accurate; now it's probably fetching the
> timestamp too early).
> 
Good point!

> Basically if we want to expose anything, especially some qapi object, IMHO
> we'd better make it work for both pre/post copy because otherwise it'll be
> hard for mgmt app to know which qemu supports precopy only, and which
> support both (if we'll add that for postcopy too).
> 

Yeap, I totally agree.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] migration: Print expected-downtime on completion
  2023-10-04 19:33   ` Peter Xu
@ 2023-10-06 11:45     ` Joao Martins
  0 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-10-06 11:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero



On 04/10/2023 20:33, Peter Xu wrote:
> On Tue, Sep 26, 2023 at 05:18:41PM +0100, Joao Martins wrote:
>> Right now, migration statistics either print downtime or expected
>> downtime depending on migration completing of in progress. Also in the
>> beginning of migration by printing the downtime limit as expected
>> downtime, when estimation is not available.
>>
>> The pending_size is private in migration iteration and not necessarily
>> accessible outside. Given the non-determinism of the switchover cost, it
>> can be useful to understand if the downtime was far off from the one
>> detected by the migration algoritm, thus print the resultant downtime
>> alongside its estimation.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  migration/migration.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index dec6c88fbff9..f08f65b4b1c3 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -943,6 +943,10 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
>>      if (s->state == MIGRATION_STATUS_COMPLETED) {
>>          info->has_total_time = true;
>>          info->total_time = s->total_time;
>> +        if (s->expected_downtime) {
>> +            info->has_expected_downtime = true;
>> +            info->expected_downtime = s->expected_downtime;
>> +        }
> 
> There's another chunk right below that will also show
> expected_downtime.. How about we merge them to be clear?
>
Definitly

> IIUC the current patch will not display expected_downtime during postcopy,
> which makes sense.  But it'll pop up again after postcopy completes... so
> not ideal either. If so sounds easier to just show it as long as we have a
> value, and the user can ignore it.
> 
Yes.


> @@ -913,7 +913,9 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
>      if (migrate_show_downtime(s)) {
>          info->has_downtime = true;
>          info->downtime = s->downtime;
> -    } else {
> +    }
> +
> +    if (s->expected_downtime) {
>          info->has_expected_downtime = true;
>          info->expected_downtime = s->expected_downtime;
>      }
> 
> IIUC currently expected_downtime for postcopy makes less sense. 

I think it makes sense, you still need to switchover to destination. Knowing how
much you miss is useful? Albeit compared to the rest of the algorithm is less
critical than say in precopy.

> Maybe one
> day we can make it reflect reality, by taking more things into account
> (besides dirty RAM rate).
> 
>>      } else {
>>          info->has_total_time = true;
>>          info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
>> @@ -2844,6 +2848,10 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>>  
>>      if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
>>          trace_migration_thread_low_pending(pending_size);
>> +        if (s->threshold_size) {
>> +            s->expected_downtime = (pending_size * s->parameters.downtime_limit) /
>> +                                   s->threshold_size;
>> +        }
> 
> I had a feeling that you did the calculation to avoid accessing ->mbps. :)
> 
It was oversight on my end

> I'd suggest we move this into migration_completion(), and use ->mbps
> (before the other avail-switchover-bandwidth patch lands).  It's just that
> using the bandwidth value seems more straightforward. 

It is better that way, and things get consolidated.


> Or maybe I missed
> something tricky?
> 

Nah, just oversight :)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] migration: Provide QMP access to downtime stats
  2023-10-06 11:37     ` Joao Martins
@ 2023-10-06 14:27       ` Peter Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-10-06 14:27 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Juan Quintela, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On Fri, Oct 06, 2023 at 12:37:15PM +0100, Joao Martins wrote:
> I added the statistics mainly for observability (e.g. you would grep in the
> libvirt logs for a non developer and they can understand how downtime is
> explained). I wasn't specifically thinking about management app using this, just
> broad access to the metrics.
> 
> One can get the same level of observability with a BPF/dtrace/systemtap script,
> albeit in a less obvious way.

Makes sense.

> 
> With respect to motivation: I am doing migration with VFs and sometimes
> vhost-net, and the downtime/switchover is the only thing that is either
> non-determinisc or not captured in the migration math. There are some things
> that aren't accounted (e.g. vhost with enough queues will give you high
> downtimes),

Will this be something relevant to loading of the queues?  There used to be
a work on greatly reducing downtime especially for virtio scenarios over
multiple queues (and iirc even 1 queue also benefits from that), it wasn't
merged probably because not enough review:

https://lore.kernel.org/r/20230317081904.24389-1-xuchuangxclwt@bytedance.com

Though personally I think that's some direction good to keep exploring at
least, maybe some slightly enhancement to that series will work for us.

> and algorithimally not really possible to account for as one needs
> to account every possible instruction when we quiesce the guest (or at least
> that's my understanding).
> 
> Just having these metrics, help the developer *and* user see why such downtime
> is high, and maybe open up window for fixes/bug-reports or where to improve.
> 
> Furthermore, hopefully these tracepoints or stats could be a starting point for
> developers to understand how much downtime is spent in a particular device in
> Qemu(as a follow-up to this series),

Yes, I was actually expecting that when read the cover letter. :) This also
makes sense.  One thing worth mention is, the real downtime measured can,
IMHO, differ on src/dst due to "pre_save" and "post_load" may not really
doing similar things.  IIUC it can happen that some device sents fast, but
loads slow.  I'm not sure whether there's reversed use case. Maybe we want
to capture that on both sides on some metrics?

> or allow to implement bounds check limits in switchover limits in way
> that doesn't violate downtime-limit SLAs (I have a small set of patches
> for this).

I assume that decision will always be synchronized between src/dst in some
way, or guaranteed to be same. But I can wait to read the series first.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] migration: Print expected-downtime on completion
  2023-09-26 16:18 ` [PATCH 5/5] migration: Print expected-downtime on completion Joao Martins
  2023-10-04 19:33   ` Peter Xu
@ 2023-10-31 13:14   ` Juan Quintela
  2023-11-02 10:22     ` Joao Martins
  1 sibling, 1 reply; 17+ messages in thread
From: Juan Quintela @ 2023-10-31 13:14 UTC (permalink / raw)
  To: Joao Martins
  Cc: qemu-devel, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

Joao Martins <joao.m.martins@oracle.com> wrote:
> Right now, migration statistics either print downtime or expected
> downtime depending on migration completing of in progress. Also in the
> beginning of migration by printing the downtime limit as expected
> downtime, when estimation is not available.
>
> The pending_size is private in migration iteration and not necessarily
> accessible outside. Given the non-determinism of the switchover cost, it
> can be useful to understand if the downtime was far off from the one
> detected by the migration algoritm, thus print the resultant downtime
> alongside its estimation.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

I see that "part" of this series is on the downtime series by Peter.
I have merged them (they are tracepoints, we can change them when
needed).

But this one is not on that series.

Should we continue and send a patch for it?

Later, Juan.


> ---
>  migration/migration.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index dec6c88fbff9..f08f65b4b1c3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -943,6 +943,10 @@ static void populate_time_info(MigrationInfo *info, MigrationState *s)
>      if (s->state == MIGRATION_STATUS_COMPLETED) {
>          info->has_total_time = true;
>          info->total_time = s->total_time;
> +        if (s->expected_downtime) {
> +            info->has_expected_downtime = true;
> +            info->expected_downtime = s->expected_downtime;
> +        }
>      } else {
>          info->has_total_time = true;
>          info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
> @@ -2844,6 +2848,10 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>  
>      if ((!pending_size || pending_size < s->threshold_size) && can_switchover) {
>          trace_migration_thread_low_pending(pending_size);
> +        if (s->threshold_size) {
> +            s->expected_downtime = (pending_size * s->parameters.downtime_limit) /
> +                                   s->threshold_size;
> +        }
>          migration_completion(s);
>          return MIG_ITERATE_BREAK;
>      }



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] migration: Print expected-downtime on completion
  2023-10-31 13:14   ` Juan Quintela
@ 2023-11-02 10:22     ` Joao Martins
  0 siblings, 0 replies; 17+ messages in thread
From: Joao Martins @ 2023-11-02 10:22 UTC (permalink / raw)
  To: quintela
  Cc: qemu-devel, Peter Xu, Leonardo Bras, Eric Blake,
	Markus Armbruster, Avihai Horon, Yishai Hadas,
	Maciej S. Szmigiero

On 31/10/2023 13:14, Juan Quintela wrote:
> Joao Martins <joao.m.martins@oracle.com> wrote:
>> Right now, migration statistics either print downtime or expected
>> downtime depending on migration completing of in progress. Also in the
>> beginning of migration by printing the downtime limit as expected
>> downtime, when estimation is not available.
>>
>> The pending_size is private in migration iteration and not necessarily
>> accessible outside. Given the non-determinism of the switchover cost, it
>> can be useful to understand if the downtime was far off from the one
>> detected by the migration algoritm, thus print the resultant downtime
>> alongside its estimation.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> I see that "part" of this series is on the downtime series by Peter.
> I have merged them (they are tracepoints, we can change them when
> needed).
> 
> But this one is not on that series.
> 
> Should we continue and send a patch for it?
> 

Yeap, I will follow up (taking into account Peter's comments)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-11-02 10:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-26 16:18 [PATCH 0/5] migration: Downtime observability improvements Joao Martins
2023-09-26 16:18 ` [PATCH 1/5] migration: Store downtime timestamps in an array Joao Martins
2023-09-28  1:55   ` Wang, Lei
2023-09-28 13:31     ` Joao Martins
2023-09-26 16:18 ` [PATCH 2/5] migration: Collect more timestamps during switchover Joao Martins
2023-09-26 16:18 ` [PATCH 3/5] migration: Add a tracepoint for the downtime stats Joao Martins
2023-09-26 16:18 ` [PATCH 4/5] migration: Provide QMP access to " Joao Martins
2023-10-04 17:10   ` Peter Xu
2023-10-06 11:37     ` Joao Martins
2023-10-06 14:27       ` Peter Xu
2023-09-26 16:18 ` [PATCH 5/5] migration: Print expected-downtime on completion Joao Martins
2023-10-04 19:33   ` Peter Xu
2023-10-06 11:45     ` Joao Martins
2023-10-31 13:14   ` Juan Quintela
2023-11-02 10:22     ` Joao Martins
2023-10-04 17:19 ` [PATCH 0/5] migration: Downtime observability improvements Peter Xu
2023-10-06 11:39   ` Joao Martins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).