qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] migration: Better error handling in return path thread
@ 2023-07-05 16:34 Peter Xu
  2023-07-05 16:34 ` [PATCH v2 1/7] migration: Display error in query-migrate irrelevant of status Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

v2:
- Patch "migration: Provide explicit error message for file shutdowns"
  - Touched up qapi doc [Fabiano]
  - Added Bugzilla link to commit which I didn't even notice that I was
    fixing a bug.. but rightfully pointed out by Laszlo.
  - Moved it to the 1st patch because it fixes a bug, please consider
    review and merge it earlier.

This is a small series that reworks error handling of postcopy return path
threads.

We used to contain a bunch of error_report(), converting them into
error_setg() properly and deliver any of those errors to migration generic
error reports (via migrate_set_error()).  Then these errors can also be
observed in query-migrate after postcopy is paused.

Dropped the return-path specific error reporting: mark_source_rp_bad(),
because it's a duplication if we can always use migrate_set_error().

Please have a look, thanks.

Peter Xu (7):
  migration: Display error in query-migrate irrelevant of status
  migration: Let migrate_set_error() take ownership
  migration: Introduce migrate_has_error()
  migration: Refactor error handling in source return path
  migration: Deliver return path file error to migrate state too
  qemufile: Always return a verbose error
  migration: Provide explicit error message for file shutdowns

 qapi/migration.json      |   5 +-
 migration/migration.h    |   8 +-
 migration/ram.h          |   5 +-
 migration/channel.c      |   1 -
 migration/migration.c    | 168 +++++++++++++++++++++++----------------
 migration/multifd.c      |  10 +--
 migration/postcopy-ram.c |   1 -
 migration/qemu-file.c    |  20 ++++-
 migration/ram.c          |  42 +++++-----
 migration/trace-events   |   2 +-
 10 files changed, 149 insertions(+), 113 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/7] migration: Display error in query-migrate irrelevant of status
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
@ 2023-07-05 16:34 ` Peter Xu
  2023-07-05 16:34 ` [PATCH v2 2/7] migration: Let migrate_set_error() take ownership Peter Xu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

Display it as long as being set, irrelevant of FAILED status.  E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.

The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 5 ++---
 migration/migration.c | 8 +++++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index c050081555..ade45d564d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -222,9 +222,8 @@
 #     throttled during auto-converge.  This is only present when
 #     auto-converge has started throttling guest cpus.  (Since 2.7)
 #
-# @error-desc: the human readable error description string, when
-#     @status is 'failed'. Clients should not attempt to parse the
-#     error strings.  (Since 2.7)
+# @error-desc: the human readable error description string. Clients
+#     should not attempt to parse the error strings.  (Since 2.7)
 #
 # @postcopy-blocktime: total time when all vCPU were blocked during
 #     postcopy live migration.  This is only present when the
diff --git a/migration/migration.c b/migration/migration.c
index d75c2bd63c..6a4c245f74 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1039,9 +1039,6 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     case MIGRATION_STATUS_FAILED:
         info->has_status = true;
-        if (s->error) {
-            info->error_desc = g_strdup(error_get_pretty(s->error));
-        }
         break;
     case MIGRATION_STATUS_CANCELLED:
         info->has_status = true;
@@ -1051,6 +1048,11 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     }
     info->status = state;
+
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    if (s->error) {
+        info->error_desc = g_strdup(error_get_pretty(s->error));
+    }
 }
 
 static void fill_destination_migration_info(MigrationInfo *info)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/7] migration: Let migrate_set_error() take ownership
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
  2023-07-05 16:34 ` [PATCH v2 1/7] migration: Display error in query-migrate irrelevant of status Peter Xu
@ 2023-07-05 16:34 ` Peter Xu
  2023-07-05 20:53   ` Fabiano Rosas
  2023-07-05 16:34 ` [PATCH v2 3/7] migration: Introduce migrate_has_error() Peter Xu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

migrate_set_error() used one error_copy() so it always copy an error.
However that's not the major use case - the major use case is one would
like to pass the error to migrate_set_error() without further touching the
error.

It can be proved if we see most of the callers are freeing the error
explicitly right afterwards.  There're a few outliers (only if when the
caller) where we can use error_copy() explicitly there.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  6 +++---
 migration/channel.c      |  1 -
 migration/migration.c    | 20 ++++++++++++++------
 migration/multifd.c      | 10 ++++------
 migration/postcopy-ram.c |  1 -
 migration/ram.c          |  1 -
 6 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 6b31a4b371..507f2f111e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -482,8 +482,8 @@ bool  migration_has_all_channels(void);
 
 uint64_t migrate_max_downtime(void);
 
-void migrate_set_error(MigrationState *s, const Error *error);
-void migrate_fd_error(MigrationState *s, const Error *error);
+void migrate_set_error(MigrationState *s, Error *error);
+void migrate_fd_error(MigrationState *s, Error *error);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
@@ -528,7 +528,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
-void migration_cancel(const Error *error);
+void migration_cancel(Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
 void reset_vfio_bytes_transferred(void);
diff --git a/migration/channel.c b/migration/channel.c
index ca3319a309..48b3f6abd6 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
         }
     }
     migrate_fd_connect(s, error);
-    error_free(error);
 }
 
 
diff --git a/migration/migration.c b/migration/migration.c
index 6a4c245f74..e5d207699b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -161,7 +161,7 @@ void migration_object_init(void)
     dirty_bitmap_mig_init();
 }
 
-void migration_cancel(const Error *error)
+void migration_cancel(Error *error)
 {
     if (error) {
         migrate_set_error(current_migration, error);
@@ -1205,11 +1205,20 @@ static void migrate_fd_cleanup_bh(void *opaque)
     object_unref(OBJECT(s));
 }
 
-void migrate_set_error(MigrationState *s, const Error *error)
+/*
+ * Set error for current migration state.  The `error' ownership will be
+ * moved from the caller to MigrationState, so the caller doesn't need to
+ * free the error.
+ *
+ * If the caller still needs to reference the `error' passed in, one should
+ * use error_copy() explicitly.
+ */
+void migrate_set_error(MigrationState *s, Error *error)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
     if (!s->error) {
-        s->error = error_copy(error);
+        /* Record the first error triggered */
+        s->error = error;
     }
 }
 
@@ -1222,7 +1231,7 @@ static void migrate_error_free(MigrationState *s)
     }
 }
 
-void migrate_fd_error(MigrationState *s, const Error *error)
+void migrate_fd_error(MigrationState *s, Error *error)
 {
     trace_migrate_fd_error(error_get_pretty(error));
     assert(s->to_dst_file == NULL);
@@ -1688,7 +1697,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         if (!(has_resume && resume)) {
             yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         }
-        migrate_fd_error(s, local_err);
+        migrate_fd_error(s, error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
@@ -2611,7 +2620,6 @@ static MigThrError migration_detect_error(MigrationState *s)
 
     if (local_error) {
         migrate_set_error(s, local_error);
-        error_free(local_error);
     }
 
     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
diff --git a/migration/multifd.c b/migration/multifd.c
index 3387d8277f..62bc2dbf49 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -551,7 +551,6 @@ void multifd_save_cleanup(void)
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
-            error_free(local_err);
         }
     }
     qemu_sem_destroy(&multifd_send_state->channels_ready);
@@ -750,7 +749,6 @@ out:
     if (local_err) {
         trace_multifd_send_error(p->id);
         multifd_send_terminate_threads(local_err);
-        error_free(local_err);
     }
 
     /*
@@ -883,7 +881,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
       */
      p->quit = true;
      object_unref(OBJECT(ioc));
-     error_free(err);
 }
 
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
@@ -1148,7 +1145,6 @@ static void *multifd_recv_thread(void *opaque)
 
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
-        error_free(local_err);
     }
     qemu_mutex_lock(&p->mutex);
     p->running = false;
@@ -1240,7 +1236,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate_prepend(errp, local_err,
                                 "failed to receive packet"
                                 " via multifd channel %d: ",
@@ -1253,7 +1250,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     if (p->c != NULL) {
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5615ec29eb..6f6fb52bf1 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1594,7 +1594,6 @@ postcopy_preempt_send_channel_done(MigrationState *s,
 {
     if (local_err) {
         migrate_set_error(s, local_err);
-        error_free(local_err);
     } else {
         migration_ioc_register_yank(ioc);
         s->postcopy_qemufile_src = qemu_file_new_output(ioc);
diff --git a/migration/ram.c b/migration/ram.c
index 5283a75f02..ba4890563d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4267,7 +4267,6 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
          */
         error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
         migration_cancel(err);
-        error_free(err);
     }
 
     switch (ps) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/7] migration: Introduce migrate_has_error()
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
  2023-07-05 16:34 ` [PATCH v2 1/7] migration: Display error in query-migrate irrelevant of status Peter Xu
  2023-07-05 16:34 ` [PATCH v2 2/7] migration: Let migrate_set_error() take ownership Peter Xu
@ 2023-07-05 16:34 ` Peter Xu
  2023-07-05 20:55   ` Fabiano Rosas
  2023-07-05 16:34 ` [PATCH v2 4/7] migration: Refactor error handling in source return path Peter Xu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

Introduce a helper to detect whether MigrationState.error is set for
whatever reason.  It is intended to not taking the error_mutex here because
neither do we reference the pointer, nor do we modify the pointer.  State
why it's safe to do so.

This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  1 +
 migration/migration.c | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 507f2f111e..7d916e13e1 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -483,6 +483,7 @@ bool  migration_has_all_channels(void);
 uint64_t migrate_max_downtime(void);
 
 void migrate_set_error(MigrationState *s, Error *error);
+bool migrate_has_error(MigrationState *s);
 void migrate_fd_error(MigrationState *s, Error *error);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
diff --git a/migration/migration.c b/migration/migration.c
index e5d207699b..c54c195603 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1222,6 +1222,21 @@ void migrate_set_error(MigrationState *s, Error *error)
     }
 }
 
+/*
+ * Whether the migration state has error set?
+ *
+ * Note this function explicitly didn't use error_mutex, because it only
+ * reads the error pointer for a boolean status.
+ *
+ * As long as the Error* is set, it shouldn't be freed before migration
+ * cleanup, so any thread can use this helper to safely detect whether
+ * there's anything wrong happened already.
+ */
+bool migrate_has_error(MigrationState *s)
+{
+    return qatomic_read(&s->error);
+}
+
 static void migrate_error_free(MigrationState *s)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/7] migration: Refactor error handling in source return path
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
                   ` (2 preceding siblings ...)
  2023-07-05 16:34 ` [PATCH v2 3/7] migration: Introduce migrate_has_error() Peter Xu
@ 2023-07-05 16:34 ` Peter Xu
  2023-07-05 16:35 ` [PATCH v2 5/7] migration: Deliver return path file error to migrate state too Peter Xu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

rp_state.error was a boolean used to show error happened in return path
thread.  That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.

To make this better, a few things done:

  - Use error_setg() rather than error_report() across the whole lifecycle
    of return path thread, keeping the error in an Error*.

  - Use migrate_set_error() to apply that captured error to the global
    migration object when error occured in this thread.

  - With above, no need to have mark_source_rp_bad(), remove it, alongside
    with rp_state.error itself.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |   1 -
 migration/ram.h        |   5 +-
 migration/migration.c  | 118 ++++++++++++++++++++---------------------
 migration/ram.c        |  41 +++++++-------
 migration/trace-events |   2 +-
 5 files changed, 82 insertions(+), 85 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 7d916e13e1..0fb6355225 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -297,7 +297,6 @@ struct MigrationState {
         /* Protected by qemu_file_lock */
         QEMUFile     *from_dst_file;
         QemuThread    rp_thread;
-        bool          error;
         /*
          * We can also check non-zero of rp_thread, but there's no "official"
          * way to do this, so this bool makes it slightly more elegant.
diff --git a/migration/ram.h b/migration/ram.h
index ea1f3c25b5..02a4af2db6 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -52,7 +52,8 @@ uint64_t ram_bytes_total(void);
 void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
+                         Error **errp);
 void ram_postcopy_migrated_memory_release(MigrationState *ms);
 /* For outgoing discard bitmap */
 void ram_postcopy_send_discard_bitmap(MigrationState *ms);
@@ -72,7 +73,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
diff --git a/migration/migration.c b/migration/migration.c
index c54c195603..46dbfb07c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1417,7 +1417,6 @@ void migrate_init(MigrationState *s)
     s->to_dst_file = NULL;
     s->state = MIGRATION_STATUS_NONE;
     s->rp_state.from_dst_file = NULL;
-    s->rp_state.error = false;
     s->mbps = 0.0;
     s->pages_per_second = 0.0;
     s->downtime = 0;
@@ -1734,16 +1733,6 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp)
     qemu_sem_post(&s->pause_sem);
 }
 
-/* migration thread support */
-/*
- * Something bad happened to the RP stream, mark an error
- * The caller shall print or trace something to indicate why
- */
-static void mark_source_rp_bad(MigrationState *s)
-{
-    s->rp_state.error = true;
-}
-
 static struct rp_cmd_args {
     ssize_t     len; /* -1 = variable */
     const char *name;
@@ -1765,7 +1754,7 @@ static struct rp_cmd_args {
  * and we don't need to send pages that have already been sent.
  */
 static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
-                                       ram_addr_t start, size_t len)
+                                        ram_addr_t start, size_t len, Error **errp)
 {
     long our_host_ps = qemu_real_host_page_size();
 
@@ -1777,15 +1766,12 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
      */
     if (!QEMU_IS_ALIGNED(start, our_host_ps) ||
         !QEMU_IS_ALIGNED(len, our_host_ps)) {
-        error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
-                     " len: %zd", __func__, start, len);
-        mark_source_rp_bad(ms);
+        error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:"
+                   RAM_ADDR_FMT " len: %zd", start, len);
         return;
     }
 
-    if (ram_save_queue_pages(rbname, start, len)) {
-        mark_source_rp_bad(ms);
-    }
+    ram_save_queue_pages(rbname, start, len, errp);
 }
 
 /* Return true to retry, false to quit */
@@ -1800,26 +1786,28 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
-static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
+                                         Error **errp)
 {
     RAMBlock *block = qemu_ram_block_by_name(block_name);
 
     if (!block) {
-        error_report("%s: invalid block name '%s'", __func__, block_name);
+        error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'",
+                   block_name);
         return -EINVAL;
     }
 
     /* Fetch the received bitmap and refresh the dirty bitmap */
-    return ram_dirty_bitmap_reload(s, block);
+    return ram_dirty_bitmap_reload(s, block, errp);
 }
 
-static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+static int migrate_handle_rp_resume_ack(MigrationState *s,
+                                        uint32_t value, Error **errp)
 {
     trace_source_return_path_thread_resume_ack(value);
 
     if (value != MIGRATION_RESUME_ACK_VALUE) {
-        error_report("%s: illegal resume_ack value %"PRIu32,
-                     __func__, value);
+        error_setg(errp, "illegal resume_ack value %"PRIu32, value);
         return -1;
     }
 
@@ -1878,49 +1866,47 @@ static void *source_return_path_thread(void *opaque)
     uint32_t tmp32, sibling_error;
     ram_addr_t start = 0; /* =0 to silence warning */
     size_t  len = 0, expected_len;
+    Error *err = NULL;
     int res;
 
     trace_source_return_path_thread_entry();
     rcu_register_thread();
 
 retry:
-    while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
+    while (!migrate_has_error(ms) && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
+
         header_type = qemu_get_be16(rp);
         header_len = qemu_get_be16(rp);
 
         if (qemu_file_get_error(rp)) {
-            mark_source_rp_bad(ms);
             goto out;
         }
 
         if (header_type >= MIG_RP_MSG_MAX ||
             header_type == MIG_RP_MSG_INVALID) {
-            error_report("RP: Received invalid message 0x%04x length 0x%04x",
-                         header_type, header_len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Received invalid message 0x%04x length 0x%04x",
+                       header_type, header_len);
             goto out;
         }
 
         if ((rp_cmd_args[header_type].len != -1 &&
             header_len != rp_cmd_args[header_type].len) ||
             header_len > sizeof(buf)) {
-            error_report("RP: Received '%s' message (0x%04x) with"
-                         "incorrect length %d expecting %zu",
-                         rp_cmd_args[header_type].name, header_type, header_len,
-                         (size_t)rp_cmd_args[header_type].len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Received '%s' message (0x%04x) with"
+                       "incorrect length %d expecting %zu",
+                       rp_cmd_args[header_type].name, header_type, header_len,
+                       (size_t)rp_cmd_args[header_type].len);
             goto out;
         }
 
         /* We know we've got a valid header by this point */
         res = qemu_get_buffer(rp, buf, header_len);
         if (res != header_len) {
-            error_report("RP: Failed reading data for message 0x%04x"
-                         " read %d expected %d",
-                         header_type, res, header_len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Failed reading data for message 0x%04x"
+                       " read %d expected %d",
+                       header_type, res, header_len);
             goto out;
         }
 
@@ -1930,8 +1916,7 @@ retry:
             sibling_error = ldl_be_p(buf);
             trace_source_return_path_thread_shut(sibling_error);
             if (sibling_error) {
-                error_report("RP: Sibling indicated error %d", sibling_error);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "Sibling indicated error %d", sibling_error);
             }
             /*
              * We'll let the main thread deal with closing the RP
@@ -1949,7 +1934,10 @@ retry:
         case MIG_RP_MSG_REQ_PAGES:
             start = ldq_be_p(buf);
             len = ldl_be_p(buf + 8);
-            migrate_handle_rp_req_pages(ms, NULL, start, len);
+            migrate_handle_rp_req_pages(ms, NULL, start, len, &err);
+            if (err) {
+                goto out;
+            }
             break;
 
         case MIG_RP_MSG_REQ_PAGES_ID:
@@ -1964,32 +1952,32 @@ retry:
                 expected_len += tmp32;
             }
             if (header_len != expected_len) {
-                error_report("RP: Req_Page_id with length %d expecting %zd",
-                             header_len, expected_len);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "Req_Page_id with length %d expecting %zd",
+                           header_len, expected_len);
+                goto out;
+            }
+            migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len,
+                                        &err);
+            if (err) {
                 goto out;
             }
-            migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
         case MIG_RP_MSG_RECV_BITMAP:
             if (header_len < 1) {
-                error_report("%s: missing block name", __func__);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block name");
                 goto out;
             }
             /* Format: len (1B) + idstr (<255B). This ends the idstr. */
             buf[buf[0] + 1] = '\0';
-            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
-                mark_source_rp_bad(ms);
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
                 goto out;
             }
             break;
 
         case MIG_RP_MSG_RESUME_ACK:
             tmp32 = ldl_be_p(buf);
-            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
-                mark_source_rp_bad(ms);
+            if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
                 goto out;
             }
             break;
@@ -2005,6 +1993,19 @@ retry:
     }
 
 out:
+    if (err) {
+        /*
+         * Collect any error in return-path thread and report it to the
+         * migration state object.
+         */
+        migrate_set_error(ms, err);
+        /*
+         * We lost ownership to Error*, clear it, prepared to capture the
+         * next error.
+         */
+        err = NULL;
+    }
+
     res = qemu_file_get_error(rp);
     if (res) {
         if (res && migration_in_postcopy()) {
@@ -2020,13 +2021,11 @@ out:
                  * it's reset only by us above, or when migration completes
                  */
                 rp = ms->rp_state.from_dst_file;
-                ms->rp_state.error = false;
                 goto retry;
             }
         }
 
         trace_source_return_path_thread_bad_end();
-        mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
@@ -2059,8 +2058,7 @@ static int open_return_path_on_source(MigrationState *ms,
     return 0;
 }
 
-/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
-static int await_return_path_close_on_source(MigrationState *ms)
+static void await_return_path_close_on_source(MigrationState *ms)
 {
     /*
      * If this is a normal exit then the destination will send a SHUT and the
@@ -2073,13 +2071,11 @@ static int await_return_path_close_on_source(MigrationState *ms)
          * waiting for the destination.
          */
         qemu_file_shutdown(ms->rp_state.from_dst_file);
-        mark_source_rp_bad(ms);
     }
     trace_await_return_path_close_on_source_joining();
     qemu_thread_join(&ms->rp_state.rp_thread);
     ms->rp_state.rp_thread_created = false;
     trace_await_return_path_close_on_source_close();
-    return ms->rp_state.error;
 }
 
 static inline void
@@ -2382,11 +2378,11 @@ static void migration_completion(MigrationState *s)
      * a SHUT command).
      */
     if (s->rp_state.rp_thread_created) {
-        int rp_error;
         trace_migration_return_path_end_before();
-        rp_error = await_return_path_close_on_source(s);
-        trace_migration_return_path_end_after(rp_error);
-        if (rp_error) {
+        await_return_path_close_on_source(s);
+        trace_migration_return_path_end_after();
+        /* If return path has error, should have been set here */
+        if (migrate_has_error(s)) {
             goto fail;
         }
     }
diff --git a/migration/ram.c b/migration/ram.c
index ba4890563d..567b179fb9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1922,7 +1922,8 @@ static void migration_page_queue_free(RAMState *rs)
  * @start: starting address from the start of the RAMBlock
  * @len: length (in bytes) to send
  */
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
+                         Error **errp)
 {
     RAMBlock *ramblock;
     RAMState *rs = ram_state;
@@ -1939,7 +1940,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
              * Shouldn't happen, we can't reuse the last RAMBlock if
              * it's the 1st request.
              */
-            error_report("ram_save_queue_pages no previous block");
+            error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block");
             return -1;
         }
     } else {
@@ -1947,16 +1948,17 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
 
         if (!ramblock) {
             /* We shouldn't be asked for a non-existent RAMBlock */
-            error_report("ram_save_queue_pages no block '%s'", rbname);
+            error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname);
             return -1;
         }
         rs->last_req_rb = ramblock;
     }
     trace_ram_save_queue_pages(ramblock->idstr, start, len);
     if (!offset_in_ramblock(ramblock, start + len - 1)) {
-        error_report("%s request overrun start=" RAM_ADDR_FMT " len="
-                     RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
-                     __func__, start, len, ramblock->used_length);
+        error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, "
+                   "start=" RAM_ADDR_FMT " len="
+                   RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
+                   start, len, ramblock->used_length);
         return -1;
     }
 
@@ -1988,9 +1990,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         assert(len % page_size == 0);
         while (len) {
             if (ram_save_host_page_urgent(pss)) {
-                error_report("%s: ram_save_host_page_urgent() failed: "
-                             "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
-                             __func__, ramblock->idstr, start);
+                error_setg(errp, "ram_save_host_page_urgent() failed: "
+                           "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
+                           ramblock->idstr, start);
                 ret = -1;
                 break;
             }
@@ -4124,7 +4126,7 @@ static void ram_dirty_bitmap_reload_notify(MigrationState *s)
  * This is only used when the postcopy migration is paused but wants
  * to resume from a middle point.
  */
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
 {
     int ret = -EINVAL;
     /* from_dst_file is always valid because we're within rp_thread */
@@ -4136,8 +4138,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
     if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
-        error_report("%s: incorrect state %s", __func__,
-                     MigrationStatus_str(s->state));
+        error_setg(errp, "Reload bitmap in incorrect state %s",
+                   MigrationStatus_str(s->state));
         return -EINVAL;
     }
 
@@ -4154,9 +4156,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     /* The size of the bitmap should match with our ramblock */
     if (size != local_size) {
-        error_report("%s: ramblock '%s' bitmap size mismatch "
-                     "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
-                     block->idstr, size, local_size);
+        error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64
+                   " != 0x%"PRIx64")", block->idstr, size, local_size);
         ret = -EINVAL;
         goto out;
     }
@@ -4166,16 +4167,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     ret = qemu_file_get_error(file);
     if (ret || size != local_size) {
-        error_report("%s: read bitmap failed for ramblock '%s': %d"
-                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
-                     __func__, block->idstr, ret, local_size, size);
+        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
+                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
+                   block->idstr, ret, local_size, size);
         ret = -EIO;
         goto out;
     }
 
     if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
-        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
-                     __func__, block->idstr, end_mark);
+        error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
+                   block->idstr, end_mark);
         ret = -EINVAL;
         goto out;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 5259c1044b..262eb73078 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -163,7 +163,7 @@ migration_completion_postcopy_end_after_complete(void) ""
 migration_rate_limit_pre(int ms) "%d ms"
 migration_rate_limit_post(int urgent) "urgent: %d"
 migration_return_path_end_before(void) ""
-migration_return_path_end_after(int rp_error) "%d"
+migration_return_path_end_after(void) ""
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/7] migration: Deliver return path file error to migrate state too
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
                   ` (3 preceding siblings ...)
  2023-07-05 16:34 ` [PATCH v2 4/7] migration: Refactor error handling in source return path Peter Xu
@ 2023-07-05 16:35 ` Peter Xu
  2023-07-05 16:35 ` [PATCH v2 6/7] qemufile: Always return a verbose error Peter Xu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

We've already did this for most of the return path thread errors, but not
yet for the IO errors happened on the return path qemufile.  Do that too.

Remember to reset "err" always, because the ownership is not us anymore,
otherwise we're prone to use-after-free later after recovered.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 46dbfb07c4..8f60de362f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2008,6 +2008,13 @@ out:
 
     res = qemu_file_get_error(rp);
     if (res) {
+        /* We have forwarded any error in "err" already, reuse "error" */
+        assert(err == NULL);
+        /* Try to deliver this file error to migration state */
+        qemu_file_get_error_obj(rp, &err);
+        migrate_set_error(ms, err);
+        err = NULL;
+
         if (res && migration_in_postcopy()) {
             /*
              * Maybe there is something we can do: it looks like a
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/7] qemufile: Always return a verbose error
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
                   ` (4 preceding siblings ...)
  2023-07-05 16:35 ` [PATCH v2 5/7] migration: Deliver return path file error to migrate state too Peter Xu
@ 2023-07-05 16:35 ` Peter Xu
  2023-07-05 21:54   ` Fabiano Rosas
  2023-07-05 16:35 ` [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns Peter Xu
  2023-07-24 14:11 ` [PATCH v2 0/7] migration: Better error handling in return path thread Fabiano Rosas
  7 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

There're a lot of cases where we only have an errno set in last_error but
without a detailed error description.  When this happens, try to generate
an error contains the errno as a descriptive error.

This will be helpful in cases where one relies on the Error*.  E.g.,
migration state only caches Error* in MigrationState.error.  With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index acc282654a..419b4092e7 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
  *
  * Return negative error value if there has been an error on previous
  * operations, return 0 if no error happened.
- * Optional, it returns Error* in errp, but it may be NULL even if return value
- * is not 0.
  *
+ * If errp is specified, a verbose error message will be copied over.
  */
 int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
+    if (!f->last_error) {
+        return 0;
+    }
+
+    /* There is an error */
     if (errp) {
-        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
+        if (f->last_error_obj) {
+            *errp = error_copy(f->last_error_obj);
+        } else {
+            error_setg_errno(errp, -f->last_error, "Channel error");
+        }
     }
+
     return f->last_error;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
                   ` (5 preceding siblings ...)
  2023-07-05 16:35 ` [PATCH v2 6/7] qemufile: Always return a verbose error Peter Xu
@ 2023-07-05 16:35 ` Peter Xu
  2023-07-05 22:05   ` Fabiano Rosas
  2023-07-24 14:11 ` [PATCH v2 0/7] migration: Better error handling in return path thread Fabiano Rosas
  7 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 16:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Fabiano Rosas, Juan Quintela,
	Lukas Straub, Laszlo Ersek, peterx

Provide an explicit reason for qemu_file_shutdown()s, which can be
displayed in query-migrate when used.

This will make e.g. migrate-pause to display explicit error descriptions,
from:

"error-desc": "Channel error: Input/output error"

To:

"error-desc": "Channel is explicitly shutdown by the user"

in query-migrate.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 419b4092e7..ff605027de 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
      *      --> guest crash!
      */
     if (!f->last_error) {
-        qemu_file_set_error(f, -EIO);
+        Error *err = NULL;
+
+        error_setg(&err, "Channel is explicitly shutdown by the user");
+        qemu_file_set_error_obj(f, -EIO, err);
     }
 
     if (!qio_channel_has_feature(f->ioc,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/7] migration: Let migrate_set_error() take ownership
  2023-07-05 16:34 ` [PATCH v2 2/7] migration: Let migrate_set_error() take ownership Peter Xu
@ 2023-07-05 20:53   ` Fabiano Rosas
  0 siblings, 0 replies; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-05 20:53 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Peter Xu <peterx@redhat.com> writes:

> migrate_set_error() used one error_copy() so it always copy an error.
> However that's not the major use case - the major use case is one would
> like to pass the error to migrate_set_error() without further touching the
> error.
>
> It can be proved if we see most of the callers are freeing the error
> explicitly right afterwards.  There're a few outliers (only if when the
> caller) where we can use error_copy() explicitly there.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/7] migration: Introduce migrate_has_error()
  2023-07-05 16:34 ` [PATCH v2 3/7] migration: Introduce migrate_has_error() Peter Xu
@ 2023-07-05 20:55   ` Fabiano Rosas
  0 siblings, 0 replies; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-05 20:55 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Peter Xu <peterx@redhat.com> writes:

> Introduce a helper to detect whether MigrationState.error is set for
> whatever reason.  It is intended to not taking the error_mutex here because
> neither do we reference the pointer, nor do we modify the pointer.  State
> why it's safe to do so.
>
> This is preparation work for any thread (e.g. source return path thread) to
> setup errors in an unified way to MigrationState, rather than relying on
> its own way to set errors (mark_source_rp_bad()).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] qemufile: Always return a verbose error
  2023-07-05 16:35 ` [PATCH v2 6/7] qemufile: Always return a verbose error Peter Xu
@ 2023-07-05 21:54   ` Fabiano Rosas
  2023-07-05 22:24     ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-05 21:54 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Peter Xu <peterx@redhat.com> writes:

> There're a lot of cases where we only have an errno set in last_error but
> without a detailed error description.  When this happens, try to generate
> an error contains the errno as a descriptive error.
>
> This will be helpful in cases where one relies on the Error*.  E.g.,
> migration state only caches Error* in MigrationState.error.  With this,
> we'll display correct error messages in e.g. query-migrate when the error
> was only set by qemu_file_set_error().
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/qemu-file.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index acc282654a..419b4092e7 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
>   *
>   * Return negative error value if there has been an error on previous
>   * operations, return 0 if no error happened.
> - * Optional, it returns Error* in errp, but it may be NULL even if return value
> - * is not 0.
>   *
> + * If errp is specified, a verbose error message will be copied over.
>   */
>  int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
>  {
> +    if (!f->last_error) {
> +        return 0;
> +    }
> +
> +    /* There is an error */
>      if (errp) {
> -        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
> +        if (f->last_error_obj) {
> +            *errp = error_copy(f->last_error_obj);
> +        } else {
> +            error_setg_errno(errp, -f->last_error, "Channel error");

There are a couple of places that do:

    ret = vmstate_save(f, se, ms->vmdesc);
    if (ret) {
        qemu_file_set_error(f, ret);
        break;
    }

and vmstate_save() can return > 0 on error. This would make this message
say "Unknown error". This is minor.

But take a look at qemu_fclose(). It can return f->last_error while the
function documentation says it should return negative on error.

Should we make qemu_file_set_error() check 'ret' and always set a
negative value for f->last_error?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-05 16:35 ` [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns Peter Xu
@ 2023-07-05 22:05   ` Fabiano Rosas
  2023-07-05 22:34     ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-05 22:05 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Peter Xu <peterx@redhat.com> writes:

> Provide an explicit reason for qemu_file_shutdown()s, which can be
> displayed in query-migrate when used.
>

Can we consider this to cover the TODO:

 * TODO: convert to propagate Error objects instead of squashing
 * to a fixed errno value

or would that need something fancier?

> This will make e.g. migrate-pause to display explicit error descriptions,
> from:
>
> "error-desc": "Channel error: Input/output error"
>
> To:
>
> "error-desc": "Channel is explicitly shutdown by the user"
>
> in query-migrate.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/qemu-file.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 419b4092e7..ff605027de 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>       *      --> guest crash!
>       */
>      if (!f->last_error) {
> -        qemu_file_set_error(f, -EIO);
> +        Error *err = NULL;
> +
> +        error_setg(&err, "Channel is explicitly shutdown by the user");

It is good that we can grep this message. However, I'm confused about
who the "user" is meant to be here and how are they implicated in this
error.

> +        qemu_file_set_error_obj(f, -EIO, err);
>      }
>  
>      if (!qio_channel_has_feature(f->ioc,


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] qemufile: Always return a verbose error
  2023-07-05 21:54   ` Fabiano Rosas
@ 2023-07-05 22:24     ` Peter Xu
  2023-07-06 13:56       ` Fabiano Rosas
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 22:24 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

On Wed, Jul 05, 2023 at 06:54:37PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > There're a lot of cases where we only have an errno set in last_error but
> > without a detailed error description.  When this happens, try to generate
> > an error contains the errno as a descriptive error.
> >
> > This will be helpful in cases where one relies on the Error*.  E.g.,
> > migration state only caches Error* in MigrationState.error.  With this,
> > we'll display correct error messages in e.g. query-migrate when the error
> > was only set by qemu_file_set_error().
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/qemu-file.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index acc282654a..419b4092e7 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
> >   *
> >   * Return negative error value if there has been an error on previous
> >   * operations, return 0 if no error happened.
> > - * Optional, it returns Error* in errp, but it may be NULL even if return value
> > - * is not 0.
> >   *
> > + * If errp is specified, a verbose error message will be copied over.
> >   */
> >  int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
> >  {
> > +    if (!f->last_error) {
> > +        return 0;
> > +    }
> > +
> > +    /* There is an error */
> >      if (errp) {
> > -        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
> > +        if (f->last_error_obj) {
> > +            *errp = error_copy(f->last_error_obj);
> > +        } else {
> > +            error_setg_errno(errp, -f->last_error, "Channel error");
> 
> There are a couple of places that do:
> 
>     ret = vmstate_save(f, se, ms->vmdesc);
>     if (ret) {
>         qemu_file_set_error(f, ret);
>         break;
>     }
> 
> and vmstate_save() can return > 0 on error. This would make this message
> say "Unknown error". This is minor.
> 
> But take a look at qemu_fclose(). It can return f->last_error while the
> function documentation says it should return negative on error.
> 
> Should we make qemu_file_set_error() check 'ret' and always set a
> negative value for f->last_error?

Yeah, maybe we can add a sanity check, but logically it's better we just
fix vmstate_save() to make sure it always returns a <0 error.

It seems to me there're so many hooks in vmstate_save_state_v() that it can
return random things.  What's the one you spot?  If it's an obvious issue
we can fix them.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-05 22:05   ` Fabiano Rosas
@ 2023-07-05 22:34     ` Peter Xu
  2023-07-06 13:50       ` Fabiano Rosas
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-05 22:34 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> > displayed in query-migrate when used.
> >
> 
> Can we consider this to cover the TODO:
> 
>  * TODO: convert to propagate Error objects instead of squashing
>  * to a fixed errno value
> 
> or would that need something fancier?

The TODO seems to say we want to allow qemu_file_shutdown() to report an
Error* when anything wrong happened (e.g. shutdown() failed)?  While this
patch was trying to store a specific error string so when query migration
later it'll show up to the user.  If so, IMHO they're two things.

> 
> > This will make e.g. migrate-pause to display explicit error descriptions,
> > from:
> >
> > "error-desc": "Channel error: Input/output error"
> >
> > To:
> >
> > "error-desc": "Channel is explicitly shutdown by the user"
> >
> > in query-migrate.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/qemu-file.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 419b4092e7..ff605027de 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >       *      --> guest crash!
> >       */
> >      if (!f->last_error) {
> > -        qemu_file_set_error(f, -EIO);
> > +        Error *err = NULL;
> > +
> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> 
> It is good that we can grep this message. However, I'm confused about
> who the "user" is meant to be here and how are they implicated in this
> error.

Ah, here the user is who sends the "migrate-pause" command, according to
the example of the commit message.

What I wanted to do is provide a clear message (besides -EIO) when
query-migrate, so we know more on how the migration is paused/stopped/...
Before that it shows that the same as e.g. any form of IO errors happened.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-05 22:34     ` Peter Xu
@ 2023-07-06 13:50       ` Fabiano Rosas
  2023-07-06 16:27         ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-06 13:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> > displayed in query-migrate when used.
>> >
>> 
>> Can we consider this to cover the TODO:
>> 
>>  * TODO: convert to propagate Error objects instead of squashing
>>  * to a fixed errno value
>> 
>> or would that need something fancier?
>
> The TODO seems to say we want to allow qemu_file_shutdown() to report an
> Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> patch was trying to store a specific error string so when query migration
> later it'll show up to the user.  If so, IMHO they're two things.
>

Ok, just making sure.

>> 
>> > This will make e.g. migrate-pause to display explicit error descriptions,
>> > from:
>> >
>> > "error-desc": "Channel error: Input/output error"
>> >
>> > To:
>> >
>> > "error-desc": "Channel is explicitly shutdown by the user"
>> >
>> > in query-migrate.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/qemu-file.c | 5 ++++-
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> > index 419b4092e7..ff605027de 100644
>> > --- a/migration/qemu-file.c
>> > +++ b/migration/qemu-file.c
>> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >       *      --> guest crash!
>> >       */
>> >      if (!f->last_error) {
>> > -        qemu_file_set_error(f, -EIO);
>> > +        Error *err = NULL;
>> > +
>> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> 
>> It is good that we can grep this message. However, I'm confused about
>> who the "user" is meant to be here and how are they implicated in this
>> error.
>
> Ah, here the user is who sends the "migrate-pause" command, according to
> the example of the commit message.
>

That's where I'm confused. There are 15 callsites for
qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
missing the logical step that links migrate-pause with this
error_setg(). Are you assuming that the race described will only happen
with migrate-pause and the other invocations would have set an error
already?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] qemufile: Always return a verbose error
  2023-07-05 22:24     ` Peter Xu
@ 2023-07-06 13:56       ` Fabiano Rosas
  0 siblings, 0 replies; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-06 13:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 05, 2023 at 06:54:37PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > There're a lot of cases where we only have an errno set in last_error but
>> > without a detailed error description.  When this happens, try to generate
>> > an error contains the errno as a descriptive error.
>> >
>> > This will be helpful in cases where one relies on the Error*.  E.g.,
>> > migration state only caches Error* in MigrationState.error.  With this,
>> > we'll display correct error messages in e.g. query-migrate when the error
>> > was only set by qemu_file_set_error().
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/qemu-file.c | 15 ++++++++++++---
>> >  1 file changed, 12 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> > index acc282654a..419b4092e7 100644
>> > --- a/migration/qemu-file.c
>> > +++ b/migration/qemu-file.c
>> > @@ -156,15 +156,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
>> >   *
>> >   * Return negative error value if there has been an error on previous
>> >   * operations, return 0 if no error happened.
>> > - * Optional, it returns Error* in errp, but it may be NULL even if return value
>> > - * is not 0.
>> >   *
>> > + * If errp is specified, a verbose error message will be copied over.
>> >   */
>> >  int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
>> >  {
>> > +    if (!f->last_error) {
>> > +        return 0;
>> > +    }
>> > +
>> > +    /* There is an error */
>> >      if (errp) {
>> > -        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
>> > +        if (f->last_error_obj) {
>> > +            *errp = error_copy(f->last_error_obj);
>> > +        } else {
>> > +            error_setg_errno(errp, -f->last_error, "Channel error");
>> 
>> There are a couple of places that do:
>> 
>>     ret = vmstate_save(f, se, ms->vmdesc);
>>     if (ret) {
>>         qemu_file_set_error(f, ret);
>>         break;
>>     }
>> 
>> and vmstate_save() can return > 0 on error. This would make this message
>> say "Unknown error". This is minor.
>> 
>> But take a look at qemu_fclose(). It can return f->last_error while the
>> function documentation says it should return negative on error.
>> 
>> Should we make qemu_file_set_error() check 'ret' and always set a
>> negative value for f->last_error?
>
> Yeah, maybe we can add a sanity check, but logically it's better we just
> fix vmstate_save() to make sure it always returns a <0 error.
>
> It seems to me there're so many hooks in vmstate_save_state_v() that it can
> return random things.  What's the one you spot?  If it's an obvious issue
> we can fix them.

I see at least:
    ret = field->info->put(f, curr_elem, size, field, vmdesc_loop);
with put_power() from target/arm/machine.c returning 1.

Since vmstate_save_state_v() is quite involved I don't think we should
block this series because of it. I can do a closer audit and send a
separate patch with it.

So:

Reviewed-by: Fabiano Rosas <farosas@suse.de>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-06 13:50       ` Fabiano Rosas
@ 2023-07-06 16:27         ` Peter Xu
  2023-07-06 17:33           ` Fabiano Rosas
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-06 16:27 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> >> > displayed in query-migrate when used.
> >> >
> >> 
> >> Can we consider this to cover the TODO:
> >> 
> >>  * TODO: convert to propagate Error objects instead of squashing
> >>  * to a fixed errno value
> >> 
> >> or would that need something fancier?
> >
> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> > patch was trying to store a specific error string so when query migration
> > later it'll show up to the user.  If so, IMHO they're two things.
> >
> 
> Ok, just making sure.
> 
> >> 
> >> > This will make e.g. migrate-pause to display explicit error descriptions,
> >> > from:
> >> >
> >> > "error-desc": "Channel error: Input/output error"
> >> >
> >> > To:
> >> >
> >> > "error-desc": "Channel is explicitly shutdown by the user"
> >> >
> >> > in query-migrate.
> >> >
> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> > ---
> >> >  migration/qemu-file.c | 5 ++++-
> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> > index 419b4092e7..ff605027de 100644
> >> > --- a/migration/qemu-file.c
> >> > +++ b/migration/qemu-file.c
> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >> >       *      --> guest crash!
> >> >       */
> >> >      if (!f->last_error) {
> >> > -        qemu_file_set_error(f, -EIO);
> >> > +        Error *err = NULL;
> >> > +
> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> >> 
> >> It is good that we can grep this message. However, I'm confused about
> >> who the "user" is meant to be here and how are they implicated in this
> >> error.
> >
> > Ah, here the user is who sends the "migrate-pause" command, according to
> > the example of the commit message.
> >
> 
> That's where I'm confused. There are 15 callsites for
> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
> missing the logical step that links migrate-pause with this
> error_setg().
> Are you assuming that the race described will only happen
> with migrate-pause and the other invocations would have set an error
> already?

It's not a race, but I think you're right. I thought it was always the case
to shut but actually not: we do shutdown() also in a few places where we
don't really fail, either for COLO or for completion of migration.  With
the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
I could have done better.

Let's drop this patch.. sorry for the noise.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-06 16:27         ` Peter Xu
@ 2023-07-06 17:33           ` Fabiano Rosas
  2023-07-06 18:08             ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-06 17:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> >> > displayed in query-migrate when used.
>> >> >
>> >> 
>> >> Can we consider this to cover the TODO:
>> >> 
>> >>  * TODO: convert to propagate Error objects instead of squashing
>> >>  * to a fixed errno value
>> >> 
>> >> or would that need something fancier?
>> >
>> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
>> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
>> > patch was trying to store a specific error string so when query migration
>> > later it'll show up to the user.  If so, IMHO they're two things.
>> >
>> 
>> Ok, just making sure.
>> 
>> >> 
>> >> > This will make e.g. migrate-pause to display explicit error descriptions,
>> >> > from:
>> >> >
>> >> > "error-desc": "Channel error: Input/output error"
>> >> >
>> >> > To:
>> >> >
>> >> > "error-desc": "Channel is explicitly shutdown by the user"
>> >> >
>> >> > in query-migrate.
>> >> >
>> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> >> > ---
>> >> >  migration/qemu-file.c | 5 ++++-
>> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> > index 419b4092e7..ff605027de 100644
>> >> > --- a/migration/qemu-file.c
>> >> > +++ b/migration/qemu-file.c
>> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >> >       *      --> guest crash!
>> >> >       */
>> >> >      if (!f->last_error) {
>> >> > -        qemu_file_set_error(f, -EIO);
>> >> > +        Error *err = NULL;
>> >> > +
>> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> >> 
>> >> It is good that we can grep this message. However, I'm confused about
>> >> who the "user" is meant to be here and how are they implicated in this
>> >> error.
>> >
>> > Ah, here the user is who sends the "migrate-pause" command, according to
>> > the example of the commit message.
>> >
>> 
>> That's where I'm confused. There are 15 callsites for
>> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
>> missing the logical step that links migrate-pause with this
>> error_setg().
>> Are you assuming that the race described will only happen
>> with migrate-pause and the other invocations would have set an error
>> already?
>
> It's not a race, but I think you're right. I thought it was always the case

I'm talking about the race with another thread checking f->last_error
and this thread setting it. Described in commit f5816b5c86ed
("migration: Fix race on qemu_file_shutdown()").

> to shut but actually not: we do shutdown() also in a few places where we
> don't really fail, either for COLO or for completion of migration.  With
> the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> I could have done better.
>

The idea is that we avoid doing IO after the file has been shutdown, so
we preload this -EIO error. We could just alter the message to "Channel
has been explicitly shutdown" or "Tried to do IO after channel
shutdown". It would still be better than the generic EIO message.

But up to you.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-06 17:33           ` Fabiano Rosas
@ 2023-07-06 18:08             ` Peter Xu
  2023-07-06 18:47               ` Fabiano Rosas
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2023-07-06 18:08 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <peterx@redhat.com> writes:
> >> >> 
> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> >> >> > displayed in query-migrate when used.
> >> >> >
> >> >> 
> >> >> Can we consider this to cover the TODO:
> >> >> 
> >> >>  * TODO: convert to propagate Error objects instead of squashing
> >> >>  * to a fixed errno value
> >> >> 
> >> >> or would that need something fancier?
> >> >
> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
> >> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> >> > patch was trying to store a specific error string so when query migration
> >> > later it'll show up to the user.  If so, IMHO they're two things.
> >> >
> >> 
> >> Ok, just making sure.
> >> 
> >> >> 
> >> >> > This will make e.g. migrate-pause to display explicit error descriptions,
> >> >> > from:
> >> >> >
> >> >> > "error-desc": "Channel error: Input/output error"
> >> >> >
> >> >> > To:
> >> >> >
> >> >> > "error-desc": "Channel is explicitly shutdown by the user"
> >> >> >
> >> >> > in query-migrate.
> >> >> >
> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> >> > ---
> >> >> >  migration/qemu-file.c | 5 ++++-
> >> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> >> > index 419b4092e7..ff605027de 100644
> >> >> > --- a/migration/qemu-file.c
> >> >> > +++ b/migration/qemu-file.c
> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >> >> >       *      --> guest crash!
> >> >> >       */
> >> >> >      if (!f->last_error) {
> >> >> > -        qemu_file_set_error(f, -EIO);
> >> >> > +        Error *err = NULL;
> >> >> > +
> >> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> >> >> 
> >> >> It is good that we can grep this message. However, I'm confused about
> >> >> who the "user" is meant to be here and how are they implicated in this
> >> >> error.
> >> >
> >> > Ah, here the user is who sends the "migrate-pause" command, according to
> >> > the example of the commit message.
> >> >
> >> 
> >> That's where I'm confused. There are 15 callsites for
> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
> >> missing the logical step that links migrate-pause with this
> >> error_setg().
> >> Are you assuming that the race described will only happen
> >> with migrate-pause and the other invocations would have set an error
> >> already?
> >
> > It's not a race, but I think you're right. I thought it was always the case
> 
> I'm talking about the race with another thread checking f->last_error
> and this thread setting it. Described in commit f5816b5c86ed
> ("migration: Fix race on qemu_file_shutdown()").

I don't yet catch your point, sorry.  I thought f5816b5c86ed closed that
race.  What's still missing?

> 
> > to shut but actually not: we do shutdown() also in a few places where we
> > don't really fail, either for COLO or for completion of migration.  With
> > the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> > I could have done better.
> >
> 
> The idea is that we avoid doing IO after the file has been shutdown, so
> we preload this -EIO error. We could just alter the message to "Channel
> has been explicitly shutdown" or "Tried to do IO after channel
> shutdown". It would still be better than the generic EIO message.

My point is I'm afraid (I thought after you pointed out, but maybe I just
misread what you said..) we'll call qemu_file_shutdown() even in normal
paths, so we can see an error poped up in query-migrate even if nothing
wrong happened. I think that's unwanted.

We can still improve that msg by only setting that specific error in e.g.
qmp_migrate_pause|cancel() or paths where we know we want to set the error,
but I'd rather drop the patch first so the rest patches can be reviewed and
merged first; that'll be a cosmetic change.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns
  2023-07-06 18:08             ` Peter Xu
@ 2023-07-06 18:47               ` Fabiano Rosas
  0 siblings, 0 replies; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-06 18:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu <peterx@redhat.com> writes:
>> >> >> 
>> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> >> >> > displayed in query-migrate when used.
>> >> >> >
>> >> >> 
>> >> >> Can we consider this to cover the TODO:
>> >> >> 
>> >> >>  * TODO: convert to propagate Error objects instead of squashing
>> >> >>  * to a fixed errno value
>> >> >> 
>> >> >> or would that need something fancier?
>> >> >
>> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
>> >> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
>> >> > patch was trying to store a specific error string so when query migration
>> >> > later it'll show up to the user.  If so, IMHO they're two things.
>> >> >
>> >> 
>> >> Ok, just making sure.
>> >> 
>> >> >> 
>> >> >> > This will make e.g. migrate-pause to display explicit error descriptions,
>> >> >> > from:
>> >> >> >
>> >> >> > "error-desc": "Channel error: Input/output error"
>> >> >> >
>> >> >> > To:
>> >> >> >
>> >> >> > "error-desc": "Channel is explicitly shutdown by the user"
>> >> >> >
>> >> >> > in query-migrate.
>> >> >> >
>> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> >> >> > ---
>> >> >> >  migration/qemu-file.c | 5 ++++-
>> >> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >> >> >
>> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> >> > index 419b4092e7..ff605027de 100644
>> >> >> > --- a/migration/qemu-file.c
>> >> >> > +++ b/migration/qemu-file.c
>> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >> >> >       *      --> guest crash!
>> >> >> >       */
>> >> >> >      if (!f->last_error) {
>> >> >> > -        qemu_file_set_error(f, -EIO);
>> >> >> > +        Error *err = NULL;
>> >> >> > +
>> >> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> >> >> 
>> >> >> It is good that we can grep this message. However, I'm confused about
>> >> >> who the "user" is meant to be here and how are they implicated in this
>> >> >> error.
>> >> >
>> >> > Ah, here the user is who sends the "migrate-pause" command, according to
>> >> > the example of the commit message.
>> >> >
>> >> 
>> >> That's where I'm confused. There are 15 callsites for
>> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
>> >> missing the logical step that links migrate-pause with this
>> >> error_setg().
>> >> Are you assuming that the race described will only happen
>> >> with migrate-pause and the other invocations would have set an error
>> >> already?
>> >
>> > It's not a race, but I think you're right. I thought it was always the case
>> 
>> I'm talking about the race with another thread checking f->last_error
>> and this thread setting it. Described in commit f5816b5c86ed
>> ("migration: Fix race on qemu_file_shutdown()").
>
> I don't yet catch your point, sorry.  I thought f5816b5c86ed closed that
> race.  What's still missing?
>

I was initially trying to ask if your previous knowledge about the
situation that caused the race could allow you to infer that the error
message would only be relevant in the migrate-pause scenario. But I now
understand that is not the case.

>> 
>> > to shut but actually not: we do shutdown() also in a few places where we
>> > don't really fail, either for COLO or for completion of migration.  With
>> > the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
>> > I could have done better.
>> >
>> 
>> The idea is that we avoid doing IO after the file has been shutdown, so
>> we preload this -EIO error. We could just alter the message to "Channel
>> has been explicitly shutdown" or "Tried to do IO after channel
>> shutdown". It would still be better than the generic EIO message.
>
> My point is I'm afraid (I thought after you pointed out, but maybe I just
> misread what you said..) we'll call qemu_file_shutdown() even in normal
> paths, so we can see an error poped up in query-migrate even if nothing
> wrong happened. I think that's unwanted.
>

I see. My point was that the error message wouldn't always match the
situation in which qemu_file_shutdown() was called. The fact that we
might not even want the error message at all had not crossed my mind.

> We can still improve that msg by only setting that specific error in e.g.
> qmp_migrate_pause|cancel() or paths where we know we want to set the error,
> but I'd rather drop the patch first so the rest patches can be reviewed and
> merged first; that'll be a cosmetic change.

Ok, I agree. Thanks for the clarification.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] migration: Better error handling in return path thread
  2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
                   ` (6 preceding siblings ...)
  2023-07-05 16:35 ` [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns Peter Xu
@ 2023-07-24 14:11 ` Fabiano Rosas
  2023-07-25 18:24   ` Fabiano Rosas
  7 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-24 14:11 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Peter Xu <peterx@redhat.com> writes:

> v2:
> - Patch "migration: Provide explicit error message for file shutdowns"
>   - Touched up qapi doc [Fabiano]
>   - Added Bugzilla link to commit which I didn't even notice that I was
>     fixing a bug.. but rightfully pointed out by Laszlo.
>   - Moved it to the 1st patch because it fixes a bug, please consider
>     review and merge it earlier.
>
> This is a small series that reworks error handling of postcopy return path
> threads.
>
> We used to contain a bunch of error_report(), converting them into
> error_setg() properly and deliver any of those errors to migration generic
> error reports (via migrate_set_error()).  Then these errors can also be
> observed in query-migrate after postcopy is paused.
>
> Dropped the return-path specific error reporting: mark_source_rp_bad(),
> because it's a duplication if we can always use migrate_set_error().
>
> Please have a look, thanks.
>
> Peter Xu (7):
>   migration: Display error in query-migrate irrelevant of status
>   migration: Let migrate_set_error() take ownership
>   migration: Introduce migrate_has_error()
>   migration: Refactor error handling in source return path
>   migration: Deliver return path file error to migrate state too
>   qemufile: Always return a verbose error
>   migration: Provide explicit error message for file shutdowns
>
>  qapi/migration.json      |   5 +-
>  migration/migration.h    |   8 +-
>  migration/ram.h          |   5 +-
>  migration/channel.c      |   1 -
>  migration/migration.c    | 168 +++++++++++++++++++++++----------------
>  migration/multifd.c      |  10 +--
>  migration/postcopy-ram.c |   1 -
>  migration/qemu-file.c    |  20 ++++-
>  migration/ram.c          |  42 +++++-----
>  migration/trace-events   |   2 +-
>  10 files changed, 149 insertions(+), 113 deletions(-)

Hi Peter,

Were you aiming at solving any specific bug with this series? I'm seeing
a bug on master (361d5397355) with the
/x86_64/migration/postcopy/preempt/recovery/plain test around the areas
that this series touches.

It happens very rarely and I'm still investigating, but in case you have
any thoughts:

====
It seems there's a race condition between postcopy resume and the return
path cleanup.

It is possible for open_return_path_on_source() to setup the new
QEMUFile *before* the cleanup path at source_return_path_thread() has
had a chance to run, so we end up calling migration_release_dst_files()
on the new file and ms->rp_state.from_dst_file gets set to NULL again,
leading to a SIGSEGV at qemu_file_get_error(rp) due to rp being NULL.

Here's a trace from when it works (both src and dst):

open_return_path_on_source
open_return_path_on_source_continue
postcopy_pause_incoming
postcopy_pause_fast_load
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_fault_thread
postcopy_pause_return_path   <---
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
open_return_path_on_source   <--- OK
postcopy_pause_continued     <---
postcopy_pause_return_path_continued
postcopy_pause_incoming_continued
postcopy_pause_fault_thread_continued
postcopy_pause_fast_load_continued

versus when it fails:

open_return_path_on_source
open_return_path_on_source_continue
postcopy_pause_incoming
postcopy_pause_fast_load
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_fault_thread
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_incoming_continued
open_return_path_on_source   <--- NOK
postcopy_pause_continued
postcopy_pause_return_path   <---
postcopy_pause_return_path_continued <---
postcopy_pause_incoming
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_incoming_continued


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] migration: Better error handling in return path thread
  2023-07-24 14:11 ` [PATCH v2 0/7] migration: Better error handling in return path thread Fabiano Rosas
@ 2023-07-25 18:24   ` Fabiano Rosas
  2023-07-26 16:36     ` Peter Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Fabiano Rosas @ 2023-07-25 18:24 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Leonardo Bras Soares Passos, Juan Quintela, Lukas Straub,
	Laszlo Ersek, peterx

Fabiano Rosas <farosas@suse.de> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> v2:
>> - Patch "migration: Provide explicit error message for file shutdowns"
>>   - Touched up qapi doc [Fabiano]
>>   - Added Bugzilla link to commit which I didn't even notice that I was
>>     fixing a bug.. but rightfully pointed out by Laszlo.
>>   - Moved it to the 1st patch because it fixes a bug, please consider
>>     review and merge it earlier.
>>
>> This is a small series that reworks error handling of postcopy return path
>> threads.
>>
>> We used to contain a bunch of error_report(), converting them into
>> error_setg() properly and deliver any of those errors to migration generic
>> error reports (via migrate_set_error()).  Then these errors can also be
>> observed in query-migrate after postcopy is paused.
>>
>> Dropped the return-path specific error reporting: mark_source_rp_bad(),
>> because it's a duplication if we can always use migrate_set_error().
>>
>> Please have a look, thanks.
>>
>> Peter Xu (7):
>>   migration: Display error in query-migrate irrelevant of status
>>   migration: Let migrate_set_error() take ownership
>>   migration: Introduce migrate_has_error()
>>   migration: Refactor error handling in source return path
>>   migration: Deliver return path file error to migrate state too
>>   qemufile: Always return a verbose error
>>   migration: Provide explicit error message for file shutdowns
>>
>>  qapi/migration.json      |   5 +-
>>  migration/migration.h    |   8 +-
>>  migration/ram.h          |   5 +-
>>  migration/channel.c      |   1 -
>>  migration/migration.c    | 168 +++++++++++++++++++++++----------------
>>  migration/multifd.c      |  10 +--
>>  migration/postcopy-ram.c |   1 -
>>  migration/qemu-file.c    |  20 ++++-
>>  migration/ram.c          |  42 +++++-----
>>  migration/trace-events   |   2 +-
>>  10 files changed, 149 insertions(+), 113 deletions(-)
>
> Hi Peter,
>
> Were you aiming at solving any specific bug with this series? I'm seeing
> a bug on master (361d5397355) with the
> /x86_64/migration/postcopy/preempt/recovery/plain test around the areas
> that this series touches.
>
> It happens very rarely and I'm still investigating, but in case you have
> any thoughts:
>
> ====
> It seems there's a race condition between postcopy resume and the return
> path cleanup.
>
> It is possible for open_return_path_on_source() to setup the new
> QEMUFile *before* the cleanup path at source_return_path_thread() has
> had a chance to run, so we end up calling migration_release_dst_files()
> on the new file and ms->rp_state.from_dst_file gets set to NULL again,
> leading to a SIGSEGV at qemu_file_get_error(rp) due to rp being NULL.

I did some more digging and this is indeed what happens. When we pause
on the incoming side, the to_src_file is closed and the source return
path sees an error (EBADFD) which leads to the cleanup (from_dst_file =
NULL). This happens independently and without any synchronization with a
potential concurrent resume operation.

Is there a reason for not closing the return path thread and starting a
new one for resume? The from_dst_file is the only thing being changed
anyway. It would allow us to remove the retry logic along with the
problematic cleanup path and not need another synchronization point
between qmp_migrate() and the return path.

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
                            qmp_migrate_pause()
                             shutdown(ms->to_dst_file)
                              f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
                            qmp_migrate(resume)
                            migrate_fd_connect()
                             resume = state == PAUSED
                             open_return_path <-- TOO SOON!
                             set_state(RECOVER)
                             post(postcopy_pause_sem)
                                                        (incoming closes to_src_file)
                                                        res = qemu_file_get_error(rp)
                                                        migration_release_dst_files()
                                                        ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
                                                        postcopy_pause_return_path_thread()
                                                          wait(postcopy_pause_rp_sem)
                                                        rp = ms->rp_state.from_dst_file
                                                        goto retry
                                                        qemu_file_get_error(rp)
                                                        SIGSEGV
-------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/7] migration: Better error handling in return path thread
  2023-07-25 18:24   ` Fabiano Rosas
@ 2023-07-26 16:36     ` Peter Xu
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2023-07-26 16:36 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Leonardo Bras Soares Passos, Juan Quintela,
	Lukas Straub, Laszlo Ersek

Hi, Fabiano,

Sorry to be late on responding.

On Tue, Jul 25, 2023 at 03:24:26PM -0300, Fabiano Rosas wrote:
> Fabiano Rosas <farosas@suse.de> writes:
> 
> > Peter Xu <peterx@redhat.com> writes:
> >
> >> v2:
> >> - Patch "migration: Provide explicit error message for file shutdowns"
> >>   - Touched up qapi doc [Fabiano]
> >>   - Added Bugzilla link to commit which I didn't even notice that I was
> >>     fixing a bug.. but rightfully pointed out by Laszlo.
> >>   - Moved it to the 1st patch because it fixes a bug, please consider
> >>     review and merge it earlier.
> >>
> >> This is a small series that reworks error handling of postcopy return path
> >> threads.
> >>
> >> We used to contain a bunch of error_report(), converting them into
> >> error_setg() properly and deliver any of those errors to migration generic
> >> error reports (via migrate_set_error()).  Then these errors can also be
> >> observed in query-migrate after postcopy is paused.
> >>
> >> Dropped the return-path specific error reporting: mark_source_rp_bad(),
> >> because it's a duplication if we can always use migrate_set_error().
> >>
> >> Please have a look, thanks.
> >>
> >> Peter Xu (7):
> >>   migration: Display error in query-migrate irrelevant of status
> >>   migration: Let migrate_set_error() take ownership
> >>   migration: Introduce migrate_has_error()
> >>   migration: Refactor error handling in source return path
> >>   migration: Deliver return path file error to migrate state too
> >>   qemufile: Always return a verbose error
> >>   migration: Provide explicit error message for file shutdowns
> >>
> >>  qapi/migration.json      |   5 +-
> >>  migration/migration.h    |   8 +-
> >>  migration/ram.h          |   5 +-
> >>  migration/channel.c      |   1 -
> >>  migration/migration.c    | 168 +++++++++++++++++++++++----------------
> >>  migration/multifd.c      |  10 +--
> >>  migration/postcopy-ram.c |   1 -
> >>  migration/qemu-file.c    |  20 ++++-
> >>  migration/ram.c          |  42 +++++-----
> >>  migration/trace-events   |   2 +-
> >>  10 files changed, 149 insertions(+), 113 deletions(-)
> >
> > Hi Peter,
> >
> > Were you aiming at solving any specific bug with this series?

Nop.  I simply noticed that the error in return path cannot be proxied to
migration object and thought maybe we should do that.

Thanks for looing into this problem.

> > I'm seeing
> > a bug on master (361d5397355) with the
> > /x86_64/migration/postcopy/preempt/recovery/plain test around the areas
> > that this series touches.
> >
> > It happens very rarely and I'm still investigating, but in case you have
> > any thoughts:
> >
> > ====
> > It seems there's a race condition between postcopy resume and the return
> > path cleanup.
> >
> > It is possible for open_return_path_on_source() to setup the new
> > QEMUFile *before* the cleanup path at source_return_path_thread() has
> > had a chance to run, so we end up calling migration_release_dst_files()
> > on the new file and ms->rp_state.from_dst_file gets set to NULL again,
> > leading to a SIGSEGV at qemu_file_get_error(rp) due to rp being NULL.
> 
> I did some more digging and this is indeed what happens. When we pause
> on the incoming side, the to_src_file is closed and the source return
> path sees an error (EBADFD) which leads to the cleanup (from_dst_file =
> NULL). This happens independently and without any synchronization with a
> potential concurrent resume operation.
> 
> Is there a reason for not closing the return path thread and starting a
> new one for resume?

Not anything special.  When I initially worked on that (quite a few years
ago) I thought it would be the simplest we keep hold as much things as
possible, including the threads.  But maybe it's not too hard either to
just reinitiate the thread when resume indeed.

> The from_dst_file is the only thing being changed
> anyway. It would allow us to remove the retry logic along with the
> problematic cleanup path and not need another synchronization point
> between qmp_migrate() and the return path.

I'm fine if you want to remove the return path thread when a pause happens,
as long as we can cleanly do that; if you already drafted something and
looks all good from your side, happy to read it.  We may somewhere need
another call to await_return_path_close_on_source() when pause from
migration thread.

The other way is the main thread can stupidly wait for the two files to be
released, namely, from_dst_file and postcopy_qemufile_src.

> 
> Here's the race (important bit is open_return_path happening before
> migration_release_dst_files):
> 
> migration                 | qmp                         | return path
> --------------------------+-----------------------------+---------------------------------
>                             qmp_migrate_pause()
>                              shutdown(ms->to_dst_file)
>                               f->last_error = -EIO
> migrate_detect_error()
>  postcopy_pause()
>   set_state(PAUSED)
>   wait(postcopy_pause_sem)
>                             qmp_migrate(resume)
>                             migrate_fd_connect()
>                              resume = state == PAUSED
>                              open_return_path <-- TOO SOON!
>                              set_state(RECOVER)
>                              post(postcopy_pause_sem)
>                                                         (incoming closes to_src_file)
>                                                         res = qemu_file_get_error(rp)
>                                                         migration_release_dst_files()
>                                                         ms->rp_state.from_dst_file = NULL
>   post(postcopy_pause_rp_sem)
>                                                         postcopy_pause_return_path_thread()
>                                                           wait(postcopy_pause_rp_sem)
>                                                         rp = ms->rp_state.from_dst_file
>                                                         goto retry
>                                                         qemu_file_get_error(rp)
>                                                         SIGSEGV
> -------------------------------------------------------------------------------------------
> 

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-07-26 16:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-05 16:34 [PATCH v2 0/7] migration: Better error handling in return path thread Peter Xu
2023-07-05 16:34 ` [PATCH v2 1/7] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-07-05 16:34 ` [PATCH v2 2/7] migration: Let migrate_set_error() take ownership Peter Xu
2023-07-05 20:53   ` Fabiano Rosas
2023-07-05 16:34 ` [PATCH v2 3/7] migration: Introduce migrate_has_error() Peter Xu
2023-07-05 20:55   ` Fabiano Rosas
2023-07-05 16:34 ` [PATCH v2 4/7] migration: Refactor error handling in source return path Peter Xu
2023-07-05 16:35 ` [PATCH v2 5/7] migration: Deliver return path file error to migrate state too Peter Xu
2023-07-05 16:35 ` [PATCH v2 6/7] qemufile: Always return a verbose error Peter Xu
2023-07-05 21:54   ` Fabiano Rosas
2023-07-05 22:24     ` Peter Xu
2023-07-06 13:56       ` Fabiano Rosas
2023-07-05 16:35 ` [PATCH v2 7/7] migration: Provide explicit error message for file shutdowns Peter Xu
2023-07-05 22:05   ` Fabiano Rosas
2023-07-05 22:34     ` Peter Xu
2023-07-06 13:50       ` Fabiano Rosas
2023-07-06 16:27         ` Peter Xu
2023-07-06 17:33           ` Fabiano Rosas
2023-07-06 18:08             ` Peter Xu
2023-07-06 18:47               ` Fabiano Rosas
2023-07-24 14:11 ` [PATCH v2 0/7] migration: Better error handling in return path thread Fabiano Rosas
2023-07-25 18:24   ` Fabiano Rosas
2023-07-26 16:36     ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).