qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] migration: do not exit on incoming failure
@ 2024-04-29 19:14 Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

Hi all!

The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".

v5:
- add "migration: process_incoming_migration_co(): fix reporting s->error"

v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested

v3:
- don't refactor the whole code around setting migration error, it seems
  too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions


New behavior can be demonstrated like this:

bash:

(
cat <<EOF
{'execute': 'qmp_capabilities'}
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [{'capability': 'events', 'state': true}]}}
{'execute': 'migrate-incoming', 'arguments': {'uri': 'exec:echo x', 'exit-on-error': false}}
EOF
sleep 1
cat <<EOF
{'execute': 'query-migrate'}
{'execute': 'quit'}
EOF
) | ./build/qemu-system-x86_64 -incoming 'defer' -qmp stdio -nographic -nodefaults

output:

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-149-gb6295ad58c"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 263907}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1714068847, "microseconds": 266696}, "event": "MIGRATION", "data": {"status": "active"}}
qemu-system-x86_64: Not a migration stream
{"timestamp": {"seconds": 1714068847, "microseconds": 266766}, "event": "MIGRATION", "data": {"status": "failed"}}
{"return": {"status": "failed", "error-desc": "load of migration failed: Invalid argument"}}
{"timestamp": {"seconds": 1714068848, "microseconds": 237187}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}

Vladimir Sementsov-Ogievskiy (5):
  migration: move trace-point from migrate_fd_error to migrate_set_error
  migration: process_incoming_migration_co(): complete cleanup on
    failure
  migration: process_incoming_migration_co(): fix reporting s->error
  migration: process_incoming_migration_co(): rework error reporting
  qapi: introduce exit-on-error parameter for migrate-incoming

 migration/migration-hmp-cmds.c |  2 +-
 migration/migration.c          | 76 +++++++++++++++++++++++-----------
 migration/migration.h          |  3 ++
 migration/trace-events         |  2 +-
 qapi/migration.json            |  7 +++-
 system/vl.c                    |  3 +-
 6 files changed, 64 insertions(+), 29 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error
  2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

Cover more cases by trace-point.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c  | 4 +++-
 migration/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
 void migrate_set_error(MigrationState *s, const Error *error)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
+
+    trace_migrate_error(error_get_pretty(error));
+
     if (!s->error) {
         s->error = error_copy(error);
     }
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
-    trace_migrate_fd_error(error_get_pretty(error));
     assert(s->to_dst_file == NULL);
     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                       MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
  2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
  2024-04-29 19:22   ` Peter Xu
  2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
 fail:
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_FAILED);
-    qemu_fclose(mis->from_src_file);
-
-    multifd_recv_cleanup();
-    compress_threads_load_cleanup();
+    migration_incoming_state_destroy();
 
     exit(EXIT_FAILURE);
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
  2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
  2024-04-29 19:32   ` Peter Xu
  2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
  2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

It's bad idea to leave critical section with error object freed, but
s->error still set, this theoretically may lead to use-after-free
crash. Let's avoid it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/migration.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..58fd5819bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
     migration_incoming_state_destroy();
 }
 
+static void migrate_error_free(MigrationState *s)
+{
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    if (s->error) {
+        error_free(s->error);
+        s->error = NULL;
+    }
+}
+
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
+    MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyState ps;
     int ret;
@@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
     }
 
     if (ret < 0) {
-        MigrationState *s = migrate_get_current();
-
         if (migrate_has_error(s)) {
             WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-                error_report_err(s->error);
+                error_report_err(error_copy(s->error));
             }
         }
         error_report("load of migration failed: %s", strerror(-ret));
@@ -801,6 +809,7 @@ fail:
                       MIGRATION_STATUS_FAILED);
     migration_incoming_state_destroy();
 
+    migrate_error_free(s);
     exit(EXIT_FAILURE);
 }
 
@@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
     return qatomic_read(&s->error);
 }
 
-static void migrate_error_free(MigrationState *s)
-{
-    QEMU_LOCK_GUARD(&s->error_mutex);
-    if (s->error) {
-        error_free(s->error);
-        s->error = NULL;
-    }
-}
-
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
     assert(s->to_dst_file == NULL);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
  2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
  2024-04-29 19:39   ` Peter Xu
  2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58fd5819bc..5489ff96df 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyState ps;
     int ret;
+    Error *local_err = NULL;
 
     assert(mis->from_src_file);
 
     if (compress_threads_load_setup(mis->from_src_file)) {
-        error_report("Failed to setup decompress threads");
+        error_setg(&local_err, "Failed to setup decompress threads");
         goto fail;
     }
 
@@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
     }
 
     if (ret < 0) {
-        if (migrate_has_error(s)) {
-            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-                error_report_err(error_copy(s->error));
-            }
-        }
-        error_report("load of migration failed: %s", strerror(-ret));
+        error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
         goto fail;
     }
 
     if (colo_incoming_co() < 0) {
+        error_setg(&local_err, "colo incoming failed");
         goto fail;
     }
 
@@ -809,6 +806,12 @@ fail:
                       MIGRATION_STATUS_FAILED);
     migration_incoming_state_destroy();
 
+    if (migrate_has_error(s)) {
+        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+            error_report_err(error_copy(s->error));
+        }
+    }
+    error_report_err(local_err);
     migrate_error_free(s);
     exit(EXIT_FAILURE);
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
  2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:14 ` Vladimir Sementsov-Ogievskiy
  2024-04-29 19:41   ` Peter Xu
  4 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-29 19:14 UTC (permalink / raw)
  To: peterx, farosas; +Cc: eblake, armbru, pbonzini, qemu-devel, vsementsov, yc-core

Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.

Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.

For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.

For x-exit-preconfig, let's keep the old behavior:
 - it's called from init(), so here we want to keep current behavior by
   default
 - it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
 migration/migration-hmp-cmds.c |  2 +-
 migration/migration.c          | 38 +++++++++++++++++++++++++++-------
 migration/migration.h          |  3 +++
 qapi/migration.json            |  7 ++++++-
 system/vl.c                    |  3 ++-
 5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..23181bbee1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
     }
     QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
 
-    qmp_migrate_incoming(NULL, true, caps, &err);
+    qmp_migrate_incoming(NULL, true, caps, true, false, &err);
     qapi_free_MigrationChannelList(caps);
 
 end:
diff --git a/migration/migration.c b/migration/migration.c
index 5489ff96df..09f762c99e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,6 +72,8 @@
 #define NOTIFIER_ELEM_INIT(array, elem)    \
     [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
 
+#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
+
 static NotifierWithReturnList migration_state_notifiers[] = {
     NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
     NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
@@ -234,6 +236,8 @@ void migration_object_init(void)
     qemu_cond_init(&current_incoming->page_request_cond);
     current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
+    current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
     migration_object_check(current_migration, &error_fatal);
 
     blk_mig_init();
@@ -806,14 +810,19 @@ fail:
                       MIGRATION_STATUS_FAILED);
     migration_incoming_state_destroy();
 
-    if (migrate_has_error(s)) {
-        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-            error_report_err(error_copy(s->error));
+    if (mis->exit_on_error) {
+        if (migrate_has_error(s)) {
+            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+                error_report_err(error_copy(s->error));
+            }
         }
+        error_report_err(local_err);
+        migrate_error_free(s);
+        exit(EXIT_FAILURE);
+    } else {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
     }
-    error_report_err(local_err);
-    migrate_error_free(s);
-    exit(EXIT_FAILURE);
 }
 
 /**
@@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info)
         break;
     }
     info->status = mis->state;
+
+    if (!info->error_desc) {
+        MigrationState *s = migrate_get_current();
+        QEMU_LOCK_GUARD(&s->error_mutex);
+
+        if (s->error) {
+            info->error_desc = g_strdup(error_get_pretty(s->error));
+        }
+    }
 }
 
 MigrationInfo *qmp_query_migrate(Error **errp)
@@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp)
 }
 
 void qmp_migrate_incoming(const char *uri, bool has_channels,
-                          MigrationChannelList *channels, Error **errp)
+                          MigrationChannelList *channels,
+                          bool has_exit_on_error, bool exit_on_error,
+                          Error **errp)
 {
     Error *local_err = NULL;
     static bool once = true;
+    MigrationIncomingState *mis = migration_incoming_get_current();
 
     if (!once) {
         error_setg(errp, "The incoming migration has already been started");
@@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
         return;
     }
 
+    mis->exit_on_error =
+        has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
     qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
 
     if (local_err) {
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..95995a818e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
      * is needed as this field is updated serially.
      */
     unsigned int switchover_ack_pending_num;
+
+    /* Do exit on incoming migration failure */
+    bool exit_on_error;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..9feed413b5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1837,6 +1837,10 @@
 # @channels: list of migration stream channels with each stream in the
 #     list connected to a destination interface endpoint.
 #
+# @exit-on-error: Exit on incoming migration failure.  Default true.
+#     When set to false, the failure triggers a MIGRATION event, and
+#     error details could be retrieved with query-migrate.  (since 9.1)
+#
 # Since: 2.3
 #
 # Notes:
@@ -1889,7 +1893,8 @@
 ##
 { 'command': 'migrate-incoming',
              'data': {'*uri': 'str',
-                      '*channels': [ 'MigrationChannel' ] } }
+                      '*channels': [ 'MigrationChannel' ],
+                      '*exit-on-error': 'bool' } }
 
 ##
 # @xen-save-devices-state:
diff --git a/system/vl.c b/system/vl.c
index 7756eac81e..79cd498395 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp)
     if (incoming) {
         Error *local_err = NULL;
         if (strcmp(incoming, "defer") != 0) {
-            qmp_migrate_incoming(incoming, false, NULL, &local_err);
+            qmp_migrate_incoming(incoming, false, NULL, true, true,
+                                 &local_err);
             if (local_err) {
                 error_reportf_err(local_err, "-incoming %s: ", incoming);
                 exit(1);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure
  2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:22   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On Mon, Apr 29, 2024 at 10:14:23PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Make call to migration_incoming_state_destroy(), instead of doing only
> partial of it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
  2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:32   ` Peter Xu
  2024-04-30  8:06     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> It's bad idea to leave critical section with error object freed, but
> s->error still set, this theoretically may lead to use-after-free
> crash. Let's avoid it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  migration/migration.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0d26db47f7..58fd5819bc 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>      migration_incoming_state_destroy();
>  }
>  
> +static void migrate_error_free(MigrationState *s)
> +{
> +    QEMU_LOCK_GUARD(&s->error_mutex);
> +    if (s->error) {
> +        error_free(s->error);
> +        s->error = NULL;
> +    }
> +}
> +
>  static void coroutine_fn
>  process_incoming_migration_co(void *opaque)
>  {
> +    MigrationState *s = migrate_get_current();
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      PostcopyState ps;
>      int ret;
> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>      }
>  
>      if (ret < 0) {
> -        MigrationState *s = migrate_get_current();
> -
>          if (migrate_has_error(s)) {
>              WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> -                error_report_err(s->error);
> +                error_report_err(error_copy(s->error));

This looks like a bugfix, agreed.

>              }
>          }
>          error_report("load of migration failed: %s", strerror(-ret));
> @@ -801,6 +809,7 @@ fail:
>                        MIGRATION_STATUS_FAILED);
>      migration_incoming_state_destroy();
>  
> +    migrate_error_free(s);

Would migration_incoming_state_destroy() be a better place?

One thing weird is we actually reuses MigrationState*'s error for incoming
too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
calling migrate_error_free even in incoming_state_destroy() looks ok.

This patch still looks like containing two changes.  Better split them (or
just fix the bug only)?

Thanks,

>      exit(EXIT_FAILURE);
>  }
>  
> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>      return qatomic_read(&s->error);
>  }
>  
> -static void migrate_error_free(MigrationState *s)
> -{
> -    QEMU_LOCK_GUARD(&s->error_mutex);
> -    if (s->error) {
> -        error_free(s->error);
> -        s->error = NULL;
> -    }
> -}
> -
>  static void migrate_fd_error(MigrationState *s, const Error *error)
>  {
>      assert(s->to_dst_file == NULL);
> -- 
> 2.34.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
  2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:39   ` Peter Xu
  2024-04-30  7:12     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Unify error reporting in the function. This simplifies the following
> commit, which will not-exit-on-error behavior variant to the function.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 58fd5819bc..5489ff96df 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      PostcopyState ps;
>      int ret;
> +    Error *local_err = NULL;
>  
>      assert(mis->from_src_file);
>  
>      if (compress_threads_load_setup(mis->from_src_file)) {
> -        error_report("Failed to setup decompress threads");
> +        error_setg(&local_err, "Failed to setup decompress threads");
>          goto fail;
>      }
>  
> @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
>      }
>  
>      if (ret < 0) {
> -        if (migrate_has_error(s)) {
> -            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> -                error_report_err(error_copy(s->error));
> -            }
> -        }
> -        error_report("load of migration failed: %s", strerror(-ret));
> +        error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
>          goto fail;
>      }
>  
>      if (colo_incoming_co() < 0) {
> +        error_setg(&local_err, "colo incoming failed");
>          goto fail;
>      }
>  
> @@ -809,6 +806,12 @@ fail:
>                        MIGRATION_STATUS_FAILED);
>      migration_incoming_state_destroy();
>  
> +    if (migrate_has_error(s)) {
> +        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> +            error_report_err(error_copy(s->error));
> +        }
> +    }
> +    error_report_err(local_err);

Here migrate_has_error() will always return true?  If so we can drop it.

Meanwhile, IMHO it's easier we simply always keep the earliest error we see
and report that only, local_err is just one of the errors and whoever
reaches first will be reported.  Something like:

  migrate_set_error(local_err);
  WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
      error_report_err(error_copy(s->error));
  }
  exit(EXIT_FAILURE);

Then when with the exit-on-error thing:

  migrate_set_error(local_err);
  if (exit_on_error) {
    WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
      error_report_err(error_copy(s->error));
    }
    exit(EXIT_FAILURE);
  }

Would this looks slightly cleaner?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming
  2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
@ 2024-04-29 19:41   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2024-04-29 19:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On Mon, Apr 29, 2024 at 10:14:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Now we do set MIGRATION_FAILED state, but don't give a chance to
> orchestrator to query migration state and get the error.
> 
> Let's provide a possibility for QMP-based orchestrators to get an error
> like with outgoing migration.
> 
> For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
> and ABI, it's mostly intended to use by developer and it makes sense
> not to stop the process.
> 
> For x-exit-preconfig, let's keep the old behavior:
>  - it's called from init(), so here we want to keep current behavior by
>    default
>  - it does exit on error by itself as well
> So, if we want to change the behavior of x-exit-preconfig, it should be
> another patch.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/migration-hmp-cmds.c |  2 +-
>  migration/migration.c          | 38 +++++++++++++++++++++++++++-------
>  migration/migration.h          |  3 +++
>  qapi/migration.json            |  7 ++++++-
>  system/vl.c                    |  3 ++-
>  5 files changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..23181bbee1 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
>      }
>      QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
>  
> -    qmp_migrate_incoming(NULL, true, caps, &err);
> +    qmp_migrate_incoming(NULL, true, caps, true, false, &err);
>      qapi_free_MigrationChannelList(caps);
>  
>  end:
> diff --git a/migration/migration.c b/migration/migration.c
> index 5489ff96df..09f762c99e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -72,6 +72,8 @@
>  #define NOTIFIER_ELEM_INIT(array, elem)    \
>      [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
>  
> +#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
> +
>  static NotifierWithReturnList migration_state_notifiers[] = {
>      NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
>      NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
> @@ -234,6 +236,8 @@ void migration_object_init(void)
>      qemu_cond_init(&current_incoming->page_request_cond);
>      current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>  
> +    current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> +
>      migration_object_check(current_migration, &error_fatal);
>  
>      blk_mig_init();
> @@ -806,14 +810,19 @@ fail:
>                        MIGRATION_STATUS_FAILED);
>      migration_incoming_state_destroy();
>  
> -    if (migrate_has_error(s)) {
> -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> -            error_report_err(error_copy(s->error));
> +    if (mis->exit_on_error) {
> +        if (migrate_has_error(s)) {
> +            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> +                error_report_err(error_copy(s->error));
> +            }
>          }
> +        error_report_err(local_err);
> +        migrate_error_free(s);
> +        exit(EXIT_FAILURE);
> +    } else {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
>      }
> -    error_report_err(local_err);
> -    migrate_error_free(s);
> -    exit(EXIT_FAILURE);
>  }

(Please see my reply in previous patch)

>  
>  /**
> @@ -1322,6 +1331,15 @@ static void fill_destination_migration_info(MigrationInfo *info)
>          break;
>      }
>      info->status = mis->state;
> +
> +    if (!info->error_desc) {

This will also guarantee to happen?  Can drop the check too if so.

Other than that looks good, thanks.

> +        MigrationState *s = migrate_get_current();
> +        QEMU_LOCK_GUARD(&s->error_mutex);
> +
> +        if (s->error) {
> +            info->error_desc = g_strdup(error_get_pretty(s->error));
> +        }
> +    }
>  }
>  
>  MigrationInfo *qmp_query_migrate(Error **errp)
> @@ -1796,10 +1814,13 @@ void migrate_del_blocker(Error **reasonp)
>  }
>  
>  void qmp_migrate_incoming(const char *uri, bool has_channels,
> -                          MigrationChannelList *channels, Error **errp)
> +                          MigrationChannelList *channels,
> +                          bool has_exit_on_error, bool exit_on_error,
> +                          Error **errp)
>  {
>      Error *local_err = NULL;
>      static bool once = true;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      if (!once) {
>          error_setg(errp, "The incoming migration has already been started");
> @@ -1814,6 +1835,9 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
>          return;
>      }
>  
> +    mis->exit_on_error =
> +        has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
> +
>      qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
>  
>      if (local_err) {
> diff --git a/migration/migration.h b/migration/migration.h
> index 8045e39c26..95995a818e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -227,6 +227,9 @@ struct MigrationIncomingState {
>       * is needed as this field is updated serially.
>       */
>      unsigned int switchover_ack_pending_num;
> +
> +    /* Do exit on incoming migration failure */
> +    bool exit_on_error;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8c65b90328..9feed413b5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1837,6 +1837,10 @@
>  # @channels: list of migration stream channels with each stream in the
>  #     list connected to a destination interface endpoint.
>  #
> +# @exit-on-error: Exit on incoming migration failure.  Default true.
> +#     When set to false, the failure triggers a MIGRATION event, and
> +#     error details could be retrieved with query-migrate.  (since 9.1)
> +#
>  # Since: 2.3
>  #
>  # Notes:
> @@ -1889,7 +1893,8 @@
>  ##
>  { 'command': 'migrate-incoming',
>               'data': {'*uri': 'str',
> -                      '*channels': [ 'MigrationChannel' ] } }
> +                      '*channels': [ 'MigrationChannel' ],
> +                      '*exit-on-error': 'bool' } }
>  
>  ##
>  # @xen-save-devices-state:
> diff --git a/system/vl.c b/system/vl.c
> index 7756eac81e..79cd498395 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -2723,7 +2723,8 @@ void qmp_x_exit_preconfig(Error **errp)
>      if (incoming) {
>          Error *local_err = NULL;
>          if (strcmp(incoming, "defer") != 0) {
> -            qmp_migrate_incoming(incoming, false, NULL, &local_err);
> +            qmp_migrate_incoming(incoming, false, NULL, true, true,
> +                                 &local_err);
>              if (local_err) {
>                  error_reportf_err(local_err, "-incoming %s: ", incoming);
>                  exit(1);
> -- 
> 2.34.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting
  2024-04-29 19:39   ` Peter Xu
@ 2024-04-30  7:12     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30  7:12 UTC (permalink / raw)
  To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On 29.04.24 22:39, Peter Xu wrote:
> On Mon, Apr 29, 2024 at 10:14:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Unify error reporting in the function. This simplifies the following
>> commit, which will not-exit-on-error behavior variant to the function.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>   migration/migration.c | 17 ++++++++++-------
>>   1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 58fd5819bc..5489ff96df 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -748,11 +748,12 @@ process_incoming_migration_co(void *opaque)
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>       PostcopyState ps;
>>       int ret;
>> +    Error *local_err = NULL;
>>   
>>       assert(mis->from_src_file);
>>   
>>       if (compress_threads_load_setup(mis->from_src_file)) {
>> -        error_report("Failed to setup decompress threads");
>> +        error_setg(&local_err, "Failed to setup decompress threads");
>>           goto fail;
>>       }
>>   
>> @@ -789,16 +790,12 @@ process_incoming_migration_co(void *opaque)
>>       }
>>   
>>       if (ret < 0) {
>> -        if (migrate_has_error(s)) {
>> -            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> -                error_report_err(error_copy(s->error));
>> -            }
>> -        }
>> -        error_report("load of migration failed: %s", strerror(-ret));
>> +        error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
>>           goto fail;
>>       }
>>   
>>       if (colo_incoming_co() < 0) {
>> +        error_setg(&local_err, "colo incoming failed");
>>           goto fail;
>>       }
>>   
>> @@ -809,6 +806,12 @@ fail:
>>                         MIGRATION_STATUS_FAILED);
>>       migration_incoming_state_destroy();
>>   
>> +    if (migrate_has_error(s)) {
>> +        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> +            error_report_err(error_copy(s->error));
>> +        }
>> +    }
>> +    error_report_err(local_err);
> 
> Here migrate_has_error() will always return true?  If so we can drop it.
> 
> Meanwhile, IMHO it's easier we simply always keep the earliest error we see
> and report that only, local_err is just one of the errors and whoever
> reaches first will be reported.  Something like:
> 
>    migrate_set_error(local_err);
>    WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>        error_report_err(error_copy(s->error));
>    }
>    exit(EXIT_FAILURE);
> 
> Then when with the exit-on-error thing:
> 
>    migrate_set_error(local_err);
>    if (exit_on_error) {
>      WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>        error_report_err(error_copy(s->error));
>      }
>      exit(EXIT_FAILURE);
>    }
> 
> Would this looks slightly cleaner?
> 

Yes, looks better, will do so

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
  2024-04-29 19:32   ` Peter Xu
@ 2024-04-30  8:06     ` Vladimir Sementsov-Ogievskiy
  2024-04-30  8:09       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30  8:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On 29.04.24 22:32, Peter Xu wrote:
> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> It's bad idea to leave critical section with error object freed, but
>> s->error still set, this theoretically may lead to use-after-free
>> crash. Let's avoid it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   migration/migration.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 0d26db47f7..58fd5819bc 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>>       migration_incoming_state_destroy();
>>   }
>>   
>> +static void migrate_error_free(MigrationState *s)
>> +{
>> +    QEMU_LOCK_GUARD(&s->error_mutex);
>> +    if (s->error) {
>> +        error_free(s->error);
>> +        s->error = NULL;
>> +    }
>> +}
>> +
>>   static void coroutine_fn
>>   process_incoming_migration_co(void *opaque)
>>   {
>> +    MigrationState *s = migrate_get_current();
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>       PostcopyState ps;
>>       int ret;
>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>>       }
>>   
>>       if (ret < 0) {
>> -        MigrationState *s = migrate_get_current();
>> -
>>           if (migrate_has_error(s)) {
>>               WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> -                error_report_err(s->error);
>> +                error_report_err(error_copy(s->error));
> 
> This looks like a bugfix, agreed.
> 
>>               }
>>           }
>>           error_report("load of migration failed: %s", strerror(-ret));
>> @@ -801,6 +809,7 @@ fail:
>>                         MIGRATION_STATUS_FAILED);
>>       migration_incoming_state_destroy();
>>   
>> +    migrate_error_free(s);
> 
> Would migration_incoming_state_destroy() be a better place?

Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.

> 
> One thing weird is we actually reuses MigrationState*'s error for incoming
> too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
> calling migrate_error_free even in incoming_state_destroy() looks ok.
> 
> This patch still looks like containing two changes.  Better split them (or
> just fix the bug only)?
> 
> Thanks,
> 
>>       exit(EXIT_FAILURE);
>>   }
>>   
>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>>       return qatomic_read(&s->error);
>>   }
>>   
>> -static void migrate_error_free(MigrationState *s)
>> -{
>> -    QEMU_LOCK_GUARD(&s->error_mutex);
>> -    if (s->error) {
>> -        error_free(s->error);
>> -        s->error = NULL;
>> -    }
>> -}
>> -
>>   static void migrate_fd_error(MigrationState *s, const Error *error)
>>   {
>>       assert(s->to_dst_file == NULL);
>> -- 
>> 2.34.1
>>
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
  2024-04-30  8:06     ` Vladimir Sementsov-Ogievskiy
@ 2024-04-30  8:09       ` Vladimir Sementsov-Ogievskiy
  2024-04-30  8:47         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30  8:09 UTC (permalink / raw)
  To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:
> On 29.04.24 22:32, Peter Xu wrote:
>> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> It's bad idea to leave critical section with error object freed, but
>>> s->error still set, this theoretically may lead to use-after-free
>>> crash. Let's avoid it.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>> ---
>>>   migration/migration.c | 24 ++++++++++++------------
>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 0d26db47f7..58fd5819bc 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>>>       migration_incoming_state_destroy();
>>>   }
>>> +static void migrate_error_free(MigrationState *s)
>>> +{
>>> +    QEMU_LOCK_GUARD(&s->error_mutex);
>>> +    if (s->error) {
>>> +        error_free(s->error);
>>> +        s->error = NULL;
>>> +    }
>>> +}
>>> +
>>>   static void coroutine_fn
>>>   process_incoming_migration_co(void *opaque)
>>>   {
>>> +    MigrationState *s = migrate_get_current();
>>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>>       PostcopyState ps;
>>>       int ret;
>>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>>>       }
>>>       if (ret < 0) {
>>> -        MigrationState *s = migrate_get_current();
>>> -
>>>           if (migrate_has_error(s)) {
>>>               WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>> -                error_report_err(s->error);
>>> +                error_report_err(error_copy(s->error));
>>
>> This looks like a bugfix, agreed.
>>
>>>               }
>>>           }
>>>           error_report("load of migration failed: %s", strerror(-ret));
>>> @@ -801,6 +809,7 @@ fail:
>>>                         MIGRATION_STATUS_FAILED);
>>>       migration_incoming_state_destroy();
>>> +    migrate_error_free(s);
>>
>> Would migration_incoming_state_destroy() be a better place?
> 
> Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.

Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit().

> 
>>
>> One thing weird is we actually reuses MigrationState*'s error for incoming
>> too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
>> calling migrate_error_free even in incoming_state_destroy() looks ok.
>>
>> This patch still looks like containing two changes.  Better split them (or
>> just fix the bug only)?
>>
>> Thanks,
>>
>>>       exit(EXIT_FAILURE);
>>>   }
>>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>>>       return qatomic_read(&s->error);
>>>   }
>>> -static void migrate_error_free(MigrationState *s)
>>> -{
>>> -    QEMU_LOCK_GUARD(&s->error_mutex);
>>> -    if (s->error) {
>>> -        error_free(s->error);
>>> -        s->error = NULL;
>>> -    }
>>> -}
>>> -
>>>   static void migrate_fd_error(MigrationState *s, const Error *error)
>>>   {
>>>       assert(s->to_dst_file == NULL);
>>> -- 
>>> 2.34.1
>>>
>>
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error
  2024-04-30  8:09       ` Vladimir Sementsov-Ogievskiy
@ 2024-04-30  8:47         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-04-30  8:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: farosas, eblake, armbru, pbonzini, qemu-devel, yc-core

On 30.04.24 11:09, Vladimir Sementsov-Ogievskiy wrote:
> On 30.04.24 11:06, Vladimir Sementsov-Ogievskiy wrote:
>> On 29.04.24 22:32, Peter Xu wrote:
>>> On Mon, Apr 29, 2024 at 10:14:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> It's bad idea to leave critical section with error object freed, but
>>>> s->error still set, this theoretically may lead to use-after-free
>>>> crash. Let's avoid it.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>>   migration/migration.c | 24 ++++++++++++------------
>>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 0d26db47f7..58fd5819bc 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -732,9 +732,19 @@ static void process_incoming_migration_bh(void *opaque)
>>>>       migration_incoming_state_destroy();
>>>>   }
>>>> +static void migrate_error_free(MigrationState *s)
>>>> +{
>>>> +    QEMU_LOCK_GUARD(&s->error_mutex);
>>>> +    if (s->error) {
>>>> +        error_free(s->error);
>>>> +        s->error = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>>   static void coroutine_fn
>>>>   process_incoming_migration_co(void *opaque)
>>>>   {
>>>> +    MigrationState *s = migrate_get_current();
>>>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>>>       PostcopyState ps;
>>>>       int ret;
>>>> @@ -779,11 +789,9 @@ process_incoming_migration_co(void *opaque)
>>>>       }
>>>>       if (ret < 0) {
>>>> -        MigrationState *s = migrate_get_current();
>>>> -
>>>>           if (migrate_has_error(s)) {
>>>>               WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>>> -                error_report_err(s->error);
>>>> +                error_report_err(error_copy(s->error));
>>>
>>> This looks like a bugfix, agreed.
>>>
>>>>               }
>>>>           }
>>>>           error_report("load of migration failed: %s", strerror(-ret));
>>>> @@ -801,6 +809,7 @@ fail:
>>>>                         MIGRATION_STATUS_FAILED);
>>>>       migration_incoming_state_destroy();
>>>> +    migrate_error_free(s);
>>>
>>> Would migration_incoming_state_destroy() be a better place?
>>
>> Hmm. But we want to call migration_incoming_state_destroy() in case when exit-on-error=false too. And in this case we want to keep the error for further query-migrate commands.
> 
> Actually, I think we shouldn't care about freeing the error for exit() case. I think, we skip a lot of other cleanups which we normally do in main() in case of this exit().
> 

Or, just do the simplest fix of potential use-after-free, and don't care:

WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
    error_report_err(s->error);
    s->error = NULL;
}

- I'll send v6 with it.

>>
>>>
>>> One thing weird is we actually reuses MigrationState*'s error for incoming
>>> too, but so far it looks ok as long as QEMU can't be both src & dst.  Then
>>> calling migrate_error_free even in incoming_state_destroy() looks ok.
>>>
>>> This patch still looks like containing two changes.  Better split them (or
>>> just fix the bug only)?
>>>
>>> Thanks,
>>>
>>>>       exit(EXIT_FAILURE);
>>>>   }
>>>> @@ -1433,15 +1442,6 @@ bool migrate_has_error(MigrationState *s)
>>>>       return qatomic_read(&s->error);
>>>>   }
>>>> -static void migrate_error_free(MigrationState *s)
>>>> -{
>>>> -    QEMU_LOCK_GUARD(&s->error_mutex);
>>>> -    if (s->error) {
>>>> -        error_free(s->error);
>>>> -        s->error = NULL;
>>>> -    }
>>>> -}
>>>> -
>>>>   static void migrate_fd_error(MigrationState *s, const Error *error)
>>>>   {
>>>>       assert(s->to_dst_file == NULL);
>>>> -- 
>>>> 2.34.1
>>>>
>>>
>>
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-30  8:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29 19:14 [PATCH v5 0/5] migration: do not exit on incoming failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 1/5] migration: move trace-point from migrate_fd_error to migrate_set_error Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 2/5] migration: process_incoming_migration_co(): complete cleanup on failure Vladimir Sementsov-Ogievskiy
2024-04-29 19:22   ` Peter Xu
2024-04-29 19:14 ` [PATCH v5 3/5] migration: process_incoming_migration_co(): fix reporting s->error Vladimir Sementsov-Ogievskiy
2024-04-29 19:32   ` Peter Xu
2024-04-30  8:06     ` Vladimir Sementsov-Ogievskiy
2024-04-30  8:09       ` Vladimir Sementsov-Ogievskiy
2024-04-30  8:47         ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 4/5] migration: process_incoming_migration_co(): rework error reporting Vladimir Sementsov-Ogievskiy
2024-04-29 19:39   ` Peter Xu
2024-04-30  7:12     ` Vladimir Sementsov-Ogievskiy
2024-04-29 19:14 ` [PATCH v5 5/5] qapi: introduce exit-on-error parameter for migrate-incoming Vladimir Sementsov-Ogievskiy
2024-04-29 19:41   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).