* [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues
@ 2018-11-29 10:03 Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails Fei Li
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel
These five patches almost get the Reviewed-by and are extracted from
previous "[PATCH RFC v7 0/9] qemu_thread_create: propagate errors to
callers to check." The mentioned patch series have waited on one
multifd issue for a while and still needs a further discussion.
Thus separate(send) these five almost-done patches and hope they can
be merged for the next tag. Thanks for the review. :)
v2:
- Update the commit message for patch 1/5, and get one more
Reviewed-by.
- Get one Reviewed-by for patch 3/5.
Fei Li (5):
Fix segmentation fault when qemu_signal_init fails
qemu_thread_join: fix segmentation fault
migration: fix the multifd code when receiving less channels
migration: remove unused &local_err parameter in multifd_save_cleanup
migration: add more error handling for postcopy_ram_enable_notify
migration/channel.c | 11 ++++++-----
migration/migration.c | 14 ++++++++------
migration/migration.h | 2 +-
migration/postcopy-ram.c | 1 +
migration/ram.c | 18 ++++++++++--------
migration/ram.h | 4 ++--
migration/savevm.c | 1 +
util/main-loop.c | 8 ++++----
util/qemu-thread-posix.c | 3 +++
util/qemu-thread-win32.c | 2 +-
10 files changed, 37 insertions(+), 27 deletions(-)
--
2.13.7
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
@ 2018-11-29 10:03 ` Fei Li
2018-11-29 12:49 ` Markus Armbruster
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 2/5] qemu_thread_join: fix segmentation fault Fei Li
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel; +Cc: Markus Armbruster, Fam Zheng
When qemu_signal_init() fails in qemu_init_main_loop(), we return
without setting an error. Its callers crash then when they try to
report the error with error_report_err().
To avoid such segmentation fault, add a new Error parameter to make
the call trace to propagate the err to the final caller.
Fixes: 2f78e491d7b46542158ce0b8132ee4e05bc0ade4
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
util/main-loop.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/util/main-loop.c b/util/main-loop.c
index affe0403c5..443cb4cfe8 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -71,7 +71,7 @@ static void sigfd_handler(void *opaque)
}
}
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
{
int sigfd;
sigset_t set;
@@ -96,7 +96,7 @@ static int qemu_signal_init(void)
sigdelset(&set, SIG_IPI);
sigfd = qemu_signalfd(&set);
if (sigfd == -1) {
- fprintf(stderr, "failed to create signalfd\n");
+ error_setg_errno(errp, errno, "failed to create signalfd");
return -errno;
}
@@ -109,7 +109,7 @@ static int qemu_signal_init(void)
#else /* _WIN32 */
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
{
return 0;
}
@@ -148,7 +148,7 @@ int qemu_init_main_loop(Error **errp)
init_clocks(qemu_timer_notify_cb);
- ret = qemu_signal_init();
+ ret = qemu_signal_init(errp);
if (ret) {
return ret;
}
--
2.13.7
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC v2 2/5] qemu_thread_join: fix segmentation fault
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails Fei Li
@ 2018-11-29 10:03 ` Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels Fei Li
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel
To avoid the segmentation fault in qemu_thread_join(), just directly
return when the QemuThread *thread failed to be created in either
qemu-thread-posix.c or qemu-thread-win32.c.
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
util/qemu-thread-posix.c | 3 +++
util/qemu-thread-win32.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 865e476df5..b9ab5a4711 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -558,6 +558,9 @@ void *qemu_thread_join(QemuThread *thread)
int err;
void *ret;
+ if (!thread->thread) {
+ return NULL;
+ }
err = pthread_join(thread->thread, &ret);
if (err) {
error_exit(err, __func__);
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 4a363ca675..1a27e1cf6f 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -366,7 +366,7 @@ void *qemu_thread_join(QemuThread *thread)
HANDLE handle;
data = thread->data;
- if (data->mode == QEMU_THREAD_DETACHED) {
+ if (data == NULL || data->mode == QEMU_THREAD_DETACHED) {
return NULL;
}
--
2.13.7
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 2/5] qemu_thread_join: fix segmentation fault Fei Li
@ 2018-11-29 10:03 ` Fei Li
2018-11-29 14:46 ` Philippe Mathieu-Daudé
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel
In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.
Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel.
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
migration/channel.c | 11 ++++++-----
migration/migration.c | 9 +++++++--
migration/migration.h | 2 +-
migration/ram.c | 7 ++++++-
migration/ram.h | 2 +-
5 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/migration/channel.c b/migration/channel.c
index 33e0e9b82f..20e4c8e2dc 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -30,6 +30,7 @@
void migration_channel_process_incoming(QIOChannel *ioc)
{
MigrationState *s = migrate_get_current();
+ Error *local_err = NULL;
trace_migration_set_incoming_channel(
ioc, object_get_typename(OBJECT(ioc)));
@@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
*s->parameters.tls_creds &&
!object_dynamic_cast(OBJECT(ioc),
TYPE_QIO_CHANNEL_TLS)) {
- Error *local_err = NULL;
migration_tls_channel_process_incoming(s, ioc, &local_err);
- if (local_err) {
- error_report_err(local_err);
- }
} else {
- migration_ioc_process_incoming(ioc);
+ migration_ioc_process_incoming(ioc, &local_err);
+ }
+
+ if (local_err) {
+ error_report_err(local_err);
}
}
diff --git a/migration/migration.c b/migration/migration.c
index 49ffb9997a..72106bddf0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
migration_incoming_process();
}
-void migration_ioc_process_incoming(QIOChannel *ioc)
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
{
MigrationIncomingState *mis = migration_incoming_get_current();
bool start_migration;
@@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
*/
start_migration = !migrate_use_multifd();
} else {
+ Error *local_err = NULL;
/* Multiple connections */
assert(migrate_use_multifd());
- start_migration = multifd_recv_new_channel(ioc);
+ start_migration = multifd_recv_new_channel(ioc, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
}
if (start_migration) {
diff --git a/migration/migration.h b/migration/migration.h
index e413d4d8b6..02b7304610 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -229,7 +229,7 @@ struct MigrationState
void migrate_set_state(int *state, int old_state, int new_state);
void migration_fd_process_incoming(QEMUFile *f);
-void migration_ioc_process_incoming(QIOChannel *ioc);
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
void migration_incoming_process(void);
bool migration_has_all_channels(void);
diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec4d8..e13b9349d0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
}
/* Return true if multifd is ready for the migration, otherwise false */
-bool multifd_recv_new_channel(QIOChannel *ioc)
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
{
MultiFDRecvParams *p;
Error *local_err = NULL;
@@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
id = multifd_recv_initial_packet(ioc, &local_err);
if (id < 0) {
+ error_propagate_prepend(errp, local_err,
+ "failed to receive packet"
+ " via multifd channel %d: ",
+ multifd_recv_state->count);
multifd_recv_terminate_threads(local_err);
return false;
}
@@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
error_setg(&local_err, "multifd: received id '%d' already setup'",
id);
multifd_recv_terminate_threads(local_err);
+ error_propagate(errp, local_err);
return false;
}
p->c = ioc;
diff --git a/migration/ram.h b/migration/ram.h
index 83ff1bc11a..046d3074be 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
int multifd_load_setup(void);
int multifd_load_cleanup(Error **errp);
bool multifd_recv_all_channels_created(void);
-bool multifd_recv_new_channel(QIOChannel *ioc);
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
uint64_t ram_pagesize_summary(void);
int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
--
2.13.7
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
` (2 preceding siblings ...)
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-11-29 10:03 ` Fei Li
2018-11-29 14:50 ` Philippe Mathieu-Daudé
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-11-29 14:20 ` [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues Eric Blake
5 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel
Always call migrate_set_error() to set the error state without relying
on whether multifd_save_cleanup() succeeds. As the passed &local_err
is never used in multifd_save_cleanup(), remove it. And make the
function be: void multifd_save_cleanup(void).
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
migration/migration.c | 5 +----
migration/ram.c | 11 ++++-------
migration/ram.h | 2 +-
3 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 72106bddf0..0537fc0c26 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1386,7 +1386,6 @@ static void migrate_fd_cleanup(void *opaque)
qemu_savevm_state_cleanup();
if (s->to_dst_file) {
- Error *local_err = NULL;
QEMUFile *tmp;
trace_migrate_fd_cleanup();
@@ -1397,9 +1396,7 @@ static void migrate_fd_cleanup(void *opaque)
}
qemu_mutex_lock_iothread();
- if (multifd_save_cleanup(&local_err) != 0) {
- error_report_err(local_err);
- }
+ multifd_save_cleanup();
qemu_mutex_lock(&s->qemu_file_lock);
tmp = s->to_dst_file;
s->to_dst_file = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index e13b9349d0..c44cb6f56d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -917,13 +917,12 @@ static void multifd_send_terminate_threads(Error *err)
}
}
-int multifd_save_cleanup(Error **errp)
+void multifd_save_cleanup(void)
{
int i;
- int ret = 0;
if (!migrate_use_multifd()) {
- return 0;
+ return;
}
multifd_send_terminate_threads(NULL);
for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -953,7 +952,6 @@ int multifd_save_cleanup(Error **errp)
multifd_send_state->pages = NULL;
g_free(multifd_send_state);
multifd_send_state = NULL;
- return ret;
}
static void multifd_send_sync_main(void)
@@ -1071,9 +1069,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
Error *local_err = NULL;
if (qio_task_propagate_error(task, &local_err)) {
- if (multifd_save_cleanup(&local_err) != 0) {
- migrate_set_error(migrate_get_current(), local_err);
- }
+ migrate_set_error(migrate_get_current(), local_err);
+ multifd_save_cleanup();
} else {
p->c = QIO_CHANNEL(sioc);
qio_channel_set_delay(p->c, false);
diff --git a/migration/ram.h b/migration/ram.h
index 046d3074be..936177b3e9 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
uint64_t ram_bytes_total(void);
int multifd_save_setup(void);
-int multifd_save_cleanup(Error **errp);
+void multifd_save_cleanup(void);
int multifd_load_setup(void);
int multifd_load_cleanup(Error **errp);
bool multifd_recv_all_channels_created(void);
--
2.13.7
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
` (3 preceding siblings ...)
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
@ 2018-11-29 10:03 ` Fei Li
2018-11-30 9:49 ` Dr. David Alan Gilbert
2018-11-29 14:20 ` [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues Eric Blake
5 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-29 10:03 UTC (permalink / raw)
To: qemu-devel; +Cc: Dr . David Alan Gilbert
Call postcopy_ram_incoming_cleanup() to do the cleanup when
postcopy_ram_enable_notify fails. Besides, report the error
message when qemu_ram_foreach_migratable_block() fails.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
---
migration/postcopy-ram.c | 1 +
migration/savevm.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e5c02a32c5..fa09dba534 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1117,6 +1117,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
/* Mark so that we get notified of accesses to unwritten areas */
if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
+ error_report("ram_block_enable_notify failed");
return -1;
}
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4f3f..d784e8aa40 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1729,6 +1729,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
*/
if (migrate_postcopy_ram()) {
if (postcopy_ram_enable_notify(mis)) {
+ postcopy_ram_incoming_cleanup(mis);
return -1;
}
}
--
2.13.7
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails Fei Li
@ 2018-11-29 12:49 ` Markus Armbruster
2018-11-30 3:29 ` Fei Li
0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2018-11-29 12:49 UTC (permalink / raw)
To: Fei Li; +Cc: qemu-devel, Fam Zheng, Paolo Bonzini
You neglected to cc the maintainer. I'm doing that for you now. Cc'ing
maintainers is important to maximize your chances at getting your
patches picked up. Use scripts/get_maintainer.pl to find them.
Fei Li <fli@suse.com> writes:
> When qemu_signal_init() fails in qemu_init_main_loop(), we return
> without setting an error. Its callers crash then when they try to
> report the error with error_report_err().
>
> To avoid such segmentation fault, add a new Error parameter to make
> the call trace to propagate the err to the final caller.
>
> Fixes: 2f78e491d7b46542158ce0b8132ee4e05bc0ade4
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Fam Zheng <famz@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
> util/main-loop.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/util/main-loop.c b/util/main-loop.c
> index affe0403c5..443cb4cfe8 100644
> --- a/util/main-loop.c
> +++ b/util/main-loop.c
> @@ -71,7 +71,7 @@ static void sigfd_handler(void *opaque)
> }
> }
>
> -static int qemu_signal_init(void)
> +static int qemu_signal_init(Error **errp)
> {
> int sigfd;
> sigset_t set;
> @@ -96,7 +96,7 @@ static int qemu_signal_init(void)
> sigdelset(&set, SIG_IPI);
> sigfd = qemu_signalfd(&set);
> if (sigfd == -1) {
> - fprintf(stderr, "failed to create signalfd\n");
> + error_setg_errno(errp, errno, "failed to create signalfd");
> return -errno;
> }
>
> @@ -109,7 +109,7 @@ static int qemu_signal_init(void)
>
> #else /* _WIN32 */
>
> -static int qemu_signal_init(void)
> +static int qemu_signal_init(Error **errp)
> {
> return 0;
> }
> @@ -148,7 +148,7 @@ int qemu_init_main_loop(Error **errp)
>
> init_clocks(qemu_timer_notify_cb);
>
> - ret = qemu_signal_init();
> + ret = qemu_signal_init(errp);
> if (ret) {
> return ret;
> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
` (4 preceding siblings ...)
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify Fei Li
@ 2018-11-29 14:20 ` Eric Blake
2018-11-30 6:15 ` Fei Li
5 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2018-11-29 14:20 UTC (permalink / raw)
To: Fei Li, qemu-devel, Peter Maydell
On 11/29/18 4:03 AM, Fei Li wrote:
> These five patches almost get the Reviewed-by and are extracted from
> previous "[PATCH RFC v7 0/9] qemu_thread_create: propagate errors to
> callers to check." The mentioned patch series have waited on one
> multifd issue for a while and still needs a further discussion.
>
> Thus separate(send) these five almost-done patches and hope they can
> be merged for the next tag. Thanks for the review. :)
How likely are any of these crashers to affect an end user? Are any of
them regressions over 3.0? I'm trying to gauge if any of this is
serious enough to warrant a -rc4, or if we are okay just documenting
them as known corner-case bugs and deferring the fix to 4.0 and
qemu-stable. The fact that the series is still titled RFC is also an
argument in favor of deferral.
>
> v2:
> - Update the commit message for patch 1/5, and get one more
> Reviewed-by.
> - Get one Reviewed-by for patch 3/5.
>
>
> Fei Li (5):
> Fix segmentation fault when qemu_signal_init fails
> qemu_thread_join: fix segmentation fault
> migration: fix the multifd code when receiving less channels
> migration: remove unused &local_err parameter in multifd_save_cleanup
> migration: add more error handling for postcopy_ram_enable_notify
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-11-29 14:46 ` Philippe Mathieu-Daudé
2018-11-30 3:45 ` Fei Li
0 siblings, 1 reply; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-11-29 14:46 UTC (permalink / raw)
To: Fei Li, qemu-devel
Hi Fei,
On 29/11/18 11:03, Fei Li wrote:
> In our current code, when multifd is used during migration, if there
> is an error before the destination receives all new channels, the
> source keeps running, however the destination does not exit but keeps
> waiting until the source is killed deliberately.
>
> Fix this by dumping the specific error and let users decide whether
> to quit from the destination side when failing to receive packet via
> some channel.
>
> Signed-off-by: Fei Li <fli@suse.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
> migration/channel.c | 11 ++++++-----
> migration/migration.c | 9 +++++++--
> migration/migration.h | 2 +-
> migration/ram.c | 7 ++++++-
> migration/ram.h | 2 +-
> 5 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/migration/channel.c b/migration/channel.c
> index 33e0e9b82f..20e4c8e2dc 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -30,6 +30,7 @@
> void migration_channel_process_incoming(QIOChannel *ioc)
> {
> MigrationState *s = migrate_get_current();
> + Error *local_err = NULL;
>
> trace_migration_set_incoming_channel(
> ioc, object_get_typename(OBJECT(ioc)));
> @@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
> *s->parameters.tls_creds &&
> !object_dynamic_cast(OBJECT(ioc),
> TYPE_QIO_CHANNEL_TLS)) {
> - Error *local_err = NULL;
> migration_tls_channel_process_incoming(s, ioc, &local_err);
> - if (local_err) {
> - error_report_err(local_err);
> - }
> } else {
> - migration_ioc_process_incoming(ioc);
> + migration_ioc_process_incoming(ioc, &local_err);
> + }
> +
> + if (local_err) {
> + error_report_err(local_err);
> }
> }
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 49ffb9997a..72106bddf0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
> migration_incoming_process();
> }
>
> -void migration_ioc_process_incoming(QIOChannel *ioc)
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> {
> MigrationIncomingState *mis = migration_incoming_get_current();
> bool start_migration;
> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
> */
> start_migration = !migrate_use_multifd();
> } else {
> + Error *local_err = NULL;
> /* Multiple connections */
> assert(migrate_use_multifd());
> - start_migration = multifd_recv_new_channel(ioc);
> + start_migration = multifd_recv_new_channel(ioc, &local_err);
> + if (local_err) {
> + error_propagate(errp, local_err);
> + return;
> + }
> }
>
> if (start_migration) {
> diff --git a/migration/migration.h b/migration/migration.h
> index e413d4d8b6..02b7304610 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -229,7 +229,7 @@ struct MigrationState
> void migrate_set_state(int *state, int old_state, int new_state);
>
> void migration_fd_process_incoming(QEMUFile *f);
> -void migration_ioc_process_incoming(QIOChannel *ioc);
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
> void migration_incoming_process(void);
>
> bool migration_has_all_channels(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 7e7deec4d8..e13b9349d0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
> }
>
> /* Return true if multifd is ready for the migration, otherwise false */
> -bool multifd_recv_new_channel(QIOChannel *ioc)
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
> {
> MultiFDRecvParams *p;
> Error *local_err = NULL;
> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>
> id = multifd_recv_initial_packet(ioc, &local_err);
> if (id < 0) {
> + error_propagate_prepend(errp, local_err,
> + "failed to receive packet"
> + " via multifd channel %d: ",
> + multifd_recv_state->count);
Shouldn't we use atomic_read(&multifd_recv_state->count) here?
Patch looks good otherwise.
Regards,
Phil.
> multifd_recv_terminate_threads(local_err);
> return false;
> }
> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
> error_setg(&local_err, "multifd: received id '%d' already setup'",
> id);
> multifd_recv_terminate_threads(local_err);
> + error_propagate(errp, local_err);
> return false;
> }
> p->c = ioc;
> diff --git a/migration/ram.h b/migration/ram.h
> index 83ff1bc11a..046d3074be 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
> int multifd_load_setup(void);
> int multifd_load_cleanup(Error **errp);
> bool multifd_recv_all_channels_created(void);
> -bool multifd_recv_new_channel(QIOChannel *ioc);
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>
> uint64_t ram_pagesize_summary(void);
> int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
@ 2018-11-29 14:50 ` Philippe Mathieu-Daudé
2018-11-29 14:52 ` Philippe Mathieu-Daudé
2018-11-30 5:12 ` Fei Li
0 siblings, 2 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-11-29 14:50 UTC (permalink / raw)
To: Fei Li, qemu-devel
On 29/11/18 11:03, Fei Li wrote:
> Always call migrate_set_error() to set the error state without relying
> on whether multifd_save_cleanup() succeeds. As the passed &local_err
> is never used in multifd_save_cleanup(), remove it. And make the
> function be: void multifd_save_cleanup(void).
Reading this after your patch 1/5, maybe multifd_save_cleanup() lacks
error reporting on failure. So the actual prototype is correct, we just
need to use **errp in that function (like reporting invalid thread
instead of calling qemu_thread_join()).
>
> Signed-off-by: Fei Li <fli@suse.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> ---
> migration/migration.c | 5 +----
> migration/ram.c | 11 ++++-------
> migration/ram.h | 2 +-
> 3 files changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 72106bddf0..0537fc0c26 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1386,7 +1386,6 @@ static void migrate_fd_cleanup(void *opaque)
> qemu_savevm_state_cleanup();
>
> if (s->to_dst_file) {
> - Error *local_err = NULL;
> QEMUFile *tmp;
>
> trace_migrate_fd_cleanup();
> @@ -1397,9 +1396,7 @@ static void migrate_fd_cleanup(void *opaque)
> }
> qemu_mutex_lock_iothread();
>
> - if (multifd_save_cleanup(&local_err) != 0) {
> - error_report_err(local_err);
> - }
> + multifd_save_cleanup();
> qemu_mutex_lock(&s->qemu_file_lock);
> tmp = s->to_dst_file;
> s->to_dst_file = NULL;
> diff --git a/migration/ram.c b/migration/ram.c
> index e13b9349d0..c44cb6f56d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -917,13 +917,12 @@ static void multifd_send_terminate_threads(Error *err)
> }
> }
>
> -int multifd_save_cleanup(Error **errp)
> +void multifd_save_cleanup(void)
> {
> int i;
> - int ret = 0;
>
> if (!migrate_use_multifd()) {
> - return 0;
> + return;
> }
> multifd_send_terminate_threads(NULL);
> for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -953,7 +952,6 @@ int multifd_save_cleanup(Error **errp)
> multifd_send_state->pages = NULL;
> g_free(multifd_send_state);
> multifd_send_state = NULL;
> - return ret;
> }
>
> static void multifd_send_sync_main(void)
> @@ -1071,9 +1069,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> Error *local_err = NULL;
>
> if (qio_task_propagate_error(task, &local_err)) {
> - if (multifd_save_cleanup(&local_err) != 0) {
> - migrate_set_error(migrate_get_current(), local_err);
> - }
> + migrate_set_error(migrate_get_current(), local_err);
> + multifd_save_cleanup();
> } else {
> p->c = QIO_CHANNEL(sioc);
> qio_channel_set_delay(p->c, false);
> diff --git a/migration/ram.h b/migration/ram.h
> index 046d3074be..936177b3e9 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
> uint64_t ram_bytes_total(void);
>
> int multifd_save_setup(void);
> -int multifd_save_cleanup(Error **errp);
> +void multifd_save_cleanup(void);
> int multifd_load_setup(void);
> int multifd_load_cleanup(Error **errp);
> bool multifd_recv_all_channels_created(void);
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup
2018-11-29 14:50 ` Philippe Mathieu-Daudé
@ 2018-11-29 14:52 ` Philippe Mathieu-Daudé
2018-11-30 5:12 ` Fei Li
1 sibling, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-11-29 14:52 UTC (permalink / raw)
To: Fei Li, QEMU Developers
On Thu, Nov 29, 2018 at 3:50 PM Philippe Mathieu-Daudé
<philmd@redhat.com> wrote:
> On 29/11/18 11:03, Fei Li wrote:
> > Always call migrate_set_error() to set the error state without relying
> > on whether multifd_save_cleanup() succeeds. As the passed &local_err
> > is never used in multifd_save_cleanup(), remove it. And make the
> > function be: void multifd_save_cleanup(void).
>
> Reading this after your patch 1/5, maybe multifd_save_cleanup() lacks
I wanted to write "2/5: qemu_thread_join: fix segmentation fault"
> error reporting on failure. So the actual prototype is correct, we just
> need to use **errp in that function (like reporting invalid thread
> instead of calling qemu_thread_join()).
>
> >
> > Signed-off-by: Fei Li <fli@suse.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > ---
> > migration/migration.c | 5 +----
> > migration/ram.c | 11 ++++-------
> > migration/ram.h | 2 +-
> > 3 files changed, 6 insertions(+), 12 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 72106bddf0..0537fc0c26 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1386,7 +1386,6 @@ static void migrate_fd_cleanup(void *opaque)
> > qemu_savevm_state_cleanup();
> >
> > if (s->to_dst_file) {
> > - Error *local_err = NULL;
> > QEMUFile *tmp;
> >
> > trace_migrate_fd_cleanup();
> > @@ -1397,9 +1396,7 @@ static void migrate_fd_cleanup(void *opaque)
> > }
> > qemu_mutex_lock_iothread();
> >
> > - if (multifd_save_cleanup(&local_err) != 0) {
> > - error_report_err(local_err);
> > - }
> > + multifd_save_cleanup();
> > qemu_mutex_lock(&s->qemu_file_lock);
> > tmp = s->to_dst_file;
> > s->to_dst_file = NULL;
> > diff --git a/migration/ram.c b/migration/ram.c
> > index e13b9349d0..c44cb6f56d 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -917,13 +917,12 @@ static void multifd_send_terminate_threads(Error *err)
> > }
> > }
> >
> > -int multifd_save_cleanup(Error **errp)
> > +void multifd_save_cleanup(void)
> > {
> > int i;
> > - int ret = 0;
> >
> > if (!migrate_use_multifd()) {
> > - return 0;
> > + return;
> > }
> > multifd_send_terminate_threads(NULL);
> > for (i = 0; i < migrate_multifd_channels(); i++) {
> > @@ -953,7 +952,6 @@ int multifd_save_cleanup(Error **errp)
> > multifd_send_state->pages = NULL;
> > g_free(multifd_send_state);
> > multifd_send_state = NULL;
> > - return ret;
> > }
> >
> > static void multifd_send_sync_main(void)
> > @@ -1071,9 +1069,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > Error *local_err = NULL;
> >
> > if (qio_task_propagate_error(task, &local_err)) {
> > - if (multifd_save_cleanup(&local_err) != 0) {
> > - migrate_set_error(migrate_get_current(), local_err);
> > - }
> > + migrate_set_error(migrate_get_current(), local_err);
> > + multifd_save_cleanup();
> > } else {
> > p->c = QIO_CHANNEL(sioc);
> > qio_channel_set_delay(p->c, false);
> > diff --git a/migration/ram.h b/migration/ram.h
> > index 046d3074be..936177b3e9 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
> > uint64_t ram_bytes_total(void);
> >
> > int multifd_save_setup(void);
> > -int multifd_save_cleanup(Error **errp);
> > +void multifd_save_cleanup(void);
> > int multifd_load_setup(void);
> > int multifd_load_cleanup(Error **errp);
> > bool multifd_recv_all_channels_created(void);
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails
2018-11-29 12:49 ` Markus Armbruster
@ 2018-11-30 3:29 ` Fei Li
0 siblings, 0 replies; 18+ messages in thread
From: Fei Li @ 2018-11-30 3:29 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Fam Zheng, Paolo Bonzini
On 11/29/2018 08:49 PM, Markus Armbruster wrote:
> You neglected to cc the maintainer. I'm doing that for you now. Cc'ing
> maintainers is important to maximize your chances at getting your
> patches picked up. Use scripts/get_maintainer.pl to find them.
Got it, thanks so much for this tip! :)
Have a nice day
Fei
>
> Fei Li <fli@suse.com> writes:
>
>> When qemu_signal_init() fails in qemu_init_main_loop(), we return
>> without setting an error. Its callers crash then when they try to
>> report the error with error_report_err().
>>
>> To avoid such segmentation fault, add a new Error parameter to make
>> the call trace to propagate the err to the final caller.
>>
>> Fixes: 2f78e491d7b46542158ce0b8132ee4e05bc0ade4
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Fam Zheng <famz@redhat.com>
>> Signed-off-by: Fei Li <fli@suse.com>
>> Reviewed-by: Fam Zheng <famz@redhat.com>
>> Reviewed-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> util/main-loop.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/util/main-loop.c b/util/main-loop.c
>> index affe0403c5..443cb4cfe8 100644
>> --- a/util/main-loop.c
>> +++ b/util/main-loop.c
>> @@ -71,7 +71,7 @@ static void sigfd_handler(void *opaque)
>> }
>> }
>>
>> -static int qemu_signal_init(void)
>> +static int qemu_signal_init(Error **errp)
>> {
>> int sigfd;
>> sigset_t set;
>> @@ -96,7 +96,7 @@ static int qemu_signal_init(void)
>> sigdelset(&set, SIG_IPI);
>> sigfd = qemu_signalfd(&set);
>> if (sigfd == -1) {
>> - fprintf(stderr, "failed to create signalfd\n");
>> + error_setg_errno(errp, errno, "failed to create signalfd");
>> return -errno;
>> }
>>
>> @@ -109,7 +109,7 @@ static int qemu_signal_init(void)
>>
>> #else /* _WIN32 */
>>
>> -static int qemu_signal_init(void)
>> +static int qemu_signal_init(Error **errp)
>> {
>> return 0;
>> }
>> @@ -148,7 +148,7 @@ int qemu_init_main_loop(Error **errp)
>>
>> init_clocks(qemu_timer_notify_cb);
>>
>> - ret = qemu_signal_init();
>> + ret = qemu_signal_init(errp);
>> if (ret) {
>> return ret;
>> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels
2018-11-29 14:46 ` Philippe Mathieu-Daudé
@ 2018-11-30 3:45 ` Fei Li
2018-12-06 6:31 ` Fei Li
0 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-30 3:45 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
On 11/29/2018 10:46 PM, Philippe Mathieu-Daudé wrote:
> Hi Fei,
>
> On 29/11/18 11:03, Fei Li wrote:
>> In our current code, when multifd is used during migration, if there
>> is an error before the destination receives all new channels, the
>> source keeps running, however the destination does not exit but keeps
>> waiting until the source is killed deliberately.
>>
>> Fix this by dumping the specific error and let users decide whether
>> to quit from the destination side when failing to receive packet via
>> some channel.
>>
>> Signed-off-by: Fei Li <fli@suse.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>> migration/channel.c | 11 ++++++-----
>> migration/migration.c | 9 +++++++--
>> migration/migration.h | 2 +-
>> migration/ram.c | 7 ++++++-
>> migration/ram.h | 2 +-
>> 5 files changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/migration/channel.c b/migration/channel.c
>> index 33e0e9b82f..20e4c8e2dc 100644
>> --- a/migration/channel.c
>> +++ b/migration/channel.c
>> @@ -30,6 +30,7 @@
>> void migration_channel_process_incoming(QIOChannel *ioc)
>> {
>> MigrationState *s = migrate_get_current();
>> + Error *local_err = NULL;
>>
>> trace_migration_set_incoming_channel(
>> ioc, object_get_typename(OBJECT(ioc)));
>> @@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>> *s->parameters.tls_creds &&
>> !object_dynamic_cast(OBJECT(ioc),
>> TYPE_QIO_CHANNEL_TLS)) {
>> - Error *local_err = NULL;
>> migration_tls_channel_process_incoming(s, ioc, &local_err);
>> - if (local_err) {
>> - error_report_err(local_err);
>> - }
>> } else {
>> - migration_ioc_process_incoming(ioc);
>> + migration_ioc_process_incoming(ioc, &local_err);
>> + }
>> +
>> + if (local_err) {
>> + error_report_err(local_err);
>> }
>> }
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 49ffb9997a..72106bddf0 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>> migration_incoming_process();
>> }
>>
>> -void migration_ioc_process_incoming(QIOChannel *ioc)
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>> {
>> MigrationIncomingState *mis = migration_incoming_get_current();
>> bool start_migration;
>> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>> */
>> start_migration = !migrate_use_multifd();
>> } else {
>> + Error *local_err = NULL;
>> /* Multiple connections */
>> assert(migrate_use_multifd());
>> - start_migration = multifd_recv_new_channel(ioc);
>> + start_migration = multifd_recv_new_channel(ioc, &local_err);
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + return;
>> + }
>> }
>>
>> if (start_migration) {
>> diff --git a/migration/migration.h b/migration/migration.h
>> index e413d4d8b6..02b7304610 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -229,7 +229,7 @@ struct MigrationState
>> void migrate_set_state(int *state, int old_state, int new_state);
>>
>> void migration_fd_process_incoming(QEMUFile *f);
>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>> void migration_incoming_process(void);
>>
>> bool migration_has_all_channels(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 7e7deec4d8..e13b9349d0 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>> }
>>
>> /* Return true if multifd is ready for the migration, otherwise false */
>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>> {
>> MultiFDRecvParams *p;
>> Error *local_err = NULL;
>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>
>> id = multifd_recv_initial_packet(ioc, &local_err);
>> if (id < 0) {
>> + error_propagate_prepend(errp, local_err,
>> + "failed to receive packet"
>> + " via multifd channel %d: ",
>> + multifd_recv_state->count);
> Shouldn't we use atomic_read(&multifd_recv_state->count) here?
Right, will update this in next version. Thanks for pointing it out. :)
BTW, should we do the same update for the below sentence:
` return multifd_recv_state->count == migrate_multifd_channels();`?
Have a nice day
Fei
>
> Patch looks good otherwise.
>
> Regards,
>
> Phil.
>
>> multifd_recv_terminate_threads(local_err);
>> return false;
>> }
>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>> error_setg(&local_err, "multifd: received id '%d' already setup'",
>> id);
>> multifd_recv_terminate_threads(local_err);
>> + error_propagate(errp, local_err);
>> return false;
>> }
>> p->c = ioc;
>> diff --git a/migration/ram.h b/migration/ram.h
>> index 83ff1bc11a..046d3074be 100644
>> --- a/migration/ram.h
>> +++ b/migration/ram.h
>> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
>> int multifd_load_setup(void);
>> int multifd_load_cleanup(Error **errp);
>> bool multifd_recv_all_channels_created(void);
>> -bool multifd_recv_new_channel(QIOChannel *ioc);
>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>
>> uint64_t ram_pagesize_summary(void);
>> int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup
2018-11-29 14:50 ` Philippe Mathieu-Daudé
2018-11-29 14:52 ` Philippe Mathieu-Daudé
@ 2018-11-30 5:12 ` Fei Li
1 sibling, 0 replies; 18+ messages in thread
From: Fei Li @ 2018-11-30 5:12 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
On 11/29/2018 10:50 PM, Philippe Mathieu-Daudé wrote:
> On 29/11/18 11:03, Fei Li wrote:
>> Always call migrate_set_error() to set the error state without relying
>> on whether multifd_save_cleanup() succeeds. As the passed &local_err
>> is never used in multifd_save_cleanup(), remove it. And make the
>> function be: void multifd_save_cleanup(void).
> Reading this after your patch 1/5, maybe multifd_save_cleanup() lacks
> error reporting on failure. So the actual prototype is correct, we just
> need to use **errp in that function (like reporting invalid thread
> instead of calling qemu_thread_join()).
Sorry that maybe the patch sequences causethe mis-understanding.
The patch "2/5: qemu_thread_join: fix segmentation fault" is added to
fix another problem when qemu_thread_create() fails.
The qemu_thread_join() => pthread_join in multifd_save_cleanup() is
used to terminate the already created/running thread, so I do not
think it should be replaced. Besides, do we need to report the
invalid thread for users?
But for multifd_save_cleanup() itself, it is called when
1. fails to send one multifd channel. In this way, the error reason has
been stored in MigrationState->error, but not be printed. I think we
can add a `error_report_err(local_err)` in
multifd_new_send_channel_async()
just as follows:
@@ -1070,6 +1070,7 @@ static void multifd_new_send_channel_async(QIOTask
*task, gpointer opaque)
if (qio_task_propagate_error(task, &local_err)) {
migrate_set_error(migrate_get_current(), local_err);
+ error_report_err(local_err);
multifd_save_cleanup();
} else {
BTW, we also need to notify the main thread that the migrate_state
is failed in this case, but currently this issue is still under discussion.
Please see patch 5/9 in"[PATCH RFC v7 0/9] qemu_thread_create:
propagate errors to callers to check." for more details. :)
2. migrate_fd_cleanup() is called to do the cleanup (In this way, the real
error reason is stored and also will be printed inside in
migrate_fd_cleanup()
or before migrate_fd_cleanup() is called. Thus IMO no update is
needed here.)
Have a nice day, thanks
Fei
>> Signed-off-by: Fei Li <fli@suse.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> ---
>> migration/migration.c | 5 +----
>> migration/ram.c | 11 ++++-------
>> migration/ram.h | 2 +-
>> 3 files changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 72106bddf0..0537fc0c26 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1386,7 +1386,6 @@ static void migrate_fd_cleanup(void *opaque)
>> qemu_savevm_state_cleanup();
>>
>> if (s->to_dst_file) {
>> - Error *local_err = NULL;
>> QEMUFile *tmp;
>>
>> trace_migrate_fd_cleanup();
>> @@ -1397,9 +1396,7 @@ static void migrate_fd_cleanup(void *opaque)
>> }
>> qemu_mutex_lock_iothread();
>>
>> - if (multifd_save_cleanup(&local_err) != 0) {
>> - error_report_err(local_err);
>> - }
>> + multifd_save_cleanup();
>> qemu_mutex_lock(&s->qemu_file_lock);
>> tmp = s->to_dst_file;
>> s->to_dst_file = NULL;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index e13b9349d0..c44cb6f56d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -917,13 +917,12 @@ static void multifd_send_terminate_threads(Error *err)
>> }
>> }
>>
>> -int multifd_save_cleanup(Error **errp)
>> +void multifd_save_cleanup(void)
>> {
>> int i;
>> - int ret = 0;
>>
>> if (!migrate_use_multifd()) {
>> - return 0;
>> + return;
>> }
>> multifd_send_terminate_threads(NULL);
>> for (i = 0; i < migrate_multifd_channels(); i++) {
>> @@ -953,7 +952,6 @@ int multifd_save_cleanup(Error **errp)
>> multifd_send_state->pages = NULL;
>> g_free(multifd_send_state);
>> multifd_send_state = NULL;
>> - return ret;
>> }
>>
>> static void multifd_send_sync_main(void)
>> @@ -1071,9 +1069,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>> Error *local_err = NULL;
>>
>> if (qio_task_propagate_error(task, &local_err)) {
>> - if (multifd_save_cleanup(&local_err) != 0) {
>> - migrate_set_error(migrate_get_current(), local_err);
>> - }
>> + migrate_set_error(migrate_get_current(), local_err);
>> + multifd_save_cleanup();
>> } else {
>> p->c = QIO_CHANNEL(sioc);
>> qio_channel_set_delay(p->c, false);
>> diff --git a/migration/ram.h b/migration/ram.h
>> index 046d3074be..936177b3e9 100644
>> --- a/migration/ram.h
>> +++ b/migration/ram.h
>> @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
>> uint64_t ram_bytes_total(void);
>>
>> int multifd_save_setup(void);
>> -int multifd_save_cleanup(Error **errp);
>> +void multifd_save_cleanup(void);
>> int multifd_load_setup(void);
>> int multifd_load_cleanup(Error **errp);
>> bool multifd_recv_all_channels_created(void);
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues
2018-11-29 14:20 ` [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues Eric Blake
@ 2018-11-30 6:15 ` Fei Li
2018-11-30 15:57 ` Eric Blake
0 siblings, 1 reply; 18+ messages in thread
From: Fei Li @ 2018-11-30 6:15 UTC (permalink / raw)
To: Eric Blake, qemu-devel, Peter Maydell
On 11/29/2018 10:20 PM, Eric Blake wrote:
> On 11/29/18 4:03 AM, Fei Li wrote:
>> These five patches almost get the Reviewed-by and are extracted from
>> previous "[PATCH RFC v7 0/9] qemu_thread_create: propagate errors to
>> callers to check." The mentioned patch series have waited on one
>> multifd issue for a while and still needs a further discussion.
>>
>> Thus separate(send) these five almost-done patches and hope they can
>> be merged for the next tag. Thanks for the review. :)
>
> How likely are any of these crashers to affect an end user?
IMHO, they are not easily triggered.
> Are any of them regressions over 3.0?
I do not think so.
> I'm trying to gauge if any of this is serious enough to warrant a
> -rc4, or if we are okay just documenting them as known corner-case
> bugs and deferring the fix to 4.0 and qemu-stable.
Emm, actually not so emergency to be included in -rc4.
And I think it is ok to wait for maintainers to do the pick for the
appropriate release.
BTW, why 4.0, but not 3.2 or 3.3 (I mean 3.minor version)?
> The fact that the series is still titled RFC is also an argument in
> favor of deferral.
Sorry that I forgot to remove the RFC..
Have a nice day, thanks :)
Fei
>
>>
>> v2:
>> - Update the commit message for patch 1/5, and get one more
>> Reviewed-by.
>> - Get one Reviewed-by for patch 3/5.
>>
>>
>> Fei Li (5):
>> Fix segmentation fault when qemu_signal_init fails
>> qemu_thread_join: fix segmentation fault
>> migration: fix the multifd code when receiving less channels
>> migration: remove unused &local_err parameter in multifd_save_cleanup
>> migration: add more error handling for postcopy_ram_enable_notify
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify Fei Li
@ 2018-11-30 9:49 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2018-11-30 9:49 UTC (permalink / raw)
To: Fei Li; +Cc: qemu-devel
* Fei Li (fli@suse.com) wrote:
> Call postcopy_ram_incoming_cleanup() to do the cleanup when
> postcopy_ram_enable_notify fails. Besides, report the error
> message when qemu_ram_foreach_migratable_block() fails.
>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/postcopy-ram.c | 1 +
> migration/savevm.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index e5c02a32c5..fa09dba534 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1117,6 +1117,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>
> /* Mark so that we get notified of accesses to unwritten areas */
> if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
> + error_report("ram_block_enable_notify failed");
> return -1;
> }
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 9e45fb4f3f..d784e8aa40 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1729,6 +1729,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> */
> if (migrate_postcopy_ram()) {
> if (postcopy_ram_enable_notify(mis)) {
> + postcopy_ram_incoming_cleanup(mis);
> return -1;
> }
> }
> --
> 2.13.7
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues
2018-11-30 6:15 ` Fei Li
@ 2018-11-30 15:57 ` Eric Blake
0 siblings, 0 replies; 18+ messages in thread
From: Eric Blake @ 2018-11-30 15:57 UTC (permalink / raw)
To: Fei Li, qemu-devel, Peter Maydell
On 11/30/18 12:15 AM, Fei Li wrote:
>
>
> On 11/29/2018 10:20 PM, Eric Blake wrote:
>> On 11/29/18 4:03 AM, Fei Li wrote:
>>> These five patches almost get the Reviewed-by and are extracted from
>>> previous "[PATCH RFC v7 0/9] qemu_thread_create: propagate errors to
>>> callers to check." The mentioned patch series have waited on one
>>> multifd issue for a while and still needs a further discussion.
>>>
>>> Thus separate(send) these five almost-done patches and hope they can
>>> be merged for the next tag. Thanks for the review. :)
>>
>> How likely are any of these crashers to affect an end user?
> IMHO, they are not easily triggered.
A crash at the command line is annoying, but not too bad (because you
didn't start the guest after all). A crash after the guest has been
running for some time, though, is worth considering (data loss should be
avoided). I won't make the final determination on this series, but hope
that my questions are helpful to the maintainers for this code in
ranking the severity of these crashes.
>> Are any of them regressions over 3.0?
> I do not think so.
>> I'm trying to gauge if any of this is serious enough to warrant a
>> -rc4, or if we are okay just documenting them as known corner-case
>> bugs and deferring the fix to 4.0 and qemu-stable.
> Emm, actually not so emergency to be included in -rc4.
> And I think it is ok to wait for maintainers to do the pick for the
> appropriate release.
> BTW, why 4.0, but not 3.2 or 3.3 (I mean 3.minor version)?
See https://www.qemu.org/download/ for the explanation on version numbering
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels
2018-11-30 3:45 ` Fei Li
@ 2018-12-06 6:31 ` Fei Li
0 siblings, 0 replies; 18+ messages in thread
From: Fei Li @ 2018-12-06 6:31 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
On 11/30/2018 11:45 AM, Fei Li wrote:
>
>
> On 11/29/2018 10:46 PM, Philippe Mathieu-Daudé wrote:
>> Hi Fei,
>>
>> On 29/11/18 11:03, Fei Li wrote:
>>> In our current code, when multifd is used during migration, if there
>>> is an error before the destination receives all new channels, the
>>> source keeps running, however the destination does not exit but keeps
>>> waiting until the source is killed deliberately.
>>>
>>> Fix this by dumping the specific error and let users decide whether
>>> to quit from the destination side when failing to receive packet via
>>> some channel.
>>>
>>> Signed-off-by: Fei Li <fli@suse.com>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> ---
>>> migration/channel.c | 11 ++++++-----
>>> migration/migration.c | 9 +++++++--
>>> migration/migration.h | 2 +-
>>> migration/ram.c | 7 ++++++-
>>> migration/ram.h | 2 +-
>>> 5 files changed, 21 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/migration/channel.c b/migration/channel.c
>>> index 33e0e9b82f..20e4c8e2dc 100644
>>> --- a/migration/channel.c
>>> +++ b/migration/channel.c
>>> @@ -30,6 +30,7 @@
>>> void migration_channel_process_incoming(QIOChannel *ioc)
>>> {
>>> MigrationState *s = migrate_get_current();
>>> + Error *local_err = NULL;
>>> trace_migration_set_incoming_channel(
>>> ioc, object_get_typename(OBJECT(ioc)));
>>> @@ -38,13 +39,13 @@ void
>>> migration_channel_process_incoming(QIOChannel *ioc)
>>> *s->parameters.tls_creds &&
>>> !object_dynamic_cast(OBJECT(ioc),
>>> TYPE_QIO_CHANNEL_TLS)) {
>>> - Error *local_err = NULL;
>>> migration_tls_channel_process_incoming(s, ioc, &local_err);
>>> - if (local_err) {
>>> - error_report_err(local_err);
>>> - }
>>> } else {
>>> - migration_ioc_process_incoming(ioc);
>>> + migration_ioc_process_incoming(ioc, &local_err);
>>> + }
>>> +
>>> + if (local_err) {
>>> + error_report_err(local_err);
>>> }
>>> }
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 49ffb9997a..72106bddf0 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>>> migration_incoming_process();
>>> }
>>> -void migration_ioc_process_incoming(QIOChannel *ioc)
>>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>>> {
>>> MigrationIncomingState *mis = migration_incoming_get_current();
>>> bool start_migration;
>>> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel
>>> *ioc)
>>> */
>>> start_migration = !migrate_use_multifd();
>>> } else {
>>> + Error *local_err = NULL;
>>> /* Multiple connections */
>>> assert(migrate_use_multifd());
>>> - start_migration = multifd_recv_new_channel(ioc);
>>> + start_migration = multifd_recv_new_channel(ioc, &local_err);
>>> + if (local_err) {
>>> + error_propagate(errp, local_err);
>>> + return;
>>> + }
>>> }
>>> if (start_migration) {
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index e413d4d8b6..02b7304610 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -229,7 +229,7 @@ struct MigrationState
>>> void migrate_set_state(int *state, int old_state, int new_state);
>>> void migration_fd_process_incoming(QEMUFile *f);
>>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>> void migration_incoming_process(void);
>>> bool migration_has_all_channels(void);
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 7e7deec4d8..e13b9349d0 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>>> }
>>> /* Return true if multifd is ready for the migration, otherwise
>>> false */
>>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>> {
>>> MultiFDRecvParams *p;
>>> Error *local_err = NULL;
>>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>> id = multifd_recv_initial_packet(ioc, &local_err);
>>> if (id < 0) {
>>> + error_propagate_prepend(errp, local_err,
>>> + "failed to receive packet"
>>> + " via multifd channel %d: ",
>>> + multifd_recv_state->count);
>> Shouldn't we use atomic_read(&multifd_recv_state->count) here?
> Right, will update this in next version. Thanks for pointing it out. :)
> BTW, should we do the same update for the below sentence:
> ` return multifd_recv_state->count == migrate_multifd_channels();`?
>
> Have a nice day
> Fei
Kindly ping. :)
Thanks in advance.
>>
>> Patch looks good otherwise.
>>
>> Regards,
>>
>> Phil.
>>
>>> multifd_recv_terminate_threads(local_err);
>>> return false;
>>> }
>>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>> error_setg(&local_err, "multifd: received id '%d' already
>>> setup'",
>>> id);
>>> multifd_recv_terminate_threads(local_err);
>>> + error_propagate(errp, local_err);
>>> return false;
>>> }
>>> p->c = ioc;
>>> diff --git a/migration/ram.h b/migration/ram.h
>>> index 83ff1bc11a..046d3074be 100644
>>> --- a/migration/ram.h
>>> +++ b/migration/ram.h
>>> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
>>> int multifd_load_setup(void);
>>> int multifd_load_cleanup(Error **errp);
>>> bool multifd_recv_all_channels_created(void);
>>> -bool multifd_recv_new_channel(QIOChannel *ioc);
>>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>> uint64_t ram_pagesize_summary(void);
>>> int ram_save_queue_pages(const char *rbname, ram_addr_t start,
>>> ram_addr_t len);
>>>
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-12-06 6:31 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-29 10:03 [Qemu-devel] [PATCH RFC v2 0/5] fix some segmentation faults and migration issues Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 1/5] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-11-29 12:49 ` Markus Armbruster
2018-11-30 3:29 ` Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 2/5] qemu_thread_join: fix segmentation fault Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 3/5] migration: fix the multifd code when receiving less channels Fei Li
2018-11-29 14:46 ` Philippe Mathieu-Daudé
2018-11-30 3:45 ` Fei Li
2018-12-06 6:31 ` Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 4/5] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
2018-11-29 14:50 ` Philippe Mathieu-Daudé
2018-11-29 14:52 ` Philippe Mathieu-Daudé
2018-11-30 5:12 ` Fei Li
2018-11-29 10:03 ` [Qemu-devel] [PATCH RFC v2 5/5] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-11-30 9:49 ` Dr. David Alan Gilbert
2018-11-29 14:20 ` [Qemu-devel] [PATCH for-3.1? RFC v2 0/5] fix some segmentation faults and migration issues Eric Blake
2018-11-30 6:15 ` Fei Li
2018-11-30 15:57 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).