From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Markus Armbruster <armbru@redhat.com>, quintela@redhat.com
Cc: Fei Li <lifei1214@126.com>,
qemu-devel@nongnu.org, shirley17fei@gmail.com
Subject: Re: [Qemu-devel] [PATCH v11 for-4.0 09/11] qemu_thread: supplement error handling for migration
Date: Mon, 4 Feb 2019 16:34:52 +0000 [thread overview]
Message-ID: <20190204163451.GE2680@work-vm> (raw)
In-Reply-To: <878syz4j5y.fsf@dusky.pond.sub.org>
* Markus Armbruster (armbru@redhat.com) wrote:
> Dave, I tried to review the error paths, in particular resource cleanup,
> but there's a lot going on, and I'm not feeling confident. Please have
> a close look.
>
> Fei Li <lifei1214@126.com> writes:
>
> > From: Fei Li <fli@suse.com>
> >
> > Update qemu_thread_create()'s callers by
> > - setting an error on qemu_thread_create() failure for callers that
> > set an error on failure;
> > - reporting the error and returning failure for callers that return
> > an error code on failure;
> > - reporting the error and setting some state for callers that just
> > report errors and choose not to continue on.
> >
> > Besides, make compress_threads_save_cleanup() cope with partially
> > initialized comp_param[i] to adapt to the new qemu_thread_create()
> > failure case.
> >
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Signed-off-by: Fei Li <fli@suse.com>
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> > migration/migration.c | 35 +++++++++++++-------
> > migration/postcopy-ram.c | 16 ++++++---
> > migration/ram.c | 70 ++++++++++++++++++++++++++--------------
> > migration/savevm.c | 12 ++++---
> > 4 files changed, 89 insertions(+), 44 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 1da71211c8..0034ca1334 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -447,10 +447,13 @@ static void process_incoming_migration_co(void *opaque)
> > goto fail;
> > }
> >
> > - /* TODO: let the further caller handle the error instead of abort() */
> > - qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
> > - colo_process_incoming_thread, mis,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
> > + colo_process_incoming_thread, mis,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_reportf_err(local_err, "failed to create "
> > + "colo_process_incoming_thread: ");
> > + goto fail;
> > + }
> > mis->have_colo_incoming_thread = true;
> > qemu_coroutine_yield();
> >
> > @@ -2349,6 +2352,7 @@ out:
> > static int open_return_path_on_source(MigrationState *ms,
> > bool create_thread)
> > {
> > + Error *local_err = NULL;
> >
> > ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
> > if (!ms->rp_state.from_dst_file) {
> > @@ -2362,10 +2366,15 @@ static int open_return_path_on_source(MigrationState *ms,
> > return 0;
> > }
> >
> > - /* TODO: let the further caller handle the error instead of abort() here */
> > - qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> > - source_return_path_thread, ms,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> > + source_return_path_thread, ms,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_reportf_err(local_err,
> > + "failed to create source_return_path_thread: ");
> > + qemu_fclose(ms->rp_state.from_dst_file);
> > + ms->rp_state.from_dst_file = NULL;
> > + return -1;
> > + }
> >
> > trace_open_return_path_on_source_continue();
> >
> > @@ -3201,9 +3210,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> if (multifd_save_setup() != 0) {
> migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> MIGRATION_STATUS_FAILED);
> > migrate_fd_cleanup(s);
> > return;
> > }
> > - /* TODO: let the further caller handle the error instead of abort() here */
> > - qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
> > + QEMU_THREAD_JOINABLE, &error_in) < 0) {
> > + error_reportf_err(error_in, "failed to create migration_thread: ");
> > + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> > + migrate_fd_cleanup(s);
>
> Is there anything to clean up for multifd_save_setup()? Dave?
I need to bounce that one to Juan; he knows the multifd stuff; cc'd
> > + return;
> > + }
> > s->migration_thread_running = true;
> > }
> >
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 221ea24919..0934a1403a 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -1083,6 +1083,8 @@ retry:
> >
> > int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > {
> > + Error *local_err = NULL;
> > +
> > /* Open the fd for the kernel to give us userfaults */
> > mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
> > if (mis->userfault_fd == -1) {
> > @@ -1109,10 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > }
> >
> > qemu_sem_init(&mis->fault_thread_sem, 0);
> > - /* TODO: let the further caller handle the error instead of abort() here */
> > - qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> > - postcopy_ram_fault_thread, mis,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> > + postcopy_ram_fault_thread, mis,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_reportf_err(local_err,
> > + "failed to create postcopy_ram_fault_thread: ");
> > + close(mis->userfault_event_fd);
> > + close(mis->userfault_fd);
> > + qemu_sem_destroy(&mis->fault_thread_sem);
> > + return -1;
> > + }
> > qemu_sem_wait(&mis->fault_thread_sem);
> > qemu_sem_destroy(&mis->fault_thread_sem);
> > mis->have_fault_thread = true;
>
> /* Mark so that we get notified of accesses to unwritten areas */
> if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
> error_report("ram_block_enable_notify failed");
> return -1;
>
> Not this patch's problem, but here goes anyway: where are
> mis->userfault_event_fd and mis->userfault_fd closed?
As Fei replied, see postcopy_ram_incoming_cleanup; once the fault thread
is running it does a bigger cleanup.
Dave
> }
>
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 700ea229e0..66b8b764f1 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -441,6 +441,14 @@ static void compress_threads_save_cleanup(void)
> >
> > thread_count = migrate_compress_threads();
> > for (i = 0; i < thread_count; i++) {
> > + qemu_mutex_lock(&comp_param[i].mutex);
> > + comp_param[i].quit = true;
> > + qemu_cond_signal(&comp_param[i].cond);
> > + qemu_mutex_unlock(&comp_param[i].mutex);
> > +
> > + qemu_mutex_destroy(&comp_param[i].mutex);
> > + qemu_cond_destroy(&comp_param[i].cond);
> > +
> > /*
> > * we use it as a indicator which shows if the thread is
> > * properly init'd or not
> > @@ -448,15 +456,7 @@ static void compress_threads_save_cleanup(void)
> > if (!comp_param[i].file) {
> > break;
> > }
> > -
> > - qemu_mutex_lock(&comp_param[i].mutex);
> > - comp_param[i].quit = true;
> > - qemu_cond_signal(&comp_param[i].cond);
> > - qemu_mutex_unlock(&comp_param[i].mutex);
> > -
> > qemu_thread_join(compress_threads + i);
> > - qemu_mutex_destroy(&comp_param[i].mutex);
> > - qemu_cond_destroy(&comp_param[i].cond);
> > deflateEnd(&comp_param[i].stream);
> > g_free(comp_param[i].originbuf);
> > qemu_fclose(comp_param[i].file);
> > @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
> > static int compress_threads_save_setup(void)
> > {
> > int i, thread_count;
> > + Error *local_err = NULL;
> >
> > if (!migrate_use_compression()) {
> > return 0;
> > @@ -483,6 +484,9 @@ static int compress_threads_save_setup(void)
> > qemu_cond_init(&comp_done_cond);
> > qemu_mutex_init(&comp_done_lock);
> > for (i = 0; i < thread_count; i++) {
> > + qemu_mutex_init(&comp_param[i].mutex);
> > + qemu_cond_init(&comp_param[i].cond);
> > + comp_param[i].quit = false;
> > comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE);
> > if (!comp_param[i].originbuf) {
> > goto exit;
> > @@ -499,13 +503,16 @@ static int compress_threads_save_setup(void)
> > */
> > comp_param[i].file = qemu_fopen_ops(NULL, &empty_ops);
> > comp_param[i].done = true;
> > - comp_param[i].quit = false;
> > - qemu_mutex_init(&comp_param[i].mutex);
> > - qemu_cond_init(&comp_param[i].cond);
> > - /* TODO: let the further caller handle the error instead of abort() */
> > - qemu_thread_create(compress_threads + i, "compress",
> > - do_data_compress, comp_param + i,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(compress_threads + i, "compress",
> > + do_data_compress, comp_param + i,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_reportf_err(local_err, "failed to create do_data_compress: ");
> > + deflateEnd(&comp_param[i].stream);
> > + g_free(comp_param[i].originbuf);
> > + qemu_fclose(comp_param[i].file);
> > + comp_param[i].file = NULL;
> > + goto exit;
> > + }
> > }
> > return 0;
> >
> > @@ -1076,9 +1083,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > p->c = QIO_CHANNEL(sioc);
> > qio_channel_set_delay(p->c, false);
> > p->running = true;
> > - /* TODO: let the further caller handle the error instead of abort() */
> > - qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + migrate_set_error(migrate_get_current(), local_err);
> > + error_reportf_err(local_err,
> > + "failed to create multifd_send_thread: ");
> > + multifd_save_cleanup();
> > + return;
> > + }
> >
> > atomic_inc(&multifd_send_state->count);
> > }
> > @@ -1357,9 +1369,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
> > p->num_packets = 1;
> >
> > p->running = true;
> > - /* TODO: let the further caller handle the error instead of abort() here */
> > - qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_propagate_prepend(errp, local_err,
> > + "failed to create multifd_recv_thread: ");
>
> This prepends only if @errp isn't null. Fine from the caller's point of
> view. But it also affects multifd_recv_thread():
>
> > + multifd_recv_terminate_threads(local_err);
>
> That's at least unclean.
>
> I think you should use error_prepend() and error_propagate() here.
>
> > + return false;
> > + }
> > atomic_inc(&multifd_recv_state->count);
> > return atomic_read(&multifd_recv_state->count) ==
> > migrate_multifd_channels();
> > @@ -3631,6 +3647,7 @@ static void compress_threads_load_cleanup(void)
> > static int compress_threads_load_setup(QEMUFile *f)
> > {
> > int i, thread_count;
> > + Error *local_err = NULL;
> >
> > if (!migrate_use_compression()) {
> > return 0;
> > @@ -3652,10 +3669,13 @@ static int compress_threads_load_setup(QEMUFile *f)
> > qemu_cond_init(&decomp_param[i].cond);
> > decomp_param[i].done = true;
> > decomp_param[i].quit = false;
> > - /* TODO: let the further caller handle the error instead of abort() */
> > - qemu_thread_create(decompress_threads + i, "decompress",
> > - do_data_decompress, decomp_param + i,
> > - QEMU_THREAD_JOINABLE, &error_abort);
> > + if (qemu_thread_create(decompress_threads + i, "decompress",
> > + do_data_decompress, decomp_param + i,
> > + QEMU_THREAD_JOINABLE, &local_err) < 0) {
> > + error_reportf_err(local_err,
> > + "failed to create do_data_decompress: ");
> > + goto exit;
> > + }
> > }
> > return 0;
> > exit:
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index d5b45843b6..310cecbf8f 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1747,10 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
> > mis->have_listen_thread = true;
> > /* Start up the listening thread and wait for it to signal ready */
> > qemu_sem_init(&mis->listen_thread_sem, 0);
> > - /* TODO: let the further caller handle the error instead of abort() here */
> > - qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > - postcopy_ram_listen_thread, NULL,
> > - QEMU_THREAD_DETACHED, &error_abort);
> > + if (qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> > + postcopy_ram_listen_thread, NULL,
> > + QEMU_THREAD_DETACHED, &local_err) < 0) {
> > + error_reportf_err(local_err,
> > + "failed to create postcopy_ram_listen_thread: ");
> > + qemu_sem_destroy(&mis->listen_thread_sem);
> > + return -1;
> > + }
> > qemu_sem_wait(&mis->listen_thread_sem);
> > qemu_sem_destroy(&mis->listen_thread_sem);
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2019-02-04 16:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-01 5:17 [Qemu-devel] [PATCH v11 for-4.0 00/11] qemu_thread_create: propagate the error to callers to handle Fei Li
2019-02-01 5:17 ` [Qemu-devel] [PATCH v11 for-4.0 01/11] qemu_thread: make qemu_thread_create() take Error ** argument Fei Li
2019-02-01 10:02 ` Markus Armbruster
2019-02-01 5:17 ` [Qemu-devel] [PATCH v11 for-4.0 02/11] qemu_thread: supplement error handling for qemu_X_start_vcpu Fei Li
2019-02-01 12:33 ` Markus Armbruster
2019-02-02 4:47 ` fei
2019-02-01 5:17 ` [Qemu-devel] [PATCH v11 for-4.0 03/11] qemu_thread: supplement error handling for qmp_dump_guest_memory Fei Li
2019-02-01 12:34 ` Markus Armbruster
2019-02-01 5:17 ` [Qemu-devel] [PATCH v11 for-4.0 04/11] qemu_thread: supplement error handling for pci_edu_realize Fei Li
2019-02-01 12:58 ` Markus Armbruster
2019-02-02 4:44 ` fei
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 05/11] qemu_thread: supplement error handling for h_resize_hpt_prepare Fei Li
2019-02-01 13:01 ` Markus Armbruster
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 06/11] qemu_thread: supplement error handling for emulated_realize Fei Li
2019-02-01 13:04 ` Markus Armbruster
2019-02-03 7:17 ` Fei Li
2019-02-04 13:30 ` Markus Armbruster
2019-02-15 12:35 ` Fei Li
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 07/11] qemu_thread: supplement error handling for iothread_complete Fei Li
2019-02-01 14:03 ` Markus Armbruster
2019-02-02 4:51 ` fei
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 08/11] qemu_thread: supplement error handling for qemu_signalfd_compat Fei Li
2019-02-01 14:13 ` Markus Armbruster
2019-02-02 4:52 ` fei
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 09/11] qemu_thread: supplement error handling for migration Fei Li
2019-02-01 5:35 ` Fei Li
2019-02-01 15:34 ` Markus Armbruster
2019-02-03 7:55 ` Fei Li
2019-02-04 16:34 ` Dr. David Alan Gilbert [this message]
2019-02-15 13:23 ` Fei Li
2019-02-01 5:18 ` [Qemu-devel] [PATCH v11 for-4.0 10/11] qemu_thread: supplement error handling for vnc_start_worker_thread Fei Li
2019-02-01 14:16 ` Markus Armbruster
2019-07-12 5:57 ` [Qemu-devel] [PATCH v11 for-4.0 00/11] qemu_thread_create: propagate the error to callers to handle Markus Armbruster
2019-07-16 12:25 ` fei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204163451.GE2680@work-vm \
--to=dgilbert@redhat.com \
--cc=armbru@redhat.com \
--cc=lifei1214@126.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=shirley17fei@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.