From: Fabiano Rosas <farosas@suse.de>
To: peterx@redhat.com, qemu-devel@nongnu.org
Cc: Bryan Zhang <bryan.zhang@bytedance.com>,
Prasad Pandit <ppandit@redhat.com>,
peterx@redhat.com, Yuan Liu <yuan1.liu@intel.com>,
Avihai Horon <avihaih@nvidia.com>,
Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Date: Wed, 31 Jan 2024 12:05:08 -0300 [thread overview]
Message-ID: <87zfwlk0gr.fsf@suse.de> (raw)
In-Reply-To: <20240131103111.306523-4-peterx@redhat.com>
peterx@redhat.com writes:
> From: Peter Xu <peterx@redhat.com>
>
> Multifd send side has two fields to indicate error quits:
>
> - MultiFDSendParams.quit
> - &multifd_send_state->exiting
>
> Merge them into the global one. The replacement is done by changing all
> p->quit checks into the global var check. The global check doesn't need
> any lock.
>
> A few more things done on top of this altogether:
>
> - multifd_send_terminate_threads()
>
> Moving the xchg() of &multifd_send_state->exiting upper, so as to cover
> the tracepoint, migrate_set_error() and migrate_set_state().
Good.
>
> - multifd_send_sync_main()
>
> In the 2nd loop, add one more check over the global var to make sure we
> don't keep the looping if QEMU already decided to quit.
Yes, also because we don't necessarily enter at multifd_send_page()
every time.
>
> - multifd_tls_outgoing_handshake()
>
> Use multifd_send_terminate_threads() to set the error state. That has
> a benefit of updating MigrationState.error to that error too, so we can
> persist that 1st error we hit in that specific channel.
Makes sense.
>
> - multifd_new_send_channel_async()
>
> Take similar approach like above, drop the migrate_set_error() because
> multifd_send_terminate_threads() already covers that. Unwrap the helper
> multifd_new_send_channel_cleanup() along the way; not really needed.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/multifd.h | 2 --
> migration/multifd.c | 85 ++++++++++++++++++---------------------------
> 2 files changed, 33 insertions(+), 54 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 35d11f103c..7c040cb85a 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -95,8 +95,6 @@ typedef struct {
> QemuMutex mutex;
> /* is this channel thread running */
> bool running;
> - /* should this thread finish */
> - bool quit;
> /* multifd flags for each packet */
> uint32_t flags;
> /* global number of generated multifd packets */
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b8d2c96533..2c98023d67 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -372,6 +372,11 @@ struct {
> MultiFDMethods *ops;
> } *multifd_send_state;
>
> +static bool multifd_send_should_exit(void)
> +{
> + return qatomic_read(&multifd_send_state->exiting);
> +}
> +
> /*
> * The migration thread can wait on either of the two semaphores. This
> * function can be used to kick the main thread out of waiting on either of
> @@ -409,7 +414,7 @@ static int multifd_send_pages(void)
> MultiFDSendParams *p = NULL; /* make happy gcc */
> MultiFDPages_t *pages = multifd_send_state->pages;
>
> - if (qatomic_read(&multifd_send_state->exiting)) {
> + if (multifd_send_should_exit()) {
> return -1;
> }
>
> @@ -421,14 +426,11 @@ static int multifd_send_pages(void)
> */
> next_channel %= migrate_multifd_channels();
> for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> - p = &multifd_send_state->params[i];
> -
> - qemu_mutex_lock(&p->mutex);
> - if (p->quit) {
> - error_report("%s: channel %d has already quit!", __func__, i);
> - qemu_mutex_unlock(&p->mutex);
> + if (multifd_send_should_exit()) {
> return -1;
> }
> + p = &multifd_send_state->params[i];
> + qemu_mutex_lock(&p->mutex);
> if (!p->pending_job) {
> p->pending_job++;
> next_channel = (i + 1) % migrate_multifd_channels();
Hm, I'm not sure it's correct to check 'exiting' outside of the
lock. While it is an atomic operation, it is not atomic in relation to
pending_job...
... looking closer, it seems that we can do what you suggest because
p->pending_job is not touched by the multifd_send_thread in case of
error, which means this function will indeed miss the 'exiting' flag,
but pending_job > 0 means it will loop to the next channel and _then_ it
will see the 'exiting' flag.
> @@ -483,6 +485,16 @@ static void multifd_send_terminate_threads(Error *err)
> {
> int i;
>
> + /*
> + * We don't want to exit each threads twice. Depending on where
> + * we get the error, or if there are two independent errors in two
> + * threads at the same time, we can end calling this function
> + * twice.
> + */
> + if (qatomic_xchg(&multifd_send_state->exiting, 1)) {
> + return;
> + }
> +
> trace_multifd_send_terminate_threads(err != NULL);
>
> if (err) {
> @@ -497,26 +509,13 @@ static void multifd_send_terminate_threads(Error *err)
> }
> }
>
> - /*
> - * We don't want to exit each threads twice. Depending on where
> - * we get the error, or if there are two independent errors in two
> - * threads at the same time, we can end calling this function
> - * twice.
> - */
> - if (qatomic_xchg(&multifd_send_state->exiting, 1)) {
> - return;
> - }
> -
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
>
> - qemu_mutex_lock(&p->mutex);
> - p->quit = true;
Now that you removed this, we decoupled kicking the threads from setting
the exit/error, so this function could be split in two.
We could set the exiting flag at the places the error occurred (multifd
threads, thread creation, etc) and "terminate the threads" at
multifd_save_cleanup(). That second part we already do actually:
void multifd_save_cleanup(void) {
...
multifd_send_terminate_threads(NULL);
^see?
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
if (p->running) {
qemu_thread_join(&p->thread);
}
}
...
}
I think there's no reason anymore for the channels to kick each
other. They would all be waiting at p->sem and multifd_send_cleanup()
would kick + join them.
> qemu_sem_post(&p->sem);
> if (p->c) {
> qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> }
> - qemu_mutex_unlock(&p->mutex);
> }
> }
>
> @@ -615,16 +614,13 @@ int multifd_send_sync_main(void)
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
>
> - trace_multifd_send_sync_main_signal(p->id);
> -
> - qemu_mutex_lock(&p->mutex);
> -
> - if (p->quit) {
> - error_report("%s: channel %d has already quit", __func__, i);
> - qemu_mutex_unlock(&p->mutex);
> + if (multifd_send_should_exit()) {
> return -1;
> }
>
> + trace_multifd_send_sync_main_signal(p->id);
> +
> + qemu_mutex_lock(&p->mutex);
> p->packet_num = multifd_send_state->packet_num++;
> p->flags |= MULTIFD_FLAG_SYNC;
> p->pending_job++;
> @@ -634,6 +630,10 @@ int multifd_send_sync_main(void)
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
>
> + if (multifd_send_should_exit()) {
> + return -1;
> + }
> +
> qemu_sem_wait(&multifd_send_state->channels_ready);
> trace_multifd_send_sync_main_wait(p->id);
> qemu_sem_wait(&p->sem_sync);
> @@ -671,7 +671,7 @@ static void *multifd_send_thread(void *opaque)
> qemu_sem_post(&multifd_send_state->channels_ready);
> qemu_sem_wait(&p->sem);
>
> - if (qatomic_read(&multifd_send_state->exiting)) {
> + if (multifd_send_should_exit()) {
> break;
> }
> qemu_mutex_lock(&p->mutex);
> @@ -786,12 +786,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>
> trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
>
> - migrate_set_error(migrate_get_current(), err);
> - /*
> - * Error happen, mark multifd_send_thread status as 'quit' although it
> - * is not created, and then tell who pay attention to me.
> - */
> - p->quit = true;
> + multifd_send_terminate_threads(err);
> multifd_send_kick_main(p);
> error_free(err);
> }
> @@ -857,22 +852,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
> return true;
> }
>
> -static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
> - QIOChannel *ioc, Error *err)
> -{
> - migrate_set_error(migrate_get_current(), err);
> - /* Error happen, we need to tell who pay attention to me */
> - multifd_send_kick_main(p);
> - /*
> - * Although multifd_send_thread is not created, but main migration
> - * thread need to judge whether it is running, so we need to mark
> - * its status.
> - */
> - p->quit = true;
> - object_unref(OBJECT(ioc));
> - error_free(err);
> -}
> -
> static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> {
> MultiFDSendParams *p = opaque;
> @@ -889,7 +868,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> }
>
> trace_multifd_new_send_channel_async_error(p->id, local_err);
> - multifd_new_send_channel_cleanup(p, ioc, local_err);
> + multifd_send_terminate_threads(local_err);
> + multifd_send_kick_main(p);
> + object_unref(OBJECT(ioc));
> + error_free(local_err);
> }
>
> static void multifd_new_send_channel_create(gpointer opaque)
> @@ -921,7 +903,6 @@ int multifd_save_setup(Error **errp)
> qemu_mutex_init(&p->mutex);
> qemu_sem_init(&p->sem, 0);
> qemu_sem_init(&p->sem_sync, 0);
> - p->quit = false;
> p->pending_job = 0;
> p->id = i;
> p->pages = multifd_pages_init(page_count);
next prev parent reply other threads:[~2024-01-31 15:05 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 10:30 [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-01-31 10:30 ` [PATCH 01/14] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-01-31 10:30 ` [PATCH 02/14] migration/multifd: multifd_send_kick_main() peterx
2024-01-31 10:31 ` [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-01-31 15:05 ` Fabiano Rosas [this message]
2024-02-01 9:28 ` Peter Xu
2024-02-01 13:30 ` Fabiano Rosas
2024-02-02 0:21 ` Peter Xu
2024-01-31 10:31 ` [PATCH 04/14] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-01-31 15:27 ` Fabiano Rosas
2024-02-01 10:01 ` Peter Xu
2024-02-01 15:21 ` Fabiano Rosas
2024-02-02 0:28 ` Peter Xu
2024-02-02 0:37 ` Peter Xu
2024-02-02 12:15 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 05/14] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-01-31 16:02 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs peterx
2024-01-31 18:45 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 07/14] migration/multifd: Simplify locking in sender thread peterx
2024-01-31 20:21 ` Fabiano Rosas
2024-02-01 10:37 ` Peter Xu
2024-01-31 10:31 ` [PATCH 08/14] migration/multifd: Drop pages->num check " peterx
2024-01-31 21:19 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up peterx
2024-01-31 21:24 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 10/14] migration/multifd: Move total_normal_pages accounting peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv() peterx
2024-01-31 21:26 ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 12/14] migration/multifd: multifd_send_prepare_header() peterx
2024-01-31 21:32 ` Fabiano Rosas
2024-02-01 10:02 ` Peter Xu
2024-01-31 10:31 ` [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-01-31 21:42 ` Fabiano Rosas
2024-02-01 10:15 ` Peter Xu
2024-02-02 3:57 ` Peter Xu
2024-01-31 10:31 ` [PATCH 14/14] migration/multifd: Forbid spurious wakeups peterx
2024-01-31 21:43 ` Fabiano Rosas
2024-02-01 6:01 ` Peter Xu
2024-01-31 22:49 ` [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups Fabiano Rosas
2024-02-01 5:47 ` Peter Xu
2024-02-01 12:51 ` Avihai Horon
2024-02-01 21:46 ` Fabiano Rosas
2024-02-02 2:12 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zfwlk0gr.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=bryan.zhang@bytedance.com \
--cc=hao.xiang@bytedance.com \
--cc=peterx@redhat.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuan1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.