All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Bryan Zhang <bryan.zhang@bytedance.com>,
	Prasad Pandit <ppandit@redhat.com>,
	Yuan Liu <yuan1.liu@intel.com>, Avihai Horon <avihaih@nvidia.com>,
	Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Date: Thu, 1 Feb 2024 17:28:32 +0800	[thread overview]
Message-ID: <ZbtkQLnPJDmXK912@x1n> (raw)
In-Reply-To: <87zfwlk0gr.fsf@suse.de>

On Wed, Jan 31, 2024 at 12:05:08PM -0300, Fabiano Rosas wrote:
> peterx@redhat.com writes:
> 
> > From: Peter Xu <peterx@redhat.com>
> >
> > Multifd send side has two fields to indicate error quits:
> >
> >   - MultiFDSendParams.quit
> >   - &multifd_send_state->exiting
> >
> > Merge them into the global one.  The replacement is done by changing all
> > p->quit checks into the global var check.  The global check doesn't need
> > any lock.
> >
> > A few more things done on top of this altogether:
> >
> >   - multifd_send_terminate_threads()
> >
> >     Moving the xchg() of &multifd_send_state->exiting upper, so as to cover
> >     the tracepoint, migrate_set_error() and migrate_set_state().
> 
> Good.
> 
> >
> >   - multifd_send_sync_main()
> >
> >     In the 2nd loop, add one more check over the global var to make sure we
> >     don't keep the looping if QEMU already decided to quit.
> 
> Yes, also because we don't necessarily enter at multifd_send_page()
> every time.
> 
> >
> >   - multifd_tls_outgoing_handshake()
> >
> >     Use multifd_send_terminate_threads() to set the error state.  That has
> >     a benefit of updating MigrationState.error to that error too, so we can
> >     persist that 1st error we hit in that specific channel.
> 
> Makes sense.
> 
> >
> >   - multifd_new_send_channel_async()
> >
> >     Take similar approach like above, drop the migrate_set_error() because
> >     multifd_send_terminate_threads() already covers that.  Unwrap the helper
> >     multifd_new_send_channel_cleanup() along the way; not really needed.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/multifd.h |  2 --
> >  migration/multifd.c | 85 ++++++++++++++++++---------------------------
> >  2 files changed, 33 insertions(+), 54 deletions(-)
> >
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 35d11f103c..7c040cb85a 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -95,8 +95,6 @@ typedef struct {
> >      QemuMutex mutex;
> >      /* is this channel thread running */
> >      bool running;
> > -    /* should this thread finish */
> > -    bool quit;
> >      /* multifd flags for each packet */
> >      uint32_t flags;
> >      /* global number of generated multifd packets */
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index b8d2c96533..2c98023d67 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -372,6 +372,11 @@ struct {
> >      MultiFDMethods *ops;
> >  } *multifd_send_state;
> >  
> > +static bool multifd_send_should_exit(void)
> > +{
> > +    return qatomic_read(&multifd_send_state->exiting);
> > +}
> > +
> >  /*
> >   * The migration thread can wait on either of the two semaphores.  This
> >   * function can be used to kick the main thread out of waiting on either of
> > @@ -409,7 +414,7 @@ static int multifd_send_pages(void)
> >      MultiFDSendParams *p = NULL; /* make happy gcc */
> >      MultiFDPages_t *pages = multifd_send_state->pages;
> >  
> > -    if (qatomic_read(&multifd_send_state->exiting)) {
> > +    if (multifd_send_should_exit()) {
> >          return -1;
> >      }
> >  
> > @@ -421,14 +426,11 @@ static int multifd_send_pages(void)
> >       */
> >      next_channel %= migrate_multifd_channels();
> >      for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> > -        p = &multifd_send_state->params[i];
> > -
> > -        qemu_mutex_lock(&p->mutex);
> > -        if (p->quit) {
> > -            error_report("%s: channel %d has already quit!", __func__, i);
> > -            qemu_mutex_unlock(&p->mutex);
> > +        if (multifd_send_should_exit()) {
> >              return -1;
> >          }
> > +        p = &multifd_send_state->params[i];
> > +        qemu_mutex_lock(&p->mutex);
> >          if (!p->pending_job) {
> >              p->pending_job++;
> >              next_channel = (i + 1) % migrate_multifd_channels();
> 
> Hm, I'm not sure it's correct to check 'exiting' outside of the
> lock. While it is an atomic operation, it is not atomic in relation to
> pending_job...
> 
> ... looking closer, it seems that we can do what you suggest because
> p->pending_job is not touched by the multifd_send_thread in case of
> error, which means this function will indeed miss the 'exiting' flag,
> but pending_job > 0 means it will loop to the next channel and _then_ it
> will see the 'exiting' flag.

It could still be the last channel we iterate, then IIUC we can still try
to assign a job to a thread even if a concurrent error is set there.

However IMHO it's okay; the error in the sender thread should ultimately
set migrate_set_error() and the main thread should detect that in the
migration loop, then we'll still quit.  The extra queued job shouldn't
matter, IMHO.

> 
> > @@ -483,6 +485,16 @@ static void multifd_send_terminate_threads(Error *err)
> >  {
> >      int i;
> >  
> > +    /*
> > +     * We don't want to exit each threads twice.  Depending on where
> > +     * we get the error, or if there are two independent errors in two
> > +     * threads at the same time, we can end calling this function
> > +     * twice.
> > +     */
> > +    if (qatomic_xchg(&multifd_send_state->exiting, 1)) {
> > +        return;
> > +    }
> > +
> >      trace_multifd_send_terminate_threads(err != NULL);
> >  
> >      if (err) {
> > @@ -497,26 +509,13 @@ static void multifd_send_terminate_threads(Error *err)
> >          }
> >      }
> >  
> > -    /*
> > -     * We don't want to exit each threads twice.  Depending on where
> > -     * we get the error, or if there are two independent errors in two
> > -     * threads at the same time, we can end calling this function
> > -     * twice.
> > -     */
> > -    if (qatomic_xchg(&multifd_send_state->exiting, 1)) {
> > -        return;
> > -    }
> > -
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> >  
> > -        qemu_mutex_lock(&p->mutex);
> > -        p->quit = true;
> 
> Now that you removed this, we decoupled kicking the threads from setting
> the exit/error, so this function could be split in two.
> 
> We could set the exiting flag at the places the error occurred (multifd
> threads, thread creation, etc) and "terminate the threads" at
> multifd_save_cleanup(). That second part we already do actually:
> 
> void multifd_save_cleanup(void) {
> ...
>     multifd_send_terminate_threads(NULL);
>                                    ^see?
>     for (i = 0; i < migrate_multifd_channels(); i++) {
>         MultiFDSendParams *p = &multifd_send_state->params[i];
> 
>         if (p->running) {
>             qemu_thread_join(&p->thread);
>         }
>     }
> ...
> }
> 
> I think there's no reason anymore for the channels to kick each
> other. They would all be waiting at p->sem and multifd_send_cleanup()
> would kick + join them.

Sounds good here.

I'll attach one patch like this, feel free to have an early look:

=====

From f9a3d63d5cca0068daaea4c72392803f4b29dcb5 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 1 Feb 2024 17:01:54 +0800
Subject: [PATCH] migration/multifd: Split multifd_send_terminate_threads()

Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
    side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
    all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/multifd.c    | 27 ++++++++++++++++++---------
 migration/trace-events |  2 +-
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index c71e74b101..95dc29c8c7 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -536,10 +536,9 @@ int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
     return 1;
 }
 
-static void multifd_send_terminate_threads(Error *err)
+/* Multifd send side hit an error; remember it and prepare to quit */
+static void multifd_send_set_error(Error *err)
 {
-    int i;
-
     /*
      * We don't want to exit each threads twice.  Depending on where
      * we get the error, or if there are two independent errors in two
@@ -550,8 +549,6 @@ static void multifd_send_terminate_threads(Error *err)
         return;
     }
 
-    trace_multifd_send_terminate_threads(err != NULL);
-
     if (err) {
         MigrationState *s = migrate_get_current();
         migrate_set_error(s, err);
@@ -563,7 +560,19 @@ static void multifd_send_terminate_threads(Error *err)
                               MIGRATION_STATUS_FAILED);
         }
     }
+}
+
+static void multifd_send_terminate_threads(void)
+{
+    int i;
+
+    trace_multifd_send_terminate_threads();
 
+    /*
+     * Tell everyone we're quitting.  No xchg() needed here; we simply
+     * always set it.
+     */
+    qatomic_set(&multifd_send_state->exiting, 1);
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -586,7 +595,7 @@ void multifd_save_cleanup(void)
     if (!migrate_multifd()) {
         return;
     }
-    multifd_send_terminate_threads(NULL);
+    multifd_send_terminate_threads();
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -778,7 +787,7 @@ out:
     if (ret) {
         assert(local_err);
         trace_multifd_send_error(p->id);
-        multifd_send_terminate_threads(local_err);
+        multifd_send_set_error(local_err);
         multifd_send_kick_main(p);
         error_free(local_err);
     }
@@ -814,7 +823,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
 
     trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
 
-    multifd_send_terminate_threads(err);
+    multifd_send_set_error(err);
     multifd_send_kick_main(p);
     error_free(err);
 }
@@ -896,7 +905,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
     }
 
     trace_multifd_new_send_channel_async_error(p->id, local_err);
-    multifd_send_terminate_threads(local_err);
+    multifd_send_set_error(local_err);
     multifd_send_kick_main(p);
     object_unref(OBJECT(ioc));
     error_free(local_err);
diff --git a/migration/trace-events b/migration/trace-events
index de4a743c8a..298ad2b0dd 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -141,7 +141,7 @@ multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
-multifd_send_terminate_threads(bool error) "error %d"
+multifd_send_terminate_threads(void) ""
 multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
-- 
2.43.0


-- 
Peter Xu



  reply	other threads:[~2024-02-01  9:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 10:30 [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-01-31 10:30 ` [PATCH 01/14] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-01-31 10:30 ` [PATCH 02/14] migration/multifd: multifd_send_kick_main() peterx
2024-01-31 10:31 ` [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-01-31 15:05   ` Fabiano Rosas
2024-02-01  9:28     ` Peter Xu [this message]
2024-02-01 13:30       ` Fabiano Rosas
2024-02-02  0:21         ` Peter Xu
2024-01-31 10:31 ` [PATCH 04/14] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-01-31 15:27   ` Fabiano Rosas
2024-02-01 10:01     ` Peter Xu
2024-02-01 15:21       ` Fabiano Rosas
2024-02-02  0:28         ` Peter Xu
2024-02-02  0:37           ` Peter Xu
2024-02-02 12:15             ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 05/14] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-01-31 16:02   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs peterx
2024-01-31 18:45   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 07/14] migration/multifd: Simplify locking in sender thread peterx
2024-01-31 20:21   ` Fabiano Rosas
2024-02-01 10:37     ` Peter Xu
2024-01-31 10:31 ` [PATCH 08/14] migration/multifd: Drop pages->num check " peterx
2024-01-31 21:19   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up peterx
2024-01-31 21:24   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 10/14] migration/multifd: Move total_normal_pages accounting peterx
2024-01-31 21:26   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv() peterx
2024-01-31 21:26   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 12/14] migration/multifd: multifd_send_prepare_header() peterx
2024-01-31 21:32   ` Fabiano Rosas
2024-02-01 10:02     ` Peter Xu
2024-01-31 10:31 ` [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-01-31 21:42   ` Fabiano Rosas
2024-02-01 10:15     ` Peter Xu
2024-02-02  3:57   ` Peter Xu
2024-01-31 10:31 ` [PATCH 14/14] migration/multifd: Forbid spurious wakeups peterx
2024-01-31 21:43   ` Fabiano Rosas
2024-02-01  6:01   ` Peter Xu
2024-01-31 22:49 ` [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups Fabiano Rosas
2024-02-01  5:47   ` Peter Xu
2024-02-01 12:51     ` Avihai Horon
2024-02-01 21:46       ` Fabiano Rosas
2024-02-02  2:12         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbtkQLnPJDmXK912@x1n \
    --to=peterx@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=bryan.zhang@bytedance.com \
    --cc=farosas@suse.de \
    --cc=hao.xiang@bytedance.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuan1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.