qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>,
	qemu-stable@nongnu.org
Subject: Re: [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race
Date: Tue, 17 Sep 2024 13:02:36 -0400	[thread overview]
Message-ID: <Zum2LOpaIRVDDEo9@x1n> (raw)
In-Reply-To: <20240913220542.18305-3-farosas@suse.de>

On Fri, Sep 13, 2024 at 07:05:42PM -0300, Fabiano Rosas wrote:
> Fix a segmentation fault in multifd when rb->receivedmap is cleared
> too early.
> 
> After commit 5ef7e26bdb ("migration/multifd: solve zero page causing
> multiple page faults"), multifd started using the rb->receivedmap
> bitmap, which belongs to ram.c and is initialized and *freed* from the
> ram SaveVMHandlers.
> 
> Multifd threads are live until migration_incoming_state_destroy(),
> which is called after qemu_loadvm_state_cleanup(), leading to a crash
> when accessing rb->receivedmap.
> 
> process_incoming_migration_co()        ...
>   qemu_loadvm_state()                  multifd_nocomp_recv()
>     qemu_loadvm_state_cleanup()          ramblock_recv_bitmap_set_offset()
>       rb->receivedmap = NULL               set_bit_atomic(..., rb->receivedmap)
>   ...
>   migration_incoming_state_destroy()
>     multifd_recv_cleanup()
>       multifd_recv_terminate_threads(NULL)
> 
> Move the loadvm cleanup into migration_incoming_state_destroy(), after
> multifd_recv_cleanup() to ensure multifd thread have already exited
> when rb->receivedmap is cleared.
> 
> The have_listen_thread logic can now be removed because its purpose
> was to delay cleanup until postcopy_ram_listen_thread() had finished.
> 
> CC: qemu-stable@nongnu.org
> Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 1 +
>  migration/migration.h | 1 -
>  migration/savevm.c    | 9 ---------
>  3 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 3dea06d577..b190a574b1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -378,6 +378,7 @@ void migration_incoming_state_destroy(void)
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      multifd_recv_cleanup();
> +    qemu_loadvm_state_cleanup();
>  
>      if (mis->to_src_file) {
>          /* Tell source that we are done */
> diff --git a/migration/migration.h b/migration/migration.h
> index 38aa1402d5..20b0a5b66e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -101,7 +101,6 @@ struct MigrationIncomingState {
>      /* Set this when we want the fault thread to quit */
>      bool           fault_thread_quit;
>  
> -    bool           have_listen_thread;
>      QemuThread     listen_thread;
>  
>      /* For the kernel to send us notifications */
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d0759694fd..532ee5e4b0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2076,10 +2076,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * got a bad migration state).
>       */
>      migration_incoming_state_destroy();
> -    qemu_loadvm_state_cleanup();
>  
>      rcu_unregister_thread();
> -    mis->have_listen_thread = false;
>      postcopy_state_set(POSTCOPY_INCOMING_END);
>  
>      object_unref(OBJECT(migr));
> @@ -2130,7 +2128,6 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    mis->have_listen_thread = true;
>      postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen",
>                             postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
>      trace_loadvm_postcopy_handle_listen("return");
> @@ -2978,11 +2975,6 @@ int qemu_loadvm_state(QEMUFile *f)
>  
>      trace_qemu_loadvm_state_post_main(ret);
>  
> -    if (mis->have_listen_thread) {
> -        /* Listen thread still going, can't clean up yet */
> -        return ret;
> -    }

Hmm, I wonder whether we would still need this.  IIUC it's not only about
cleanup, but also that when postcopy is involved, dst QEMU postpones doing
any of the rest in the qemu_loadvm_state_main() call.

E.g. cpu put, aka, cpu_synchronize_all_post_init(), is also done in
loadvm_postcopy_handle_run_bh() later.

IOW, I'd then expect when this patch applied we'll put cpu twice?

I think the should_send_vmdesc() part is fine, as it returns false for
postcopy anyway.  However not sure on the cpu post_init above.

> -
>      if (ret == 0) {
>          ret = qemu_file_get_error(f);
>      }
> @@ -3022,7 +3014,6 @@ int qemu_loadvm_state(QEMUFile *f)
>          }
>      }
>  
> -    qemu_loadvm_state_cleanup();
>      cpu_synchronize_all_post_init();
>  
>      return ret;
> -- 
> 2.35.3
> 

-- 
Peter Xu



  reply	other threads:[~2024-09-17 17:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-13 22:05 [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-13 22:05 ` [PATCH 1/2] migration/savevm: Remove extra load cleanup calls Fabiano Rosas
2024-09-17 16:42   ` Peter Xu
2024-09-17 17:17     ` Fabiano Rosas
2024-09-13 22:05 ` [PATCH 2/2] migration/multifd: Fix rb->receivedmap cleanup race Fabiano Rosas
2024-09-17 17:02   ` Peter Xu [this message]
2024-09-17 17:41     ` Fabiano Rosas
2024-09-20 18:55   ` Elena Ufimtseva
2024-10-08 21:36     ` Fabiano Rosas
  -- strict thread matches above, loose matches on Subject: below --
2024-09-17 18:58 [PATCH 0/2] " Fabiano Rosas
2024-09-17 18:58 ` [PATCH 2/2] " Fabiano Rosas
2024-09-17 19:20   ` Peter Xu
2024-09-17 19:29     ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zum2LOpaIRVDDEo9@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).