qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Shivam Kumar <shivam.kumar1@nutanix.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"farosas@suse.de" <farosas@suse.de>
Subject: Re: [PATCH] Use multifd state to determine if multifd cleanup is needed
Date: Tue, 8 Oct 2024 10:00:39 -0400	[thread overview]
Message-ID: <ZwU7B8O3GgPnKf5S@x1n> (raw)
In-Reply-To: <EF62F9C3-322E-4478-B985-65FDD794B3D2@nutanix.com>

On Tue, Oct 08, 2024 at 12:09:03PM +0000, Shivam Kumar wrote:
> 
> 
> On 7 Oct 2024, at 9:56 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Mon, Oct 07, 2024 at 03:44:51PM +0000, Shivam Kumar wrote:
> If the client calls the QMP command to reset the migration
> capabilities after the migration status is set to failed or cancelled
> 
> Is cancelled ok?
> 
> Asked because I think migrate_fd_cleanup() should still be in CANCELLING
> stage there, so no one can disable multifd capability before that, it
> should fail the QMP command.
> I meant CANCELLED but I can see that currently, CANCELLED is only possible
> after migrate_fd_cleanup is called. So, you are right. We won’t have a problem
> in that path at least.
> 
> But FAILED indeed looks problematic.
> 
> IIUC it's not only to multifd alone - is it a race condition that
> migrate_fd_cleanup() can be invoked without migration_is_running() keeps
> being true?  Then I wonder what happens if a concurrent QMP "migrate"
> happens together with migrate_fd_cleanup(), even with multifd always off.
> 
> Do we perhaps need to cleanup everything before the state changes to
> FAILED?
> Tried calling migrate_fd_cleanup before (and just after) setting the status to
> failed. The migration thread gets stuck in close_return_path_on_source trying
> to join rp_thread.

I am guessing it's because the new rp thread is created before cleanup of
the previous one in this case, so the join() will hang forever.

In this case, below change might not be enough I guess, as I discussed
above.

We may need to postpone setting FAILED status after everything cleaned up
just like what we do with CANCELLING.. maybe we don't need a FAILING state
if we have migrate_set/has_error() - we can use migrate_has/set_error() for
whatever we used to do (set/check) with FAILED, then we set FAILED at last
in the BH like CANCELLED.

> but before multifd cleanup starts, multifd cleanup can be skipped as
> it will falsely assume that multifd was not used for migration. This
> will eventually lead to source QEMU crashing due to the following
> assertion failure:
> 
> yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)`
> failed
> 
> Check multifd state to determine whether multifd was used or not for
> the migration rather than checking the state of multifd migration
> capability.
> 
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
> migration/multifd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 9b200f4ad9..427c9a7956 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -487,7 +487,7 @@ void multifd_send_shutdown(void)
> {
>     int i;
> 
> -    if (!migrate_multifd()) {
> +    if (!multifd_send_state) {
>         return;
>     }
> 
> --
> 2.22.3
> 
> 
> --
> Peter Xu
> 

-- 
Peter Xu



  reply	other threads:[~2024-10-08 14:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-07 15:44 [PATCH] Use multifd state to determine if multifd cleanup is needed Shivam Kumar
2024-10-07 16:26 ` Peter Xu
2024-10-08 12:09   ` Shivam Kumar
2024-10-08 14:00     ` Peter Xu [this message]
2024-10-08 14:20   ` Fabiano Rosas
2024-10-08 15:03     ` Peter Xu
2024-10-08 18:40       ` Fabiano Rosas
2024-10-09 10:02         ` Shivam Kumar
2024-10-09 13:19           ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZwU7B8O3GgPnKf5S@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=qemu-devel@nongnu.org \
    --cc=shivam.kumar1@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).