qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
Date: Tue, 02 Nov 2021 16:28:13 +0100	[thread overview]
Message-ID: <87lf26inle.fsf@secure.mitica> (raw)
In-Reply-To: <20211102110043-mutt-send-email-mst@kernel.org> (Michael S. Tsirkin's message of "Tue, 2 Nov 2021 11:04:55 -0400")

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> As the guest OS is paused, we will never receive the unplug event
>> from the kernel and the migration cannot continue.
>> 
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Well ... what if user previously did
>
> pause
> start migration
> unpause
>
> we are breaking it now for no good reason.

No.  we are canceling the migration.  Migration can not finish on that
state.  We are inside the test:

      if (migration_in_setup(s) && !should_be_hidden) {

If you don't have any really weird setup[1], migration setup just takes
milliseconds (low units for small guest, and 200-300ms for really huge
ones).

So I still think this is right.


1: Weird here means things like RDMA, locking all the memory of one
   guest can take forever.  To get an idea about this, until we
   introduce RDMA, we didn't meassured the setup stage time, because it
   was so small that it didn't matter at all.

Unplug from guest is other operation that can take quite a long time,
because it depends on guest cooperation.

> Further, how about
>
> start migration
> pause
>
> are we going to break this too? by failing pause?

I haven't thougth about this one, but it shouldn't matter (famous last
words), beacuse there are to cases:

- migration has started and unplug has already finished, no problem.

- migration has started but we haven't yet arrived to
  virtio_net_handle_migration_primary().  We are paused, and we give the
  guest a good error message about why are we failing.  notice that
  migration can't finish anyways, it would stuck there forever waiting
  for the (stopped guest to unplug the device).

So the only case that I can see that *could* matter is:

- start migration
- pause the guest
   this implies pausing the migration
- unpause
   at this point we can continue the migration

do we really care about this scenary?

I think not, because the migration has advanced so few, that starting
from zero would be the best option anyways.

Later, Juan.

PD1: No, I am not sure what happens if you run "pause" after the event
     to guest is sent, but before that the guest finish the unplug (I
     guess it would stall).  But in this case, we are doing something at
     least fishy.  On the other hand, we know that "pause; migration"
     will never really work.

PD2: Perhaps we could "invet" another state that means:
IN_SETUP_AND_WE_CANT_BE_PAUSED, and change it between we ask for the
device to unplug, and that it unplugs.  But it looks really complicated.



  reply	other threads:[~2021-11-02 15:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 14:43 [PATCH 0/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
2021-09-29 14:43 ` [PATCH 1/2] migration: provide an error message to migration_cancel() Laurent Vivier
2021-11-02  9:04   ` Juan Quintela
2021-09-29 14:43 ` [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug Laurent Vivier
2021-11-02  9:04   ` Juan Quintela
2021-11-02 15:04   ` Michael S. Tsirkin
2021-11-02 15:28     ` Juan Quintela [this message]
2021-11-02 17:06     ` Laurent Vivier
2021-11-02 17:08       ` Michael S. Tsirkin
2021-11-02 17:26         ` Juan Quintela
2021-11-02 17:47           ` Laurent Vivier
2021-11-02 18:09             ` Juan Quintela
2021-11-02 17:43         ` Laurent Vivier
2021-10-06  9:22 ` [PATCH 0/2] " Laurent Vivier
2021-10-21  8:49 ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lf26inle.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).