From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Jason Wang <jasowang@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest
Date: Wed, 2 Dec 2020 11:26:39 +0000 [thread overview]
Message-ID: <20201202112639.GE2360260@redhat.com> (raw)
In-Reply-To: <20201202061641-mutt-send-email-mst@kernel.org>
On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2020 at 10:55:15AM +0000, Daniel P. Berrangé wrote:
> > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. Tsirkin wrote:
> > > >> On Wed, Dec 02, 2020 at 10:27:18AM +0000, Daniel P. Berrangé wrote:
> > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. Tsirkin wrote:
> > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan Quintela wrote:
> > > >> > > > If we have a paused guest, it can't unplug the network VF device, so
> > > >> > > > we wait there forever. Just change the code to give one error on that
> > > >> > > > case.
> > > >> > > >
> > > >> > > > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > > >> > >
> > > >> > > It's certainly possible but it's management that created
> > > >> > > this situation after all - why do we bother to enforce
> > > >> > > a policy? It is possible that management will unpause immediately
> > > >> > > afterwards and everything will proceed smoothly.
> > > >> > >
> > > >> > > Yes migration will not happen until guest is
> > > >> > > unpaused but the same it true of e.g. a guest that is stuck
> > > >> > > because of a bug.
> > > >> >
> > > >> > That's pretty different behaviour from how migration normally handles
> > > >> > a paused guest, which is that it is guaranteed to complete the migration
> > > >> > in as short a time as network bandwidth allows.
> > > >> >
> > > >> > Just ignoring the situation I think will lead to surprise apps / admins,
> > > >> > because the person/entity invoking the migration is not likely to have
> > > >> > checked wether this particular guest uses net failover or not before
> > > >> > invoking - they'll just be expecting a paused migration to run fast and
> > > >> > be guaranteed to complete.
> > > >> >
> > > >> > Regards,
> > > >> > Daniel
> > > >>
> > > >> Okay I guess. But then shouldn't we handle the reverse situation too:
> > > >> pausing guest after migration started but before device was
> > > >> unplugged?
> > > >>
> > > >
> > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > pausing guest until migration is cancelled?
> > > >
> > > > All this seems heavy handed to me ...
> > >
> > > This is the minimal fix that I can think of.
> > >
> > > Further solution would be:
> > > - Add a new migration parameter: migrate-paused
> > > - change libvirt to use the new parameter if it exist
> > > - in qemu, when we do start migration (but after we wait for the unplug
> > > device) paused the guest before starting migration and resume it after
> > > migration finish.
> >
> > It would also have to handle issuing of paused after migration has
> > been started - delay the pause request until the nuplug is complete
> > is one answer.
>
> Hmm my worry would be that pausing is one way to give cpu
> resources back to host. It's problematic if guest can delay
> that indefinitely.
hmm, yes, that is awkward. Perhaps we should just report an explicit
error then. In normal cases this won't happen, as unplug will have
easily completed before the mgmt app pauses the running migration.
In broken/malicious guest cases, this at least ives mgmt a heads up
that something is wrong and they might then decide to cancel the
migration.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2020-12-02 11:52 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-18 8:37 [PATCH v2 00/27] Virtio net failover fixes Juan Quintela
2020-11-18 8:37 ` [PATCH v2 01/27] migration: Network Failover can't work with a paused guest Juan Quintela
2020-11-25 12:05 ` Dr. David Alan Gilbert
2020-12-02 10:13 ` Michael S. Tsirkin
2020-12-02 10:27 ` Daniel P. Berrangé
2020-12-02 10:31 ` Michael S. Tsirkin
2020-12-02 10:33 ` Michael S. Tsirkin
2020-12-02 10:51 ` Juan Quintela
2020-12-02 10:55 ` Daniel P. Berrangé
2020-12-02 11:19 ` Michael S. Tsirkin
2020-12-02 11:26 ` Daniel P. Berrangé [this message]
2020-12-02 11:37 ` Michael S. Tsirkin
2020-12-02 12:01 ` Daniel P. Berrangé
2020-12-03 11:21 ` Michael S. Tsirkin
2020-12-03 11:32 ` Daniel P. Berrangé
2020-12-03 11:40 ` Michael S. Tsirkin
2020-12-03 11:43 ` Dr. David Alan Gilbert
2020-12-03 12:02 ` Michael S. Tsirkin
2020-12-03 12:04 ` Dr. David Alan Gilbert
2020-12-03 12:11 ` Michael S. Tsirkin
2020-12-03 12:16 ` Daniel P. Berrangé
2020-12-08 18:48 ` Michael S. Tsirkin
2020-12-03 11:45 ` Daniel P. Berrangé
2020-12-03 12:01 ` Michael S. Tsirkin
2020-12-03 12:06 ` Daniel P. Berrangé
2020-12-03 12:13 ` Michael S. Tsirkin
2020-12-08 18:32 ` Michael S. Tsirkin
2020-12-02 10:34 ` Daniel P. Berrangé
2020-11-18 8:37 ` [PATCH v2 02/27] failover: fix indentantion Juan Quintela
2020-11-18 8:37 ` [PATCH v2 03/27] failover: Use always atomics for primary_should_be_hidden Juan Quintela
2020-11-18 8:37 ` [PATCH v2 04/27] failover: primary bus is only used once, and where it is set Juan Quintela
2020-11-18 8:37 ` [PATCH v2 05/27] failover: Remove unused parameter Juan Quintela
2020-11-18 8:37 ` [PATCH v2 06/27] failover: Remove external partially_hotplugged property Juan Quintela
2020-11-18 8:37 ` [PATCH v2 07/27] failover: qdev_device_add() returns err or dev set Juan Quintela
2020-11-18 8:37 ` [PATCH v2 08/27] failover: Rename bool to failover_primary_hidden Juan Quintela
2020-11-18 8:37 ` [PATCH v2 09/27] failover: g_strcmp0() knows how to handle NULL Juan Quintela
2020-11-18 8:37 ` [PATCH v2 10/27] failover: Remove primary_device_opts Juan Quintela
2020-11-18 8:37 ` [PATCH v2 11/27] failover: remove standby_id variable Juan Quintela
2020-11-18 8:37 ` [PATCH v2 12/27] failover: Remove primary_device_dict Juan Quintela
2020-11-18 8:37 ` [PATCH v2 13/27] failover: Remove memory leak Juan Quintela
2020-11-18 8:37 ` [PATCH v2 14/27] failover: simplify virtio_net_find_primary() Juan Quintela
2020-11-18 8:37 ` [PATCH v2 15/27] failover: should_be_hidden() should take a bool Juan Quintela
2020-11-18 8:37 ` [PATCH v2 16/27] failover: Rename function to hide_device() Juan Quintela
2020-11-18 8:37 ` [PATCH v2 17/27] failover: virtio_net_connect_failover_devices() does nothing Juan Quintela
2020-11-18 8:37 ` [PATCH v2 18/27] failover: Rename to failover_find_primary_device() Juan Quintela
2020-11-18 8:37 ` [PATCH v2 19/27] failover: simplify qdev_device_add() failover case Juan Quintela
2020-11-18 8:37 ` [PATCH v2 20/27] failover: simplify qdev_device_add() Juan Quintela
2020-11-18 8:37 ` [PATCH v2 21/27] failover: make sure that id always exist Juan Quintela
2020-11-18 8:37 ` [PATCH v2 22/27] failover: remove failover_find_primary_device() error parameter Juan Quintela
2020-11-18 8:37 ` [PATCH v2 23/27] failover: split failover_find_primary_device_id() Juan Quintela
2020-11-18 8:37 ` [PATCH v2 24/27] failover: We don't need to cache primary_device_id anymore Juan Quintela
2020-11-18 8:37 ` [PATCH v2 25/27] failover: Caller of this two functions already have primary_dev Juan Quintela
2020-11-18 8:37 ` [PATCH v2 26/27] failover: simplify failover_unplug_primary Juan Quintela
2020-11-18 8:37 ` [PATCH v2 27/27] failover: Remove primary_dev member Juan Quintela
2020-11-18 8:53 ` [PATCH v2 00/27] Virtio net failover fixes Michael S. Tsirkin
2020-12-02 10:16 ` Juan Quintela
2020-12-02 10:30 ` Michael S. Tsirkin
2020-12-02 9:55 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201202112639.GE2360260@redhat.com \
--to=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).