virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org,
	Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Matt Benjamin <mbenjamin@redhat.com>,
	virtualization@lists.linux-foundation.org,
	Christoffer Dall <christoffer.dall@linaro.org>
Subject: Re: virtio-vsock live migration
Date: Tue, 15 Mar 2016 15:10:37 +0000	[thread overview]
Message-ID: <20160315151037.GA26263@stefanha-x1.localdomain> (raw)
In-Reply-To: <20160311014147-mutt-send-email-mst@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 6191 bytes --]

On Fri, Mar 11, 2016 at 01:56:05AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > Michael pointed out that the virtio-vsock draft specification does not
> > address live migration and in fact currently precludes migration.
> > 
> > Migration is fundamental so the device specification at least mustn't
> > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > Michael Tsirkin, I am now summarizing the approach that I want to
> > include in the next draft specification.
> > 
> > Feedback and comments welcome!  In the meantime I will implement this in
> > code and update the draft specification.
> > 
> > 1. Requirements
> > 
> > Virtio-vsock is a new AF_VSOCK transport.  As such, it should provide at
> > least the same guarantees as the existing AF_VSOCK VMCI transport.  This
> > is for consistency and to allow code reuse across any AF_VSOCK
> > transport.
> > 
> > Virtio-vsock aims to replace virtio-serial by providing the same
> > guest/host communication ability but with sockets API semantics that are
> > more popular and convenient for application developers.  Therefore
> > virtio-vsock migration should provide at least the same level of
> > migration functionality as virtio-serial.
> > 
> > Ideally it should be possible to migrate applications using AF_VSOCK
> > together with the virtual machine so that guest<->host communication is
> > interrupted.  Neither AF_VSOCK VMCI nor virtio-serial support this
> > today.
> 
> I'm not sure why do you say this about virtio serial.
> It appears that if host pre-connected to destination
> qemu before migration, backend reconnects transparently
> on destination.

You are right, virtio-serial supports keeping active ports open across
migration (as well as closing active ports across migration).  In
virtio-vsock the equivalent would be setsockopt() CRIU-style socket
migration which is not implemented today.

> > 2. Basic disruptive migration flow
> > 
> > When the virtual machine migrates from the source host to the
> > destination host, the guest's CID may change.  The CID namespace is
> > host-wide
> 
> 
> BTW, I think CIDs would have to become per network namespace.

Yes, I agree.

> > so other hosts may have CID collisions and allocate a new CID
> > for incoming migration VMs.
> 
> I guess all this is so that guest can retrieve its CID and
> send it to host using some side-channel?

Yes.

> > The device notifies the guest that the CID has changed.  Guest sockets
> > are affected as follows:
> > 
> >  * Established connections are reset (ECONNRESET) and the guest
> >    application will have to reconnect.
> > 
> >  * Listen sockets remain open.  The only thing to note is that
> >    connections from the host are now made to the new CID.  This means
> >    the local address of the listen socket is automatically updated to
> >    the new CID.
> > 
> >  * Sockets in other states are unchanged.
> > 
> > Applications must handle disruptive migration by reconnecting if
> > necessary after ECONNRESET.
> > 
> > 3. Checkpoint/restore for seamless migration
> > 
> > Applications that wish to communicate across live migration can do so
> > but this requires extra application-specific checkpoint/restore code.
> > 
> > This is similar to the approach taken by the CRIU project where
> > getsockopt()/setsockopt() is used to migrate socket state.  The
> > difference is that the application process is not automatically migrated
> > from the source host to the destination host.  Therefore, the
> > application needs to migrate its own state somehow.
> > 
> > The flow is as follows:
> > 
> > The application on the source host must quiesce (stop sending/receiving)
> > and use getsockopt() to extract socket state information from the host
> > kernel.
> > 
> > A new instance of the application is started on the destination host and
> > given the state so it can restore the connection.  The setsockopt()
> > syscall is used to restore socket state information.
> > 
> > The guest is given a list of <host_old_cid, host_new_cid, host_port,
> > guest_port> tuples for established connections that must not be reset
> > when the guest CID update notification is received.  These connections
> > will carry on as if nothing changed.
> > 
> > Note that the connection's remote address is updated from host_old_cid
> > to host_new_cid.  This allows remapping of CIDs (if necessary).
> > Typically this will be unused because the host always has well-known CID
> > 2.  In a guest<->guest scenario it may be used to remap CIDs.
> > 
> > 
> > For the time being I am focussing on the basic disruptive migration flow
> > only.  Checkpoint/restore can be added with a feature bit in the future.
> > It is a lot more complex and I'm not sure whether there will be any
> > users yet.
> > 
> > Stefan
> 
> This makes some things harder. For example, imagine a guest
> reboot mixed with migration. We don't know why did the connection
> die, so we'll retry connections until - when?
> 
> Could you please describe some user of vsock and show how
> it recovers from destructive migration?

qemu-guest-agent runs inside the guest with an AF_VSOCK listen socket.

libvirt arbitrates the qemu-guest-agent connection and provides an API
for applications to send commands.

When an application sends a command, libvirt checks if the connection to
qemu-guest-agent is established.  If there is no connection libvirt will
attempt to connect.

The command is sent to qemu-guest-agent and the response is handed back
to the guest application.  libvirt arbitrates access so commands from
multiple applications are serialized.

Live migration resets the established connection between
qemu-guest-agent and the source host's libvirt daemon.  When an
application issues the next qemu-guest-agent command the libvirt daemon
on the destination host notices there is no established connection yet
and starts a new one.

Libvirt refuses to send qemu-guest-agent commands while live migration
is in progress.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2016-03-15 15:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-03 15:37 virtio-vsock live migration Stefan Hajnoczi
2016-03-10 23:56 ` Michael S. Tsirkin
2016-03-14 11:13 ` [virtio-dev] " Michael S. Tsirkin
     [not found] ` <20160311014147-mutt-send-email-mst@redhat.com>
2016-03-15 15:10   ` Stefan Hajnoczi [this message]
     [not found] ` <20160314130150-mutt-send-email-mst@redhat.com>
2016-03-15 15:15   ` Stefan Hajnoczi
     [not found]   ` <20160315151529.GB26263@stefanha-x1.localdomain>
2016-03-15 16:12     ` Michael S. Tsirkin
     [not found]     ` <20160315180916-mutt-send-email-mst@redhat.com>
2016-03-16 14:32       ` Stefan Hajnoczi
2016-03-16 14:58         ` Matt Benjamin
2016-03-16 15:05         ` Michael S. Tsirkin
     [not found]         ` <20160316163344-mutt-send-email-mst@redhat.com>
2016-04-06 12:55           ` Stefan Hajnoczi
     [not found]           ` <20160406125550.GB17538@stefanha-x1.localdomain>
2016-04-06 13:17             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160315151037.GA26263@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=imbrenda@linux.vnet.ibm.com \
    --cc=mbenjamin@redhat.com \
    --cc=mst@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).