qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: Time to introduce a migration protocol negotiation (Re: [PATCH v2 00/25] migration: Postcopy Preemption)
Date: Tue, 15 Mar 2022 11:05:51 +0000	[thread overview]
Message-ID: <YjBzD4V3iG4EMjTU@redhat.com> (raw)
In-Reply-To: <YjBt4XqD1bg/JJx1@work-vm>

On Tue, Mar 15, 2022 at 10:43:45AM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > Almost every time we add a new feature to migration, we end up
> > having to define at least one new migration parameter, then wire
> > it up in libvirt, and then the mgmt app too, often needing to
> > ensure it is turn on for both client and server at the same time.
> > 
> > 
> > For some features, requiring an explicit opt-in could make sense,
> > because we don't know for sure that the feature is always a benefit.
> > These are things that can be thought of as workload sensitive
> > tunables.
> > 
> > 
> > For other features though, it feels like we would be better off if
> > we could turn it on by default with no config. These are things
> > that can be thought of as migration infrastructre / transport
> > architectural designs.
> > 
> > 
> > eg it would be nice to be able to use multifd by default for
> > migration. We would still want a tunable to control the number
> > of channels, but we ought to be able to just start with a default
> > number of channels automatically, so the tunable is only needed
> > for special cases.
> 
> Right, I agree in part - but we do need those tunables to exist; we rely
> on being able to turn things on or off, or play with the tunables
> to debug and get performance.  We need libvirt to enumerate the tunables
> from qemu rather than having to add code to libvirt every time.
> They're all in QAPI definitions anyway - libvirt really shouldn't be
> adding code each time.   Then we could have a  virsh migrate --tunable
> rather than having loads of extra options which all have different names
> from qemu's name for the same feature.

Provided tunables are strictl just tunables, that would be viable.
Right now our tunables are a mixture of tunables and low level
data transport architectural knobs.

> > This post-copy is another case.  We should start off knowing
> > we can switch to post-copy at any time. We should further be
> > able to add pre-emption if we find it available. IOW, we should
> > not have required anything more than 'switch to post-copy' to
> > be exposed to mgmtm apps.
> 
> Some of these things are tricky; for example knowing whether or not you
> can do postcopy depends on your exact memory configuration; some of that
> is tricky to probe.

I'm just refering to the postcopy capability that we nneed to
set upfront before starting the migration on both sides.  IIUC
that should be possible for QEMU to automatically figure out,
if it could negotiate with dst QEMU.

Whether we ever switch from precopy to postcopy mode once
running can remain under mgmt app control.

> > Or enabling zero copy on either send or receive side.
> > 
> > Or enabling kernel-TLS offload
> 
> Will kernel-TLS be something you'd want to automatically turn on?
> We don't know yet whether it's a good idea if you don't have hardware
> support.

I'm pretty sure kTLS will always be a benefit, because even without
hardware offload you still benefit from getting the TLS encryption
onto a separate CPU core from QEMU's migration thread. We've measured
this already with NBD and I've no reason to suspect it will differ
for migration. 


> > Now define a protocol handshake. A 5 minute thought experiment
> > starts off with something simple:
> > 
> >    dst -> src:  Greeting Message:
> >                   Magic: "QEMU-MIGRATE"  12 bytes
> >                   Num Versions: 1 byte
> >                   Version list: 1 byte * num versions
> >                   Num features: 4 bytes
> >                   Feature list: string * num features
> > 
> >    src -> dst:  Greeting Reply:
> >                   Magic: "QEMU-MIGRATE" 12 bytes
> >                   Select version: 1 byte
> >                   Num select features: 4 bytes
> >                   Selected features: string * num features   
> > 
> >    .... possibly more src <-> dst messages depending on
> >         features negotiated....
> > 
> >    src -> dst:  start migration
> >  
> >     ...traditional migration stream runs now for the remainder
> >        of this connection ...
> 
> Don't worry about designing the bytes; we already have a command
> structure; we just need to add a MIG_CMD_FEATURES and a 
> MIG_RP_MSG_FEATURES
> (I'm not sure what we need to do for RDMA; or what we do for exec: or
> savevm)

For RDMA there are two options

 - Drop RDMA support (preferred ;-)

 - Use a regular TCP channel for the migration protocol
   handshake todo all the feature negotiation.  Open a
   second channel using RDMA just for the migration payload

Before considering "exec", lets think about "fd" as that's more
critical.

How can be get an arbitrary number of bi-directional channels
open when the user is passing in pre-opened FDs individual and
does not know upfront how many QEMU wants ?

We could have an event that QEMU emits whenever it wants to be
given a new "fd" channel. The mgmt app would watch for that and
pass in more pre-opened FDs in response. Not too difficult

Back to "exec" we have two options

 - Drop exec support, and just let the user spawn the
   program externally and pass in a pre-opened socket
   FDs for talking to it

 - Keep exec and make it use a socketpair instead of
   pipe FDs. Connect the socketpair to both stdin+stdout.
   Exec the program many times if needing many channels.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2022-03-15 11:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01  8:39 [PATCH v2 00/25] migration: Postcopy Preemption Peter Xu
2022-03-01  8:39 ` [PATCH v2 01/25] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
2022-03-01  8:39 ` [PATCH v2 02/25] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
2022-03-01  8:39 ` [PATCH v2 03/25] migration: Tracepoint change in postcopy-run bottom half Peter Xu
2022-03-01  8:39 ` [PATCH v2 04/25] migration: Introduce postcopy channels on dest node Peter Xu
2022-03-01  8:39 ` [PATCH v2 05/25] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
2022-03-01  8:39 ` [PATCH v2 06/25] migration: Add postcopy_thread_create() Peter Xu
2022-03-01  8:39 ` [PATCH v2 07/25] migration: Move static var in ram_block_from_stream() into global Peter Xu
2022-03-01  8:39 ` [PATCH v2 08/25] migration: Add pss.postcopy_requested status Peter Xu
2022-03-01  8:39 ` [PATCH v2 09/25] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-03-01  8:39 ` [PATCH v2 10/25] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
2022-03-01  8:39 ` [PATCH v2 11/25] migration: postcopy_pause_fault_thread() never fails Peter Xu
2022-03-01  8:39 ` [PATCH v2 12/25] migration: Export ram_load_postcopy() Peter Xu
2022-03-01  8:39 ` [PATCH v2 13/25] migration: Move channel setup out of postcopy_try_recover() Peter Xu
2022-03-01  8:39 ` [PATCH v2 14/25] migration: Add migration_incoming_transport_cleanup() Peter Xu
2022-03-01  8:39 ` [PATCH v2 15/25] migration: Allow migrate-recover to run multiple times Peter Xu
2022-03-01  8:39 ` [PATCH v2 16/25] migration: Add postcopy-preempt capability Peter Xu
2022-03-01  8:39 ` [PATCH v2 17/25] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-03-01  8:39 ` [PATCH v2 18/25] migration: Postcopy preemption enablement Peter Xu
2022-03-01  8:39 ` [PATCH v2 19/25] migration: Postcopy recover with preempt enabled Peter Xu
2022-03-01  8:39 ` [PATCH v2 20/25] migration: Create the postcopy preempt channel asynchronously Peter Xu
2022-03-01  8:39 ` [PATCH v2 21/25] migration: Parameter x-postcopy-preempt-break-huge Peter Xu
2022-03-01  8:39 ` [PATCH v2 22/25] migration: Add helpers to detect TLS capability Peter Xu
2022-03-01  8:39 ` [PATCH v2 23/25] migration: Fail postcopy preempt with TLS for now Peter Xu
2022-03-01  8:39 ` [PATCH v2 24/25] tests: Add postcopy preempt test Peter Xu
2022-03-01  8:39 ` [PATCH v2 25/25] tests: Pass in MigrateStart** into test_migrate_start() Peter Xu
2022-03-02 12:11   ` Dr. David Alan Gilbert
2022-03-01  9:25 ` [PATCH v2 00/25] migration: Postcopy Preemption Daniel P. Berrangé
2022-03-01 10:17   ` Peter Xu
2022-03-01 10:27     ` Daniel P. Berrangé
2022-03-01 10:55       ` Peter Xu
2022-03-01 16:51         ` Dr. David Alan Gilbert
2022-03-02  1:46           ` Peter Xu
2022-03-14 18:49           ` Time to introduce a migration protocol negotiation (Re: [PATCH v2 00/25] migration: Postcopy Preemption) Daniel P. Berrangé
2022-03-15  6:13             ` Peter Xu
2022-03-15 11:15               ` Daniel P. Berrangé
2022-03-16  3:30                 ` Peter Xu
2022-03-16  9:59                   ` Daniel P. Berrangé
2022-03-16 10:40                     ` Peter Xu
2022-03-16 11:00                       ` Daniel P. Berrangé
2022-03-18  7:08                         ` Peter Xu
2022-03-15 10:43             ` Dr. David Alan Gilbert
2022-03-15 11:05               ` Daniel P. Berrangé [this message]
2022-03-01 18:05         ` [PATCH v2 00/25] migration: Postcopy Preemption Daniel P. Berrangé
2022-03-02  1:48           ` Peter Xu
2022-03-02 12:14 ` Dr. David Alan Gilbert
2022-03-02 12:34   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjBzD4V3iG4EMjTU@redhat.com \
    --to=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).