From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60844)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1axwDT-0003ok-15
	for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1axwD9-00086K-7o
	for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:41 -0400
Received: from mx1.redhat.com ([209.132.183.28]:44430)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1axwD9-00083V-1J
	for qemu-devel@nongnu.org; Wed, 04 May 2016 08:47:27 -0400
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id AB7F0804E5
	for <qemu-devel@nongnu.org>; Wed,  4 May 2016 12:47:15 +0000 (UTC)
Date: Wed, 4 May 2016 13:47:12 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160504124711.GG2302@work-vm>
References: <87oa8mf4sj.fsf@emacs.mitica>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87oa8mf4sj.fsf@emacs.mitica>
Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: QEMU Developer <qemu-devel@nongnu.org>

* Juan Quintela (quintela@redhat.com) wrote:
> 
> Hi
> 
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).

Let me add:
  Getting everything to use VMState;  I intend to try and fix virtio to use
VMState as much as possible.

And yes, a wiki entry would be good; then people might notice it and fix things
for us :-)

> - migration thread on reception
>   would make trivial to do other things while receiving, and would make
>   postcopy easier also (I was going to put much easier, but postcopy is
>   never easy).

I don't think it makes much difference to postcopy.

> - migration capabilities and parameters
>   this is a mess.  Not, is worse than that.  I don't know who is to
>   blame here, but something needs to be done:
> 
>      void qmp_migrate_set_parameters(bool has_compress_level,
>                                 int64_t compress_level,
>                                 bool has_compress_threads,
>                                 int64_t compress_threads,
>                                 bool has_decompress_threads,
>                                 int64_t decompress_threads,
>                                 bool has_x_cpu_throttle_initial,
>                                 int64_t x_cpu_throttle_initial,
>                                 bool has_x_cpu_throttle_increment,
>                                 int64_t x_cpu_throttle_increment,
>                                 bool has_multifd_threads,
>                                 int64_t multifd_threads,
>                                 Error **errp)
> 
> 
> 
>     Can we move this to an array of structs, please, pretty please?
>     I think that for this one, the blame is on qmp

Yes; zhanghailiang had a patch to try and help that and there was
some discussion at about the same time (June last year?!)
That function is VERY delicate; if you screw up and get those in the
wrong order then everything will appear to be just fine....

> - info migrate
>   This deserves its own item.  Lets see a typical output
> 
> (qemu)info migrate
> 
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on 
> 
>    Aha, we have the capabilities, but not the parameters.  This is
>    historical, I know, but don't belong here.

Well, for the HMP version we can fix any of this IMHO without a problem;
lets add more detail/fix names/etc.

> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
> 
> We need to decide some units also internal.  Some things are in bytes,
> some are in kilobytes, some are in pages.  Some are in host pages, or
> guest pages, or who knows :-(

I don't - every time I look at some of it I end up going back to the source.

> - Block migration (the migration/block.c one).  This is the bastard
>   child of migration.  Much less tested, we should make a decision
>   about letting it live or deprecating it.  Things needed from memory:
>      - functions should return the same values than ram.c
>        some functions don't have "exact" values, and return 1 when there
>        are more than one block dirty, etc, etc
>      - if we continue maintaing it, allowing it to have _some_ shared
>        devices and some non shared ones, insntead of everything?

My vague understanding was that there were still configurations that were
only useable with block migration; mostly those things that only wanted
a single socket because they wanted to tunnel it;  this might change with
Dan's TLS setup.
Having said that, I don't understand all of the block migration alternatives.

> - RDMA: Another step child
> 
>   This is really, really weird.  We don't use the normal infrastructure
>   for RDMA, we use the ram_control_* stuff.  We should really move to
>   use the normal stuff here.

I'm not sure that's possible - while the RDMA code is huge and horribly
complex, some of that is just down to the kernel APIs and standards it
has to deal with; it might be possibl to glue it into ram.c better
but I wouldn't bet on it.

> - autoconverge code:  This could be used outside of migration (i.e. just
>   to slow down a guess).  We should really do some measurement here to
>   see how useful it is for migration.  If the guest is using lots of
>   memory dirtying, we end having to throttle the guest 90% or so :-(

Dan's doing some I think.  The other question is how it compares to using
an external cgroup based converge (which I think is what oVirt does).

> - xbzrle.  We only have one cache, we should decide how to work with
>   this for multithread/compression.
> 
> - When we do migration, we have spaguetti code to decide if:
>   * it is a zero page
>   * it is a duplicated page
>   * it is a xbzrle page
>   * it is a compressed page
>   And as the code is written, it is not trivial to add new "options".  I
>   think that we should "re-think" what combinations are allowed an which
>   ones make nosense.

Yeh, and find a way to express to libvirt what combinations are legal.

> - savevm and migration: they use two different paths for not really good
>   reason.  We should really abstract this to a single code path.
>   We always forget the savevm one when we do changes.
> 
> - error handling.  Every function should return an error.  Every
>   function should return an error.

Yeh.

> - qemu_get_buffer() don't give one error if there is nothing to read,
>   sniff.
> 
> - Multipage support: Welcome to the XXI century.  Now almost all
>   architectures have HugePages.  And other have different sized pages
>   (in PPC is not strange that page size of host and guest differ).  We
>   have work to do here.  For starters, sending Huge pages as one chunk
>   will make TransparentHugePages happier.

Yeh, Andrea has pushed me about this a bit; the only problem I have
here is with postcopy where getting a page request stuck behind a huge
page request would do nasty things to the latency - but your multifd might
fix that.

> - Bitmaps.  Related with previous one.  We should really be better about
>   walking them and about synchronising them between qemu/kernel.

Oh yes, they're a nightmare on things with different page sizes; especially
when people worry that the source and destination might have different host
page sizes.

> - COLO: We need to integrate it.
> 
> I will continue the rant at some other point O:-)  Just now I need to
> left for the bar.

One that's related to that, is the big-lock around the last stage of migrate;
we really could do with being able to recover from a migrate that hangs during
the final stage due to a block-IO or network issue.

> Thanks for your attention, Juan.
> 
> PD.  I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK