From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: QEMU Developer <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
Date: Wed, 4 May 2016 13:47:12 +0100 [thread overview]
Message-ID: <20160504124711.GG2302@work-vm> (raw)
In-Reply-To: <87oa8mf4sj.fsf@emacs.mitica>
* Juan Quintela (quintela@redhat.com) wrote:
>
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).
Let me add:
Getting everything to use VMState; I intend to try and fix virtio to use
VMState as much as possible.
And yes, a wiki entry would be good; then people might notice it and fix things
for us :-)
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
I don't think it makes much difference to postcopy.
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
Yes; zhanghailiang had a patch to try and help that and there was
some discussion at about the same time (June last year?!)
That function is VERY delicate; if you screw up and get those in the
wrong order then everything will appear to be just fine....
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
Well, for the HMP version we can fix any of this IMHO without a problem;
lets add more detail/fix names/etc.
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes,
> some are in kilobytes, some are in pages. Some are in host pages, or
> guest pages, or who knows :-(
I don't - every time I look at some of it I end up going back to the source.
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
My vague understanding was that there were still configurations that were
only useable with block migration; mostly those things that only wanted
a single socket because they wanted to tunnel it; this might change with
Dan's TLS setup.
Having said that, I don't understand all of the block migration alternatives.
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
I'm not sure that's possible - while the RDMA code is huge and horribly
complex, some of that is just down to the kernel APIs and standards it
has to deal with; it might be possibl to glue it into ram.c better
but I wouldn't bet on it.
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
Dan's doing some I think. The other question is how it compares to using
an external cgroup based converge (which I think is what oVirt does).
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
Yeh, and find a way to express to libvirt what combinations are legal.
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
Yeh.
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
Yeh, Andrea has pushed me about this a bit; the only problem I have
here is with postcopy where getting a page request stuck behind a huge
page request would do nasty things to the latency - but your multifd might
fix that.
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
Oh yes, they're a nightmare on things with different page sizes; especially
when people worry that the source and destination might have different host
page sizes.
> - COLO: We need to integrate it.
>
> I will continue the rant at some other point O:-) Just now I need to
> left for the bar.
One that's related to that, is the big-lock around the last stage of migrate;
we really could do with being able to recover from a migrate that hangs during
the final stage due to a block-IO or network issue.
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2016-05-04 12:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert [this message]
2016-05-04 16:35 ` Greg Kurz
2016-05-04 13:08 ` Denis V. Lunev
2016-05-04 13:38 ` Eric Blake
2016-05-04 14:30 ` Juan Quintela
2016-06-16 9:05 ` Markus Armbruster
2016-05-05 6:19 ` Hailiang Zhang
2016-05-05 8:01 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160504124711.GG2302@work-vm \
--to=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.