From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: QEMU Developer <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
Date: Wed, 4 May 2016 13:47:12 +0100 [thread overview]
Message-ID: <20160504124711.GG2302@work-vm> (raw)
In-Reply-To: <87oa8mf4sj.fsf@emacs.mitica>
* Juan Quintela (quintela@redhat.com) wrote:
>
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).
Let me add:
Getting everything to use VMState; I intend to try and fix virtio to use
VMState as much as possible.
And yes, a wiki entry would be good; then people might notice it and fix things
for us :-)
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
I don't think it makes much difference to postcopy.
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
Yes; zhanghailiang had a patch to try and help that and there was
some discussion at about the same time (June last year?!)
That function is VERY delicate; if you screw up and get those in the
wrong order then everything will appear to be just fine....
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
Well, for the HMP version we can fix any of this IMHO without a problem;
lets add more detail/fix names/etc.
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes,
> some are in kilobytes, some are in pages. Some are in host pages, or
> guest pages, or who knows :-(
I don't - every time I look at some of it I end up going back to the source.
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
My vague understanding was that there were still configurations that were
only useable with block migration; mostly those things that only wanted
a single socket because they wanted to tunnel it; this might change with
Dan's TLS setup.
Having said that, I don't understand all of the block migration alternatives.
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
I'm not sure that's possible - while the RDMA code is huge and horribly
complex, some of that is just down to the kernel APIs and standards it
has to deal with; it might be possibl to glue it into ram.c better
but I wouldn't bet on it.
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
Dan's doing some I think. The other question is how it compares to using
an external cgroup based converge (which I think is what oVirt does).
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
Yeh, and find a way to express to libvirt what combinations are legal.
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
Yeh.
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
Yeh, Andrea has pushed me about this a bit; the only problem I have
here is with postcopy where getting a page request stuck behind a huge
page request would do nasty things to the latency - but your multifd might
fix that.
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
Oh yes, they're a nightmare on things with different page sizes; especially
when people worry that the source and destination might have different host
page sizes.
> - COLO: We need to integrate it.
>
> I will continue the rant at some other point O:-) Just now I need to
> left for the bar.
One that's related to that, is the big-lock around the last stage of migrate;
we really could do with being able to recover from a migrate that hangs during
the final stage due to a block-IO or network issue.
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2016-05-04 12:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert [this message]
2016-05-04 16:35 ` Greg Kurz
2016-05-04 13:08 ` Denis V. Lunev
2016-05-04 13:38 ` Eric Blake
2016-05-04 14:30 ` Juan Quintela
2016-06-16 9:05 ` Markus Armbruster
2016-05-05 6:19 ` Hailiang Zhang
2016-05-05 8:01 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160504124711.GG2302@work-vm \
--to=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).