* [Qemu-devel] Migration ToDo list (a.k.a. Rant)
@ 2016-05-04 11:20 Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Juan Quintela @ 2016-05-04 11:20 UTC (permalink / raw)
To: QEMU Developer
Hi
I am lots of times asked about what is the ToDo list for migration, that
was on my head, and random notes over my desk, so, trying some
organization (Yes, I would put this in the wiki).
- migration thread on reception
would make trivial to do other things while receiving, and would make
postcopy easier also (I was going to put much easier, but postcopy is
never easy).
- migration capabilities and parameters
this is a mess. Not, is worse than that. I don't know who is to
blame here, but something needs to be done:
void qmp_migrate_set_parameters(bool has_compress_level,
int64_t compress_level,
bool has_compress_threads,
int64_t compress_threads,
bool has_decompress_threads,
int64_t decompress_threads,
bool has_x_cpu_throttle_initial,
int64_t x_cpu_throttle_initial,
bool has_x_cpu_throttle_increment,
int64_t x_cpu_throttle_increment,
bool has_multifd_threads,
int64_t multifd_threads,
Error **errp)
Can we move this to an array of structs, please, pretty please?
I think that for this one, the blame is on qmp
but we can continue:
migrate
migrate_cancel
migrate_incoming
migrate_start_postcopy
Not a lot to do until here
migrate_set_capability
Minor nickpit, if it only allow booleans, "migrate_set_capability x-multifd",
should be an equivalent of "migrate_set_capability x-multifd on"
migrate_set_cache_size
migrate_set_downtime
migrate_set_speed
This three should be claimed obsolete, deprecated, whatever, and
make it on top of next one
migrate_set_parameter
Now to read the migration information:
migrate_capabilities
good
migrate_parameters
good
migrate_cache_size
good, but we are missing migrate_speed and migrate_downtime, see
why I want it be inside migrate_set_parameters
migrate
now, this is ..... weird? We put here lots of information, and
this is basically the only way to put information out. To make
things more interesting, the values change meaning during
migration, and the fields it shows change also over time.
- info migrate
This deserves its own item. Lets see a typical output
(qemu)info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
Aha, we have the capabilities, but not the parameters. This is
historical, I know, but don't belong here.
Migration status: completed
ok
total time: 1621 milliseconds
ok
downtime: 208 milliseconds
ok
setup: 9 milliseconds
ok
transferred ram: 609708 kbytes
kilo bytes, not pages
throughput: 27.64 mbps
but we measure bandwidth is megabytes by second
previous one was kylobytes
remaining ram: 0 kbytes
total ram: 2106180 kbytes
this amount don't change. I can understand why it was here.
duplicate: 452528 pages
name is historical. It really means pages filled with the same
characeter. Althought in practical effects it means zero pages
skipped: 0 pages
Even I don't remember what this means.
normal: 151064 pages
This is normal pages that we have sent, i.e. pages that are not zero
pages nor skipped pages. Notice that we have put here pages, not
bytes, not kilobytes, but pages.
normal bytes: 604256 kbytes
Don't worry, we put for you the same number as kilobytes.
dirty sync count: 11
Number of iterations over the full ram. Yes, I know, we are very,
very bad at naming.
And we still have more optional information that appears if we are doing
block migration, xbzrle, compression, rdma, etc, etc.
We need to decide some units also internal. Some things are in bytes,
some are in kilobytes, some are in pages. Some are in host pages, or
guest pages, or who knows :-(
- Block migration (the migration/block.c one). This is the bastard
child of migration. Much less tested, we should make a decision
about letting it live or deprecating it. Things needed from memory:
- functions should return the same values than ram.c
some functions don't have "exact" values, and return 1 when there
are more than one block dirty, etc, etc
- if we continue maintaing it, allowing it to have _some_ shared
devices and some non shared ones, insntead of everything?
- RDMA: Another step child
This is really, really weird. We don't use the normal infrastructure
for RDMA, we use the ram_control_* stuff. We should really move to
use the normal stuff here.
- autoconverge code: This could be used outside of migration (i.e. just
to slow down a guess). We should really do some measurement here to
see how useful it is for migration. If the guest is using lots of
memory dirtying, we end having to throttle the guest 90% or so :-(
- xbzrle. We only have one cache, we should decide how to work with
this for multithread/compression.
- When we do migration, we have spaguetti code to decide if:
* it is a zero page
* it is a duplicated page
* it is a xbzrle page
* it is a compressed page
And as the code is written, it is not trivial to add new "options". I
think that we should "re-think" what combinations are allowed an which
ones make nosense.
- savevm and migration: they use two different paths for not really good
reason. We should really abstract this to a single code path.
We always forget the savevm one when we do changes.
- error handling. Every function should return an error. Every
function should return an error.
- qemu_get_buffer() don't give one error if there is nothing to read,
sniff.
- Multipage support: Welcome to the XXI century. Now almost all
architectures have HugePages. And other have different sized pages
(in PPC is not strange that page size of host and guest differ). We
have work to do here. For starters, sending Huge pages as one chunk
will make TransparentHugePages happier.
- Bitmaps. Related with previous one. We should really be better about
walking them and about synchronising them between qemu/kernel.
- COLO: We need to integrate it.
I will continue the rant at some other point O:-) Just now I need to
left for the bar.
Thanks for your attention, Juan.
PD. I just looked while I wrote this to the channel code from Daniel, a
step on the right direction.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
@ 2016-05-04 12:47 ` Dr. David Alan Gilbert
2016-05-04 16:35 ` Greg Kurz
2016-05-04 13:08 ` Denis V. Lunev
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2016-05-04 12:47 UTC (permalink / raw)
To: Juan Quintela; +Cc: QEMU Developer
* Juan Quintela (quintela@redhat.com) wrote:
>
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).
Let me add:
Getting everything to use VMState; I intend to try and fix virtio to use
VMState as much as possible.
And yes, a wiki entry would be good; then people might notice it and fix things
for us :-)
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
I don't think it makes much difference to postcopy.
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
Yes; zhanghailiang had a patch to try and help that and there was
some discussion at about the same time (June last year?!)
That function is VERY delicate; if you screw up and get those in the
wrong order then everything will appear to be just fine....
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
Well, for the HMP version we can fix any of this IMHO without a problem;
lets add more detail/fix names/etc.
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes,
> some are in kilobytes, some are in pages. Some are in host pages, or
> guest pages, or who knows :-(
I don't - every time I look at some of it I end up going back to the source.
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
My vague understanding was that there were still configurations that were
only useable with block migration; mostly those things that only wanted
a single socket because they wanted to tunnel it; this might change with
Dan's TLS setup.
Having said that, I don't understand all of the block migration alternatives.
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
I'm not sure that's possible - while the RDMA code is huge and horribly
complex, some of that is just down to the kernel APIs and standards it
has to deal with; it might be possibl to glue it into ram.c better
but I wouldn't bet on it.
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
Dan's doing some I think. The other question is how it compares to using
an external cgroup based converge (which I think is what oVirt does).
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
Yeh, and find a way to express to libvirt what combinations are legal.
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
Yeh.
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
Yeh, Andrea has pushed me about this a bit; the only problem I have
here is with postcopy where getting a page request stuck behind a huge
page request would do nasty things to the latency - but your multifd might
fix that.
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
Oh yes, they're a nightmare on things with different page sizes; especially
when people worry that the source and destination might have different host
page sizes.
> - COLO: We need to integrate it.
>
> I will continue the rant at some other point O:-) Just now I need to
> left for the bar.
One that's related to that, is the big-lock around the last stage of migrate;
we really could do with being able to recover from a migrate that hangs during
the final stage due to a block-IO or network issue.
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert
@ 2016-05-04 13:08 ` Denis V. Lunev
2016-05-04 13:38 ` Eric Blake
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2016-05-04 13:08 UTC (permalink / raw)
To: quintela, QEMU Developer
On 05/04/2016 02:20 PM, Juan Quintela wrote:
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).
>
>
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
>
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
>
> but we can continue:
>
> migrate
> migrate_cancel
> migrate_incoming
> migrate_start_postcopy
>
> Not a lot to do until here
>
> migrate_set_capability
> Minor nickpit, if it only allow booleans, "migrate_set_capability x-multifd",
> should be an equivalent of "migrate_set_capability x-multifd on"
>
> migrate_set_cache_size
> migrate_set_downtime
> migrate_set_speed
> This three should be claimed obsolete, deprecated, whatever, and
> make it on top of next one
>
> migrate_set_parameter
>
> Now to read the migration information:
>
> migrate_capabilities
> good
> migrate_parameters
> good
> migrate_cache_size
> good, but we are missing migrate_speed and migrate_downtime, see
> why I want it be inside migrate_set_parameters
>
> migrate
> now, this is ..... weird? We put here lots of information, and
> this is basically the only way to put information out. To make
> things more interesting, the values change meaning during
> migration, and the fields it shows change also over time.
>
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
>
> Migration status: completed
> ok
> total time: 1621 milliseconds
> ok
> downtime: 208 milliseconds
> ok
> setup: 9 milliseconds
> ok
>
> transferred ram: 609708 kbytes
> kilo bytes, not pages
>
> throughput: 27.64 mbps
> but we measure bandwidth is megabytes by second
> previous one was kylobytes
>
> remaining ram: 0 kbytes
>
> total ram: 2106180 kbytes
> this amount don't change. I can understand why it was here.
>
> duplicate: 452528 pages
> name is historical. It really means pages filled with the same
> characeter. Althought in practical effects it means zero pages
>
> skipped: 0 pages
> Even I don't remember what this means.
> normal: 151064 pages
> This is normal pages that we have sent, i.e. pages that are not zero
> pages nor skipped pages. Notice that we have put here pages, not
> bytes, not kilobytes, but pages.
>
> normal bytes: 604256 kbytes
> Don't worry, we put for you the same number as kilobytes.
>
> dirty sync count: 11
> Number of iterations over the full ram. Yes, I know, we are very,
> very bad at naming.
>
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes,
> some are in kilobytes, some are in pages. Some are in host pages, or
> guest pages, or who knows :-(
>
>
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
>
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
>
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
>
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
>
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
>
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
>
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
>
> - COLO: We need to integrate it.
>
> I will continue the rant at some other point O:-) Just now I need to
> left for the bar.
>
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.
>
>
Let me add too.
- snapshot management (savevm/loadvm) via QMP interface
Den
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert
2016-05-04 13:08 ` Denis V. Lunev
@ 2016-05-04 13:38 ` Eric Blake
2016-05-04 14:30 ` Juan Quintela
2016-05-05 6:19 ` Hailiang Zhang
2016-05-05 8:01 ` Li, Liang Z
4 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2016-05-04 13:38 UTC (permalink / raw)
To: quintela, QEMU Developer
[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]
On 05/04/2016 05:20 AM, Juan Quintela wrote:
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
I've got a QAPI patch in the pipeline that makes this MUCH simpler, by
boxing everything through a single MigrationParameter* pointer rather
than an exploded list of parameters.
> migrate_set_capability
> Minor nickpit, if it only allow booleans, "migrate_set_capability x-multifd",
> should be an equivalent of "migrate_set_capability x-multifd on"
That's HMP - you can make HMP do whatever you want without breaking
back-compat.
>
> migrate_set_cache_size
> migrate_set_downtime
> migrate_set_speed
> This three should be claimed obsolete, deprecated, whatever, and
> make it on top of next one
Again, HMP can make this change easy, even if it has to call out to
different QMP under the hood.
>
> migrate_set_parameter
>
> Now to read the migration information:
>
> migrate_capabilities
> good
> migrate_parameters
> good
Why we need two commands is beyond me - one command that lists
everything (capabilities AND parameters) should be sufficient.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 13:38 ` Eric Blake
@ 2016-05-04 14:30 ` Juan Quintela
2016-06-16 9:05 ` Markus Armbruster
0 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2016-05-04 14:30 UTC (permalink / raw)
To: Eric Blake; +Cc: QEMU Developer
Eric Blake <eblake@redhat.com> wrote:
> On 05/04/2016 05:20 AM, Juan Quintela wrote:
>> - migration capabilities and parameters
>> this is a mess. Not, is worse than that. I don't know who is to
>> blame here, but something needs to be done:
>>
>> void qmp_migrate_set_parameters(bool has_compress_level,
>> int64_t compress_level,
>> bool has_compress_threads,
>> int64_t compress_threads,
>> bool has_decompress_threads,
>> int64_t decompress_threads,
>> bool has_x_cpu_throttle_initial,
>> int64_t x_cpu_throttle_initial,
>> bool has_x_cpu_throttle_increment,
>> int64_t x_cpu_throttle_increment,
>> bool has_multifd_threads,
>> int64_t multifd_threads,
>> Error **errp)
>
> I've got a QAPI patch in the pipeline that makes this MUCH simpler, by
> boxing everything through a single MigrationParameter* pointer rather
> than an exploded list of parameters.
NICE!!!
>> migrate_set_capability
>> Minor nickpit, if it only allow booleans,
>> "migrate_set_capability x-multifd",
>> should be an equivalent of "migrate_set_capability x-multifd on"
>
> That's HMP - you can make HMP do whatever you want without breaking
> back-compat.
I would like to structure it as: Use the other way, this is deprecated
and only here for backwards compatibility.
>> migrate_set_cache_size
>> migrate_set_downtime
>> migrate_set_speed
>> This three should be claimed obsolete, deprecated, whatever, and
>> make it on top of next one
>
> Again, HMP can make this change easy, even if it has to call out to
> different QMP under the hood.
Yeap, but I preffer to have both consistent.
>>
>> Now to read the migration information:
>>
>> migrate_capabilities
>> good
>> migrate_parameters
>> good
>
> Why we need two commands is beyond me - one command that lists
> everything (capabilities AND parameters) should be sufficient.
I don't care how we do it, but we need an easy way to be sure that when
we add a new parameter/capability we also list it. Just now there are
things that we can't get back.
Later, Juan.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 12:47 ` Dr. David Alan Gilbert
@ 2016-05-04 16:35 ` Greg Kurz
0 siblings, 0 replies; 9+ messages in thread
From: Greg Kurz @ 2016-05-04 16:35 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: Juan Quintela, QEMU Developer
On Wed, 4 May 2016 13:47:12 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
> >
> > Hi
> >
> > I am lots of times asked about what is the ToDo list for migration, that
> > was on my head, and random notes over my desk, so, trying some
> > organization (Yes, I would put this in the wiki).
>
> Let me add:
> Getting everything to use VMState; I intend to try and fix virtio to use
> VMState as much as possible.
>
I had tried to revive Juan's 41-patch series from 2009 some time ago but it
was really tedious and the virtio 1.0 work started and I gave up...
> And yes, a wiki entry would be good; then people might notice it and fix things
> for us :-)
>
But I'm still willing to help if I can. :)
> > - migration thread on reception
> > would make trivial to do other things while receiving, and would make
> > postcopy easier also (I was going to put much easier, but postcopy is
> > never easy).
>
> I don't think it makes much difference to postcopy.
>
> > - migration capabilities and parameters
> > this is a mess. Not, is worse than that. I don't know who is to
> > blame here, but something needs to be done:
> >
> > void qmp_migrate_set_parameters(bool has_compress_level,
> > int64_t compress_level,
> > bool has_compress_threads,
> > int64_t compress_threads,
> > bool has_decompress_threads,
> > int64_t decompress_threads,
> > bool has_x_cpu_throttle_initial,
> > int64_t x_cpu_throttle_initial,
> > bool has_x_cpu_throttle_increment,
> > int64_t x_cpu_throttle_increment,
> > bool has_multifd_threads,
> > int64_t multifd_threads,
> > Error **errp)
> >
> >
> >
> > Can we move this to an array of structs, please, pretty please?
> > I think that for this one, the blame is on qmp
>
> Yes; zhanghailiang had a patch to try and help that and there was
> some discussion at about the same time (June last year?!)
> That function is VERY delicate; if you screw up and get those in the
> wrong order then everything will appear to be just fine....
>
> > - info migrate
> > This deserves its own item. Lets see a typical output
> >
> > (qemu)info migrate
> >
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
> >
> > Aha, we have the capabilities, but not the parameters. This is
> > historical, I know, but don't belong here.
>
> Well, for the HMP version we can fix any of this IMHO without a problem;
> lets add more detail/fix names/etc.
>
> > And we still have more optional information that appears if we are doing
> > block migration, xbzrle, compression, rdma, etc, etc.
> >
> > We need to decide some units also internal. Some things are in bytes,
> > some are in kilobytes, some are in pages. Some are in host pages, or
> > guest pages, or who knows :-(
>
> I don't - every time I look at some of it I end up going back to the source.
>
> > - Block migration (the migration/block.c one). This is the bastard
> > child of migration. Much less tested, we should make a decision
> > about letting it live or deprecating it. Things needed from memory:
> > - functions should return the same values than ram.c
> > some functions don't have "exact" values, and return 1 when there
> > are more than one block dirty, etc, etc
> > - if we continue maintaing it, allowing it to have _some_ shared
> > devices and some non shared ones, insntead of everything?
>
> My vague understanding was that there were still configurations that were
> only useable with block migration; mostly those things that only wanted
> a single socket because they wanted to tunnel it; this might change with
> Dan's TLS setup.
> Having said that, I don't understand all of the block migration alternatives.
>
> > - RDMA: Another step child
> >
> > This is really, really weird. We don't use the normal infrastructure
> > for RDMA, we use the ram_control_* stuff. We should really move to
> > use the normal stuff here.
>
> I'm not sure that's possible - while the RDMA code is huge and horribly
> complex, some of that is just down to the kernel APIs and standards it
> has to deal with; it might be possibl to glue it into ram.c better
> but I wouldn't bet on it.
>
> > - autoconverge code: This could be used outside of migration (i.e. just
> > to slow down a guess). We should really do some measurement here to
> > see how useful it is for migration. If the guest is using lots of
> > memory dirtying, we end having to throttle the guest 90% or so :-(
>
> Dan's doing some I think. The other question is how it compares to using
> an external cgroup based converge (which I think is what oVirt does).
>
> > - xbzrle. We only have one cache, we should decide how to work with
> > this for multithread/compression.
> >
> > - When we do migration, we have spaguetti code to decide if:
> > * it is a zero page
> > * it is a duplicated page
> > * it is a xbzrle page
> > * it is a compressed page
> > And as the code is written, it is not trivial to add new "options". I
> > think that we should "re-think" what combinations are allowed an which
> > ones make nosense.
>
> Yeh, and find a way to express to libvirt what combinations are legal.
>
> > - savevm and migration: they use two different paths for not really good
> > reason. We should really abstract this to a single code path.
> > We always forget the savevm one when we do changes.
> >
> > - error handling. Every function should return an error. Every
> > function should return an error.
>
> Yeh.
>
> > - qemu_get_buffer() don't give one error if there is nothing to read,
> > sniff.
> >
> > - Multipage support: Welcome to the XXI century. Now almost all
> > architectures have HugePages. And other have different sized pages
> > (in PPC is not strange that page size of host and guest differ). We
> > have work to do here. For starters, sending Huge pages as one chunk
> > will make TransparentHugePages happier.
>
> Yeh, Andrea has pushed me about this a bit; the only problem I have
> here is with postcopy where getting a page request stuck behind a huge
> page request would do nasty things to the latency - but your multifd might
> fix that.
>
> > - Bitmaps. Related with previous one. We should really be better about
> > walking them and about synchronising them between qemu/kernel.
>
> Oh yes, they're a nightmare on things with different page sizes; especially
> when people worry that the source and destination might have different host
> page sizes.
>
> > - COLO: We need to integrate it.
> >
> > I will continue the rant at some other point O:-) Just now I need to
> > left for the bar.
>
> One that's related to that, is the big-lock around the last stage of migrate;
> we really could do with being able to recover from a migrate that hangs during
> the final stage due to a block-IO or network issue.
>
> > Thanks for your attention, Juan.
> >
> > PD. I just looked while I wrote this to the channel code from Daniel, a
> > step on the right direction.
>
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
` (2 preceding siblings ...)
2016-05-04 13:38 ` Eric Blake
@ 2016-05-05 6:19 ` Hailiang Zhang
2016-05-05 8:01 ` Li, Liang Z
4 siblings, 0 replies; 9+ messages in thread
From: Hailiang Zhang @ 2016-05-05 6:19 UTC (permalink / raw)
To: quintela, QEMU Developer; +Cc: peter.huangpeng
Hi Juan,
On 2016/5/4 19:20, Juan Quintela wrote:
>
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that
> was on my head, and random notes over my desk, so, trying some
> organization (Yes, I would put this in the wiki).
>
>
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
>
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
>
> but we can continue:
>
> migrate
> migrate_cancel
> migrate_incoming
> migrate_start_postcopy
>
> Not a lot to do until here
>
> migrate_set_capability
> Minor nickpit, if it only allow booleans, "migrate_set_capability x-multifd",
> should be an equivalent of "migrate_set_capability x-multifd on"
>
> migrate_set_cache_size
> migrate_set_downtime
> migrate_set_speed
> This three should be claimed obsolete, deprecated, whatever, and
> make it on top of next one
>
> migrate_set_parameter
>
> Now to read the migration information:
>
> migrate_capabilities
> good
> migrate_parameters
> good
> migrate_cache_size
> good, but we are missing migrate_speed and migrate_downtime, see
> why I want it be inside migrate_set_parameters
>
> migrate
> now, this is ..... weird? We put here lots of information, and
> this is basically the only way to put information out. To make
> things more interesting, the values change meaning during
> migration, and the fields it shows change also over time.
>
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
>
> Migration status: completed
> ok
> total time: 1621 milliseconds
> ok
> downtime: 208 milliseconds
> ok
> setup: 9 milliseconds
> ok
>
> transferred ram: 609708 kbytes
> kilo bytes, not pages
>
> throughput: 27.64 mbps
> but we measure bandwidth is megabytes by second
> previous one was kylobytes
>
> remaining ram: 0 kbytes
>
> total ram: 2106180 kbytes
> this amount don't change. I can understand why it was here.
>
> duplicate: 452528 pages
> name is historical. It really means pages filled with the same
> characeter. Althought in practical effects it means zero pages
>
> skipped: 0 pages
> Even I don't remember what this means.
> normal: 151064 pages
> This is normal pages that we have sent, i.e. pages that are not zero
> pages nor skipped pages. Notice that we have put here pages, not
> bytes, not kilobytes, but pages.
>
> normal bytes: 604256 kbytes
> Don't worry, we put for you the same number as kilobytes.
>
> dirty sync count: 11
> Number of iterations over the full ram. Yes, I know, we are very,
> very bad at naming.
>
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes,
> some are in kilobytes, some are in pages. Some are in host pages, or
> guest pages, or who knows :-(
>
>
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
>
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
>
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
>
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
>
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
>
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
>
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
>
> - COLO: We need to integrate it.
>
As you mentioned COLO here, would you please help review this series ?
Dave has reviewed most of them, and we hope you could give some feedbacks on
it. Any comments are welcome. :)
Thanks,
Hailiang
> I will continue the rant at some other point O:-) Just now I need to
> left for the bar.
>
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a
> step on the right direction.
>
>
>
> .
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
` (3 preceding siblings ...)
2016-05-05 6:19 ` Hailiang Zhang
@ 2016-05-05 8:01 ` Li, Liang Z
4 siblings, 0 replies; 9+ messages in thread
From: Li, Liang Z @ 2016-05-05 8:01 UTC (permalink / raw)
To: quintela@redhat.com, QEMU Developer
> From: Qemu-devel [mailto:qemu-devel-
> bounces+liang.z.li=intel.com@nongnu.org] On Behalf Of Juan Quintela
> Sent: Wednesday, May 04, 2016 7:20 PM
> To: QEMU Developer
> Subject: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
>
>
> Hi
>
> I am lots of times asked about what is the ToDo list for migration, that was on
> my head, and random notes over my desk, so, trying some organization (Yes,
> I would put this in the wiki).
>
>
Is it proper to add: 'speed up live migration by skipping free pages' ?
Liang
> - migration thread on reception
> would make trivial to do other things while receiving, and would make
> postcopy easier also (I was going to put much easier, but postcopy is
> never easy).
>
> - migration capabilities and parameters
> this is a mess. Not, is worse than that. I don't know who is to
> blame here, but something needs to be done:
>
> void qmp_migrate_set_parameters(bool has_compress_level,
> int64_t compress_level,
> bool has_compress_threads,
> int64_t compress_threads,
> bool has_decompress_threads,
> int64_t decompress_threads,
> bool has_x_cpu_throttle_initial,
> int64_t x_cpu_throttle_initial,
> bool has_x_cpu_throttle_increment,
> int64_t x_cpu_throttle_increment,
> bool has_multifd_threads,
> int64_t multifd_threads,
> Error **errp)
>
>
>
> Can we move this to an array of structs, please, pretty please?
> I think that for this one, the blame is on qmp
>
> but we can continue:
>
> migrate
> migrate_cancel
> migrate_incoming
> migrate_start_postcopy
>
> Not a lot to do until here
>
> migrate_set_capability
> Minor nickpit, if it only allow booleans, "migrate_set_capability x-multifd",
> should be an equivalent of "migrate_set_capability x-multifd on"
>
> migrate_set_cache_size
> migrate_set_downtime
> migrate_set_speed
> This three should be claimed obsolete, deprecated, whatever, and
> make it on top of next one
>
> migrate_set_parameter
>
> Now to read the migration information:
>
> migrate_capabilities
> good
> migrate_parameters
> good
> migrate_cache_size
> good, but we are missing migrate_speed and migrate_downtime, see
> why I want it be inside migrate_set_parameters
>
> migrate
> now, this is ..... weird? We put here lots of information, and
> this is basically the only way to put information out. To make
> things more interesting, the values change meaning during
> migration, and the fields it shows change also over time.
>
> - info migrate
> This deserves its own item. Lets see a typical output
>
> (qemu)info migrate
>
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off
> compress: off events: off postcopy-ram: off x-multifd: on
>
> Aha, we have the capabilities, but not the parameters. This is
> historical, I know, but don't belong here.
>
> Migration status: completed
> ok
> total time: 1621 milliseconds
> ok
> downtime: 208 milliseconds
> ok
> setup: 9 milliseconds
> ok
>
> transferred ram: 609708 kbytes
> kilo bytes, not pages
>
> throughput: 27.64 mbps
> but we measure bandwidth is megabytes by second
> previous one was kylobytes
>
> remaining ram: 0 kbytes
>
> total ram: 2106180 kbytes
> this amount don't change. I can understand why it was here.
>
> duplicate: 452528 pages
> name is historical. It really means pages filled with the same
> characeter. Althought in practical effects it means zero pages
>
> skipped: 0 pages
> Even I don't remember what this means.
> normal: 151064 pages
> This is normal pages that we have sent, i.e. pages that are not zero
> pages nor skipped pages. Notice that we have put here pages, not
> bytes, not kilobytes, but pages.
>
> normal bytes: 604256 kbytes
> Don't worry, we put for you the same number as kilobytes.
>
> dirty sync count: 11
> Number of iterations over the full ram. Yes, I know, we are very,
> very bad at naming.
>
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
>
> We need to decide some units also internal. Some things are in bytes, some
> are in kilobytes, some are in pages. Some are in host pages, or guest pages,
> or who knows :-(
>
>
> - Block migration (the migration/block.c one). This is the bastard
> child of migration. Much less tested, we should make a decision
> about letting it live or deprecating it. Things needed from memory:
> - functions should return the same values than ram.c
> some functions don't have "exact" values, and return 1 when there
> are more than one block dirty, etc, etc
> - if we continue maintaing it, allowing it to have _some_ shared
> devices and some non shared ones, insntead of everything?
>
> - RDMA: Another step child
>
> This is really, really weird. We don't use the normal infrastructure
> for RDMA, we use the ram_control_* stuff. We should really move to
> use the normal stuff here.
>
> - autoconverge code: This could be used outside of migration (i.e. just
> to slow down a guess). We should really do some measurement here to
> see how useful it is for migration. If the guest is using lots of
> memory dirtying, we end having to throttle the guest 90% or so :-(
>
> - xbzrle. We only have one cache, we should decide how to work with
> this for multithread/compression.
>
> - When we do migration, we have spaguetti code to decide if:
> * it is a zero page
> * it is a duplicated page
> * it is a xbzrle page
> * it is a compressed page
> And as the code is written, it is not trivial to add new "options". I
> think that we should "re-think" what combinations are allowed an which
> ones make nosense.
>
> - savevm and migration: they use two different paths for not really good
> reason. We should really abstract this to a single code path.
> We always forget the savevm one when we do changes.
>
> - error handling. Every function should return an error. Every
> function should return an error.
>
> - qemu_get_buffer() don't give one error if there is nothing to read,
> sniff.
>
> - Multipage support: Welcome to the XXI century. Now almost all
> architectures have HugePages. And other have different sized pages
> (in PPC is not strange that page size of host and guest differ). We
> have work to do here. For starters, sending Huge pages as one chunk
> will make TransparentHugePages happier.
>
> - Bitmaps. Related with previous one. We should really be better about
> walking them and about synchronising them between qemu/kernel.
>
> - COLO: We need to integrate it.
>
> I will continue the rant at some other point O:-) Just now I need to left for
> the bar.
>
> Thanks for your attention, Juan.
>
> PD. I just looked while I wrote this to the channel code from Daniel, a step
> on the right direction.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
2016-05-04 14:30 ` Juan Quintela
@ 2016-06-16 9:05 ` Markus Armbruster
0 siblings, 0 replies; 9+ messages in thread
From: Markus Armbruster @ 2016-06-16 9:05 UTC (permalink / raw)
To: Juan Quintela; +Cc: Eric Blake, QEMU Developer
Juan Quintela <quintela@redhat.com> writes:
> Eric Blake <eblake@redhat.com> wrote:
>> On 05/04/2016 05:20 AM, Juan Quintela wrote:
>>> - migration capabilities and parameters
>>> this is a mess. Not, is worse than that. I don't know who is to
>>> blame here, but something needs to be done:
>>>
>>> void qmp_migrate_set_parameters(bool has_compress_level,
>>> int64_t compress_level,
>>> bool has_compress_threads,
>>> int64_t compress_threads,
>>> bool has_decompress_threads,
>>> int64_t decompress_threads,
>>> bool has_x_cpu_throttle_initial,
>>> int64_t x_cpu_throttle_initial,
>>> bool has_x_cpu_throttle_increment,
>>> int64_t x_cpu_throttle_increment,
>>> bool has_multifd_threads,
>>> int64_t multifd_threads,
>>> Error **errp)
>>
>> I've got a QAPI patch in the pipeline that makes this MUCH simpler, by
>> boxing everything through a single MigrationParameter* pointer rather
>> than an exploded list of parameters.
>
> NICE!!!
It has a pretty good chance to land in 2.7.
>>> migrate_set_capability
>>> Minor nickpit, if it only allow booleans,
>>> "migrate_set_capability x-multifd",
>>> should be an equivalent of "migrate_set_capability x-multifd on"
>>
>> That's HMP - you can make HMP do whatever you want without breaking
>> back-compat.
>
> I would like to structure it as: Use the other way, this is deprecated
> and only here for backwards compatibility.
Deprecate away; HMP is not a stable interface.
>>> migrate_set_cache_size
>>> migrate_set_downtime
>>> migrate_set_speed
>>> This three should be claimed obsolete, deprecated, whatever, and
>>> make it on top of next one
>>
>> Again, HMP can make this change easy, even if it has to call out to
>> different QMP under the hood.
>
> Yeap, but I preffer to have both consistent.
QMP really, really wants you to use the plainest possible unit. For
sizes, that's bytes. For times, it should be seconds, but often isn't
(because people are deadly afraid of fractions, I guess).
HMP can and should use whatever unit(s) are needed for legible input and
output.
>>>
>>> Now to read the migration information:
>>>
>>> migrate_capabilities
>>> good
>>> migrate_parameters
>>> good
>>
>> Why we need two commands is beyond me - one command that lists
>> everything (capabilities AND parameters) should be sufficient.
>
> I don't care how we do it, but we need an easy way to be sure that when
> we add a new parameter/capability we also list it. Just now there are
> things that we can't get back.
Needs fixing.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-16 9:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-04 11:20 [Qemu-devel] Migration ToDo list (a.k.a. Rant) Juan Quintela
2016-05-04 12:47 ` Dr. David Alan Gilbert
2016-05-04 16:35 ` Greg Kurz
2016-05-04 13:08 ` Denis V. Lunev
2016-05-04 13:38 ` Eric Blake
2016-05-04 14:30 ` Juan Quintela
2016-06-16 9:05 ` Markus Armbruster
2016-05-05 6:19 ` Hailiang Zhang
2016-05-05 8:01 ` Li, Liang Z
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).