* [Qemu-devel] Migration ToDo list
@ 2012-11-13 16:18 Juan Quintela
2012-11-13 16:28 ` Paolo Bonzini
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Juan Quintela @ 2012-11-13 16:18 UTC (permalink / raw)
To: qemu-devel qemu-devel, Orit Wasserman, chegu_vinod, benoit.hudzia,
Isaku Yamahata, Michael Roth
Hi
If you have anything else to put, please add.
Migration Thread
* Plan is integrate it as one of first thing in December (me)
* Remove copies with buffered file (me)
Bitmap Optimization
* Finish moving to individual bitmaps for migration/vga/code
* Make sure we don't copy things around
* Shared memory bitmap with kvm?
* Move to 2MB pages bitmap and then fine grain?
QIDL
* Review the patches (me)
PostCopy
* Review patches?
* See what we can already integrate?
I remember for last year that we could integrate the 1st third or so
RDMA
* Send RDMA/tcp/.... library they already have (Benoit)
* This is required for postcopy
* This can be used for precopy
General
* Change protocol to:
a) being always 16byte aligned (paolo said that is faster)
b) do scatter/gather of the pages?
Fault Tolerance
* That is built on top of migration code, but I have nothing to add.
Any more ideas?
Later, Juan.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 16:18 [Qemu-devel] Migration ToDo list Juan Quintela
@ 2012-11-13 16:28 ` Paolo Bonzini
2012-11-13 17:09 ` Orit Wasserman
2012-11-13 16:40 ` Orit Wasserman
2012-11-13 16:48 ` Chegu Vinod
2 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-11-13 16:28 UTC (permalink / raw)
To: quintela
Cc: Isaku Yamahata, Michael Roth, qemu-devel qemu-devel,
Orit Wasserman, benoit.hudzia, chegu_vinod
Il 13/11/2012 17:18, Juan Quintela ha scritto:
> Migration Thread
> * Plan is integrate it as one of first thing in December (me)
Please make sure to take a look at the latest reviews I sent.
> * Remove copies with buffered file (me)
I also have some prototype of this.
> RDMA
> * Send RDMA/tcp/.... library they already have (Benoit)
> * This is required for postcopy
> * This can be used for precopy
* Investigate RDS (Reliable Datagram Socket, which work on top of both
TCP and InfiniBand/RDMA.
> General
> * Change protocol to:
> a) being always 16byte aligned (paolo said that is faster)
Well, it's faster with the buffers. Hopefully they go away and we do
not have the problem.
> b) do scatter/gather of the pages?
c) Remove compression of non-zero repetitive pages.
All of the above, I'd say.
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 16:18 [Qemu-devel] Migration ToDo list Juan Quintela
2012-11-13 16:28 ` Paolo Bonzini
@ 2012-11-13 16:40 ` Orit Wasserman
2012-11-13 16:48 ` Chegu Vinod
2 siblings, 0 replies; 10+ messages in thread
From: Orit Wasserman @ 2012-11-13 16:40 UTC (permalink / raw)
To: quintela
Cc: Michael Roth, benoit.hudzia, chegu_vinod, qemu-devel qemu-devel,
Isaku Yamahata
On 11/13/2012 06:18 PM, Juan Quintela wrote:
>
> Hi
>
> If you have anything else to put, please add.
>
> Migration Thread
> * Plan is integrate it as one of first thing in December (me)
> * Remove copies with buffered file (me)
>
> Bitmap Optimization
> * Finish moving to individual bitmaps for migration/vga/code
> * Make sure we don't copy things around
> * Shared memory bitmap with kvm?
> * Move to 2MB pages bitmap and then fine grain?
>
> QIDL
> * Review the patches (me)
>
> PostCopy
> * Review patches?
> * See what we can already integrate?
> I remember for last year that we could integrate the 1st third or so
>
> RDMA
> * Send RDMA/tcp/.... library they already have (Benoit)
Use RDS (Reliable Datagram Sockets), which allows us to use the same API
when using tcp or RDMA (me)
> * This is required for postcopy
> * This can be used for precopy
>
> General
> * Change protocol to:
> a) being always 16byte aligned (paolo said that is faster)
> b) do scatter/gather of the pages?
>
> Fault Tolerance
> * That is built on top of migration code, but I have nothing to add.
>
> Any more ideas?
copyless networking - maybe use virtio zero copy mechanism? (me)
EPT/NPT dirty bits (will make the sync more expensive but will improve guest performance)
Regards,
Orit
>
> Later, Juan.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 16:18 [Qemu-devel] Migration ToDo list Juan Quintela
2012-11-13 16:28 ` Paolo Bonzini
2012-11-13 16:40 ` Orit Wasserman
@ 2012-11-13 16:48 ` Chegu Vinod
2012-11-13 16:57 ` Orit Wasserman
2 siblings, 1 reply; 10+ messages in thread
From: Chegu Vinod @ 2012-11-13 16:48 UTC (permalink / raw)
To: quintela
Cc: Michael Roth, Orit Wasserman, benoit.hudzia,
qemu-devel qemu-devel, Isaku Yamahata
On 11/13/2012 8:18 AM, Juan Quintela wrote:
> Hi
>
> If you have anything else to put, please add.
>
> Migration Thread
> * Plan is integrate it as one of first thing in December (me)
> * Remove copies with buffered file (me)
>
> Bitmap Optimization
> * Finish moving to individual bitmaps for migration/vga/code
> * Make sure we don't copy things around
> * Shared memory bitmap with kvm?
> * Move to 2MB pages bitmap and then fine grain?
If its not already implied in the above ... the long freezes observed
at the start of the migration needs to be addressed (its most likely
related to BQL ?).
>
> QIDL
> * Review the patches (me)
>
> PostCopy
> * Review patches?
> * See what we can already integrate?
> I remember for last year that we could integrate the 1st third or so
>
> RDMA
> * Send RDMA/tcp/.... library they already have (Benoit)
> * This is required for postcopy
> * This can be used for precopy
Not sure if what Benoit has can be directly used for pre-copy also.
As Paolo said... we need to look at RDS API's for pre-copy. ('have just
started looking at the same). Would like to know if SDP can be used...
> General
> * Change protocol to:
> a) being always 16byte aligned (paolo said that is faster)
> b) do scatter/gather of the pages?
Control of where the migration thread(s) run...
--
BTW, has anyone tried doing multiple guest migration from a host ? Are
there limitations (enforced via higher level management tools) as to
how many guests can be migrated at once (in an attempt to quickly
evacuate a flaky host) ?
Vinod
> Fault Tolerance
> * That is built on top of migration code, but I have nothing to add.
>
> Any more ideas?
>
> Later, Juan.
> .
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 16:48 ` Chegu Vinod
@ 2012-11-13 16:57 ` Orit Wasserman
0 siblings, 0 replies; 10+ messages in thread
From: Orit Wasserman @ 2012-11-13 16:57 UTC (permalink / raw)
To: Chegu Vinod
Cc: qemu-devel qemu-devel, Isaku Yamahata, benoit.hudzia,
Michael Roth, quintela
On 11/13/2012 06:48 PM, Chegu Vinod wrote:
> On 11/13/2012 8:18 AM, Juan Quintela wrote:
>> Hi
>>
>> If you have anything else to put, please add.
>>
>> Migration Thread
>> * Plan is integrate it as one of first thing in December (me)
>> * Remove copies with buffered file (me)
>>
>> Bitmap Optimization
>> * Finish moving to individual bitmaps for migration/vga/code
>> * Make sure we don't copy things around
>> * Shared memory bitmap with kvm?
>> * Move to 2MB pages bitmap and then fine grain?
>
> If its not already implied in the above ... the long freezes observed at the start of the migration needs to be addressed (its most likely related to BQL ?).
>
>>
>> QIDL
>> * Review the patches (me)
>>
>> PostCopy
>> * Review patches?
>> * See what we can already integrate?
>> I remember for last year that we could integrate the 1st third or so
>>
>> RDMA
>> * Send RDMA/tcp/.... library they already have (Benoit)
>> * This is required for postcopy
>> * This can be used for precopy
>
> Not sure if what Benoit has can be directly used for pre-copy also.
>
> As Paolo said... we need to look at RDS API's for pre-copy. ('have just started looking at the same). Would like to know if SDP can be used...
>
>> General
>> * Change protocol to:
>> a) being always 16byte aligned (paolo said that is faster)
>> b) do scatter/gather of the pages?
>
> Control of where the migration thread(s) run...
>
> --
>
> BTW, has anyone tried doing multiple guest migration from a host ? Are there limitations (enforced via higher level management tools) as to how many guests can be migrated at once (in an attempt to quickly evacuate a flaky host) ?
libvirt has support to concurrent migrations but I didn't try it.
>
> Vinod
>
>> Fault Tolerance
>> * That is built on top of migration code, but I have nothing to add.
>>
>> Any more ideas?
>>
>> Later, Juan.
>> .
>>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 16:28 ` Paolo Bonzini
@ 2012-11-13 17:09 ` Orit Wasserman
2012-11-13 17:16 ` Paolo Bonzini
0 siblings, 1 reply; 10+ messages in thread
From: Orit Wasserman @ 2012-11-13 17:09 UTC (permalink / raw)
To: Paolo Bonzini
Cc: quintela, Michael Roth, qemu-devel qemu-devel, Isaku Yamahata,
benoit.hudzia, chegu_vinod
On 11/13/2012 06:28 PM, Paolo Bonzini wrote:
> Il 13/11/2012 17:18, Juan Quintela ha scritto:
>> Migration Thread
>> * Plan is integrate it as one of first thing in December (me)
>
> Please make sure to take a look at the latest reviews I sent.
>
>> * Remove copies with buffered file (me)
>
> I also have some prototype of this.
>
>> RDMA
>> * Send RDMA/tcp/.... library they already have (Benoit)
>> * This is required for postcopy
>> * This can be used for precopy
>
> * Investigate RDS (Reliable Datagram Socket, which work on top of both
> TCP and InfiniBand/RDMA.
>
>> General
>> * Change protocol to:
>> a) being always 16byte aligned (paolo said that is faster)
>
> Well, it's faster with the buffers. Hopefully they go away and we do
> not have the problem.
>
>> b) do scatter/gather of the pages?
I would prefer to postpone changing the protocol and start with using iov (writev)
for sending the pages (still sending header and than the page). Later we can
move to scatter/gather I'm not sure of how large the performance gain will be.
>
> c) Remove compression of non-zero repetitive pages.
+1
we can look of identify the zero pages without calling is_dup_page which looks
expensive.
Orit
>
> All of the above, I'd say.
>
> Paolo
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 17:09 ` Orit Wasserman
@ 2012-11-13 17:16 ` Paolo Bonzini
2012-11-14 2:14 ` Isaku Yamahata
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-11-13 17:16 UTC (permalink / raw)
To: Orit Wasserman
Cc: quintela, qemu-devel qemu-devel, Michael Roth, Isaku Yamahata,
benoit.hudzia, chegu_vinod
Il 13/11/2012 18:09, Orit Wasserman ha scritto:
>> >
>>> >> b) do scatter/gather of the pages?
> I would prefer to postpone changing the protocol and start with using iov (writev)
> for sending the pages (still sending header and than the page). Later we can
> move to scatter/gather I'm not sure of how large the performance gain will be.
>> >
>> > c) Remove compression of non-zero repetitive pages.
> +1
> we can look of identify the zero pages without calling is_dup_page which looks
> expensive.
Identifying ballooned zero pages is useful, because those cause the
clear_page calls in the kernel even in a guest that has been running for
a while.
But a generic solution doesn't really matter, because is_dup_page and
clear_page shouldn't really be in the profile in practice, except in
microbenchmarks.
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-13 17:16 ` Paolo Bonzini
@ 2012-11-14 2:14 ` Isaku Yamahata
2012-11-14 2:20 ` Paolo Bonzini
0 siblings, 1 reply; 10+ messages in thread
From: Isaku Yamahata @ 2012-11-14 2:14 UTC (permalink / raw)
To: Paolo Bonzini
Cc: quintela, qemu-devel qemu-devel, Michael Roth, Orit Wasserman,
benoit.hudzia, chegu_vinod
On Tue, Nov 13, 2012 at 06:16:30PM +0100, Paolo Bonzini wrote:
> Il 13/11/2012 18:09, Orit Wasserman ha scritto:
> >> >
> >>> >> b) do scatter/gather of the pages?
> > I would prefer to postpone changing the protocol and start with using iov (writev)
> > for sending the pages (still sending header and than the page). Later we can
> > move to scatter/gather I'm not sure of how large the performance gain will be.
> >> >
> >> > c) Remove compression of non-zero repetitive pages.
> > +1
> > we can look of identify the zero pages without calling is_dup_page which looks
> > expensive.
>
> Identifying ballooned zero pages is useful, because those cause the
> clear_page calls in the kernel even in a guest that has been running for
> a while.
>
> But a generic solution doesn't really matter, because is_dup_page and
> clear_page shouldn't really be in the profile in practice, except in
> microbenchmarks.
I guess mincore(2) can be used as easy way to detect non-mapped page.
This is just implementation detail anyway.
--
yamahata
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-14 2:14 ` Isaku Yamahata
@ 2012-11-14 2:20 ` Paolo Bonzini
2012-11-14 2:31 ` Isaku Yamahata
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-11-14 2:20 UTC (permalink / raw)
To: Isaku Yamahata
Cc: quintela, qemu-devel qemu-devel, Michael Roth, Orit Wasserman,
benoit.hudzia, chegu_vinod
Il 14/11/2012 03:14, Isaku Yamahata ha scritto:
>> > Identifying ballooned zero pages is useful, because those cause the
>> > clear_page calls in the kernel even in a guest that has been running for
>> > a while.
>> >
>> > But a generic solution doesn't really matter, because is_dup_page and
>> > clear_page shouldn't really be in the profile in practice, except in
>> > microbenchmarks.
> I guess mincore(2) can be used as easy way to detect non-mapped page.
> This is just implementation detail anyway.
Doesn't work if the page is swapped, doesn't it?
But I wonder if the clear_page occurrences are because of the problem
described recently on LWN (http://lwn.net/Articles/517465/), and so
really more of a kernel bug than anything else.
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] Migration ToDo list
2012-11-14 2:20 ` Paolo Bonzini
@ 2012-11-14 2:31 ` Isaku Yamahata
0 siblings, 0 replies; 10+ messages in thread
From: Isaku Yamahata @ 2012-11-14 2:31 UTC (permalink / raw)
To: Paolo Bonzini
Cc: quintela, qemu-devel qemu-devel, Michael Roth, Orit Wasserman,
benoit.hudzia, chegu_vinod
On Wed, Nov 14, 2012 at 03:20:08AM +0100, Paolo Bonzini wrote:
> Il 14/11/2012 03:14, Isaku Yamahata ha scritto:
> >> > Identifying ballooned zero pages is useful, because those cause the
> >> > clear_page calls in the kernel even in a guest that has been running for
> >> > a while.
> >> >
> >> > But a generic solution doesn't really matter, because is_dup_page and
> >> > clear_page shouldn't really be in the profile in practice, except in
> >> > microbenchmarks.
> > I guess mincore(2) can be used as easy way to detect non-mapped page.
> > This is just implementation detail anyway.
>
> Doesn't work if the page is swapped, doesn't it?
Ah, I meant if page is in memory, we can skip is_dup_page for that page.
Only check pages that is not-mapped or swapped.
If more detailed info is needed, /proc/<pid>/pagemap would be used.
> But I wonder if the clear_page occurrences are because of the problem
> described recently on LWN (http://lwn.net/Articles/517465/), and so
> really more of a kernel bug than anything else.
>
> Paolo
>
--
yamahata
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-11-14 2:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-13 16:18 [Qemu-devel] Migration ToDo list Juan Quintela
2012-11-13 16:28 ` Paolo Bonzini
2012-11-13 17:09 ` Orit Wasserman
2012-11-13 17:16 ` Paolo Bonzini
2012-11-14 2:14 ` Isaku Yamahata
2012-11-14 2:20 ` Paolo Bonzini
2012-11-14 2:31 ` Isaku Yamahata
2012-11-13 16:40 ` Orit Wasserman
2012-11-13 16:48 ` Chegu Vinod
2012-11-13 16:57 ` Orit Wasserman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).