[Qemu-devel] Stalls on Live Migration of VMs with a lot of memory

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
@ 2012-01-03 18:04 Peter Lieven
  2012-01-04  1:38 ` Shu Ming
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2012-01-03 18:04 UTC (permalink / raw)
  To: qemu-devel, kvm

 Hi all,

 is there any known issue when migrating VMs with a lot of (e.g. 32GB) 
 of memory.
 It seems that there is some portion in the migration code which takes 
 too much time when the number
 of memory pages is large.

 Symptoms are: Irresponsive VNC connection, VM stalls and also 
 irresponsive QEMU Monitor (via TCP).

 The problem seems to be worse on 10G connections between 2 Nodes (i 
 already tried limiting the
 bandwidth with the migrate_set_speed command) than on 1G connections.

 The problem also seems to be worse in qemu-kvm-1.0 than in 
 qemu-kvm-0.12.5.

 Any hints?

 Thanks,
 Peter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-03 18:04 [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory Peter Lieven
@ 2012-01-04  1:38 ` Shu Ming
  2012-01-04  9:11   ` Peter Lieven
  2012-01-04 10:53   ` Peter Lieven
  0 siblings, 2 replies; 13+ messages in thread
From: Shu Ming @ 2012-01-04  1:38 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, kvm

On 2012-1-4 2:04, Peter Lieven wrote:
> Hi all,
>
> is there any known issue when migrating VMs with a lot of (e.g. 32GB) 
> of memory.
> It seems that there is some portion in the migration code which takes 
> too much time when the number
> of memory pages is large.
>
> Symptoms are: Irresponsive VNC connection, VM stalls and also 
> irresponsive QEMU Monitor (via TCP).
>
> The problem seems to be worse on 10G connections between 2 Nodes (i 
> already tried limiting the
> bandwidth with the migrate_set_speed command) than on 1G connections.
Is the migration  accomplished finally? How long will that be?  I did a 
test on VM with 4G and it took me about two seconds.


>
> The problem also seems to be worse in qemu-kvm-1.0 than in 
> qemu-kvm-0.12.5.
>
> Any hints?
>
> Thanks,
> Peter
>
>


-- 
Shu Ming<shuming@linux.vnet.ibm.com>
IBM China Systems and Technology Laboratory

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04  1:38 ` Shu Ming
@ 2012-01-04  9:11   ` Peter Lieven
  2012-01-04 10:53   ` Peter Lieven
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Lieven @ 2012-01-04  9:11 UTC (permalink / raw)
  To: Shu Ming; +Cc: qemu-devel, kvm

On 04.01.2012 02:38, Shu Ming wrote:
> On 2012-1-4 2:04, Peter Lieven wrote:
>> Hi all,
>>
>> is there any known issue when migrating VMs with a lot of (e.g. 32GB) 
>> of memory.
>> It seems that there is some portion in the migration code which takes 
>> too much time when the number
>> of memory pages is large.
>>
>> Symptoms are: Irresponsive VNC connection, VM stalls and also 
>> irresponsive QEMU Monitor (via TCP).
>>
>> The problem seems to be worse on 10G connections between 2 Nodes (i 
>> already tried limiting the
>> bandwidth with the migrate_set_speed command) than on 1G connections.
> Is the migration  accomplished finally? How long will that be?  I did 
> a test on VM with 4G and it took me about two seconds.
maybe i should have been more precise. i use hugetblfs and memory pre 
allocation. so all 32G are allocated and most of them
are dup. one problem seems to be an issue that has already been observed 
in 2010 but not addressed.

the rate limiter only counts bytes transferred. if there a lot of dup 
pages we end up reading almost the whole ram in one
cycle. i already patched the source to exit the while loop in stage 2 if 
either the file rate limiter kicks in *or* there have
been more pages * PAGE_SIZE bytes read than the rate limit allows.

this has improved the situation a bit, but it i have a few other ideas 
where time could be saved.

peter

>
>
>>
>> The problem also seems to be worse in qemu-kvm-1.0 than in 
>> qemu-kvm-0.12.5.
>>
>> Any hints?
>>
>> Thanks,
>> Peter
>>
>>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04  1:38 ` Shu Ming
  2012-01-04  9:11   ` Peter Lieven
@ 2012-01-04 10:53   ` Peter Lieven
  2012-01-04 11:05     ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 10:53 UTC (permalink / raw)
  To: Shu Ming; +Cc: qemu-devel, kvm

On 04.01.2012 02:38, Shu Ming wrote:
> On 2012-1-4 2:04, Peter Lieven wrote:
>> Hi all,
>>
>> is there any known issue when migrating VMs with a lot of (e.g. 32GB) 
>> of memory.
>> It seems that there is some portion in the migration code which takes 
>> too much time when the number
>> of memory pages is large.
>>
>> Symptoms are: Irresponsive VNC connection, VM stalls and also 
>> irresponsive QEMU Monitor (via TCP).
>>
>> The problem seems to be worse on 10G connections between 2 Nodes (i 
>> already tried limiting the
>> bandwidth with the migrate_set_speed command) than on 1G connections.
> Is the migration  accomplished finally? How long will that be?  I did 
> a test on VM with 4G and it took me about two seconds.
it seems that the majority of time (90%) is lost in:

cpu_physical_memory_reset_dirty(current_addr,
                                             current_addr + 
TARGET_PAGE_SIZE,
                                             MIGRATION_DIRTY_FLAG);

anyone any idea, to improve this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 10:53   ` Peter Lieven
@ 2012-01-04 11:05     ` Paolo Bonzini
  2012-01-04 11:22       ` Peter Lieven
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2012-01-04 11:05 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Shu Ming, qemu-devel, kvm

On 01/04/2012 11:53 AM, Peter Lieven wrote:
> On 04.01.2012 02:38, Shu Ming wrote:
>> On 2012-1-4 2:04, Peter Lieven wrote:
>>> Hi all,
>>>
>>> is there any known issue when migrating VMs with a lot of (e.g. 32GB)
>>> of memory.
>>> It seems that there is some portion in the migration code which takes
>>> too much time when the number
>>> of memory pages is large.
>>>
>>> Symptoms are: Irresponsive VNC connection, VM stalls and also
>>> irresponsive QEMU Monitor (via TCP).
>>>
>>> The problem seems to be worse on 10G connections between 2 Nodes (i
>>> already tried limiting the
>>> bandwidth with the migrate_set_speed command) than on 1G connections.
>> Is the migration accomplished finally? How long will that be? I did a
>> test on VM with 4G and it took me about two seconds.
> it seems that the majority of time (90%) is lost in:
>
> cpu_physical_memory_reset_dirty(current_addr,
> current_addr + TARGET_PAGE_SIZE,
> MIGRATION_DIRTY_FLAG);
>
> anyone any idea, to improve this?

There were patches to move RAM migration to a separate thread.  The 
problem is that they broke block migration.

However, asynchronous NBD is in and streaming will follow suit soon.  As 
soon as we have those two features, we might as well remove the block 
migration code.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 11:05     ` Paolo Bonzini
@ 2012-01-04 11:22       ` Peter Lieven
  2012-01-04 11:28         ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 11:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Shu Ming, qemu-devel, kvm

On 04.01.2012 12:05, Paolo Bonzini wrote:
> On 01/04/2012 11:53 AM, Peter Lieven wrote:
>> On 04.01.2012 02:38, Shu Ming wrote:
>>> On 2012-1-4 2:04, Peter Lieven wrote:
>>>> Hi all,
>>>>
>>>> is there any known issue when migrating VMs with a lot of (e.g. 32GB)
>>>> of memory.
>>>> It seems that there is some portion in the migration code which takes
>>>> too much time when the number
>>>> of memory pages is large.
>>>>
>>>> Symptoms are: Irresponsive VNC connection, VM stalls and also
>>>> irresponsive QEMU Monitor (via TCP).
>>>>
>>>> The problem seems to be worse on 10G connections between 2 Nodes (i
>>>> already tried limiting the
>>>> bandwidth with the migrate_set_speed command) than on 1G connections.
>>> Is the migration accomplished finally? How long will that be? I did a
>>> test on VM with 4G and it took me about two seconds.
>> it seems that the majority of time (90%) is lost in:
>>
>> cpu_physical_memory_reset_dirty(current_addr,
>> current_addr + TARGET_PAGE_SIZE,
>> MIGRATION_DIRTY_FLAG);
>>
>> anyone any idea, to improve this?
>
> There were patches to move RAM migration to a separate thread.  The 
> problem is that they broke block migration.
>
> However, asynchronous NBD is in and streaming will follow suit soon.  
> As soon as we have those two features, we might as well remove the 
> block migration code.

ok, so its a matter of time, right?

would it make sense to patch ram_save_block to always process a full ram 
block?
i think of copying the dirty information for the whole block then reset 
the dirty information for the complete block and then process
the the pages that have been dirty before the reset.

questions:
  - how big can ram blocks be?
  - is it possible that ram blocks differ in size?
  - in stage 3 the vm is stopped, right? so there can't be any more 
dirty blocks after scanning the whole memory once?

peter

>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 11:22       ` Peter Lieven
@ 2012-01-04 11:28         ` Paolo Bonzini
  2012-01-04 11:42           ` Peter Lieven
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2012-01-04 11:28 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Shu Ming, qemu-devel, kvm

On 01/04/2012 12:22 PM, Peter Lieven wrote:
>> There were patches to move RAM migration to a separate thread. The
>> problem is that they broke block migration.
>>
>> However, asynchronous NBD is in and streaming will follow suit soon.
>> As soon as we have those two features, we might as well remove the
>> block migration code.
>
> ok, so its a matter of time, right?

Well, there are other solutions of varying complexity in the works, that 
might remove the need for the migration thread or at least reduce the 
problem (post-copy migration, XBRLE, vectorized hot loops).  But yes, we 
are aware of the problem and we should solve it in one way or the other.

> would it make sense to patch ram_save_block to always process a full ram
> block?

If I understand the proposal, then migration would hardly be live 
anymore.  The biggest RAM block in a 32G machine is, well, 32G big. 
Other RAM blocks are for the VRAM and for some BIOS data, but they are 
very small in proportion.

> - in stage 3 the vm is stopped, right? so there can't be any more dirty
> blocks after scanning the whole memory once?

No, stage 3 is entered when there are very few dirty memory pages 
remaining.  This may happen after scanning the whole memory many times. 
  It may even never happen if migration does not converge because of low 
bandwidth or too strict downtime requirements.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 11:28         ` Paolo Bonzini
@ 2012-01-04 11:42           ` Peter Lieven
  2012-01-04 12:28             ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 11:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Shu Ming, qemu-devel, kvm

On 04.01.2012 12:28, Paolo Bonzini wrote:
> On 01/04/2012 12:22 PM, Peter Lieven wrote:
>>> There were patches to move RAM migration to a separate thread. The
>>> problem is that they broke block migration.
>>>
>>> However, asynchronous NBD is in and streaming will follow suit soon.
>>> As soon as we have those two features, we might as well remove the
>>> block migration code.
>>
>> ok, so its a matter of time, right?
>
> Well, there are other solutions of varying complexity in the works, 
> that might remove the need for the migration thread or at least reduce 
> the problem (post-copy migration, XBRLE, vectorized hot loops).  But 
> yes, we are aware of the problem and we should solve it in one way or 
> the other.
i have read all these approached and they seem all promising.
>
>> would it make sense to patch ram_save_block to always process a full ram
>> block?
>
> If I understand the proposal, then migration would hardly be live 
> anymore.  The biggest RAM block in a 32G machine is, well, 32G big. 
> Other RAM blocks are for the VRAM and for some BIOS data, but they are 
> very small in proportion.
ok, then i misunderstood the ram blocks thing. i thought the guest ram 
would consist of a collection of ram blocks.
then let me describe it differntly. would it make sense to process 
bigger portions of memory (e.g. 1M) in stage 2 to reduce the number of 
calls to cpu_physical_memory_reset_dirty and instead run it on bigger 
portions of memory. we might loose a few dirty pages but they will be 
tracked in the next iteration in stage 2 or in stage 3 at least. what 
would be necessary is that nobody marks a page dirty
while i copy the dirty information for the portion of memory i want to 
process.
>
>> - in stage 3 the vm is stopped, right? so there can't be any more dirty
>> blocks after scanning the whole memory once?
>
> No, stage 3 is entered when there are very few dirty memory pages 
> remaining.  This may happen after scanning the whole memory many 
> times.  It may even never happen if migration does not converge 
> because of low bandwidth or too strict downtime requirements.
ok, is there a chance that i lose one final page if it is modified just 
after i walked over it and i found no other page dirty (so bytes_sent = 0).

Peter
>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 11:42           ` Peter Lieven
@ 2012-01-04 12:28             ` Paolo Bonzini
  2012-01-04 13:08               ` Peter Lieven
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2012-01-04 12:28 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Shu Ming, qemu-devel, kvm

On 01/04/2012 12:42 PM, Peter Lieven wrote:
>>
> ok, then i misunderstood the ram blocks thing. i thought the guest ram
> would consist of a collection of ram blocks.
> then let me describe it differntly. would it make sense to process
> bigger portions of memory (e.g. 1M) in stage 2 to reduce the number of
> calls to cpu_physical_memory_reset_dirty and instead run it on bigger
> portions of memory. we might loose a few dirty pages but they will be
> tracked in the next iteration in stage 2 or in stage 3 at least. what
> would be necessary is that nobody marks a page dirty
> while i copy the dirty information for the portion of memory i want to
> process.

Dirty memory tracking is done by the hypervisor and must be done at page 
granularity.

>>> - in stage 3 the vm is stopped, right? so there can't be any more dirty
>>> blocks after scanning the whole memory once?
>>
>> No, stage 3 is entered when there are very few dirty memory pages
>> remaining.  This may happen after scanning the whole memory many
>> times.  It may even never happen if migration does not converge
>> because of low bandwidth or too strict downtime requirements.
>>
> ok, is there a chance that i lose one final page if it is modified just
> after i walked over it and i found no other page dirty (so bytes_sent = 0).

No, of course not.  Stage 3 will send all missing pages while the VM is 
stopped.  There is a chance that the guest will go crazy and start 
touching lots of pages at exactly the wrong time, and thus the downtime 
will be longer than expected.  However, that's a necessary evil; if you 
cannot accept that, post-copy migration would provide a completely 
different set of tradeoffs.

(BTW, bytes_sent = 0 is very rare).

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 12:28             ` Paolo Bonzini
@ 2012-01-04 13:08               ` Peter Lieven
  2012-01-04 14:14                 ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 13:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Shu Ming, qemu-devel, kvm

On 04.01.2012 13:28, Paolo Bonzini wrote:
> On 01/04/2012 12:42 PM, Peter Lieven wrote:
>>>
>> ok, then i misunderstood the ram blocks thing. i thought the guest ram
>> would consist of a collection of ram blocks.
>> then let me describe it differntly. would it make sense to process
>> bigger portions of memory (e.g. 1M) in stage 2 to reduce the number of
>> calls to cpu_physical_memory_reset_dirty and instead run it on bigger
>> portions of memory. we might loose a few dirty pages but they will be
>> tracked in the next iteration in stage 2 or in stage 3 at least. what
>> would be necessary is that nobody marks a page dirty
>> while i copy the dirty information for the portion of memory i want to
>> process.
>
> Dirty memory tracking is done by the hypervisor and must be done at 
> page granularity.
ok, so this is unfortunately no option.

thus my only option at the moment is to limit the runtime of the while 
loop in stage 2 or
are there any post 1.0 patches in git that might already help?

i tried to limit it to migrate_max_downtime() and this at least resolves 
the problem with
the vm stalls. however, migration speed is very limited (approx. 80MB/s 
on a 10G link).
with that.


>
>>>> - in stage 3 the vm is stopped, right? so there can't be any more 
>>>> dirty
>>>> blocks after scanning the whole memory once?
>>>
>>> No, stage 3 is entered when there are very few dirty memory pages
>>> remaining.  This may happen after scanning the whole memory many
>>> times.  It may even never happen if migration does not converge
>>> because of low bandwidth or too strict downtime requirements.
>>>
>> ok, is there a chance that i lose one final page if it is modified just
>> after i walked over it and i found no other page dirty (so bytes_sent 
>> = 0).
>
> No, of course not.  Stage 3 will send all missing pages while the VM 
> is stopped.  There is a chance that the guest will go crazy and start 
> touching lots of pages at exactly the wrong time, and thus the 
> downtime will be longer than expected.  However, that's a necessary 
> evil; if you cannot accept that, post-copy migration would provide a 
> completely different set of tradeoffs.
i don't suffer from long downtimes in stage 3. my issue is in stage 2.
>
> (BTW, bytes_sent = 0 is very rare).
i know, but when the vm is stopped there is no issue. i understood your 
"No, stage 3 is entered ..." wrong ;-)

thanks for your help and explainations.

peter
>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 13:08               ` Peter Lieven
@ 2012-01-04 14:14                 ` Paolo Bonzini
  2012-01-04 14:17                   ` Peter Lieven
  2012-01-04 14:21                   ` Peter Lieven
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2012-01-04 14:14 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Shu Ming, qemu-devel, kvm

On 01/04/2012 02:08 PM, Peter Lieven wrote:
>
> thus my only option at the moment is to limit the runtime of the while
> loop in stage 2 or
> are there any post 1.0 patches in git that might already help?

No; even though (as I said) people are aware of the problems and do plan 
to fix them, don't hold your breath. :(

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 14:14                 ` Paolo Bonzini
@ 2012-01-04 14:17                   ` Peter Lieven
  2012-01-04 14:21                   ` Peter Lieven
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 14:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Shu Ming, qemu-devel, kvm

On 04.01.2012 15:14, Paolo Bonzini wrote:
> don't hold your breath

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory
  2012-01-04 14:14                 ` Paolo Bonzini
  2012-01-04 14:17                   ` Peter Lieven
@ 2012-01-04 14:21                   ` Peter Lieven
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Lieven @ 2012-01-04 14:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Shu Ming, qemu-devel, kvm

On 04.01.2012 15:14, Paolo Bonzini wrote:
> On 01/04/2012 02:08 PM, Peter Lieven wrote:
>>
>> thus my only option at the moment is to limit the runtime of the while
>> loop in stage 2 or
>> are there any post 1.0 patches in git that might already help?
>
> No; even though (as I said) people are aware of the problems and do 
> plan to fix them, don't hold your breath. :(
ok, just for the record. if someone wants the time limit patch for the 
while loop in stage 2 (which solves the problem
for me) and after some tweaking is able to provide a throughput of 
approx. 450MB/s in my case, i attached it.
it also solves the case that due to a lot of dups the rate_limit does 
not kick in and end the while loop.

--- qemu-kvm-1.0/arch_init.c.orig    2012-01-04 14:21:02.000000000 +0100
+++ qemu-kvm-1.0/arch_init.c    2012-01-04 14:27:34.000000000 +0100
@@ -301,6 +301,8 @@
      bytes_transferred_last = bytes_transferred;
      bwidth = qemu_get_clock_ns(rt_clock);

+    int pages_read = 0;
+
      while ((ret = qemu_file_rate_limit(f)) == 0) {
          int bytes_sent;

@@ -309,6 +311,11 @@
          if (bytes_sent == 0) { /* no more blocks */
              break;
          }
+        if (!(++pages_read & 0xff)) {
+         if ((qemu_get_clock_ns(rt_clock) - bwidth) > 
migrate_max_downtime())
+          break; /* we have spent more than allowed downtime in this 
iteration */
+        }
      }

      if (ret < 0) {

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-01-04 14:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-03 18:04 [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory Peter Lieven
2012-01-04  1:38 ` Shu Ming
2012-01-04  9:11   ` Peter Lieven
2012-01-04 10:53   ` Peter Lieven
2012-01-04 11:05     ` Paolo Bonzini
2012-01-04 11:22       ` Peter Lieven
2012-01-04 11:28         ` Paolo Bonzini
2012-01-04 11:42           ` Peter Lieven
2012-01-04 12:28             ` Paolo Bonzini
2012-01-04 13:08               ` Peter Lieven
2012-01-04 14:14                 ` Paolo Bonzini
2012-01-04 14:17                   ` Peter Lieven
2012-01-04 14:21                   ` Peter Lieven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).