[Qemu-devel] [PATCH] fix migration with large mem

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH] fix migration with large mem
@ 2010-04-13  9:33 Izik Eidus
  2010-05-09 12:29 ` [Qemu-devel] " Izik Eidus
  2010-05-10 20:24 ` [Qemu-devel] " Anthony Liguori
  0 siblings, 2 replies; 7+ messages in thread
From: Izik Eidus @ 2010-04-13  9:33 UTC (permalink / raw)
  To: qemu-devel

From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
From: Izik Eidus <ieidus@redhat.com>
Date: Tue, 13 Apr 2010 12:24:57 +0300
Subject: [PATCH] fix migration with large mem

In cases of guests with large mem that have pages
that all their bytes content are the same, we will
spend alot of time reading the memory from the guest
(is_dup_page())

It is happening beacuse ram_save_live() function have
limit of how much we can send to the dest but not how
much we read from it, and in cases we have many is_dup_page()
hits, we might read huge amount of data without updating important
stuff like the timers...

The guest lose all its repsonsibility and have many softlock ups
inside itself.

this patch add limit on the size we can read from the guest each
iteration.

    Thanks.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch_init.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index cfc03ea..e27b1a0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_PAGE	0x08
 #define RAM_SAVE_FLAG_EOS	0x10
 
+#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
+
 static int is_dup_page(uint8_t *page, uint8_t ch)
 {
     uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
@@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
     uint64_t bytes_transferred_last;
     double bwidth = 0;
     uint64_t expected_time = 0;
+    int data_read = 0;
 
     if (stage < 0) {
         cpu_physical_memory_set_dirty_tracking(0);
@@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
     bytes_transferred_last = bytes_transferred;
     bwidth = qemu_get_clock_ns(rt_clock);
 
-    while (!qemu_file_rate_limit(f)) {
+    while (!qemu_file_rate_limit(f) && data_read < MAX_SAVE_BLOCK_READ) {
         int ret;
 
         ret = ram_save_block(f);
+        data_read += ret * TARGET_PAGE_SIZE;
         bytes_transferred += ret * TARGET_PAGE_SIZE;
         if (ret == 0) { /* no more blocks */
             break;
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: [PATCH] fix migration with large mem
  2010-04-13  9:33 [Qemu-devel] [PATCH] fix migration with large mem Izik Eidus
@ 2010-05-09 12:29 ` Izik Eidus
  2010-05-09 13:57   ` Paolo Bonzini
  2010-05-10 20:24 ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 7+ messages in thread
From: Izik Eidus @ 2010-05-09 12:29 UTC (permalink / raw)
  To: Izik Eidus; +Cc: qemu-devel

On Tue, 13 Apr 2010 12:33:18 +0300
Izik Eidus <ieidus@redhat.com> wrote:

> From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
> From: Izik Eidus <ieidus@redhat.com>
> Date: Tue, 13 Apr 2010 12:24:57 +0300
> Subject: [PATCH] fix migration with large mem

Anyone ???


> 
> In cases of guests with large mem that have pages
> that all their bytes content are the same, we will
> spend alot of time reading the memory from the guest
> (is_dup_page())
> 
> It is happening beacuse ram_save_live() function have
> limit of how much we can send to the dest but not how
> much we read from it, and in cases we have many is_dup_page()
> hits, we might read huge amount of data without updating important
> stuff like the timers...
> 
> The guest lose all its repsonsibility and have many softlock ups
> inside itself.
> 
> this patch add limit on the size we can read from the guest each
> iteration.
> 
>     Thanks.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  arch_init.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index cfc03ea..e27b1a0 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
>  #define RAM_SAVE_FLAG_PAGE	0x08
>  #define RAM_SAVE_FLAG_EOS	0x10
>  
> +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
> +
>  static int is_dup_page(uint8_t *page, uint8_t ch)
>  {
>      uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
> @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>      uint64_t bytes_transferred_last;
>      double bwidth = 0;
>      uint64_t expected_time = 0;
> +    int data_read = 0;
>  
>      if (stage < 0) {
>          cpu_physical_memory_set_dirty_tracking(0);
> @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>      bytes_transferred_last = bytes_transferred;
>      bwidth = qemu_get_clock_ns(rt_clock);
>  
> -    while (!qemu_file_rate_limit(f)) {
> +    while (!qemu_file_rate_limit(f) && data_read < MAX_SAVE_BLOCK_READ) {
>          int ret;
>  
>          ret = ram_save_block(f);
> +        data_read += ret * TARGET_PAGE_SIZE;
>          bytes_transferred += ret * TARGET_PAGE_SIZE;
>          if (ret == 0) { /* no more blocks */
>              break;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: [PATCH] fix migration with large mem
  2010-05-09 12:29 ` [Qemu-devel] " Izik Eidus
@ 2010-05-09 13:57   ` Paolo Bonzini
  2010-05-09 14:01     ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2010-05-09 13:57 UTC (permalink / raw)
  To: Izik Eidus; +Cc: Marcelo Tosatti, qemu-devel, Avi Kivity

On 05/09/2010 02:29 PM, Izik Eidus wrote:
> On Tue, 13 Apr 2010 12:33:18 +0300
> Izik Eidus<ieidus@redhat.com>  wrote:
>
>>  From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
>> From: Izik Eidus<ieidus@redhat.com>
>> Date: Tue, 13 Apr 2010 12:24:57 +0300
>> Subject: [PATCH] fix migration with large mem
>
> Anyone ???

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Maybe this could be merged via uq/master.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: [PATCH] fix migration with large mem
  2010-05-09 13:57   ` Paolo Bonzini
@ 2010-05-09 14:01     ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2010-05-09 14:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Izik Eidus, qemu-devel, Marcelo Tosatti

On 05/09/2010 04:57 PM, Paolo Bonzini wrote:
> On 05/09/2010 02:29 PM, Izik Eidus wrote:
>> On Tue, 13 Apr 2010 12:33:18 +0300
>> Izik Eidus<ieidus@redhat.com>  wrote:
>>
>>>  From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
>>> From: Izik Eidus<ieidus@redhat.com>
>>> Date: Tue, 13 Apr 2010 12:24:57 +0300
>>> Subject: [PATCH] fix migration with large mem
>>
>> Anyone ???
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Maybe this could be merged via uq/master.

It's not intrinsically kvm related.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] fix migration with large mem
  2010-04-13  9:33 [Qemu-devel] [PATCH] fix migration with large mem Izik Eidus
  2010-05-09 12:29 ` [Qemu-devel] " Izik Eidus
@ 2010-05-10 20:24 ` Anthony Liguori
  2010-05-10 21:45   ` Izik Eidus
  1 sibling, 1 reply; 7+ messages in thread
From: Anthony Liguori @ 2010-05-10 20:24 UTC (permalink / raw)
  To: Izik Eidus; +Cc: qemu-devel

On 04/13/2010 04:33 AM, Izik Eidus wrote:
>  From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
> From: Izik Eidus<ieidus@redhat.com>
> Date: Tue, 13 Apr 2010 12:24:57 +0300
> Subject: [PATCH] fix migration with large mem
>
> In cases of guests with large mem that have pages
> that all their bytes content are the same, we will
> spend alot of time reading the memory from the guest
> (is_dup_page())
>
> It is happening beacuse ram_save_live() function have
> limit of how much we can send to the dest but not how
> much we read from it, and in cases we have many is_dup_page()
> hits, we might read huge amount of data without updating important
> stuff like the timers...
>
> The guest lose all its repsonsibility and have many softlock ups
> inside itself.
>
> this patch add limit on the size we can read from the guest each
> iteration.
>
>      Thanks.
>
> Signed-off-by: Izik Eidus<ieidus@redhat.com>
> ---
>   arch_init.c |    6 +++++-
>   1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index cfc03ea..e27b1a0 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
>   #define RAM_SAVE_FLAG_PAGE	0x08
>   #define RAM_SAVE_FLAG_EOS	0x10
>
> +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
> +
>   static int is_dup_page(uint8_t *page, uint8_t ch)
>   {
>       uint32_t val = ch<<  24 | ch<<  16 | ch<<  8 | ch;
> @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>       uint64_t bytes_transferred_last;
>       double bwidth = 0;
>       uint64_t expected_time = 0;
> +    int data_read = 0;
>
>       if (stage<  0) {
>           cpu_physical_memory_set_dirty_tracking(0);
> @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>       bytes_transferred_last = bytes_transferred;
>       bwidth = qemu_get_clock_ns(rt_clock);
>
> -    while (!qemu_file_rate_limit(f)) {
> +    while (!qemu_file_rate_limit(f)&&  data_read<  MAX_SAVE_BLOCK_READ) {
>    

The effect of this patch is that we'll never send more than 10mb/s 
during live migration?  If so, it's totally wrong as a fix to the problem.

It would be better to account the deduplicated pages as part of the rate 
limiting calculations.

Regards,

Anthony Liguori

>           int ret;
>
>           ret = ram_save_block(f);
> +        data_read += ret * TARGET_PAGE_SIZE;
>           bytes_transferred += ret * TARGET_PAGE_SIZE;
>           if (ret == 0) { /* no more blocks */
>               break;
>    

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] fix migration with large mem
  2010-05-10 20:24 ` [Qemu-devel] " Anthony Liguori
@ 2010-05-10 21:45   ` Izik Eidus
  2010-05-10 21:55     ` Anthony Liguori
  0 siblings, 1 reply; 7+ messages in thread
From: Izik Eidus @ 2010-05-10 21:45 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On Mon, 10 May 2010 15:24:20 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 04/13/2010 04:33 AM, Izik Eidus wrote:
> >  From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00
> > 2001 From: Izik Eidus<ieidus@redhat.com>
> > Date: Tue, 13 Apr 2010 12:24:57 +0300
> > Subject: [PATCH] fix migration with large mem
> >
> > In cases of guests with large mem that have pages
> > that all their bytes content are the same, we will
> > spend alot of time reading the memory from the guest
> > (is_dup_page())
> >
> > It is happening beacuse ram_save_live() function have
> > limit of how much we can send to the dest but not how
> > much we read from it, and in cases we have many is_dup_page()
> > hits, we might read huge amount of data without updating important
> > stuff like the timers...
> >
> > The guest lose all its repsonsibility and have many softlock ups
> > inside itself.
> >
> > this patch add limit on the size we can read from the guest each
> > iteration.
> >
> >      Thanks.
> >
> > Signed-off-by: Izik Eidus<ieidus@redhat.com>
> > ---
> >   arch_init.c |    6 +++++-
> >   1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch_init.c b/arch_init.c
> > index cfc03ea..e27b1a0 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
> >   #define RAM_SAVE_FLAG_PAGE	0x08
> >   #define RAM_SAVE_FLAG_EOS	0x10
> >
> > +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
> > +
> >   static int is_dup_page(uint8_t *page, uint8_t ch)
> >   {
> >       uint32_t val = ch<<  24 | ch<<  16 | ch<<  8 | ch;
> > @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f,
> > int stage, void *opaque) uint64_t bytes_transferred_last;
> >       double bwidth = 0;
> >       uint64_t expected_time = 0;
> > +    int data_read = 0;
> >
> >       if (stage<  0) {
> >           cpu_physical_memory_set_dirty_tracking(0);
> > @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f,
> > int stage, void *opaque) bytes_transferred_last = bytes_transferred;
> >       bwidth = qemu_get_clock_ns(rt_clock);
> >
> > -    while (!qemu_file_rate_limit(f)) {
> > +    while (!qemu_file_rate_limit(f)&&  data_read<
> > MAX_SAVE_BLOCK_READ) { 
> 
> The effect of this patch is that we'll never send more than 10mb/s 
> during live migration?  If so, it's totally wrong as a fix to the
> problem.

It is 100mb/s... (if I remember correct the migration code will run
this thing 10 times for each iteration)

My feeling is that limit it with the network 32mb/s limit is too low,
reading memory for 100mb/s is not such a problem as long as we don`t
read gigas out of memory every sec...

(Still we want to optimize the billion of zeros cases of windows guests)

Anyway if the above does not make sense to you, I will just change it
into what you suggested

So ?

> 
> It would be better to account the deduplicated pages as part of the
> rate limiting calculations.
> 
> Regards,
> 
> Anthony Liguori
> 
> >           int ret;
> >
> >           ret = ram_save_block(f);
> > +        data_read += ret * TARGET_PAGE_SIZE;
> >           bytes_transferred += ret * TARGET_PAGE_SIZE;
> >           if (ret == 0) { /* no more blocks */
> >               break;
> >    
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] fix migration with large mem
  2010-05-10 21:45   ` Izik Eidus
@ 2010-05-10 21:55     ` Anthony Liguori
  0 siblings, 0 replies; 7+ messages in thread
From: Anthony Liguori @ 2010-05-10 21:55 UTC (permalink / raw)
  To: Izik Eidus; +Cc: qemu-devel

On 05/10/2010 04:45 PM, Izik Eidus wrote:
> On Mon, 10 May 2010 15:24:20 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 04/13/2010 04:33 AM, Izik Eidus wrote:
>>      
>>>    From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00
>>> 2001 From: Izik Eidus<ieidus@redhat.com>
>>> Date: Tue, 13 Apr 2010 12:24:57 +0300
>>> Subject: [PATCH] fix migration with large mem
>>>
>>> In cases of guests with large mem that have pages
>>> that all their bytes content are the same, we will
>>> spend alot of time reading the memory from the guest
>>> (is_dup_page())
>>>
>>> It is happening beacuse ram_save_live() function have
>>> limit of how much we can send to the dest but not how
>>> much we read from it, and in cases we have many is_dup_page()
>>> hits, we might read huge amount of data without updating important
>>> stuff like the timers...
>>>
>>> The guest lose all its repsonsibility and have many softlock ups
>>> inside itself.
>>>
>>> this patch add limit on the size we can read from the guest each
>>> iteration.
>>>
>>>       Thanks.
>>>
>>> Signed-off-by: Izik Eidus<ieidus@redhat.com>
>>> ---
>>>    arch_init.c |    6 +++++-
>>>    1 files changed, 5 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch_init.c b/arch_init.c
>>> index cfc03ea..e27b1a0 100644
>>> --- a/arch_init.c
>>> +++ b/arch_init.c
>>> @@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
>>>    #define RAM_SAVE_FLAG_PAGE	0x08
>>>    #define RAM_SAVE_FLAG_EOS	0x10
>>>
>>> +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
>>> +
>>>    static int is_dup_page(uint8_t *page, uint8_t ch)
>>>    {
>>>        uint32_t val = ch<<   24 | ch<<   16 | ch<<   8 | ch;
>>> @@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f,
>>> int stage, void *opaque) uint64_t bytes_transferred_last;
>>>        double bwidth = 0;
>>>        uint64_t expected_time = 0;
>>> +    int data_read = 0;
>>>
>>>        if (stage<   0) {
>>>            cpu_physical_memory_set_dirty_tracking(0);
>>> @@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f,
>>> int stage, void *opaque) bytes_transferred_last = bytes_transferred;
>>>        bwidth = qemu_get_clock_ns(rt_clock);
>>>
>>> -    while (!qemu_file_rate_limit(f)) {
>>> +    while (!qemu_file_rate_limit(f)&&   data_read<
>>> MAX_SAVE_BLOCK_READ) {
>>>        
>> The effect of this patch is that we'll never send more than 10mb/s
>> during live migration?  If so, it's totally wrong as a fix to the
>> problem.
>>      
> It is 100mb/s... (if I remember correct the migration code will run
> this thing 10 times for each iteration)
>    

No, it only runs it once.

> My feeling is that limit it with the network 32mb/s limit is too low,
> reading memory for 100mb/s is not such a problem as long as we don`t
> read gigas out of memory every sec...
>    

You've limited bandwidth to 10 mb/sec.  Even if it was 100 mb/sec a 
fixed limit is wrong.  On a 10gbit (or 40gbit) link, 100 mb/sec is not 
enough.

> (Still we want to optimize the billion of zeros cases of windows guests)
>
> Anyway if the above does not make sense to you, I will just change it
> into what you suggested
>
> So ?
>    

That would work for me.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-05-10 21:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13  9:33 [Qemu-devel] [PATCH] fix migration with large mem Izik Eidus
2010-05-09 12:29 ` [Qemu-devel] " Izik Eidus
2010-05-09 13:57   ` Paolo Bonzini
2010-05-09 14:01     ` Avi Kivity
2010-05-10 20:24 ` [Qemu-devel] " Anthony Liguori
2010-05-10 21:45   ` Izik Eidus
2010-05-10 21:55     ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).