[Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
@ 2009-05-19 11:09 Uri Lublin
  2009-05-19 13:00 ` Anthony Liguori
  0 siblings, 1 reply; 15+ messages in thread
From: Uri Lublin @ 2009-05-19 11:09 UTC (permalink / raw)
  To: qemu-devel

Currently the live-part (section QEMU_VM_SECTION_PART) of
ram_save_live has only one convergence rule, which is
when the number of dirty pages is smaller than a threshold.

When the guest uses more memory pages than the threshold (e.g.
playing a movie, copying files, sending/receiving many packets),
it may take a very long time before convergence according to
this rule.

This patch (re)introduces a no-progress convergence rule, which limit
the number of times the migration process is not progressing
(and even regressing), with regards to the number of dirty
pages. No-progress means that the number of pages that got
dirty is larger than the number of pages that got transferred
to the destination during the last transfer.
This rule applies only after the first round (in which most
memory pages are being transferred).

Also this patch enlarges the number-dirty-pages threshold (of
the first convergence rule) to 50 pages (was 10)

Signed-off-by: Uri Lublin <uril@redhat.com>
---
 vl.c |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index 40b1d8b..5f145a0 100644
--- a/vl.c
+++ b/vl.c
@@ -3181,6 +3181,11 @@ static void ram_decompress_close(RamDecompressState *s)
 #define RAM_SAVE_FLAG_PAGE	0x08
 #define RAM_SAVE_FLAG_EOS	0x10
 
+static ram_addr_t ram_save_threshold = 50;
+static unsigned ram_save_no_progress_max = 10;
+static unsigned ram_save_no_progress = 0;
+static ram_addr_t ram_save_rounds = 0;
+
 static int is_dup_page(uint8_t *page, uint8_t ch)
 {
     uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
@@ -3225,12 +3230,13 @@ static int ram_save_block(QEMUFile *f)
         }
         addr += TARGET_PAGE_SIZE;
         current_addr = (saved_addr + addr) % last_ram_offset;
+        if (current_addr == 0)
+            ram_save_rounds++;
     }
 
     return found;
 }
 
-static ram_addr_t ram_save_threshold = 10;
 
 static ram_addr_t ram_save_remaining(void)
 {
@@ -3245,11 +3251,26 @@ static ram_addr_t ram_save_remaining(void)
     return count;
 }
 
+static int ram_save_is_converged(void)
+{
+    const ram_addr_t count = ram_save_remaining();
+    static ram_addr_t last_count = 0;
+
+    if ((count > last_count) && (ram_save_rounds > 0))
+        ram_save_no_progress++;
+    last_count = count;
+
+    return ((count < ram_save_threshold) ||
+            (ram_save_no_progress > ram_save_no_progress_max));
+}
+
 static int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
 
     if (stage == 1) {
+        ram_save_rounds = 0;
+        ram_save_no_progress = 0;
         /* Make sure all dirty bits are set */
         for (addr = 0; addr < last_ram_offset; addr += TARGET_PAGE_SIZE) {
             if (!cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
@@ -3281,7 +3302,7 @@ static int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
-    return (stage == 2) && (ram_save_remaining() < ram_save_threshold);
+    return (stage == 2) && ram_save_is_converged();
 }
 
 static int ram_load_dead(QEMUFile *f, void *opaque)
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 11:09 [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule Uri Lublin
@ 2009-05-19 13:00 ` Anthony Liguori
  2009-05-19 14:41   ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2009-05-19 13:00 UTC (permalink / raw)
  To: Uri Lublin; +Cc: qemu-devel

Uri Lublin wrote:
> Currently the live-part (section QEMU_VM_SECTION_PART) of
> ram_save_live has only one convergence rule, which is
> when the number of dirty pages is smaller than a threshold.
>
> When the guest uses more memory pages than the threshold (e.g.
> playing a movie, copying files, sending/receiving many packets),
> it may take a very long time before convergence according to
> this rule.
>
> This patch (re)introduces a no-progress convergence rule, which limit
> the number of times the migration process is not progressing
> (and even regressing), with regards to the number of dirty
> pages. No-progress means that the number of pages that got
> dirty is larger than the number of pages that got transferred
> to the destination during the last transfer.
> This rule applies only after the first round (in which most
> memory pages are being transferred).
>
> Also this patch enlarges the number-dirty-pages threshold (of
> the first convergence rule) to 50 pages (was 10)
>
> Signed-off-by: Uri Lublin <uril@redhat.com>
>   

The right place to do this is in a management tool.  An arbitrary 
convergence rule of 50 can do more damage than good.

For some set of users, it's better that live migration fail than it 
cause an arbitrarily long pause in the guest which can result in dropped 
TCP connections, soft lock ups, and other badness.

A management tool can force convergence by issuing a "stop" command in 
the monitor.  I suspect a management tool cares more about wall-clock 
time than number of iterations too so a valid metric would be something 
along the lines of if not converged after N seconds, issue stop monitor 
command where N is calculated from available network bandwidth and guest 
memory size.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 13:00 ` Anthony Liguori
@ 2009-05-19 14:41   ` Glauber Costa
  2009-05-19 14:59     ` Dor Laor
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Glauber Costa @ 2009-05-19 14:41 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Uri Lublin, qemu-devel

On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote:
> Uri Lublin wrote:
> >Currently the live-part (section QEMU_VM_SECTION_PART) of
> >ram_save_live has only one convergence rule, which is
> >when the number of dirty pages is smaller than a threshold.
> >
> >When the guest uses more memory pages than the threshold (e.g.
> >playing a movie, copying files, sending/receiving many packets),
> >it may take a very long time before convergence according to
> >this rule.
> >
> >This patch (re)introduces a no-progress convergence rule, which limit
> >the number of times the migration process is not progressing
> >(and even regressing), with regards to the number of dirty
> >pages. No-progress means that the number of pages that got
> >dirty is larger than the number of pages that got transferred
> >to the destination during the last transfer.
> >This rule applies only after the first round (in which most
> >memory pages are being transferred).
> >
> >Also this patch enlarges the number-dirty-pages threshold (of
> >the first convergence rule) to 50 pages (was 10)
> >
> >Signed-off-by: Uri Lublin <uril@redhat.com>
> >  
> 
> The right place to do this is in a management tool.  An arbitrary 
> convergence rule of 50 can do more damage than good.
> 
> For some set of users, it's better that live migration fail than it 
> cause an arbitrarily long pause in the guest which can result in dropped 
> TCP connections, soft lock ups, and other badness.
> 
> A management tool can force convergence by issuing a "stop" command in 
> the monitor.  I suspect a management tool cares more about wall-clock 
> time than number of iterations too so a valid metric would be something 
> along the lines of if not converged after N seconds, issue stop monitor 
> command where N is calculated from available network bandwidth and guest 
> memory size.
Another possibility is for the management tool to increase the bandwidth for
little periods if it perceives that no progress is being made.

Anyhow, I completely agree that we should not introduce this in qemu.

However, maybe we could augment our "info migrate" to provide more info about
the internal state of migration, so the mgmt tool can take a more informed
decision?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 14:41   ` Glauber Costa
@ 2009-05-19 14:59     ` Dor Laor
  2009-05-19 15:09       ` Glauber Costa
  2009-05-19 18:15       ` Anthony Liguori
  2009-05-19 18:09     ` Anthony Liguori
  2009-05-20 17:15     ` Daniel P. Berrange
  2 siblings, 2 replies; 15+ messages in thread
From: Dor Laor @ 2009-05-19 14:59 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Uri Lublin, qemu-devel

Glauber Costa wrote:
> On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote:
>   
>> Uri Lublin wrote:
>>     
>>> Currently the live-part (section QEMU_VM_SECTION_PART) of
>>> ram_save_live has only one convergence rule, which is
>>> when the number of dirty pages is smaller than a threshold.
>>>
>>> When the guest uses more memory pages than the threshold (e.g.
>>> playing a movie, copying files, sending/receiving many packets),
>>> it may take a very long time before convergence according to
>>> this rule.
>>>
>>> This patch (re)introduces a no-progress convergence rule, which limit
>>> the number of times the migration process is not progressing
>>> (and even regressing), with regards to the number of dirty
>>> pages. No-progress means that the number of pages that got
>>> dirty is larger than the number of pages that got transferred
>>> to the destination during the last transfer.
>>> This rule applies only after the first round (in which most
>>> memory pages are being transferred).
>>>
>>> Also this patch enlarges the number-dirty-pages threshold (of
>>> the first convergence rule) to 50 pages (was 10)
>>>
>>> Signed-off-by: Uri Lublin <uril@redhat.com>
>>>  
>>>       
>> The right place to do this is in a management tool.  An arbitrary 
>> convergence rule of 50 can do more damage than good.
>>
>> For some set of users, it's better that live migration fail than it 
>> cause an arbitrarily long pause in the guest which can result in dropped 
>> TCP connections, soft lock ups, and other badness.
>>
>> A management tool can force convergence by issuing a "stop" command in 
>> the monitor.  I suspect a management tool cares more about wall-clock 
>> time than number of iterations too so a valid metric would be something 
>> along the lines of if not converged after N seconds, issue stop monitor 
>> command where N is calculated from available network bandwidth and guest 
>> memory size.
>>     
> Another possibility is for the management tool to increase the bandwidth for
> little periods if it perceives that no progress is being made.
>
> Anyhow, I completely agree that we should not introduce this in qemu.
>
> However, maybe we could augment our "info migrate" to provide more info about
> the internal state of migration, so the mgmt tool can take a more informed
> decision?
>
>   
The problem is that if migration is not progressing since the guest is 
dirtying pages
faster than the migration protocol can send, than we just waist time and 
cpu.
The minimum is to notify the monitor interface in order to let mgmt 
daemon to trap it.
We can easily see this issue while running iperf in the guest or any 
other high load/dirty
pages scenario.

We can also make it configurable using the monitor migrate command. For 
example:
migrate -d -no_progress -threshold=x tcp:....

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 14:59     ` Dor Laor
@ 2009-05-19 15:09       ` Glauber Costa
  2009-05-19 18:17         ` Anthony Liguori
  2009-05-19 18:19         ` Anthony Liguori
  2009-05-19 18:15       ` Anthony Liguori
  1 sibling, 2 replies; 15+ messages in thread
From: Glauber Costa @ 2009-05-19 15:09 UTC (permalink / raw)
  To: Dor Laor; +Cc: Uri Lublin, qemu-devel

On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:
> Glauber Costa wrote:
> >On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote:
> >  
> >>Uri Lublin wrote:
> >>    
> >>>Currently the live-part (section QEMU_VM_SECTION_PART) of
> >>>ram_save_live has only one convergence rule, which is
> >>>when the number of dirty pages is smaller than a threshold.
> >>>
> >>>When the guest uses more memory pages than the threshold (e.g.
> >>>playing a movie, copying files, sending/receiving many packets),
> >>>it may take a very long time before convergence according to
> >>>this rule.
> >>>
> >>>This patch (re)introduces a no-progress convergence rule, which limit
> >>>the number of times the migration process is not progressing
> >>>(and even regressing), with regards to the number of dirty
> >>>pages. No-progress means that the number of pages that got
> >>>dirty is larger than the number of pages that got transferred
> >>>to the destination during the last transfer.
> >>>This rule applies only after the first round (in which most
> >>>memory pages are being transferred).
> >>>
> >>>Also this patch enlarges the number-dirty-pages threshold (of
> >>>the first convergence rule) to 50 pages (was 10)
> >>>
> >>>Signed-off-by: Uri Lublin <uril@redhat.com>
> >>> 
> >>>      
> >>The right place to do this is in a management tool.  An arbitrary 
> >>convergence rule of 50 can do more damage than good.
> >>
> >>For some set of users, it's better that live migration fail than it 
> >>cause an arbitrarily long pause in the guest which can result in dropped 
> >>TCP connections, soft lock ups, and other badness.
> >>
> >>A management tool can force convergence by issuing a "stop" command in 
> >>the monitor.  I suspect a management tool cares more about wall-clock 
> >>time than number of iterations too so a valid metric would be something 
> >>along the lines of if not converged after N seconds, issue stop monitor 
> >>command where N is calculated from available network bandwidth and guest 
> >>memory size.
> >>    
> >Another possibility is for the management tool to increase the bandwidth 
> >for
> >little periods if it perceives that no progress is being made.
> >
> >Anyhow, I completely agree that we should not introduce this in qemu.
> >
> >However, maybe we could augment our "info migrate" to provide more info 
> >about
> >the internal state of migration, so the mgmt tool can take a more informed
> >decision?
> >
> >  
> The problem is that if migration is not progressing since the guest is 
> dirtying pages
> faster than the migration protocol can send, than we just waist time and 
> cpu.
> The minimum is to notify the monitor interface in order to let mgmt 
> daemon to trap it.
> We can easily see this issue while running iperf in the guest or any 
> other high load/dirty
> pages scenario.
I know that, seen myself. What I believe and insist is only that qemu does not
really have to possess the knowledge to deal with it. Providing migration
stats in info migrate seems to me the better thing to do, rather than one single
one size fits all notification. The mgmt tool then can take the appropriate action
depending on the scenario it has in mind.
 
> We can also make it configurable using the monitor migrate command. For 
> example:
> migrate -d -no_progress -threshold=x tcp:....
it can be done, but it fits better as a different monitor command

anthony, do you have any strong opinions here, or is this scheme acceptable?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 14:41   ` Glauber Costa
  2009-05-19 14:59     ` Dor Laor
@ 2009-05-19 18:09     ` Anthony Liguori
  2009-05-20 17:25       ` Uri Lublin
  2009-05-20 17:15     ` Daniel P. Berrange
  2 siblings, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2009-05-19 18:09 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Uri Lublin, qemu-devel

Glauber Costa wrote:
> Another possibility is for the management tool to increase the bandwidth for
> little periods if it perceives that no progress is being made.
>
> Anyhow, I completely agree that we should not introduce this in qemu.
>
> However, maybe we could augment our "info migrate" to provide more info about
> the internal state of migration, so the mgmt tool can take a more informed
> decision?
>   
Yes, I've also suggested this before.  I'm willing to expose just about 
any metric that makes sense.  We need to be careful about not exposing 
implementation details, but things like iteration count, last working 
set size, average working set size, etc. should all be relatively stable 
metrics even if the implementation changes.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 14:59     ` Dor Laor
  2009-05-19 15:09       ` Glauber Costa
@ 2009-05-19 18:15       ` Anthony Liguori
  2009-05-20 17:17         ` Uri Lublin
  1 sibling, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2009-05-19 18:15 UTC (permalink / raw)
  To: dlaor; +Cc: Glauber Costa, Uri Lublin, qemu-devel

Dor Laor wrote: 
> The problem is that if migration is not progressing since the guest is 
> dirtying pages
> faster than the migration protocol can send, than we just waist time 
> and cpu.
> The minimum is to notify the monitor interface in order to let mgmt 
> daemon to trap it.
> We can easily see this issue while running iperf in the guest or any 
> other high load/dirty
> pages scenario.

The problem is, what's the metric for determining the guest isn't 
progressing?  A raw iteration count is not a valid metric.  It may be 
expected that the migration take 50 iterations.

The management tool knows the guest isn't progressing when it decides 
that a guest isn't progressing :-)

> We can also make it configurable using the monitor migrate command. 
> For example:
> migrate -d -no_progress -threshold=x tcp:....

Theshold is really a bad metric to use.  You have no idea how much data 
has been passed in each iteration.  If you only needed one more 
iteration, then stopping the migration short was a really bad idea.

The only thing that this does is give a false sense of security.  
Management tools have to deal with forcing migration convergence based 
on policies.  If a management tool isn't doing this today, it's broken IMHO.

Basically, threshold introduces a regression.  If you run iperf and 
migrate a guest with a very large memory size, after migration, you'll 
get soft lockups because the guest hasn't been running for 10 seconds.  
This is bad.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 15:09       ` Glauber Costa
@ 2009-05-19 18:17         ` Anthony Liguori
  2009-05-20 16:56           ` Uri Lublin
  2009-05-19 18:19         ` Anthony Liguori
  1 sibling, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2009-05-19 18:17 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Dor Laor, Uri Lublin, qemu-devel

Glauber Costa wrote:
> On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:
>   
>
>> We can also make it configurable using the monitor migrate command. For 
>> example:
>> migrate -d -no_progress -threshold=x tcp:....
>>     
> it can be done, but it fits better as a different monitor command
>
> anthony, do you have any strong opinions here, or is this scheme acceptable?
>   

Threshold is a bad metric.  There's no way to choose a right number.  If 
we were going to have a means to support metrics-based forced 
convergence (and I really think this belongs in libvirt) I'd rather see 
something based on bandwidth or wall clock time.

Let me put it this way, why 50?  What were the guidelines for choosing 
that number and how would you explain what number a user should choose?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 15:09       ` Glauber Costa
  2009-05-19 18:17         ` Anthony Liguori
@ 2009-05-19 18:19         ` Anthony Liguori
  1 sibling, 0 replies; 15+ messages in thread
From: Anthony Liguori @ 2009-05-19 18:19 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Dor Laor, Uri Lublin, qemu-devel

Glauber Costa wrote:
>> We can also make it configurable using the monitor migrate command. For 
>> example:
>> migrate -d -no_progress -threshold=x tcp:....
>>     
> it can be done, but it fits better as a different monitor command
>
> anthony, do you have any strong opinions here, or is this scheme acceptable?
>   

Oh, and to add to my last post, the action to take after a certain time 
period is failing the migration.  Forced convergence is a bug.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 18:17         ` Anthony Liguori
@ 2009-05-20 16:56           ` Uri Lublin
  2009-05-20 17:28             ` Blue Swirl
  0 siblings, 1 reply; 15+ messages in thread
From: Uri Lublin @ 2009-05-20 16:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Dor Laor, qemu-devel

On 05/19/2009 09:17 PM, Anthony Liguori wrote:
> Glauber Costa wrote:
>> On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:
>>
>>> We can also make it configurable using the monitor migrate command.
>>> For example:
>>> migrate -d -no_progress -threshold=x tcp:....
>> it can be done, but it fits better as a different monitor command
>>
>> anthony, do you have any strong opinions here, or is this scheme
>> acceptable?
>
> Threshold is a bad metric. There's no way to choose a right number. If
> we were going to have a means to support metrics-based forced
> convergence (and I really think this belongs in libvirt) I'd rather see
> something based on bandwidth or wall clock time.
>
> Let me put it this way, why 50? What were the guidelines for choosing
> that number and how would you explain what number a user should choose?

I've changed the threshold of the first convergence rule, to 50 from 10. Why 10 
? For this rule the threshold (number of dirty pages) and the number of bytes to 
transfer are equivalent.

50 pages is about 200K, which can be still sent quickly.
I've added debug messages and noticed we never hit a number smaller than 10 
(excluding 0). The truth is there were very little number of runs with less than 
50 dirty pages too. I don't mind leaving it at 10 (should be configurable too).

For the second migration convergence rule I've set the limit to 10, as it seems 
much larger than what I've needed (all the runs I've made a number of 2-4 
no-progress iterations was good enough, as it seems to have a repetitive 
behavior later), but I've enlarged it "just in case". No real research work was 
done here.

Note that a no-progress iteration depends on both network bandwidth and guest 
actions.

Regards,
     Uri.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 14:41   ` Glauber Costa
  2009-05-19 14:59     ` Dor Laor
  2009-05-19 18:09     ` Anthony Liguori
@ 2009-05-20 17:15     ` Daniel P. Berrange
  2 siblings, 0 replies; 15+ messages in thread
From: Daniel P. Berrange @ 2009-05-20 17:15 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Uri Lublin, qemu-devel

On Tue, May 19, 2009 at 10:41:01AM -0400, Glauber Costa wrote:
> On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote:
> > Uri Lublin wrote:
> > 
> > The right place to do this is in a management tool.  An arbitrary 
> > convergence rule of 50 can do more damage than good.
> > 
> > For some set of users, it's better that live migration fail than it 
> > cause an arbitrarily long pause in the guest which can result in dropped 
> > TCP connections, soft lock ups, and other badness.
> > 
> > A management tool can force convergence by issuing a "stop" command in 
> > the monitor.  I suspect a management tool cares more about wall-clock 
> > time than number of iterations too so a valid metric would be something 
> > along the lines of if not converged after N seconds, issue stop monitor 
> > command where N is calculated from available network bandwidth and guest 
> > memory size.
> Another possibility is for the management tool to increase the bandwidth for
> little periods if it perceives that no progress is being made.
> 
> Anyhow, I completely agree that we should not introduce this in qemu.
> 
> However, maybe we could augment our "info migrate" to provide more info about
> the internal state of migration, so the mgmt tool can take a more informed
> decision?

Yes, I think a 'info migration' command is neccessary regardless of
whether conversion is progressing successfully or not. If a migration
takes 60 seconds total in normal case, it is useful for mgmt tool
can get some indication of how far it has progressed. eg, report
pages sent, pages remaining, and total pages. NB, sent + remaining != total

The mgmt tool knows how much wall clock time has elapsed, and can
either present progress info the admin. So after x seconds has elapsed
the admin can see if it has nearly completed, or is stuck, and make a
decision, or let the mgmt app apply policies of its own, cancelling the
migration, or pausing the guest to let it complete non-live. 

If you want to put a policy into QEMU too go ahead, as long as its 
optional so a mgmt app can have full control if desired.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 18:15       ` Anthony Liguori
@ 2009-05-20 17:17         ` Uri Lublin
  0 siblings, 0 replies; 15+ messages in thread
From: Uri Lublin @ 2009-05-20 17:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, dlaor, qemu-devel

On 05/19/2009 09:15 PM, Anthony Liguori wrote:
> Dor Laor wrote:
>> The problem is that if migration is not progressing since the guest is
>> dirtying pages
>> faster than the migration protocol can send, than we just waist time
>> and cpu.
>> The minimum is to notify the monitor interface in order to let mgmt
>> daemon to trap it.
>> We can easily see this issue while running iperf in the guest or any
>> other high load/dirty
>> pages scenario.
>
> The problem is, what's the metric for determining the guest isn't
> progressing? A raw iteration count is not a valid metric. It may be
> expected that the migration take 50 iterations.

We've defined "no-progress" as a memory transfer iteration where the number of 
pages that got dirty is larger than the number of pages transferred. For such 
iterations we have more data to transfer when the iteration completes.
Note that we did not limit the number of iterations (yet), we want to limit the 
number of no-progress iterations. Migrations with many such iterations just 
waste resources (cpu, network, etc).

>
> The management tool knows the guest isn't progressing when it decides
> that a guest isn't progressing :-)

Currently the management tool only knows the migration is still active.

>
>> We can also make it configurable using the monitor migrate command.
>> For example:
>> migrate -d -no_progress -threshold=x tcp:....
>
> Theshold is really a bad metric to use. You have no idea how much data
> has been passed in each iteration. If you only needed one more
> iteration, then stopping the migration short was a really bad idea.

You can never know there is only one more iteration needed, no matter what 
metric you use.
Again this threshold limits the number of no-progress iterations.

We can extend this rule (or add another flag/command) to enlarge the bandwidth 
limitation upon a no-progress iteration.

>
> The only thing that this does is give a false sense of security.
> Management tools have to deal with forcing migration convergence based
> on policies. If a management tool isn't doing this today, it's broken IMHO.

I agree migration convergence rules should be based on policies.

What Dor is suggesting is that the management tool do that by passing parameters 
to the migrate command (or using other migrate_X monitor commands).

I'm not sure management tools can have good such policies today. The only 
information they have is how much time passed since the migration started.
The only actions they can take is stop the guest or cancel the migration.

>
> Basically, threshold introduces a regression. If you run iperf and
> migrate a guest with a very large memory size, after migration, you'll
> get soft lockups because the guest hasn't been running for 10 seconds.
> This is bad.

Just keep resending pages that are constantly changing is bad too, probably worse.

Regards,
     Uri.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-19 18:09     ` Anthony Liguori
@ 2009-05-20 17:25       ` Uri Lublin
  0 siblings, 0 replies; 15+ messages in thread
From: Uri Lublin @ 2009-05-20 17:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Dor Laor, qemu-devel

On 05/19/2009 09:09 PM, Anthony Liguori wrote:
> Glauber Costa wrote:
>> Another possibility is for the management tool to increase the
>> bandwidth for
>> little periods if it perceives that no progress is being made.
>>
>> Anyhow, I completely agree that we should not introduce this in qemu.
>>
>> However, maybe we could augment our "info migrate" to provide more
>> info about
>> the internal state of migration, so the mgmt tool can take a more
>> informed
>> decision?
> Yes, I've also suggested this before. I'm willing to expose just about
> any metric that makes sense. We need to be careful about not exposing
> implementation details, but things like iteration count, last working
> set size, average working set size, etc. should all be relatively stable
> metrics even if the implementation changes.
>

I agree we need to provide more information via "info migration". That's not 
enough though.

In addition of augmenting "info migration" we need to add more monitor commands 
to set/change migration parameters (e.g. current bandwidth limit), and change 
the migration code to act according to such parameters. These commands should 
affect the migration when used before and during migration.

Note that as management tool, most likely, call "info migration" periodically, 
it may miss information about some (current/last) statistics.
Also the first iteration may cause some averages to get biased.

How would you recognize "stuck" migrations ? By comparing Average Working Set 
Size and Average Iteration Transfer Size ? Counting the number of no-progress 
iterations ? The average "regression" of no-progress iterations ?

The no-progress convergence rule was fairly easy to implement and gave a pretty 
good heuristic to recognizing the migration is stuck.

Regards,
     Uri.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-20 16:56           ` Uri Lublin
@ 2009-05-20 17:28             ` Blue Swirl
  2009-05-20 17:34               ` Uri Lublin
  0 siblings, 1 reply; 15+ messages in thread
From: Blue Swirl @ 2009-05-20 17:28 UTC (permalink / raw)
  To: Uri Lublin; +Cc: Glauber Costa, Dor Laor, qemu-devel

On 5/20/09, Uri Lublin <uril@redhat.com> wrote:
> On 05/19/2009 09:17 PM, Anthony Liguori wrote:
>
> > Glauber Costa wrote:
> >
> > > On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:
> > >
> > >
> > > > We can also make it configurable using the monitor migrate command.
> > > > For example:
> > > > migrate -d -no_progress -threshold=x tcp:....
> > > >
> > > it can be done, but it fits better as a different monitor command
> > >
> > > anthony, do you have any strong opinions here, or is this scheme
> > > acceptable?
> > >
> >
> > Threshold is a bad metric. There's no way to choose a right number. If
> > we were going to have a means to support metrics-based forced
> > convergence (and I really think this belongs in libvirt) I'd rather see
> > something based on bandwidth or wall clock time.
> >
> > Let me put it this way, why 50? What were the guidelines for choosing
> > that number and how would you explain what number a user should choose?
> >
>
>  I've changed the threshold of the first convergence rule, to 50 from 10.
> Why 10 ? For this rule the threshold (number of dirty pages) and the number
> of bytes to transfer are equivalent.
>
>  50 pages is about 200K, which can be still sent quickly.
>  I've added debug messages and noticed we never hit a number smaller than 10
> (excluding 0). The truth is there were very little number of runs with less
> than 50 dirty pages too. I don't mind leaving it at 10 (should be
> configurable too).
>
>  For the second migration convergence rule I've set the limit to 10, as it
> seems much larger than what I've needed (all the runs I've made a number of
> 2-4 no-progress iterations was good enough, as it seems to have a repetitive
> behavior later), but I've enlarged it "just in case". No real research work
> was done here.
>
>  Note that a no-progress iteration depends on both network bandwidth and
> guest actions.

Instead of freezing the guest or aborting the migration, the guest
could be throttled a bit by giving it less CPU time relative to
migration, or by incurring a small delay for each page dirtying write
access. Maybe this method would find the balance faster.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule
  2009-05-20 17:28             ` Blue Swirl
@ 2009-05-20 17:34               ` Uri Lublin
  0 siblings, 0 replies; 15+ messages in thread
From: Uri Lublin @ 2009-05-20 17:34 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Glauber Costa, Dor Laor, qemu-devel

On 05/20/2009 08:28 PM, Blue Swirl wrote:
> On 5/20/09, Uri Lublin<uril@redhat.com>  wrote:
>> On 05/19/2009 09:17 PM, Anthony Liguori wrote:
>>
>>> Glauber Costa wrote:
>>>
>>>> On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote:
>>>>
>>>>
>>>>> We can also make it configurable using the monitor migrate command.
>>>>> For example:
>>>>> migrate -d -no_progress -threshold=x tcp:....
>>>>>
>>>> it can be done, but it fits better as a different monitor command
>>>>
>>>> anthony, do you have any strong opinions here, or is this scheme
>>>> acceptable?
>>>>
>>> Threshold is a bad metric. There's no way to choose a right number. If
>>> we were going to have a means to support metrics-based forced
>>> convergence (and I really think this belongs in libvirt) I'd rather see
>>> something based on bandwidth or wall clock time.
>>>
>>> Let me put it this way, why 50? What were the guidelines for choosing
>>> that number and how would you explain what number a user should choose?
>>>
>>   I've changed the threshold of the first convergence rule, to 50 from 10.
>> Why 10 ? For this rule the threshold (number of dirty pages) and the number
>> of bytes to transfer are equivalent.
>>
>>   50 pages is about 200K, which can be still sent quickly.
>>   I've added debug messages and noticed we never hit a number smaller than 10
>> (excluding 0). The truth is there were very little number of runs with less
>> than 50 dirty pages too. I don't mind leaving it at 10 (should be
>> configurable too).
>>
>>   For the second migration convergence rule I've set the limit to 10, as it
>> seems much larger than what I've needed (all the runs I've made a number of
>> 2-4 no-progress iterations was good enough, as it seems to have a repetitive
>> behavior later), but I've enlarged it "just in case". No real research work
>> was done here.
>>
>>   Note that a no-progress iteration depends on both network bandwidth and
>> guest actions.
>
> Instead of freezing the guest or aborting the migration, the guest
> could be throttled a bit by giving it less CPU time relative to
> migration, or by incurring a small delay for each page dirtying write
> access. Maybe this method would find the balance faster.

Enlarging the bandwidth limitation (while migration is active) is one way of 
doing that (give more cpu to migration code and less to guest). I think 
currently it's not possible though.
I don't think I want to "play" with delaying write accesses.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-05-20 17:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-19 11:09 [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule Uri Lublin
2009-05-19 13:00 ` Anthony Liguori
2009-05-19 14:41   ` Glauber Costa
2009-05-19 14:59     ` Dor Laor
2009-05-19 15:09       ` Glauber Costa
2009-05-19 18:17         ` Anthony Liguori
2009-05-20 16:56           ` Uri Lublin
2009-05-20 17:28             ` Blue Swirl
2009-05-20 17:34               ` Uri Lublin
2009-05-19 18:19         ` Anthony Liguori
2009-05-19 18:15       ` Anthony Liguori
2009-05-20 17:17         ` Uri Lublin
2009-05-19 18:09     ` Anthony Liguori
2009-05-20 17:25       ` Uri Lublin
2009-05-20 17:15     ` Daniel P. Berrange

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).