* [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
@ 2011-09-14 13:18 Thomas Treutner
2011-09-14 15:22 ` Michael Roth
2011-09-14 15:45 ` Anthony Liguori
0 siblings, 2 replies; 6+ messages in thread
From: Thomas Treutner @ 2011-09-14 13:18 UTC (permalink / raw)
To: qemu-devel
Currently, it is possible that a live migration never finishes, when the dirty page rate is high compared to the scan/transfer rate. The exact values for MAX_MEMORY_ITERATIONS and MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be *some* limit to force the final iteration of a live migration that does not converge.
---
arch_init.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/arch_init.c b/arch_init.c
index 4486925..57fcb1e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH;
#define RAM_SAVE_FLAG_EOS 0x10
#define RAM_SAVE_FLAG_CONTINUE 0x20
+#define MAX_MEMORY_ITERATIONS 10
+#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3
+
static int is_dup_page(uint8_t *page, uint8_t ch)
{
uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
@@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
static RAMBlock *last_block;
static ram_addr_t last_offset;
+static int numberFullMemoryIterations = 0;
+
static int ram_save_block(QEMUFile *f)
{
RAMBlock *block = last_block;
@@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f)
offset = 0;
block = QLIST_NEXT(block, next);
if (!block)
+ {
+ numberFullMemoryIterations++;
block = QLIST_FIRST(&ram_list.blocks);
+ }
}
current_addr = block->offset + offset;
@@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
- return (stage == 2) && (expected_time <= migrate_max_downtime());
+ return (stage == 2) && ((expected_time <= migrate_max_downtime() || (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || (bytes_transferred > (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total()))));
}
static inline void *host_from_stream_offset(QEMUFile *f,
--
1.7.0.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
2011-09-14 13:18 [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration Thomas Treutner
@ 2011-09-14 15:22 ` Michael Roth
2011-09-14 15:36 ` Michael Roth
2011-09-14 15:45 ` Anthony Liguori
1 sibling, 1 reply; 6+ messages in thread
From: Michael Roth @ 2011-09-14 15:22 UTC (permalink / raw)
To: Thomas Treutner; +Cc: qemu-devel
On 09/14/2011 08:18 AM, Thomas Treutner wrote:
> Currently, it is possible that a live migration never finishes, when the dirty page rate is high compared to the scan/transfer rate. The exact values for MAX_MEMORY_ITERATIONS and MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be *some* limit to force the final iteration of a live migration that does not converge.
>
> ---
> arch_init.c | 10 +++++++++-
> 1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 4486925..57fcb1e 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH;
> #define RAM_SAVE_FLAG_EOS 0x10
> #define RAM_SAVE_FLAG_CONTINUE 0x20
>
> +#define MAX_MEMORY_ITERATIONS 10
> +#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3
> +
> static int is_dup_page(uint8_t *page, uint8_t ch)
> {
> uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch;
> @@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
> static RAMBlock *last_block;
> static ram_addr_t last_offset;
>
> +static int numberFullMemoryIterations = 0;
> +
> static int ram_save_block(QEMUFile *f)
> {
> RAMBlock *block = last_block;
> @@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f)
> offset = 0;
> block = QLIST_NEXT(block, next);
> if (!block)
> + {
> + numberFullMemoryIterations++;
> block = QLIST_FIRST(&ram_list.blocks);
> + }
> }
>
> current_addr = block->offset + offset;
> @@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>
> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>
> - return (stage == 2)&& (expected_time<= migrate_max_downtime());
> + return (stage == 2)&& ((expected_time<= migrate_max_downtime() || (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || (bytes_transferred> (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total()))));
> }
>
> static inline void *host_from_stream_offset(QEMUFile *f,
To me it seems like a simpler solution is to do something like:
return (stage == 2) && current_time() + expected_time < migrate_deadline()
where migrate_deadline() is the time that the migration began plus
migrate_max_downtime().
Currently, it looks like migrate_max_downtime() is being applied on a
per-iteration basis rather than per-migration, which seems like a bug to
me. Block migration seems to suffer from this as well...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
2011-09-14 15:22 ` Michael Roth
@ 2011-09-14 15:36 ` Michael Roth
0 siblings, 0 replies; 6+ messages in thread
From: Michael Roth @ 2011-09-14 15:36 UTC (permalink / raw)
To: Thomas Treutner; +Cc: qemu-devel
On 09/14/2011 10:22 AM, Michael Roth wrote:
> On 09/14/2011 08:18 AM, Thomas Treutner wrote:
>> Currently, it is possible that a live migration never finishes, when
>> the dirty page rate is high compared to the scan/transfer rate. The
>> exact values for MAX_MEMORY_ITERATIONS and
>> MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be
>> *some* limit to force the final iteration of a live migration that
>> does not converge.
>>
>> ---
>> arch_init.c | 10 +++++++++-
>> 1 files changed, 9 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 4486925..57fcb1e 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH;
>> #define RAM_SAVE_FLAG_EOS 0x10
>> #define RAM_SAVE_FLAG_CONTINUE 0x20
>>
>> +#define MAX_MEMORY_ITERATIONS 10
>> +#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3
>> +
>> static int is_dup_page(uint8_t *page, uint8_t ch)
>> {
>> uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch;
>> @@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
>> static RAMBlock *last_block;
>> static ram_addr_t last_offset;
>>
>> +static int numberFullMemoryIterations = 0;
>> +
>> static int ram_save_block(QEMUFile *f)
>> {
>> RAMBlock *block = last_block;
>> @@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f)
>> offset = 0;
>> block = QLIST_NEXT(block, next);
>> if (!block)
>> + {
>> + numberFullMemoryIterations++;
>> block = QLIST_FIRST(&ram_list.blocks);
>> + }
>> }
>>
>> current_addr = block->offset + offset;
>> @@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int
>> stage, void *opaque)
>>
>> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>>
>> - return (stage == 2)&& (expected_time<= migrate_max_downtime());
>> + return (stage == 2)&& ((expected_time<= migrate_max_downtime() ||
>> (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) ||
>> (bytes_transferred>
>> (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total()))));
>> }
>>
>> static inline void *host_from_stream_offset(QEMUFile *f,
>
> To me it seems like a simpler solution is to do something like:
>
> return (stage == 2) && current_time() + expected_time < migrate_deadline()
>
> where migrate_deadline() is the time that the migration began plus
> migrate_max_downtime().
>
> Currently, it looks like migrate_max_downtime() is being applied on a
> per-iteration basis rather than per-migration, which seems like a bug to
> me. Block migration seems to suffer from this as well...
Sorry, ignore this, that calculation's just for stage 3.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
2011-09-14 13:18 [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration Thomas Treutner
2011-09-14 15:22 ` Michael Roth
@ 2011-09-14 15:45 ` Anthony Liguori
2011-09-15 8:27 ` Thomas Treutner
1 sibling, 1 reply; 6+ messages in thread
From: Anthony Liguori @ 2011-09-14 15:45 UTC (permalink / raw)
To: Thomas Treutner; +Cc: qemu-devel
On 09/14/2011 08:18 AM, Thomas Treutner wrote:
> Currently, it is possible that a live migration never finishes, when the dirty page rate is high compared to the scan/transfer rate. The exact values for MAX_MEMORY_ITERATIONS and MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be *some* limit to force the final iteration of a live migration that does not converge.
No, there shouldn't be.
A management app can always stop a guest to force convergence. If you
make migration have unbounded downtime by default then you're making
migration unsafe for smarter consumers.
You can already set things like maximum downtime to force convergence.
If you wanted to have some logic like an exponentially increasing
maximum downtime given a fixed timeout, that would be okay provided it
was an optional feature.
So for instance, you could do something like:
downtime: defaults to 30ms
(qemu) migrate_set_convergence_timeout 60 # begin to enforce
convergence after 1 minute
At 1 minutes, downtime goes to 60ms, at 2 minutes it goes to 120ms, at 3
minutes it goes to 240ms, 4 minutes it goes to 480ms, 5 minutes it goes
to 1.06s, 6 minutes it goes to 2s, etc.
Regards,
Anthony Liguori
> ---
> arch_init.c | 10 +++++++++-
> 1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 4486925..57fcb1e 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH;
> #define RAM_SAVE_FLAG_EOS 0x10
> #define RAM_SAVE_FLAG_CONTINUE 0x20
>
> +#define MAX_MEMORY_ITERATIONS 10
> +#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3
> +
> static int is_dup_page(uint8_t *page, uint8_t ch)
> {
> uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch;
> @@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
> static RAMBlock *last_block;
> static ram_addr_t last_offset;
>
> +static int numberFullMemoryIterations = 0;
> +
> static int ram_save_block(QEMUFile *f)
> {
> RAMBlock *block = last_block;
> @@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f)
> offset = 0;
> block = QLIST_NEXT(block, next);
> if (!block)
> + {
> + numberFullMemoryIterations++;
> block = QLIST_FIRST(&ram_list.blocks);
> + }
> }
>
> current_addr = block->offset + offset;
> @@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>
> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>
> - return (stage == 2)&& (expected_time<= migrate_max_downtime());
> + return (stage == 2)&& ((expected_time<= migrate_max_downtime() || (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || (bytes_transferred> (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total()))));
> }
>
> static inline void *host_from_stream_offset(QEMUFile *f,
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
2011-09-14 15:45 ` Anthony Liguori
@ 2011-09-15 8:27 ` Thomas Treutner
2011-09-15 9:35 ` Paolo Bonzini
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Treutner @ 2011-09-15 8:27 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
Am 14.09.2011 17:45, schrieb Anthony Liguori:
> On 09/14/2011 08:18 AM, Thomas Treutner wrote:
>> Currently, it is possible that a live migration never finishes, when
>> the dirty page rate is high compared to the scan/transfer rate. The
>> exact values for MAX_MEMORY_ITERATIONS and
>> MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be
>> *some* limit to force the final iteration of a live migration that
>> does not converge.
>
> No, there shouldn't be.
I think there should be. The iterative pre-copy mechanism is completely
depending on the assumption of convergence. Currently, the probable
chance that this assumption does not hold is totally ignored, which is
kind of burying one's head in the sand to me.
> A management app
I do not know of any management app that takes care of this. Can you
give an example where management app developers actually knew about this
issue and took care of it? I didn't see any big warning regarding
migration, but just stumbled upon it by coincidence. libvirt just seems
to program around MAX_THROTTLE nowadays, which is another PITA. As a
user, I can and have to assume that a certain function actually does
what it promises and if it can't for whatever reason, it throws an
error. Would you be happy with a function that promises the write of a
file, but if the location given is not writable, it just sits there and
waits forever until you somehow, manually notice why and what the remedy is?
> can always stop a guest to force convergence.
What do you mean by stop exactly? Pausing the guest? Is it then
automatically unpaused by qemu again at the destination host?
> If you make migration have unbounded downtime by default
> then you're making migration unsafe for smarter consumers.
I'd prefer that compared to having the common case unsafe. If migration
doesn't converge, it is now eventually finished at some distant point in
time only because the VM's service severely suffers from the migration,
meaning it can do less and less page dirtying. In reality, users would
quickly stop using the service, as response times etc. are going through
the roof and they're running in network timeouts. Having a single,
longer downtime is better than a potentially everlasting unresponsive VM.
> You can already set things like maximum downtime to force convergence.
The maximum downtine parameter seems to be a nice switch, but it is
another example of surprise. The value you choose is not even in within
a magnitude of what happens, as the "bandwidth" used for calculations
seems to be a buffer bandwidth, but not the real network bandwidth. Even
with extremely aggressive bridge-timings, there is a factor of ~20
between the default 30ms setting and the actual result.
I know the - arguable, in my pov - policy is "just give progress info
when requested (although our algorithm strictly requires steady
progress, but we do no want to hear that when things go hot), and let
mgmt apps decide", but that is not implemented correctly either. First,
because of the bandwidth/downtime issue above, second, because of
incorrect memory transfer amounts, where duplicate (unused?) pages are
accounted as 1 byte of transfer. It may be correct regarding the
physical view, but from a logical, management app view, the migration
has progressed by a full page, not just 1 byte. It is hard to argue that
mgmt apps should care about things working out nicely, when the
information given to them is not consistent to each other and switches
presented are doing something but not in any way what they said they would.
> If you wanted to have some logic like an exponentially increasing
> maximum downtime given a fixed timeout, that would be okay provided it
> was an optional feature.
I'm already doing a similar thing using libvirt, I'm just coming back to
this as such an approach is causing lots of pain and clutter-up code,
and the original issue can be solved with 3-4 changed lines of code in
qemu.
AFAIK, there is neither a way to synchronize on the actual start of the
migration (so you can start polling and setting a custom downtime value)
nor to synchronize on the end of the migration (so you know when to stop
polling). As a result, one is playing around with crude sleeps, hoping
that the migration, although of course already triggered, has actually
started yet, and then trying in vain not to step on any invalidated data
structures while monitoring the progress in a second thread, as no one
knows when the main thread with the blocking live migration will pull
the rug out from under the monitoring thread's feet. Then, lots of code
is needed to clean up this holy mess and regularly, a SEGV is happening:
http://pastebin.com/jT6sXubu
I don't know of any way to reliably and cleanly solve this issue within
"a management app", as I don't see any mechanism that the main thread
signals a monitoring thread to stop monitoring *before* it will pull the
rug. Sending the signal directly after the migration call unblocks is
not enough, I've tried that, the result is above. There is still room
for two threads in one critical section.
regards,
thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration.
2011-09-15 8:27 ` Thomas Treutner
@ 2011-09-15 9:35 ` Paolo Bonzini
0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2011-09-15 9:35 UTC (permalink / raw)
To: Thomas Treutner; +Cc: qemu-devel
On 09/15/2011 10:27 AM, Thomas Treutner wrote:
>
>> can always stop a guest to force convergence.
>
> What do you mean by stop exactly? Pausing the guest? Is it then
> automatically unpaused by qemu again at the destination host?
Whether the guest restarts on the destination depends on the -S
command-line option given in the destination.
libvirt in particular restarts the guest depending on the state *at the
beginning of migration*, so yes---pausing the guest will force
convergence and will get the guest running on the destination.
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-09-15 9:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-14 13:18 [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration Thomas Treutner
2011-09-14 15:22 ` Michael Roth
2011-09-14 15:36 ` Michael Roth
2011-09-14 15:45 ` Anthony Liguori
2011-09-15 8:27 ` Thomas Treutner
2011-09-15 9:35 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).