From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Peter Lieven <pl@kamp.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>,
"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
Wenchao Xia <xiawenc@linux.vnet.ibm.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] broken incoming migration
Date: Mon, 10 Jun 2013 16:55:23 +1000 [thread overview]
Message-ID: <51B5785B.6040704@ozlabs.ru> (raw)
In-Reply-To: <51B57727.9080903@kamp.de>
On 06/10/2013 04:50 PM, Peter Lieven wrote:
> On 10.06.2013 08:39, Alexey Kardashevskiy wrote:
>> On 06/09/2013 05:27 PM, Peter Lieven wrote:
>>> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>>
>>>> On 06/09/2013 01:01 PM, Wenchao Xia wrote:
>>>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道:
>>>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote:
>>>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
>>>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote:
>>>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote:
>>>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote:
>>>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto:
>>>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote:
>>>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>>>>>>>> You could also scan the page for nonzero
>>>>>>>>>>>>>>>>>> values before writing it.
>>>>>>>>>>>>>>>> i had this in mind, but then choosed the other
>>>>>>>>>>>>>>>> approach.... turned out to be a bad idea.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> alexey: i will prepare a patch later today,
>>>>>>>>>>>>>>>> could you then please verify it fixes your
>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> paolo: would we still need the madvise or is
>>>>>>>>>>>>>>>> it enough to not write the zeroes?
>>>>>>>>>>>>>>> It should be enough to not write them.
>>>>>>>>>>>>>> Problem: checking the pages for zero allocates
>>>>>>>>>>>>>> them. even at the source.
>>>>>>>>>>>>> It doesn't look like. I tried this program and top
>>>>>>>>>>>>> doesn't show an increasing amount of reserved
>>>>>>>>>>>>> memory:
>>>>>>>>>>>>>
>>>>>>>>>>>>> #include <stdio.h> #include <stdlib.h> int main() {
>>>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i
>>>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j +=
>>>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); }
>>>>>>>>>>>>> getchar(); } }
>>>>>>>>>>>> strange. we are talking about RSS size, right?
>>>>>>>>>>> None of the three top values change, and only VIRT is
>>>>>>>>>>>> 500 MB.
>>>>>>>>>>>> is the malloc above using mmapped memory?
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> which kernel version do you use?
>>>>>>>>>>> 3.9.
>>>>>>>>>>>
>>>>>>>>>>>> what avoids allocating the memory for me is the
>>>>>>>>>>>> following (with whatever side effects it has ;-))
>>>>>>>>>>> This would also fail to migrate any page that is swapped
>>>>>>>>>>> out, breaking overcommit in a more subtle way. :)
>>>>>>>>>>>
>>>>>>>>>>> Paolo
>>>>>>>>>> the following does also not allocate memory, but qemu
>>>>>>>>>> does...
>>>>>>>>> Hi, Peter As the patch writes
>>>>>>>>>
>>>>>>>>> "not sending zero pages breaks migration if a page is zero
>>>>>>>>> at the source but not at the destination."
>>>>>>>>>
>>>>>>>>> I don't understand why it would be trouble, shouldn't all
>>>>>>>>> page not received in dest be treated as zero pages?
>>>>>>>>
>>>>>>>> How would the destination guest know if some page must be
>>>>>>>> cleared? The previous patch (which Peter reverted) did not
>>>>>>>> send anything for the pages which were zero on the source
>>>>>>>> side.
>>>>>>> If an page was not received and destination knows that page
>>>>>>> should exist according to total size, fill it with zero at
>>>>>>> destination, would it solve the problem?
>>>>>> It is _live_ migration, the source sends changes, same pages can
>>>>>> change and be sent several times. So we would need to turn
>>>>>> tracking on on the destination to know if some page was received
>>>>>> from the source or changed by the destination itself (by writing
>>>>>> there bios/firmware images, etc) and then clear pages which were
>>>>>> touched by the destination and were not sent by the source.
>>>>> OK, I can understand the problem is, for example: Destination boots
>>>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send
>>>>> zero pages in 0x0000-0xFFFF.
>>>>
>>>> The source did not forget, instead it zeroed these pages during its
>>>> life and thought that they must be zeroed at the destination already
>>>> (as the destination did not start and did not have a chance to write
>>>> something there).
>>>>
>>>>
>>>>> After migration destination got 0x0000-0xFFFF dirty(different with
>>>>> source)
>>>> Yep. And those pages were empty on the source what made debugging very
>>>> easy :)
>>>>
>>>>
>>>>> Thanks for explain.
>>>>>
>>>>> This seems refer to the migration protocol: how should the guest
>>>>> treat unsent pages. The patch causing the problem, actually treat
>>>>> zero pages as "not to sent" at source, but another half is missing:
>>>>> treat "not received" as zero pages at destination. I guess if second
>>>>> half is added, problem is gone: after page transfer completed,
>>>>> before destination resume, fill zero in "not received" pages.
>>>>
>>>>
>>>> Make a working patch, we'll discuss it :) I do not see much
>>>> acceleration coming from there.
>>> I would also not spent much time with this. I would either look to find
>>> an easy way to fix the initialization code to not unneccessarily load
>>> data into RAM or i will sent a v2 of my patch following Eric's
>>> concerns.
>> There is no easy way to implement the flag and keep your original patch as
>> we have to implement this flag in all architectures which got broken by
>> your patch and I personally can fix only PPC64-pseries but not the others.
>>
>> Furthermore your revert + new patches perfectly solve the problem, why
>> would we want to bother now with this new flag which nobody really needs
>> right now?
>>
>> Please, please, revert the original patch or I'll try to do it :)
>>
>>
> I tried, but there where concerns by the community.
Was here anybody who did not want to revert the patch (besides you)?
I did not notice.
> Alternativly I found
> the following alternate solution. Please drop the 2 patches and try the
> following:
How is it going to work if upstream QEMU doesn't send anything about empty
pages at all (this is why I want to revert that patch)?
>
> diff --git a/arch_init.c b/arch_init.c
> index 5d32ecf..458bf8c 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -799,6 +799,8 @@ static int ram_load(QEMUFile *f, void *opaque, int
> version_id)
> while (total_ram_bytes) {
> RAMBlock *block;
> uint8_t len;
> + void *base;
> + ram_addr_t offset;
>
> len = qemu_get_byte(f);
> qemu_get_buffer(f, (uint8_t *)id, len);
> @@ -822,6 +824,14 @@ static int ram_load(QEMUFile *f, void *opaque, int
> version_id)
> goto done;
> }
>
> + base = memory_region_get_ram_ptr(block->mr);
> + for (offset = 0; offset < block->length;
> + offset += TARGET_PAGE_SIZE) {
> + if (!is_zero_page(base + offset)) {
> + memset(base + offset, 0x00, TARGET_PAGE_SIZE);
> + }
> + }
> +
> total_ram_bytes -= length;
> }
> }
>
> This is done at setup time so there is no additional cost for zero checking
> at each compressed page
> coming in.
>
> Peter
--
Alexey
next prev parent reply other threads:[~2013-06-10 6:55 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 7:44 [Qemu-devel] broken incoming migration Alexey Kardashevskiy
2013-05-30 7:49 ` Alexey Kardashevskiy
2013-05-30 7:49 ` Paolo Bonzini
2013-05-30 8:18 ` Alexey Kardashevskiy
2013-05-30 9:08 ` Peter Lieven
2013-05-30 9:31 ` Alexey Kardashevskiy
2013-05-30 13:00 ` Paolo Bonzini
2013-05-30 13:38 ` Alexey Kardashevskiy
2013-05-30 14:08 ` Paolo Bonzini
2013-05-30 14:38 ` Peter Lieven
2013-05-30 14:41 ` Paolo Bonzini
2013-06-04 13:52 ` Peter Lieven
2013-06-04 14:14 ` Paolo Bonzini
2013-06-04 14:38 ` Peter Lieven
2013-06-04 14:40 ` Paolo Bonzini
2013-06-04 14:48 ` Peter Lieven
2013-06-04 15:17 ` Paolo Bonzini
2013-06-04 19:15 ` Peter Lieven
2013-06-05 3:37 ` Alexey Kardashevskiy
2013-06-05 6:09 ` Peter Lieven
2013-06-09 4:12 ` liu ping fan
2013-06-09 7:22 ` Peter Lieven
2013-06-04 15:10 ` Peter Lieven
2013-06-08 8:27 ` Wenchao Xia
2013-06-08 8:30 ` Alexey Kardashevskiy
2013-06-09 2:16 ` Wenchao Xia
2013-06-09 2:34 ` Alexey Kardashevskiy
2013-06-09 2:52 ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-09 3:01 ` Alexey Kardashevskiy
2013-06-09 3:01 ` [Qemu-devel] " Wenchao Xia
2013-06-09 3:09 ` Alexey Kardashevskiy
2013-06-09 3:31 ` Wenchao Xia
2013-06-09 7:27 ` Peter Lieven
2013-06-10 6:39 ` Alexey Kardashevskiy
2013-06-10 6:50 ` Peter Lieven
2013-06-10 6:55 ` Alexey Kardashevskiy [this message]
2013-06-10 8:44 ` Peter Lieven
2013-06-10 9:10 ` Alexey Kardashevskiy
2013-06-10 9:33 ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-10 9:42 ` Peter Lieven
2013-06-09 2:53 ` Benjamin Herrenschmidt
2013-06-12 14:00 ` Paolo Bonzini
2013-06-12 14:11 ` Benjamin Herrenschmidt
2013-06-12 20:10 ` Paolo Bonzini
2013-06-13 2:41 ` Wenchao Xia
2013-06-03 10:04 ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-04 10:56 ` Peter Lieven
2013-06-08 8:24 ` Wenchao Xia
2013-05-30 10:18 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B5785B.6040704@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=pbonzini@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=xiawenc@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).