qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Wenchao Xia <xiawenc@linux.vnet.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] broken incoming migration
Date: Mon, 10 Jun 2013 08:50:15 +0200	[thread overview]
Message-ID: <51B57727.9080903@kamp.de> (raw)
In-Reply-To: <51B57489.20802@ozlabs.ru>

On 10.06.2013 08:39, Alexey Kardashevskiy wrote:
> On 06/09/2013 05:27 PM, Peter Lieven wrote:
>> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>
>>> On 06/09/2013 01:01 PM, Wenchao Xia wrote:
>>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道:
>>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote:
>>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
>>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote:
>>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote:
>>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote:
>>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto:
>>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote:
>>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>>>>>>> You could also scan the page for nonzero
>>>>>>>>>>>>>>>>> values before writing it.
>>>>>>>>>>>>>>> i had this in mind, but then choosed the other
>>>>>>>>>>>>>>> approach.... turned out to be a bad idea.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> alexey: i will prepare a patch later today,
>>>>>>>>>>>>>>> could you then please verify it fixes your
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> paolo: would we still need the madvise or is
>>>>>>>>>>>>>>> it enough to not write the zeroes?
>>>>>>>>>>>>>> It should be enough to not write them.
>>>>>>>>>>>>> Problem: checking the pages for zero allocates
>>>>>>>>>>>>> them. even at the source.
>>>>>>>>>>>> It doesn't look like.  I tried this program and top
>>>>>>>>>>>> doesn't show an increasing amount of reserved
>>>>>>>>>>>> memory:
>>>>>>>>>>>>
>>>>>>>>>>>> #include <stdio.h> #include <stdlib.h> int main() {
>>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i
>>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j +=
>>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); }
>>>>>>>>>>>> getchar(); } }
>>>>>>>>>>> strange. we are talking about RSS size, right?
>>>>>>>>>> None of the three top values change, and only VIRT is
>>>>>>>>>>> 500 MB.
>>>>>>>>>>> is the malloc above using mmapped memory?
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> which kernel version do you use?
>>>>>>>>>> 3.9.
>>>>>>>>>>
>>>>>>>>>>> what avoids allocating the memory for me is the
>>>>>>>>>>> following (with whatever side effects it has ;-))
>>>>>>>>>> This would also fail to migrate any page that is swapped
>>>>>>>>>> out, breaking overcommit in a more subtle way. :)
>>>>>>>>>>
>>>>>>>>>> Paolo
>>>>>>>>> the following does also not allocate memory, but qemu
>>>>>>>>> does...
>>>>>>>> Hi, Peter As the patch writes
>>>>>>>>
>>>>>>>> "not sending zero pages breaks migration if a page is zero
>>>>>>>> at the source but not at the destination."
>>>>>>>>
>>>>>>>> I don't understand why it would be trouble, shouldn't all
>>>>>>>> page not received in dest be treated as zero pages?
>>>>>>>
>>>>>>> How would the destination guest know if some page must be
>>>>>>> cleared? The previous patch (which Peter reverted) did not
>>>>>>> send anything for the pages which were zero on the source
>>>>>>> side.
>>>>>> If an page was not received and destination knows that page
>>>>>> should exist according to total size, fill it with zero at
>>>>>> destination, would it solve the problem?
>>>>> It is _live_ migration, the source sends changes, same pages can
>>>>> change and be sent several times. So we would need to turn
>>>>> tracking on on the destination to know if some page was received
>>>>> from the source or changed by the destination itself (by writing
>>>>> there bios/firmware images, etc) and then clear pages which were
>>>>> touched by the destination and were not sent by the source.
>>>> OK, I can understand the problem is, for example: Destination boots
>>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send
>>>> zero pages in 0x0000-0xFFFF.
>>>
>>> The source did not forget, instead it zeroed these pages during its
>>> life and thought that they must be zeroed at the destination already
>>> (as the destination did not start and did not have a chance to write
>>> something there).
>>>
>>>
>>>> After migration destination got 0x0000-0xFFFF dirty(different with
>>>> source)
>>> Yep. And those pages were empty on the source what made debugging very
>>> easy :)
>>>
>>>
>>>> Thanks for explain.
>>>>
>>>> This seems refer to the migration protocol: how should the guest
>>>> treat unsent pages. The patch causing the problem, actually treat
>>>> zero pages as "not to sent" at source, but another half is missing:
>>>> treat "not received" as zero pages at destination. I guess if second
>>>> half is added, problem is gone: after page transfer completed,
>>>> before destination resume, fill zero in "not received" pages.
>>>
>>>
>>> Make a working patch, we'll discuss it :) I do not see much
>>> acceleration coming from there.
>> I would also not spent much time with this. I would either look to find
>> an easy way to fix the initialization code to not unneccessarily load
>> data into RAM or i will sent a v2 of my patch following Eric's
>> concerns.
> There is no easy way to implement the flag and keep your original patch as
> we have to implement this flag in all architectures which got broken by
> your patch and I personally can fix only PPC64-pseries but not the others.
>
> Furthermore your revert + new patches perfectly solve the problem, why
> would we want to bother now with this new flag which nobody really needs
> right now?
>
> Please, please, revert the original patch or I'll try to do it :)
>
>
I tried, but there where concerns by the community. Alternativly I found
the following alternate solution. Please drop the 2 patches and try the
following:

diff --git a/arch_init.c b/arch_init.c
index 5d32ecf..458bf8c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -799,6 +799,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                  while (total_ram_bytes) {
                      RAMBlock *block;
                      uint8_t len;
+                    void *base;
+                    ram_addr_t offset;

                      len = qemu_get_byte(f);
                      qemu_get_buffer(f, (uint8_t *)id, len);
@@ -822,6 +824,14 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                          goto done;
                      }

+                    base = memory_region_get_ram_ptr(block->mr);
+                    for (offset = 0; offset < block->length;
+                         offset += TARGET_PAGE_SIZE) {
+                        if (!is_zero_page(base + offset)) {
+                            memset(base + offset, 0x00, TARGET_PAGE_SIZE);
+                        }
+                    }
+
                      total_ram_bytes -= length;
                  }
              }

This is done at setup time so there is no additional cost for zero checking at each compressed page
coming in.

Peter

  reply	other threads:[~2013-06-10  6:50 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30  7:44 [Qemu-devel] broken incoming migration Alexey Kardashevskiy
2013-05-30  7:49 ` Alexey Kardashevskiy
2013-05-30  7:49 ` Paolo Bonzini
2013-05-30  8:18   ` Alexey Kardashevskiy
2013-05-30  9:08     ` Peter Lieven
2013-05-30  9:31       ` Alexey Kardashevskiy
2013-05-30 13:00       ` Paolo Bonzini
2013-05-30 13:38         ` Alexey Kardashevskiy
2013-05-30 14:08           ` Paolo Bonzini
2013-05-30 14:38         ` Peter Lieven
2013-05-30 14:41           ` Paolo Bonzini
2013-06-04 13:52             ` Peter Lieven
2013-06-04 14:14               ` Paolo Bonzini
2013-06-04 14:38                 ` Peter Lieven
2013-06-04 14:40                   ` Paolo Bonzini
2013-06-04 14:48                     ` Peter Lieven
2013-06-04 15:17                       ` Paolo Bonzini
2013-06-04 19:15                         ` Peter Lieven
2013-06-05  3:37                           ` Alexey Kardashevskiy
2013-06-05  6:09                             ` Peter Lieven
2013-06-09  4:12                               ` liu ping fan
2013-06-09  7:22                                 ` Peter Lieven
2013-06-04 15:10                     ` Peter Lieven
2013-06-08  8:27                       ` Wenchao Xia
2013-06-08  8:30                         ` Alexey Kardashevskiy
2013-06-09  2:16                           ` Wenchao Xia
2013-06-09  2:34                             ` Alexey Kardashevskiy
2013-06-09  2:52                               ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-09  3:01                                 ` Alexey Kardashevskiy
2013-06-09  3:01                               ` [Qemu-devel] " Wenchao Xia
2013-06-09  3:09                                 ` Alexey Kardashevskiy
2013-06-09  3:31                                   ` Wenchao Xia
2013-06-09  7:27                                   ` Peter Lieven
2013-06-10  6:39                                     ` Alexey Kardashevskiy
2013-06-10  6:50                                       ` Peter Lieven [this message]
2013-06-10  6:55                                         ` Alexey Kardashevskiy
2013-06-10  8:44                                           ` Peter Lieven
2013-06-10  9:10                                             ` Alexey Kardashevskiy
2013-06-10  9:33                                               ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-10  9:42                                                 ` Peter Lieven
2013-06-09  2:53                             ` Benjamin Herrenschmidt
2013-06-12 14:00                               ` Paolo Bonzini
2013-06-12 14:11                                 ` Benjamin Herrenschmidt
2013-06-12 20:10                                   ` Paolo Bonzini
2013-06-13  2:41                                     ` Wenchao Xia
2013-06-03 10:04           ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-04 10:56             ` Peter Lieven
2013-06-08  8:24         ` Wenchao Xia
2013-05-30 10:18 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B57727.9080903@kamp.de \
    --to=pl@kamp.de \
    --cc=aik@ozlabs.ru \
    --cc=david@gibson.dropbear.id.au \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=xiawenc@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).