qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@kamp.de>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] broken incoming migration
Date: Sun, 09 Jun 2013 11:31:18 +0800	[thread overview]
Message-ID: <51B3F706.1030204@linux.vnet.ibm.com> (raw)
In-Reply-To: <51B3F1FD.1090401@ozlabs.ru>

于 2013-6-9 11:09, Alexey Kardashevskiy 写道:
> On 06/09/2013 01:01 PM, Wenchao Xia wrote:
>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道:
>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote:
>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote:
>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote:
>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote:
>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto:
>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote:
>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>>>>> You could also scan the page for nonzero values before
>>>>>>>>>>>>>>> writing it.
>>>>>>>>>>>>> i had this in mind, but then choosed the other approach.... turned
>>>>>>>>>>>>> out to be a bad idea.
>>>>>>>>>>>>>
>>>>>>>>>>>>> alexey: i will prepare a patch later today, could you then please
>>>>>>>>>>>>> verify it fixes your problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> paolo: would we still need the madvise or is it enough to not
>>>>>>>>>>>>> write
>>>>>>>>>>>>> the zeroes?
>>>>>>>>>>>> It should be enough to not write them.
>>>>>>>>>>> Problem: checking the pages for zero allocates them. even at the
>>>>>>>>>>> source.
>>>>>>>>>> It doesn't look like.  I tried this program and top doesn't show an
>>>>>>>>>> increasing amount of reserved memory:
>>>>>>>>>>
>>>>>>>>>> #include <stdio.h>
>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>> int main()
>>>>>>>>>> {
>>>>>>>>>>          char *x = malloc(500 << 20);
>>>>>>>>>>          int i, j;
>>>>>>>>>>          for (i = 0; i < 500; i += 10) {
>>>>>>>>>>              for (j = 0; j < 10 << 20; j += 4096) {
>>>>>>>>>>                   *(volatile char*) (x + (i << 20) + j);
>>>>>>>>>>              }
>>>>>>>>>>              getchar();
>>>>>>>>>>          }
>>>>>>>>>> }
>>>>>>>>> strange. we are talking about RSS size, right?
>>>>>>>> None of the three top values change, and only VIRT is >500 MB.
>>>>>>>>
>>>>>>>>> is the malloc above using mmapped memory?
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> which kernel version do you use?
>>>>>>>> 3.9.
>>>>>>>>
>>>>>>>>> what avoids allocating the memory for me is the following (with
>>>>>>>>> whatever side effects it has ;-))
>>>>>>>> This would also fail to migrate any page that is swapped out, breaking
>>>>>>>> overcommit in a more subtle way. :)
>>>>>>>>
>>>>>>>> Paolo
>>>>>>> the following does also not allocate memory, but qemu does...
>>>>>>>
>>>>>> Hi, Peter
>>>>>>      As the patch writes
>>>>>>
>>>>>> "not sending zero pages breaks migration if a page is zero
>>>>>> at the source but not at the destination."
>>>>>>
>>>>>>      I don't understand why it would be trouble, shouldn't all page
>>>>>> not received in dest be treated as zero pages?
>>>>>
>>>>>
>>>>> How would the destination guest know if some page must be cleared? The
>>>>> previous patch (which Peter reverted) did not send anything for the pages
>>>>> which were zero on the source side.
>>>>>
>>>>>
>>>>     If an page was not received and destination knows that page should
>>>> exist according to total size, fill it with zero at destination, would
>>>> it solve the problem?
>>>
>>> It is _live_ migration, the source sends changes, same pages can change and
>>> be sent several times. So we would need to turn tracking on on the
>>> destination to know if some page was received from the source or changed by
>>> the destination itself (by writing there bios/firmware images, etc) and
>>> then clear pages which were touched by the destination and were not sent by
>>> the source.
>>    OK, I can understand the problem is, for example:
>> Destination boots up with 0x0000-0xFFFF filled with bios image.
>> Source forgot to send zero pages in 0x0000-0xFFFF.
>
>
> The source did not forget, instead it zeroed these pages during its life
> and thought that they must be zeroed at the destination already (as the
> destination did not start and did not have a chance to write something there).
>
>
>> After migration destination got 0x0000-0xFFFF dirty(different with
>> source)
>
> Yep. And those pages were empty on the source what made debugging very easy :)
>
>
>>    Thanks for explain.
>>
>>    This seems refer to the migration protocol: how should the guest treat
>> unsent pages. The patch causing the problem, actually treat zero pages
>> as "not to sent" at source, but another half is missing: treat "not
>> received" as zero pages at destination. I guess if second half is added,
>> problem is gone:
>> after page transfer completed, before destination resume,
>> fill zero in "not received" pages.
>
>
>
> Make a working patch, we'll discuss it :) I do not see much acceleration
> coming from there.
>
>
   4k zero page is compressed into header: 8 bytes flag + 1 byte tail +
( 1 + strlen(idstr) when ramblock is a new one), so take 10 bytes
as average, ram:network flow is 4000:10 = 400:1
   Then for a typical 4GB guest, sending the zero pages will take about
10M network flow, indeed not much acceleration. I think current method
is already good enough, unless there are other benefits in not sending
zero pages.

-- 
Best Regards

Wenchao Xia

  reply	other threads:[~2013-06-09  3:31 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30  7:44 [Qemu-devel] broken incoming migration Alexey Kardashevskiy
2013-05-30  7:49 ` Alexey Kardashevskiy
2013-05-30  7:49 ` Paolo Bonzini
2013-05-30  8:18   ` Alexey Kardashevskiy
2013-05-30  9:08     ` Peter Lieven
2013-05-30  9:31       ` Alexey Kardashevskiy
2013-05-30 13:00       ` Paolo Bonzini
2013-05-30 13:38         ` Alexey Kardashevskiy
2013-05-30 14:08           ` Paolo Bonzini
2013-05-30 14:38         ` Peter Lieven
2013-05-30 14:41           ` Paolo Bonzini
2013-06-04 13:52             ` Peter Lieven
2013-06-04 14:14               ` Paolo Bonzini
2013-06-04 14:38                 ` Peter Lieven
2013-06-04 14:40                   ` Paolo Bonzini
2013-06-04 14:48                     ` Peter Lieven
2013-06-04 15:17                       ` Paolo Bonzini
2013-06-04 19:15                         ` Peter Lieven
2013-06-05  3:37                           ` Alexey Kardashevskiy
2013-06-05  6:09                             ` Peter Lieven
2013-06-09  4:12                               ` liu ping fan
2013-06-09  7:22                                 ` Peter Lieven
2013-06-04 15:10                     ` Peter Lieven
2013-06-08  8:27                       ` Wenchao Xia
2013-06-08  8:30                         ` Alexey Kardashevskiy
2013-06-09  2:16                           ` Wenchao Xia
2013-06-09  2:34                             ` Alexey Kardashevskiy
2013-06-09  2:52                               ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-09  3:01                                 ` Alexey Kardashevskiy
2013-06-09  3:01                               ` [Qemu-devel] " Wenchao Xia
2013-06-09  3:09                                 ` Alexey Kardashevskiy
2013-06-09  3:31                                   ` Wenchao Xia [this message]
2013-06-09  7:27                                   ` Peter Lieven
2013-06-10  6:39                                     ` Alexey Kardashevskiy
2013-06-10  6:50                                       ` Peter Lieven
2013-06-10  6:55                                         ` Alexey Kardashevskiy
2013-06-10  8:44                                           ` Peter Lieven
2013-06-10  9:10                                             ` Alexey Kardashevskiy
2013-06-10  9:33                                               ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-10  9:42                                                 ` Peter Lieven
2013-06-09  2:53                             ` Benjamin Herrenschmidt
2013-06-12 14:00                               ` Paolo Bonzini
2013-06-12 14:11                                 ` Benjamin Herrenschmidt
2013-06-12 20:10                                   ` Paolo Bonzini
2013-06-13  2:41                                     ` Wenchao Xia
2013-06-03 10:04           ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-04 10:56             ` Peter Lieven
2013-06-08  8:24         ` Wenchao Xia
2013-05-30 10:18 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B3F706.1030204@linux.vnet.ibm.com \
    --to=xiawenc@linux.vnet.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=david@gibson.dropbear.id.au \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).