From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@kamp.de>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] broken incoming migration
Date: Sun, 09 Jun 2013 10:16:44 +0800 [thread overview]
Message-ID: <51B3E58C.50301@linux.vnet.ibm.com> (raw)
In-Reply-To: <51B2EBA2.5060401@ozlabs.ru>
于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
> On 06/08/2013 06:27 PM, Wenchao Xia wrote:
>>> On 04.06.2013 16:40, Paolo Bonzini wrote:
>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto:
>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote:
>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto:
>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote:
>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>> You could also scan the page for nonzero values before writing it.
>>>>>>>>> i had this in mind, but then choosed the other approach.... turned
>>>>>>>>> out to be a bad idea.
>>>>>>>>>
>>>>>>>>> alexey: i will prepare a patch later today, could you then please
>>>>>>>>> verify it fixes your problem.
>>>>>>>>>
>>>>>>>>> paolo: would we still need the madvise or is it enough to not write
>>>>>>>>> the zeroes?
>>>>>>>> It should be enough to not write them.
>>>>>>> Problem: checking the pages for zero allocates them. even at the
>>>>>>> source.
>>>>>> It doesn't look like. I tried this program and top doesn't show an
>>>>>> increasing amount of reserved memory:
>>>>>>
>>>>>> #include <stdio.h>
>>>>>> #include <stdlib.h>
>>>>>> int main()
>>>>>> {
>>>>>> char *x = malloc(500 << 20);
>>>>>> int i, j;
>>>>>> for (i = 0; i < 500; i += 10) {
>>>>>> for (j = 0; j < 10 << 20; j += 4096) {
>>>>>> *(volatile char*) (x + (i << 20) + j);
>>>>>> }
>>>>>> getchar();
>>>>>> }
>>>>>> }
>>>>> strange. we are talking about RSS size, right?
>>>> None of the three top values change, and only VIRT is >500 MB.
>>>>
>>>>> is the malloc above using mmapped memory?
>>>> Yes.
>>>>
>>>>> which kernel version do you use?
>>>> 3.9.
>>>>
>>>>> what avoids allocating the memory for me is the following (with
>>>>> whatever side effects it has ;-))
>>>> This would also fail to migrate any page that is swapped out, breaking
>>>> overcommit in a more subtle way. :)
>>>>
>>>> Paolo
>>> the following does also not allocate memory, but qemu does...
>>>
>> Hi, Peter
>> As the patch writes
>>
>> "not sending zero pages breaks migration if a page is zero
>> at the source but not at the destination."
>>
>> I don't understand why it would be trouble, shouldn't all page
>> not received in dest be treated as zero pages?
>
>
> How would the destination guest know if some page must be cleared? The
> previous patch (which Peter reverted) did not send anything for the pages
> which were zero on the source side.
>
>
If an page was not received and destination knows that page should
exist according to total size, fill it with zero at destination, would
it solve the problem?
>
>> Also, you mean following code is from qemu and it does not allocate
>> memory with you gcc right? Maybe it is related to KVM, how about
>> turn off KVM and retry following code in qemu?
>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <assert.h>
>>> #include <unistd.h>
>>> #include <sys/resource.h>
>>> #include <inttypes.h>
>>> #include <string.h>
>>> #include <sys/mman.h>
>>> #include <errno.h>
>>>
>>> #if defined __SSE2__
>>> #include <emmintrin.h>
>>> #define VECTYPE __m128i
>>> #define SPLAT(p) _mm_set1_epi8(*(p))
>>> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
>>> 0xFFFF)
>>> #else
>>> #define VECTYPE unsigned long
>>> #define SPLAT(p) (*(p) * (~0UL / 255))
>>> #define ALL_EQ(v1, v2) ((v1) == (v2))
>>> #endif
>>>
>>> #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
>>>
>>> /* Round number down to multiple */
>>> #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
>>>
>>> /* Round number up to multiple */
>>> #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
>>>
>>> #define QEMU_VMALLOC_ALIGN (256 * 4096)
>>>
>>> /* alloc shared memory pages */
>>> void *qemu_anon_ram_alloc(size_t size)
>>> {
>>> size_t align = QEMU_VMALLOC_ALIGN;
>>> size_t total = size + align - getpagesize();
>>> void *ptr = mmap(0, total, PROT_READ | PROT_WRITE,
>>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>> size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
>>>
>>> if (ptr == MAP_FAILED) {
>>> fprintf(stderr, "Failed to allocate %zu B: %s\n",
>>> size, strerror(errno));
>>> abort();
>>> }
>>>
>>> ptr += offset;
>>> total -= offset;
>>>
>>> if (offset > 0) {
>>> munmap(ptr - offset, offset);
>>> }
>>> if (total > size) {
>>> munmap(ptr + size, total - size);
>>> }
>>>
>>> return ptr;
>>> }
>>>
>>> static inline int
>>> can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>>> {
>>> return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
>>> * sizeof(VECTYPE)) == 0
>>> && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
>>> }
>>>
>>> size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>>> {
>>> const VECTYPE *p = buf;
>>> const VECTYPE zero = (VECTYPE){0};
>>> size_t i;
>>>
>>> if (!len) {
>>> return 0;
>>> }
>>>
>>> assert(can_use_buffer_find_nonzero_offset(buf, len));
>>>
>>> for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
>>> if (!ALL_EQ(p[i], zero)) {
>>> return i * sizeof(VECTYPE);
>>> }
>>> }
>>>
>>> for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
>>> i < len / sizeof(VECTYPE);
>>> i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>>> VECTYPE tmp0 = p[i + 0] | p[i + 1];
>>> VECTYPE tmp1 = p[i + 2] | p[i + 3];
>>> VECTYPE tmp2 = p[i + 4] | p[i + 5];
>>> VECTYPE tmp3 = p[i + 6] | p[i + 7];
>>> VECTYPE tmp01 = tmp0 | tmp1;
>>> VECTYPE tmp23 = tmp2 | tmp3;
>>> if (!ALL_EQ(tmp01 | tmp23, zero)) {
>>> break;
>>> }
>>> }
>>>
>>> return i * sizeof(VECTYPE);
>>> }
>>>
>>> int main()
>>> {
>>> //char *x = malloc(1024 << 20);
>>> char *x = qemu_anon_ram_alloc(1024 << 20);
>>>
>>> int i, j;
>>> int ret = 0;
>>> struct rusage rusage;
>>> for (i = 0; i < 500; i ++) {
>>> for (j = 0; j < 10 << 20; j += 4096) {
>>> ret += buffer_find_nonzero_offset((char*) (x + (i << 20)
>>> + j), 4096);
>>> }
>>> getrusage( RUSAGE_SELF, &rusage );
>>> printf("read offset: %d kB, RSS size: %ld kB", ((i+1) << 10),
>>> rusage.ru_maxrss);
>>> getchar();
>>> }
>>> printf("%d zero pages\n", ret);
>>> }
>>>
>>
>>
>
>
--
Best Regards
Wenchao Xia
next prev parent reply other threads:[~2013-06-09 2:17 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 7:44 [Qemu-devel] broken incoming migration Alexey Kardashevskiy
2013-05-30 7:49 ` Alexey Kardashevskiy
2013-05-30 7:49 ` Paolo Bonzini
2013-05-30 8:18 ` Alexey Kardashevskiy
2013-05-30 9:08 ` Peter Lieven
2013-05-30 9:31 ` Alexey Kardashevskiy
2013-05-30 13:00 ` Paolo Bonzini
2013-05-30 13:38 ` Alexey Kardashevskiy
2013-05-30 14:08 ` Paolo Bonzini
2013-05-30 14:38 ` Peter Lieven
2013-05-30 14:41 ` Paolo Bonzini
2013-06-04 13:52 ` Peter Lieven
2013-06-04 14:14 ` Paolo Bonzini
2013-06-04 14:38 ` Peter Lieven
2013-06-04 14:40 ` Paolo Bonzini
2013-06-04 14:48 ` Peter Lieven
2013-06-04 15:17 ` Paolo Bonzini
2013-06-04 19:15 ` Peter Lieven
2013-06-05 3:37 ` Alexey Kardashevskiy
2013-06-05 6:09 ` Peter Lieven
2013-06-09 4:12 ` liu ping fan
2013-06-09 7:22 ` Peter Lieven
2013-06-04 15:10 ` Peter Lieven
2013-06-08 8:27 ` Wenchao Xia
2013-06-08 8:30 ` Alexey Kardashevskiy
2013-06-09 2:16 ` Wenchao Xia [this message]
2013-06-09 2:34 ` Alexey Kardashevskiy
2013-06-09 2:52 ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-09 3:01 ` Alexey Kardashevskiy
2013-06-09 3:01 ` [Qemu-devel] " Wenchao Xia
2013-06-09 3:09 ` Alexey Kardashevskiy
2013-06-09 3:31 ` Wenchao Xia
2013-06-09 7:27 ` Peter Lieven
2013-06-10 6:39 ` Alexey Kardashevskiy
2013-06-10 6:50 ` Peter Lieven
2013-06-10 6:55 ` Alexey Kardashevskiy
2013-06-10 8:44 ` Peter Lieven
2013-06-10 9:10 ` Alexey Kardashevskiy
2013-06-10 9:33 ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-10 9:42 ` Peter Lieven
2013-06-09 2:53 ` Benjamin Herrenschmidt
2013-06-12 14:00 ` Paolo Bonzini
2013-06-12 14:11 ` Benjamin Herrenschmidt
2013-06-12 20:10 ` Paolo Bonzini
2013-06-13 2:41 ` Wenchao Xia
2013-06-03 10:04 ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-04 10:56 ` Peter Lieven
2013-06-08 8:24 ` Wenchao Xia
2013-05-30 10:18 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B3E58C.50301@linux.vnet.ibm.com \
--to=xiawenc@linux.vnet.ibm.com \
--cc=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=pbonzini@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).