From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Andrea Arcangeli <aarcange@redhat.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
Amit Shah <amit.shah@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"quintela@redhat.com" <quintela@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: post-copy is broken?
Date: Wed, 20 Apr 2016 18:27:55 +0100 [thread overview]
Message-ID: <20160420172754.GJ2263@work-vm> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041813A6@shsmsx102.ccr.corp.intel.com>
Hi,
Just a follow up with a little more debug;
I modified the test so it doesn't quit after the first miscomparison (see
diff below), and looking on the failures on real hardware I've seen:
/x86_64/postcopy: Memory content inconsistency at 3800000 first_byte = 30 last_byte = 30 current = 10 hit_edge = 0
Memory content inconsistency at 38fe000 first_byte = 30 last_byte = 10 current = 30 hit_edge = 0
and then another time:
/x86_64/postcopy: Memory content inconsistency at 4c00000 first_byte = 9a last_byte = 99 current = 1 hit_edge = 1
Memory content inconsistency at 4cec000 first_byte = 9a last_byte = 1 current = 99 hit_edge = 1
so in both cases what we're seeing there is starting on a 2M page boundary, a page
that is read on the destination as zero instead of getting the migrated value -
but somewhere later in the page it starts behaving. (in the first example the counter
had reached 0x30 - except for those pages which hadn't been transferred where
the counter is much lower at 0x10).
Testing it in my VM, I added some debug for where I'd been doing an madvise DONTNEED
previously:
ram_discard_range: pc.ram:0xf51000 for 42094592
ram_discard_range: pc.ram:0x5259000 for 18509824
Memory content inconsistency at f51000 first_byte = 6d last_byte = 6d current = 9e hit_edge = 0
Memory content inconsistency at 1000000 first_byte = 6d last_byte = 9e current = 6d hit_edge = 0
So that's saying that from f51000..1000000 it was wrong - so not just one page, but upto the THP edge.
(It then got back to the right value - 6d - on the page edge). Note how the start corresponds
to the address I'd previously done a discard on, but not the whole discard range - just
upto the THP page boundary. Nothing in my userspace code knows about THP
(other than turning it off).
Dave
@@ -251,6 +251,7 @@ static void check_guests_ram(void)
uint8_t first_byte;
uint8_t last_byte;
bool hit_edge = false;
+ bool bad = false;
qtest_memread(global_qtest, start_address, &first_byte, 1);
last_byte = first_byte;
@@ -271,11 +272,12 @@ static void check_guests_ram(void)
" first_byte = %x last_byte = %x current = %x"
" hit_edge = %x\n",
address, first_byte, last_byte, b, hit_edge);
- assert(0);
+ bad = true;
}
}
last_byte = b;
}
+ assert(!bad);
fprintf(stderr, "first_byte = %x last_byte = %x hit_edge = %x OK\n",
first_byte, last_byte, hit_edge);
}
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Andrea Arcangeli <aarcange@redhat.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
Amit Shah <amit.shah@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"quintela@redhat.com" <quintela@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [Qemu-devel] post-copy is broken?
Date: Wed, 20 Apr 2016 18:27:55 +0100 [thread overview]
Message-ID: <20160420172754.GJ2263@work-vm> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041813A6@shsmsx102.ccr.corp.intel.com>
Hi,
Just a follow up with a little more debug;
I modified the test so it doesn't quit after the first miscomparison (see
diff below), and looking on the failures on real hardware I've seen:
/x86_64/postcopy: Memory content inconsistency at 3800000 first_byte = 30 last_byte = 30 current = 10 hit_edge = 0
Memory content inconsistency at 38fe000 first_byte = 30 last_byte = 10 current = 30 hit_edge = 0
and then another time:
/x86_64/postcopy: Memory content inconsistency at 4c00000 first_byte = 9a last_byte = 99 current = 1 hit_edge = 1
Memory content inconsistency at 4cec000 first_byte = 9a last_byte = 1 current = 99 hit_edge = 1
so in both cases what we're seeing there is starting on a 2M page boundary, a page
that is read on the destination as zero instead of getting the migrated value -
but somewhere later in the page it starts behaving. (in the first example the counter
had reached 0x30 - except for those pages which hadn't been transferred where
the counter is much lower at 0x10).
Testing it in my VM, I added some debug for where I'd been doing an madvise DONTNEED
previously:
ram_discard_range: pc.ram:0xf51000 for 42094592
ram_discard_range: pc.ram:0x5259000 for 18509824
Memory content inconsistency at f51000 first_byte = 6d last_byte = 6d current = 9e hit_edge = 0
Memory content inconsistency at 1000000 first_byte = 6d last_byte = 9e current = 6d hit_edge = 0
So that's saying that from f51000..1000000 it was wrong - so not just one page, but upto the THP edge.
(It then got back to the right value - 6d - on the page edge). Note how the start corresponds
to the address I'd previously done a discard on, but not the whole discard range - just
upto the THP page boundary. Nothing in my userspace code knows about THP
(other than turning it off).
Dave
@@ -251,6 +251,7 @@ static void check_guests_ram(void)
uint8_t first_byte;
uint8_t last_byte;
bool hit_edge = false;
+ bool bad = false;
qtest_memread(global_qtest, start_address, &first_byte, 1);
last_byte = first_byte;
@@ -271,11 +272,12 @@ static void check_guests_ram(void)
" first_byte = %x last_byte = %x current = %x"
" hit_edge = %x\n",
address, first_byte, last_byte, b, hit_edge);
- assert(0);
+ bad = true;
}
}
last_byte = b;
}
+ assert(!bad);
fprintf(stderr, "first_byte = %x last_byte = %x hit_edge = %x OK\n",
first_byte, last_byte, hit_edge);
}
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2016-04-20 17:28 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-12 9:23 [Qemu-devel] post-copy is broken? Li, Liang Z
2016-04-12 9:37 ` Dr. David Alan Gilbert
2016-04-12 17:55 ` Dr. David Alan Gilbert
2016-04-13 0:43 ` Li, Liang Z
2016-04-13 2:28 ` Li, Liang Z
2016-04-13 8:06 ` Dr. David Alan Gilbert
2016-04-13 11:41 ` Dr. David Alan Gilbert
2016-04-13 12:50 ` Dr. David Alan Gilbert
2016-04-13 20:51 ` Andrea Arcangeli
2016-04-14 10:13 ` Dr. David Alan Gilbert
2016-04-14 12:34 ` Dr. David Alan Gilbert
2016-04-14 16:22 ` Andrea Arcangeli
2016-04-14 16:22 ` [Qemu-devel] " Andrea Arcangeli
2016-04-15 12:52 ` Kirill A. Shutemov
2016-04-15 12:52 ` [Qemu-devel] " Kirill A. Shutemov
2016-04-15 13:42 ` Dr. David Alan Gilbert
2016-04-15 13:42 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-15 15:23 ` Kirill A. Shutemov
2016-04-15 15:23 ` [Qemu-devel] " Kirill A. Shutemov
2016-04-15 16:34 ` Dr. David Alan Gilbert
2016-04-15 16:34 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-18 9:50 ` Li, Liang Z
2016-04-18 9:50 ` [Qemu-devel] " Li, Liang Z
2016-04-18 9:55 ` Dr. David Alan Gilbert
2016-04-18 9:55 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-18 10:06 ` Li, Liang Z
2016-04-18 10:06 ` [Qemu-devel] " Li, Liang Z
2016-04-18 10:15 ` Dr. David Alan Gilbert
2016-04-18 10:15 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-18 10:33 ` Li, Liang Z
2016-04-18 10:33 ` [Qemu-devel] " Li, Liang Z
2016-04-18 13:23 ` Dr. David Alan Gilbert
2016-04-18 13:23 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-18 17:18 ` Dr. David Alan Gilbert
2016-04-18 17:18 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-20 17:27 ` Dr. David Alan Gilbert [this message]
2016-04-20 17:27 ` Dr. David Alan Gilbert
2016-04-21 19:21 ` Dr. David Alan Gilbert
2016-04-21 19:21 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-27 14:47 ` Andrea Arcangeli
2016-04-27 14:47 ` [Qemu-devel] " Andrea Arcangeli
2016-04-28 2:59 ` Li, Liang Z
2016-04-28 2:59 ` [Qemu-devel] " Li, Liang Z
2016-04-28 8:03 ` Dr. David Alan Gilbert
2016-04-28 8:03 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-15 22:19 ` Andrea Arcangeli
2016-04-15 22:19 ` [Qemu-devel] " Andrea Arcangeli
2016-04-18 9:40 ` Dr. David Alan Gilbert
2016-04-18 9:40 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-18 9:58 ` Li, Liang Z
2016-04-18 9:58 ` [Qemu-devel] " Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160420172754.GJ2263@work-vm \
--to=dgilbert@redhat.com \
--cc=aarcange@redhat.com \
--cc=amit.shah@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=liang.z.li@intel.com \
--cc=linux-mm@kvack.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.