From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Andrea Arcangeli <aarcange@redhat.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
Amit Shah <amit.shah@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"quintela@redhat.com" <quintela@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: post-copy is broken?
Date: Wed, 20 Apr 2016 18:27:55 +0100 [thread overview]
Message-ID: <20160420172754.GJ2263@work-vm> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041813A6@shsmsx102.ccr.corp.intel.com>
Hi,
Just a follow up with a little more debug;
I modified the test so it doesn't quit after the first miscomparison (see
diff below), and looking on the failures on real hardware I've seen:
/x86_64/postcopy: Memory content inconsistency at 3800000 first_byte = 30 last_byte = 30 current = 10 hit_edge = 0
Memory content inconsistency at 38fe000 first_byte = 30 last_byte = 10 current = 30 hit_edge = 0
and then another time:
/x86_64/postcopy: Memory content inconsistency at 4c00000 first_byte = 9a last_byte = 99 current = 1 hit_edge = 1
Memory content inconsistency at 4cec000 first_byte = 9a last_byte = 1 current = 99 hit_edge = 1
so in both cases what we're seeing there is starting on a 2M page boundary, a page
that is read on the destination as zero instead of getting the migrated value -
but somewhere later in the page it starts behaving. (in the first example the counter
had reached 0x30 - except for those pages which hadn't been transferred where
the counter is much lower at 0x10).
Testing it in my VM, I added some debug for where I'd been doing an madvise DONTNEED
previously:
ram_discard_range: pc.ram:0xf51000 for 42094592
ram_discard_range: pc.ram:0x5259000 for 18509824
Memory content inconsistency at f51000 first_byte = 6d last_byte = 6d current = 9e hit_edge = 0
Memory content inconsistency at 1000000 first_byte = 6d last_byte = 9e current = 6d hit_edge = 0
So that's saying that from f51000..1000000 it was wrong - so not just one page, but upto the THP edge.
(It then got back to the right value - 6d - on the page edge). Note how the start corresponds
to the address I'd previously done a discard on, but not the whole discard range - just
upto the THP page boundary. Nothing in my userspace code knows about THP
(other than turning it off).
Dave
@@ -251,6 +251,7 @@ static void check_guests_ram(void)
uint8_t first_byte;
uint8_t last_byte;
bool hit_edge = false;
+ bool bad = false;
qtest_memread(global_qtest, start_address, &first_byte, 1);
last_byte = first_byte;
@@ -271,11 +272,12 @@ static void check_guests_ram(void)
" first_byte = %x last_byte = %x current = %x"
" hit_edge = %x\n",
address, first_byte, last_byte, b, hit_edge);
- assert(0);
+ bad = true;
}
}
last_byte = b;
}
+ assert(!bad);
fprintf(stderr, "first_byte = %x last_byte = %x hit_edge = %x OK\n",
first_byte, last_byte, hit_edge);
}
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-20 17:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <F2CBF3009FA73547804AE4C663CAB28E0417E6B1@shsmsx102.ccr.corp.intel.com>
[not found] ` <20160412175501.GB6415@work-vm>
[not found] ` <F2CBF3009FA73547804AE4C663CAB28E0417EE92@shsmsx102.ccr.corp.intel.com>
[not found] ` <F2CBF3009FA73547804AE4C663CAB28E0417EEE4@shsmsx102.ccr.corp.intel.com>
[not found] ` <20160413080545.GA2270@work-vm>
[not found] ` <20160413114103.GB2270@work-vm>
[not found] ` <20160413125053.GC2270@work-vm>
[not found] ` <20160413205132.GG26364@redhat.com>
[not found] ` <20160414123441.GF2252@work-vm>
2016-04-14 16:22 ` post-copy is broken? Andrea Arcangeli
2016-04-15 12:52 ` Kirill A. Shutemov
2016-04-15 13:42 ` Dr. David Alan Gilbert
2016-04-15 15:23 ` Kirill A. Shutemov
2016-04-15 16:34 ` Dr. David Alan Gilbert
2016-04-18 9:50 ` Li, Liang Z
2016-04-18 9:55 ` Dr. David Alan Gilbert
2016-04-18 10:06 ` Li, Liang Z
2016-04-18 10:15 ` Dr. David Alan Gilbert
2016-04-18 10:33 ` Li, Liang Z
2016-04-18 13:23 ` Dr. David Alan Gilbert
2016-04-18 17:18 ` Dr. David Alan Gilbert
2016-04-20 17:27 ` Dr. David Alan Gilbert [this message]
2016-04-21 19:21 ` Dr. David Alan Gilbert
2016-04-27 14:47 ` Andrea Arcangeli
2016-04-28 2:59 ` Li, Liang Z
2016-04-28 8:03 ` Dr. David Alan Gilbert
2016-04-15 22:19 ` Andrea Arcangeli
2016-04-18 9:40 ` Dr. David Alan Gilbert
2016-04-18 9:58 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160420172754.GJ2263@work-vm \
--to=dgilbert@redhat.com \
--cc=aarcange@redhat.com \
--cc=amit.shah@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=liang.z.li@intel.com \
--cc=linux-mm@kvack.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).