From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>, Wei Liu <wei.liu2@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Ian Campbell <ian.campbell@citrix.com>
Subject: Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
Date: Mon, 7 Sep 2015 12:08:08 +0200 [thread overview]
Message-ID: <55ED6208.2070902@suse.com> (raw)
In-Reply-To: <55ED7AD602000078000A04EF@prv-mh.provo.novell.com>
On 09/07/2015 11:53 AM, Jan Beulich wrote:
>>>> On 07.09.15 at 11:36, <wei.liu2@citrix.com> wrote:
>> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>>>>>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
>>>> The original implementation of populate_pfns didn't consider the same
>>>> pfn can be present multiple times in the array. The mechanism to prevent
>>>> populating the same pfn multiple times only worked if the recurring pfn
>>>> appeared in different batches.
>>>>
>>>> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>>>> which has several ptes pointing to same pfn, which results in an array
>>>> containing recurring pfn.
>>>
>>> Since you must have debugged this, and since the bisector appears
>>> to have fingered a patch of mine on the Linux side which triggered
>>> this, would you mind explaining this a little more? In particular I'm
>>> worried that this may point out some other bug in Linux, as in the
>>> context of the change there - dealing with the 1:1 mapping - I can't
>>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>>
>>
>> Sure. I can try to explain this as clear as possible. Note that I didn't
>> even look at Linux side changes because at that point I was sure there
>> was a bug in migration v2.
>>
>> So there is a step called normalise_page in migration v2. It's nop for
>> HVM guest. For PV guest, it only cares about page table frames. To
>> normalise a page table frame, the core idea is to replace all MFNs in
>> page tables to PFNs inside the guest.
>>
>> When restoring, there is a step called localise_page, which again is a
>> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
>> It goes through all page table frames, extract all PFNs pointed to by
>> PTEs in such frames, populate them, then reconstruct page tables.
>>
>> What I discovered is that PTEs inside one page table frame contained the
>> same PFN (something like fd42). The original implementation of toolstack
>> populate_pfns didn't consider such scenario. As for what that PFN
>> referred to, I wasn't sure and I didn't really care about that.
>
> That's unfortunate, as that's precisely the information I was after,
> since - as said - taking the repetition of the same PFN together with
> what the triggering Linux change is about, it smells like there's
> something wrong on the Linux side too. Do you at least recall how
> many times that same PFN got repeated?
The linear p2m list support introduced this behaviour. Instead of having
multiple copies of identical p2m pages (e.g. for all entries of the page
being ~0UL) only one such page is existing which is mapped multiple
times in the linear p2m list. This will happen for large regions (2 MB
aligned) of either identity mapped or invalid pfns.
In domUs we see such a scenario rather rarely as it would require either
large memory holes or large identity regions. You might have introduced
the latter.
Juergen
next prev parent reply other threads:[~2015-09-07 10:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 1/5] libxc: clearer migration v2 debug message Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 2/5] libxc: migration v2 prefix Memory -> Frames Wei Liu
2015-09-06 20:10 ` Wei Liu
2015-09-07 9:07 ` Andrew Cooper
2015-09-06 20:05 ` [PATCH for 4.6 v3 3/5] libxc: fix indentation Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns Wei Liu
2015-09-07 7:18 ` Jan Beulich
2015-09-07 9:36 ` Wei Liu
2015-09-07 9:53 ` Jan Beulich
2015-09-07 9:57 ` Wei Liu
2015-09-07 10:08 ` Juergen Gross [this message]
2015-09-07 9:59 ` David Vrabel
2015-09-07 10:07 ` Jan Beulich
2015-09-06 20:05 ` [PATCH for 4.6 v3 5/5] libxc: add assertion to avoid setting same bit more than once Wei Liu
2015-09-07 10:04 ` [PATCH for 4.6 v3 0/5] Migration v2 fix Andrew Cooper
2015-09-07 10:07 ` Wei Liu
2015-09-07 10:10 ` Andrew Cooper
2015-09-07 11:22 ` Wei Liu
2015-09-07 11:12 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55ED6208.2070902@suse.com \
--to=jgross@suse.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=ian.campbell@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).