From: Mukesh Rathor <mukesh.rathor@oracle.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: Joshua West <jwest@brandeis.edu>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
James Harper <james.harper@bendigoit.com.au>,
"Kurt C. Hackel" <kurt.hackel@oracle.com>,
"annie.li@oracle.com" <annie.li@oracle.com>,
xen-devel <xen-devel@lists.xensource.com>,
"wayne.gong@oracle.com" <wayne.gong@oracle.com>
Subject: Re: Error restoring DomU when using GPLPV
Date: Mon, 14 Sep 2009 19:25:04 -0700 [thread overview]
Message-ID: <4AAEFB00.8000909@oracle.com> (raw)
In-Reply-To: <C6C7C941.13DA6%keir.fraser@eu.citrix.com>
Ok, I've been looking at this and figured what's going on. Annie's problem
lies in not remapping the grant frames post migration. Hence the leak,
tot_pages goes up every time until migration fails. On linux, remapping
is where the frames created by restore (for heap pfn's), get freed back to
the dom heap, is what I found. So that's a fix to be made on win
pv driver side.
Now back to orig problem. As you already know, because libxc is not
skipping heap pages, tot_pages in struct domain{} temporarily goes up
by (shared-info-frame + gnt-frames) until guest remaps these pages.
Hence, migration fails if
(max_pages - tot_pages) < (shared-info-frame + gnt-frames).
Occassionally, I see tot_pages nearly same as max_pages, and I don't
know of all ways that may happen or what causes that to happen
(by default, i see tot_pages short by 21).
Anyways, of two solutions:
1. Always balloon down, shinfo+gnttab frames: This needs to be done just
once during load, right? I'm not sure how it would work tho if mem gets
ballooned up subsequently. I suppose the driver will have to intercept
every increase in reservation and balloon down everytime?
Also, balloon down during suspend call would prob be too late, right?
2. libxc fix: I wonder how much work this will be. Good thing here is,
it'll take care of both linux and PV HVM guests avoiding driver
updates in many versions, and hence appealing to us. Can we somehow
mark the frames special to be skipped? Looking at biiig xc_domain_save
function, not sure in case of HVM, how pfn_type gets set. May be before the
outer loop, it could ask hyp for all xen heap page list, but then what if a
new page gets added to the list in between.....
Also, unfortunately, the failure case is not handled properly sometimes.
If migration fails after suspend, then no way to get the guest
back. I even noticed, the guest disappeared totally from both source and
target when failed, couple times of several dozen migrations I did.
thanks,
Mukesh
Keir Fraser wrote:
> Not all those pages are special. Frames fc0xx will be ACPI tables, resident
> in ordinary guest memory pages, for example. Only the Xen-heap pages are
> special and need to be (1) skipped; or (2) unmapped by the HVMPV drivers on
> suspend; or (3) accounted for by HVMPV drivers by unmapping and freeing an
> equal number of domain-heap pages. (1) is 'nicest' but actually a bit of a
> pain to implement; (2) won't work well for live migration, where the pages
> wouldn't get unmapped by the drivers until the last round of page copying;
> and (3) was apparently tried by Annie but didn't work? I'm curious why (3)
> didn't work - I can't explain that.
>
> -- Keir
>
> On 05/09/2009 00:02, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>
>> On further debugging, it appears that the
>> p2m_size may be OK, but there's something about
>> those 24 "magic" gpfns that isn't quite right.
>>
>>> -----Original Message-----
>>> From: Dan Magenheimer
>>> Sent: Friday, September 04, 2009 3:29 PM
>>> To: Wayne Gong; Annie Li; Keir Fraser
>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>> Subject: RE: [Xen-devel] Error restoring DomU when using GPLPV
>>>
>>>
>>> I think I've tracked down the cause of this problem
>>> in the hypervisor, but am unsure how to best fix it.
>>>
>>> In tools/libxc/xc_domain_save.c, the static variable p2m_size
>>> is said to be "number of pfns this guest has (i.e. number of
>>> entries in the P2M)". But apparently p2m_size is getting
>>> set to a very large number (0x100000) regardless of the
>>> maximum psuedophysical memory for the hvm guest. As a result,
>>> some "magic" pages in the 0xf0000-0xfefff range are getting
>>> placed in the save file. But since they are not "real"
>>> pages, the restore process runs beyond the maximum number
>>> of physical pages allowed for the domain and fails.
>>> (The gpfn of the last 24 pages saved are f2020, fc000-fc012,
>>> feffb, feffc, feffd, feffe.)
>>>
>>> p2m_size is set in "save" with a call to a memory_op hypercall
>>> (XENMEM_maximum_gpfn) which for an hvm domain returns
>>> d->arch.p2m->max_mapped_pfn. I suspect that the meaning
>>> of max_mapped_pfn changed at some point to more match
>>> its name, but this changed the semantics of the hypercall
>>> as used by xc_domain_restore, resulting in this curious
>>> problem.
>>>
>>> Any thoughts on how to fix this?
>>>
>>>> -----Original Message-----
>>>> From: Annie Li
>>>> Sent: Tuesday, September 01, 2009 10:27 PM
>>>> To: Keir Fraser
>>>> Cc: Joshua West; James Harper; xen-devel@lists.xensource.com
>>>> Subject: Re: [Xen-devel] Error restoring DomU when using GPLPV
>>>>
>>>>
>>>>
>>>>> It seems this problem is connected with gnttab, not shareinfo.
>>>>> I changed some code about grant table in winpv driver (not using
>>>>> balloon down shinfo+gnttab method),
>>> save/restore/migration can work
>>>>> properly on Xen3.4 now.
>>>>>
>>>>> What i changed is winpv driver use hypercall
>>>> XENMEM_add_to_physmap to
>>>>> map corresponding grant tables which devices require, instead of
>>>>> mapping all 32 pages grant table during initialization. It seems
>>>>> those extra grant table mapping cause this problem.
>>>> Wondering whether those extra grant table mapping is the root
>>>> cause of
>>>> the migration problem? or by luck as linux PVHVM too?
>>>>
>>>> Thanks
>>>> Annie.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>
next prev parent reply other threads:[~2009-09-15 2:25 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-04 1:22 Error restoring DomU when using GPLPV James Harper
2009-08-04 1:41 ` James Harper
2009-08-04 5:30 ` James Harper
2009-08-04 6:10 ` James Harper
2009-08-04 7:58 ` James Harper
2009-08-04 8:21 ` Keir Fraser
2009-08-04 9:01 ` James Harper
2009-08-04 9:27 ` Keir Fraser
2009-08-04 9:34 ` James Harper
2009-08-04 10:28 ` Keir Fraser
2009-08-04 10:40 ` James Harper
2009-08-04 11:02 ` Keir Fraser
2009-08-04 11:34 ` James Harper
2009-08-04 13:12 ` Keir Fraser
2009-08-18 8:17 ` Pasi Kärkkäinen
2009-08-18 9:33 ` James Harper
2009-08-19 7:39 ` ANNIE LI
2009-08-19 7:52 ` Keir Fraser
2009-08-20 3:21 ` ANNIE LI
2009-09-05 4:02 ` Mukesh Rathor
2009-09-05 6:49 ` Keir Fraser
2009-08-20 8:17 ` ANNIE LI
2009-08-20 8:27 ` Keir Fraser
2009-08-20 9:42 ` James Harper
2009-08-20 10:05 ` ANNIE LI
2009-08-20 10:20 ` Keir Fraser
2009-08-20 11:55 ` ANNIE LI
2009-08-20 12:28 ` Keir Fraser
2009-08-21 4:11 ` ANNIE LI
2009-08-26 11:04 ` ANNIE LI
2009-08-27 9:28 ` ANNIE LI
2009-08-28 3:10 ` ANNIE LI
2009-09-02 4:05 ` ANNIE LI
2009-09-02 4:27 ` ANNIE LI
2009-09-04 21:28 ` Dan Magenheimer
2009-09-04 23:02 ` Dan Magenheimer
2009-09-05 6:52 ` Keir Fraser
2009-09-05 7:33 ` ANNIE LI
2009-09-15 2:25 ` Mukesh Rathor [this message]
2009-09-15 7:39 ` Keir Fraser
2009-09-15 19:14 ` Mukesh Rathor
2009-09-15 21:25 ` Keir Fraser
2009-09-15 21:29 ` Keir Fraser
2009-09-15 22:27 ` Mukesh Rathor
2009-09-16 4:37 ` ANNIE LI
2009-09-16 11:10 ` ANNIE LI
2009-09-16 12:28 ` Keir Fraser
2009-09-16 18:09 ` Dan Magenheimer
2009-09-16 20:50 ` Mukesh Rathor
2009-09-17 6:21 ` Keir Fraser
2009-09-17 15:41 ` Dan Magenheimer
2009-09-24 20:24 ` Error restoring DomU when using GPLPV / fix for GPLPV drivers Pasi Kärkkäinen
2009-10-27 20:05 ` Keith Coleman
2009-08-20 10:19 ` Error restoring DomU when using GPLPV Keir Fraser
2009-08-20 10:41 ` Keir Fraser
2009-08-04 10:39 ` James Harper
2009-08-04 9:26 ` James Harper
2009-08-25 10:02 ` Wayne Gong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AAEFB00.8000909@oracle.com \
--to=mukesh.rathor@oracle.com \
--cc=annie.li@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=james.harper@bendigoit.com.au \
--cc=jwest@brandeis.edu \
--cc=keir.fraser@eu.citrix.com \
--cc=kurt.hackel@oracle.com \
--cc=wayne.gong@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.