From: Mukesh Rathor <mukesh.rathor@oracle.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: Joshua West <jwest@brandeis.edu>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
James Harper <james.harper@bendigoit.com.au>,
"Kurt C. Hackel" <kurt.hackel@oracle.com>,
"annie.li@oracle.com" <annie.li@oracle.com>,
xen-devel <xen-devel@lists.xensource.com>,
"wayne.gong@oracle.com" <wayne.gong@oracle.com>
Subject: Re: Error restoring DomU when using GPLPV
Date: Tue, 15 Sep 2009 12:14:20 -0700 [thread overview]
Message-ID: <4AAFE78C.4000608@oracle.com> (raw)
In-Reply-To: <C6D50338.14B3E%keir.fraser@eu.citrix.com>
Keir Fraser wrote:
> On 15/09/2009 03:25, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
>
>> Ok, I've been looking at this and figured what's going on. Annie's problem
>> lies in not remapping the grant frames post migration. Hence the leak,
>> tot_pages goes up every time until migration fails. On linux, remapping
>> is where the frames created by restore (for heap pfn's), get freed back to
>> the dom heap, is what I found. So that's a fix to be made on win
>> pv driver side.
>
> Although obviosuly that is a bug, I'm not sure why it would cause this
> particular issue? The domheap pages do not get freed and replaced with
> xenheap pages, but why does that affect the next save/restore cycle? After
> all, xc_domain_save does not distinguish between Xenheap and domheap pages?
xc_domain_save doesn't distinguish is actually the problem, as
xc_domain_restore then backs xenheap pfn's for shinfo/gnt frames with dom
heap pages. These dom heap pages do get freed and replaced by xenheap pages
on target host (upon guest remap in gnttab_map()) in following code:
arch_memory_op():
/* Remove previously mapped page if it was present. */
prev_mfn = gmfn_to_mfn(d, xatp.gpfn);
if ( mfn_valid(prev_mfn) )
{
.....
guest_remove_page(d, xatp.gpfn); <=======
}
Eg. my guest with 128M gets created with tot_pages=0x83eb
max_pages:0x8400. Now xc_domain_save saves all, 0x83eb+shinfo+gnt
frames(2), so I see tot_pages on target go upto 0x83ee. Now, guest
remaps() shinfo and gnt frames. The dom heap pages are returned in
guest_remove_page(), tot_pages goes back to 0x83eb. In Annie's case,
driver forgets to remap the 2 gnt frames, so dom heap pages are wrongly
mapped and tot_pages remains at 0x83ed, and after few more when it reaches
0x83ff, migration fails as save is not be able to create
0x83ff+shinfo+gntframes temporarily, max_page being 0x8400.
Hope that makes sense.
>> 1. Always balloon down, shinfo+gnttab frames: This needs to be done just
>> once during load, right? I'm not sure how it would work tho if mem gets
>> ballooned up subsequently. I suppose the driver will have to intercept
>> every increase in reservation and balloon down everytime?
>
> Well, it is the same driver that is doing the ballooning, so it's kind of
> easy to intercept, right? Just need to track how many Xenheap pages are
> mapped and maintain that amount of 'balloon down'.
Yup, that's what I thought, but just wanted to make sure.
>> Also, balloon down during suspend call would prob be too late, right?
>
> Indeed it would. Need to do it during boot. It's only a few pages though, so
> noone will miss them.
>
>> 2. libxc fix: I wonder how much work this will be. Good thing here is,
>> it'll take care of both linux and PV HVM guests avoiding driver
>> updates in many versions, and hence appealing to us. Can we somehow
>> mark the frames special to be skipped? Looking at biiig xc_domain_save
>> function, not sure in case of HVM, how pfn_type gets set. May be before
>> the
>> outer loop, it could ask hyp for all xen heap page list, but then what if
>> a
>> new page gets added to the list in between.....
>
> It's a pain. Pfn_type[] I think doesn't really get used. Xc_domain_save()
> just tries to map PFNs and saves all the ones it successfully maps. So the
> problem is it is allowed to map Xenheap pages. But we can't always disallow
> that because sometimes the tools have good reason to map Xenheap pages. So
> we'd need a new hypercall, or a flag, or something, and that would need dom0
> kernel changes as well as Xen and toolstack changes. So it's rather a pain.
Ok got it, I think driver change is the way to go.
>> Also, unfortunately, the failure case is not handled properly sometimes.
>> If migration fails after suspend, then no way to get the guest
>> back. I even noticed, the guest disappeared totally from both source and
>> target when failed, couple times of several dozen migrations I did.
>
> That shouldn't happen since there is a mechanism to cancel the suspension of
> a suspended guest. Possibly xend doesn't get it right every time, as it's
> error handling is pretty poor in general. I trust the underlying mechanisms
> below xend pretty well however.
> -- Keir
thanks a lot,
Mukesh
next prev parent reply other threads:[~2009-09-15 19:14 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-04 1:22 Error restoring DomU when using GPLPV James Harper
2009-08-04 1:41 ` James Harper
2009-08-04 5:30 ` James Harper
2009-08-04 6:10 ` James Harper
2009-08-04 7:58 ` James Harper
2009-08-04 8:21 ` Keir Fraser
2009-08-04 9:01 ` James Harper
2009-08-04 9:27 ` Keir Fraser
2009-08-04 9:34 ` James Harper
2009-08-04 10:28 ` Keir Fraser
2009-08-04 10:40 ` James Harper
2009-08-04 11:02 ` Keir Fraser
2009-08-04 11:34 ` James Harper
2009-08-04 13:12 ` Keir Fraser
2009-08-18 8:17 ` Pasi Kärkkäinen
2009-08-18 9:33 ` James Harper
2009-08-19 7:39 ` ANNIE LI
2009-08-19 7:52 ` Keir Fraser
2009-08-20 3:21 ` ANNIE LI
2009-09-05 4:02 ` Mukesh Rathor
2009-09-05 6:49 ` Keir Fraser
2009-08-20 8:17 ` ANNIE LI
2009-08-20 8:27 ` Keir Fraser
2009-08-20 9:42 ` James Harper
2009-08-20 10:05 ` ANNIE LI
2009-08-20 10:20 ` Keir Fraser
2009-08-20 11:55 ` ANNIE LI
2009-08-20 12:28 ` Keir Fraser
2009-08-21 4:11 ` ANNIE LI
2009-08-26 11:04 ` ANNIE LI
2009-08-27 9:28 ` ANNIE LI
2009-08-28 3:10 ` ANNIE LI
2009-09-02 4:05 ` ANNIE LI
2009-09-02 4:27 ` ANNIE LI
2009-09-04 21:28 ` Dan Magenheimer
2009-09-04 23:02 ` Dan Magenheimer
2009-09-05 6:52 ` Keir Fraser
2009-09-05 7:33 ` ANNIE LI
2009-09-15 2:25 ` Mukesh Rathor
2009-09-15 7:39 ` Keir Fraser
2009-09-15 19:14 ` Mukesh Rathor [this message]
2009-09-15 21:25 ` Keir Fraser
2009-09-15 21:29 ` Keir Fraser
2009-09-15 22:27 ` Mukesh Rathor
2009-09-16 4:37 ` ANNIE LI
2009-09-16 11:10 ` ANNIE LI
2009-09-16 12:28 ` Keir Fraser
2009-09-16 18:09 ` Dan Magenheimer
2009-09-16 20:50 ` Mukesh Rathor
2009-09-17 6:21 ` Keir Fraser
2009-09-17 15:41 ` Dan Magenheimer
2009-09-24 20:24 ` Error restoring DomU when using GPLPV / fix for GPLPV drivers Pasi Kärkkäinen
2009-10-27 20:05 ` Keith Coleman
2009-08-20 10:19 ` Error restoring DomU when using GPLPV Keir Fraser
2009-08-20 10:41 ` Keir Fraser
2009-08-04 10:39 ` James Harper
2009-08-04 9:26 ` James Harper
2009-08-25 10:02 ` Wayne Gong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AAFE78C.4000608@oracle.com \
--to=mukesh.rathor@oracle.com \
--cc=annie.li@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=james.harper@bendigoit.com.au \
--cc=jwest@brandeis.edu \
--cc=keir.fraser@eu.citrix.com \
--cc=kurt.hackel@oracle.com \
--cc=wayne.gong@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.