* Balloons, crash-dumps, populate-on-demand, and shared zero pages
@ 2009-08-20 10:18 George Dunlap
2009-08-20 10:39 ` Steven Smith
0 siblings, 1 reply; 4+ messages in thread
From: George Dunlap @ 2009-08-20 10:18 UTC (permalink / raw)
To: xen-devel, Paul Durrant, Keir Fraser, Steven Smith,
Gianluca Guida <gianluca.g>
Paul recently pointed out that a side-effect of having the balloon
driver replace guest p2m memory with empty space is that when Windows
does a crash dump (perhaps Linux too), when it reaches the pages in
the balloon, it will cause a page fault, which can cause cascading
crashes and prevent any useful information from reaching the dump
file.
After thinking about it for a bit, I wondered if it might be better to
replace the "populate-on-demand" concept with a
"shared-zero-populate-on-demand". Reads to a PoD page would always
map to a read-only shared zero page (or superpage, as the case may
be). We can change the balloon driver behavior to fill the p2m
entries for the balloon with zPoD entries instead of empy p2m entries.
As a side-effect, the balloon driver no longer would need to
explicitly fill in the p2m entries with ram when deflating the
balloon; the tools already tell Xen about memory target increases, so
it can increase the PoD "cache"; the balloon driver would simply need
to free memory back to the kernel and it the balloon will be populated
on-demand by the guest.
Any thoughts?
-George
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages
2009-08-20 10:18 Balloons, crash-dumps, populate-on-demand, and shared zero pages George Dunlap
@ 2009-08-20 10:39 ` Steven Smith
2009-08-20 10:56 ` Paul Durrant
0 siblings, 1 reply; 4+ messages in thread
From: Steven Smith @ 2009-08-20 10:39 UTC (permalink / raw)
To: George Dunlap
Cc: Gianluca Guida, xen-devel@lists.xensource.com, Durrant,
Keir Fraser, Steven Smith
[-- Attachment #1.1: Type: text/plain, Size: 2878 bytes --]
> Paul recently pointed out that a side-effect of having the balloon
> driver replace guest p2m memory with empty space is that when Windows
> does a crash dump (perhaps Linux too), when it reaches the pages in
> the balloon, it will cause a page fault, which can cause cascading
> crashes and prevent any useful information from reaching the dump
> file.
Well, not quite. During a crash dump, the only thing Windows does
with the page is write it out. If you're using PV drivers, that means
you create grant references for the ballooned-out PFNs and pass them
off to the backend, which tries to map them, fails, and passes an
error back to the frontend. If the frontend then passes those errors
back to Windows then it'll retry a couple of times, then give up and
crash. It wouldn't be particularly difficult to avoid this by just
masking the error from the frontend, claiming to have written the data
even though the backend gave us an error. That'd mean you'd have
garbage in the dump file for ballooned-out pages, but those pages
probably aren't very interesting, and the rest of the dump file would
be fine.
This might be relevant for hibernation files, though, because Windows
compresses those before writing them out, and hence has to touch them
through a virtual address. At the moment, the Citrix drivers deal
with this by just blocking hibernation whenever the balloon driver's
active. Making ballooned out pages implicitly all-zeroes would let us
turn that back on, which'd be kind of nice. I'm not sure how valuable
that actually is in the real world, though: why would you hibernate a
VM when you could just vm-suspend it?
> After thinking about it for a bit, I wondered if it might be better to
> replace the "populate-on-demand" concept with a
> "shared-zero-populate-on-demand". Reads to a PoD page would always
> map to a read-only shared zero page (or superpage, as the case may
> be). We can change the balloon driver behavior to fill the p2m
> entries for the balloon with zPoD entries instead of empy p2m entries.
> As a side-effect, the balloon driver no longer would need to
> explicitly fill in the p2m entries with ram when deflating the
> balloon; the tools already tell Xen about memory target increases, so
> it can increase the PoD "cache"; the balloon driver would simply need
> to free memory back to the kernel and it the balloon will be populated
> on-demand by the guest.
That would make things marginally easier on the drivers, but it's at
the expense of potentially more subtle errors when something goes
wrong. At the moment, if the balloon driver tries to deflate the
balloon too far, the populate hypercall fails and it's very obvious
what's gone wrong, whereas with an implicit re-populate it'll look
like everything's working fine for some time afterwards, until the
guest touches too many pages and PoD kills it.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages
2009-08-20 10:39 ` Steven Smith
@ 2009-08-20 10:56 ` Paul Durrant
2009-08-20 12:49 ` George Dunlap
0 siblings, 1 reply; 4+ messages in thread
From: Paul Durrant @ 2009-08-20 10:56 UTC (permalink / raw)
To: Steven Smith
Cc: George Dunlap, Gianluca Guida, xen-devel@lists.xensource.com,
Keir Fraser
Steven Smith wrote:
> That would make things marginally easier on the drivers, but it's at
> the expense of potentially more subtle errors when something goes
> wrong. At the moment, if the balloon driver tries to deflate the
> balloon too far, the populate hypercall fails and it's very obvious
> what's gone wrong, whereas with an implicit re-populate it'll look
> like everything's working fine for some time afterwards, until the
> guest touches too many pages and PoD kills it.
>
If the balloon driver deflated too far, that would be a bug in the
balloon driver, and if Windows doesn't scrub the memory when it's freed
we could do that ourselves so at least PoD would kill the guest at the
right juncture.
Paul
--
===============================
Paul Durrant, Software Engineer
Citrix Systems (R&D) Ltd.
First Floor, Building 101
Cambridge Science Park
Milton Road
Cambridge CB4 0FY
United Kingdom
===============================
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages
2009-08-20 10:56 ` Paul Durrant
@ 2009-08-20 12:49 ` George Dunlap
0 siblings, 0 replies; 4+ messages in thread
From: George Dunlap @ 2009-08-20 12:49 UTC (permalink / raw)
To: Paul Durrant
Cc: Steven Smith, Gianluca Guida, xen-devel@lists.xensource.com,
Keir Fraser
On Thu, Aug 20, 2009 at 11:56 AM, Paul Durrant<paul.durrant@citrix.com> wrote:
> If the balloon driver deflated too far, that would be a bug in the balloon
> driver, and if Windows doesn't scrub the memory when it's freed we could do
> that ourselves so at least PoD would kill the guest at the right juncture.
But is it easier to scrub memory manually before freeing, or just make
a hypercall telling Xen to put zeroed pages there?
I think Steven's right... we may be introducing subtle latent bugs;
overall it doesn't sound like the benefit is worth the extra
complexity.
Steven, if we can make the PV drivers do something sensible wrt failed
writes during a crash, that might be the best solution.
-George
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-08-20 12:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-20 10:18 Balloons, crash-dumps, populate-on-demand, and shared zero pages George Dunlap
2009-08-20 10:39 ` Steven Smith
2009-08-20 10:56 ` Paul Durrant
2009-08-20 12:49 ` George Dunlap
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.