* Balloons, crash-dumps, populate-on-demand, and shared zero pages @ 2009-08-20 10:18 George Dunlap 2009-08-20 10:39 ` Steven Smith 0 siblings, 1 reply; 4+ messages in thread From: George Dunlap @ 2009-08-20 10:18 UTC (permalink / raw) To: xen-devel, Paul Durrant, Keir Fraser, Steven Smith, Gianluca Guida <gianluca.g> Paul recently pointed out that a side-effect of having the balloon driver replace guest p2m memory with empty space is that when Windows does a crash dump (perhaps Linux too), when it reaches the pages in the balloon, it will cause a page fault, which can cause cascading crashes and prevent any useful information from reaching the dump file. After thinking about it for a bit, I wondered if it might be better to replace the "populate-on-demand" concept with a "shared-zero-populate-on-demand". Reads to a PoD page would always map to a read-only shared zero page (or superpage, as the case may be). We can change the balloon driver behavior to fill the p2m entries for the balloon with zPoD entries instead of empy p2m entries. As a side-effect, the balloon driver no longer would need to explicitly fill in the p2m entries with ram when deflating the balloon; the tools already tell Xen about memory target increases, so it can increase the PoD "cache"; the balloon driver would simply need to free memory back to the kernel and it the balloon will be populated on-demand by the guest. Any thoughts? -George ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages 2009-08-20 10:18 Balloons, crash-dumps, populate-on-demand, and shared zero pages George Dunlap @ 2009-08-20 10:39 ` Steven Smith 2009-08-20 10:56 ` Paul Durrant 0 siblings, 1 reply; 4+ messages in thread From: Steven Smith @ 2009-08-20 10:39 UTC (permalink / raw) To: George Dunlap Cc: Gianluca Guida, xen-devel@lists.xensource.com, Durrant, Keir Fraser, Steven Smith [-- Attachment #1.1: Type: text/plain, Size: 2878 bytes --] > Paul recently pointed out that a side-effect of having the balloon > driver replace guest p2m memory with empty space is that when Windows > does a crash dump (perhaps Linux too), when it reaches the pages in > the balloon, it will cause a page fault, which can cause cascading > crashes and prevent any useful information from reaching the dump > file. Well, not quite. During a crash dump, the only thing Windows does with the page is write it out. If you're using PV drivers, that means you create grant references for the ballooned-out PFNs and pass them off to the backend, which tries to map them, fails, and passes an error back to the frontend. If the frontend then passes those errors back to Windows then it'll retry a couple of times, then give up and crash. It wouldn't be particularly difficult to avoid this by just masking the error from the frontend, claiming to have written the data even though the backend gave us an error. That'd mean you'd have garbage in the dump file for ballooned-out pages, but those pages probably aren't very interesting, and the rest of the dump file would be fine. This might be relevant for hibernation files, though, because Windows compresses those before writing them out, and hence has to touch them through a virtual address. At the moment, the Citrix drivers deal with this by just blocking hibernation whenever the balloon driver's active. Making ballooned out pages implicitly all-zeroes would let us turn that back on, which'd be kind of nice. I'm not sure how valuable that actually is in the real world, though: why would you hibernate a VM when you could just vm-suspend it? > After thinking about it for a bit, I wondered if it might be better to > replace the "populate-on-demand" concept with a > "shared-zero-populate-on-demand". Reads to a PoD page would always > map to a read-only shared zero page (or superpage, as the case may > be). We can change the balloon driver behavior to fill the p2m > entries for the balloon with zPoD entries instead of empy p2m entries. > As a side-effect, the balloon driver no longer would need to > explicitly fill in the p2m entries with ram when deflating the > balloon; the tools already tell Xen about memory target increases, so > it can increase the PoD "cache"; the balloon driver would simply need > to free memory back to the kernel and it the balloon will be populated > on-demand by the guest. That would make things marginally easier on the drivers, but it's at the expense of potentially more subtle errors when something goes wrong. At the moment, if the balloon driver tries to deflate the balloon too far, the populate hypercall fails and it's very obvious what's gone wrong, whereas with an implicit re-populate it'll look like everything's working fine for some time afterwards, until the guest touches too many pages and PoD kills it. Steven. [-- Attachment #1.2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages 2009-08-20 10:39 ` Steven Smith @ 2009-08-20 10:56 ` Paul Durrant 2009-08-20 12:49 ` George Dunlap 0 siblings, 1 reply; 4+ messages in thread From: Paul Durrant @ 2009-08-20 10:56 UTC (permalink / raw) To: Steven Smith Cc: George Dunlap, Gianluca Guida, xen-devel@lists.xensource.com, Keir Fraser Steven Smith wrote: > That would make things marginally easier on the drivers, but it's at > the expense of potentially more subtle errors when something goes > wrong. At the moment, if the balloon driver tries to deflate the > balloon too far, the populate hypercall fails and it's very obvious > what's gone wrong, whereas with an implicit re-populate it'll look > like everything's working fine for some time afterwards, until the > guest touches too many pages and PoD kills it. > If the balloon driver deflated too far, that would be a bug in the balloon driver, and if Windows doesn't scrub the memory when it's freed we could do that ourselves so at least PoD would kill the guest at the right juncture. Paul -- =============================== Paul Durrant, Software Engineer Citrix Systems (R&D) Ltd. First Floor, Building 101 Cambridge Science Park Milton Road Cambridge CB4 0FY United Kingdom =============================== ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: Balloons, crash-dumps, populate-on-demand, and shared zero pages 2009-08-20 10:56 ` Paul Durrant @ 2009-08-20 12:49 ` George Dunlap 0 siblings, 0 replies; 4+ messages in thread From: George Dunlap @ 2009-08-20 12:49 UTC (permalink / raw) To: Paul Durrant Cc: Steven Smith, Gianluca Guida, xen-devel@lists.xensource.com, Keir Fraser On Thu, Aug 20, 2009 at 11:56 AM, Paul Durrant<paul.durrant@citrix.com> wrote: > If the balloon driver deflated too far, that would be a bug in the balloon > driver, and if Windows doesn't scrub the memory when it's freed we could do > that ourselves so at least PoD would kill the guest at the right juncture. But is it easier to scrub memory manually before freeing, or just make a hypercall telling Xen to put zeroed pages there? I think Steven's right... we may be introducing subtle latent bugs; overall it doesn't sound like the benefit is worth the extra complexity. Steven, if we can make the PV drivers do something sensible wrt failed writes during a crash, that might be the best solution. -George ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-08-20 12:49 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-20 10:18 Balloons, crash-dumps, populate-on-demand, and shared zero pages George Dunlap 2009-08-20 10:39 ` Steven Smith 2009-08-20 10:56 ` Paul Durrant 2009-08-20 12:49 ` George Dunlap
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.