All of lore.kernel.org
 help / color / mirror / Atom feed
* shadow OOS and fast path are incompatible
@ 2009-07-02 20:30 Frank van der Linden
  2009-07-02 21:42 ` Gianluca Guida
  0 siblings, 1 reply; 5+ messages in thread
From: Frank van der Linden @ 2009-07-02 20:30 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

We recently observed a problem with Solaris HVM domains. The bug was 
seen was seen with a higher number of VCPUs (3 or more), and always had 
the same pattern: some memory was allocated in the guest, but the first 
reference caused it to crash with a fatal pagefault. However, on 
inspection of the page tables, the guests' view of the pagetables was 
consistent: the page was present.

Disabling the out-of-sync optimization made this problem go away.

Eventually, I tracked it down to the fault fast path and the OOS code in 
sh_page_fault(). Here's what happens:

* CPU 0 has a page fault for a PTE in an OOS page that hasn't been 
synched yet
* CPU 1 has the same page fault (or at least one involving the same L1 page)
* CPU 1 enters the fast path
* CPU 0 finds the L1 page OOS and starts a resync
* CPU 1 finds it's a "special" entry (mmio or gnp)
* CPU 0 finishes resync, clears OOS flag for the L1 page
* CPU 1 finds it's not an OOS L1 page
* CPU 1 finds that the shadow L1 entry is GNP
* CPU 1 bounces fault to guest (sh_page_fault returns 0)
* guest sees an unexpected page fault

There are certainly ways to rearrange the code to avoid this particular 
scenario, but it points to a bigger issue: the fast fault path and OOS 
pages are inherently incompatible. Since the fast path works outside of 
the shadow lock, there is nothing that prevents another CPU coming in 
and changing the OOS status, re-syncing the page, etc, right under your 
nose.

Optimized operations without OOS (i.e. on a single L1 PTE) are safe in 
the fast path outside of the lock, since the guest will have the 
appropriate locking around the PTE writes. But with OOS, you're dealing 
with an entire L1 page.

I haven't checked the fast emulation path, but similar problems might be 
lurking there in combination with OOS.

I can think of some ways to fix this, but they involve locking, which 
mostly defeats the purpose of the fast fault path.

Ideas/suggestions?

- Frank

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-07-03  9:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-02 20:30 shadow OOS and fast path are incompatible Frank van der Linden
2009-07-02 21:42 ` Gianluca Guida
2009-07-03  8:50   ` Tim Deegan
2009-07-03  9:11     ` Gianluca Guida
2009-07-03  9:20     ` Gianluca Guida

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.