xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@eu.citrix.com>
To: Jan Beulich <JBeulich@novell.com>,
	Christoph Egger <Christoph.Egger@amd.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: regression from c/s 22071:c5aed2e049bc (ept: Put locks around ept_get_entry) ?
Date: Tue, 14 Dec 2010 10:47:26 +0000	[thread overview]
Message-ID: <AANLkTikB5ym067GQOj4pCJCx3egjki20Guj-_Z67Tcis@mail.gmail.com> (raw)
In-Reply-To: <4D073B3A0200007800027BDF@vpn.id2.novell.com>

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

Yes, I have to apologize: I have a queue of PoD, p2m, and ept-related
fixes that I haven't pushed to the list because:
* they require non-negligible reworking
* it's been really difficult for me to set up an OSS-based system to test them

It actually turns out that doing locking in ept_get_entry() is the
wrong thing to do anyway; it can cause the following deadlock:

p2m_change_type [grabs p2m lock] -> set_p2m_entry -> ept_set_entry ->
ept_set_middle_level -> p2m_alloc [grabs hap lock]

write cr4 -> hap_update_paging_modes [grabes hap lock] ->
hap_update_cr3 -> gfn_to_mfn -> ept_get_entry -> [grabs p2m lock]

Attached is a ported patch that removes locking in ept_get_entry(),
and implements access-once semantics for reading and writing.  This
solves the original problem (a race between reading and writing the
table) without causing deadlocks.  I haven't had a chance to test it
-- can you give it a spin?

Thanks,
 -George

On Tue, Dec 14, 2010 at 8:39 AM, Jan Beulich <JBeulich@novell.com> wrote:
> George,
>
> with ept_get_entry() calling ept_pod_check_and_populate(), which in
> turn unconditionally calls p2m_lock(), we got a report of the BUG() in
> p2m_lock() triggering (in a backport of the patch on top of 4.0.1 - the
> logic seems unchanged in -unstable, and hence the issue ought to
> similarly exist there). Wouldn't therefore ept_pod_check_and_populate(),
> only ever called from ept_get_entry(), not need to do any locking on
> its own at all?
>
> Thanks, Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #2: ept-access-once.diff --]
[-- Type: text/x-diff, Size: 5154 bytes --]

ept: Remove lock in ept_get_entry, replace with access-once semantics.

This mirrors the RVI/shadow situation, where p2m read access is lockless
because it's done in the hardware (linear map of the p2m table).

This fixes the original bug (call it bug A) without introducing bug B (a deadlock).

Bug A was caused by a race when updating p2m entries: between testing if it's valid,
and testing if it's populate-on-demand, it may have been changed from populate-on-demand to valid.

My original patch simply introduced a lock into ept_get_entry, but that
caused bug B, caused by circular locking order:
p2m_change_type [grabs p2m lock] -> set_p2m_entry -> ept_set_entry -> ept_set_middle_level -> p2m_alloc [grabs hap lock]
write cr4 -> hap_update_paging_modes [grabes hap lock] -> hap_update_cr3 -> gfn_to_mfn -> ept_get_entry -> [grabs p2m lock]

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff -r 49d2aa5cee4e xen/arch/x86/mm/hap/p2m-ept.c
--- a/xen/arch/x86/mm/hap/p2m-ept.c	Thu Dec 09 16:17:33 2010 +0000
+++ b/xen/arch/x86/mm/hap/p2m-ept.c	Tue Dec 14 10:36:43 2010 +0000
@@ -211,7 +211,7 @@
                           int next_level)
 {
     unsigned long mfn;
-    ept_entry_t *ept_entry;
+    ept_entry_t *ept_entry, e;
     u32 shift, index;
 
     shift = next_level * EPT_TABLE_ORDER;
@@ -223,9 +223,14 @@
 
     ept_entry = (*table) + index;
 
-    if ( !is_epte_present(ept_entry) )
+    /* ept_next_level() is called (sometimes) without a lock.  Read
+     * the entry once, and act on the "cached" entry after that to
+     * avoid races. */
+    e=*ept_entry;
+
+    if ( !is_epte_present(&e) )
     {
-        if ( ept_entry->avail1 == p2m_populate_on_demand )
+        if ( e.avail1 == p2m_populate_on_demand )
             return GUEST_TABLE_POD_PAGE;
 
         if ( read_only )
@@ -233,13 +238,15 @@
 
         if ( !ept_set_middle_entry(p2m, ept_entry) )
             return GUEST_TABLE_MAP_FAILED;
+        else
+            e=*ept_entry; /* Refresh */
     }
 
     /* The only time sp would be set here is if we had hit a superpage */
-    if ( is_epte_superpage(ept_entry) )
+    if ( is_epte_superpage(&e) )
         return GUEST_TABLE_SUPER_PAGE;
 
-    mfn = ept_entry->mfn;
+    mfn = e.mfn;
     unmap_domain_page(*table);
     *table = map_domain_page(mfn);
     *gfn_remainder &= (1UL << shift) - 1;
@@ -318,19 +325,24 @@
         if ( mfn_valid(mfn_x(mfn)) || direct_mmio || p2m_is_paged(p2mt) ||
              (p2mt == p2m_ram_paging_in_start) )
         {
-            ept_entry->emt = epte_get_entry_emt(p2m->domain, gfn, mfn, &ipat,
+            ept_entry_t new_entry;
+
+            /* Construct the new entry, and then write it once */
+            new_entry.emt = epte_get_entry_emt(p2m->domain, gfn, mfn, &ipat,
                                                 direct_mmio);
-            ept_entry->ipat = ipat;
-            ept_entry->sp = order ? 1 : 0;
-            ept_entry->avail1 = p2mt;
-            ept_entry->avail2 = 0;
+            new_entry.ipat = ipat;
+            new_entry.sp = order ? 1 : 0;
+            new_entry.avail1 = p2mt;
+            new_entry.avail2 = 0;
 
-            if ( ept_entry->mfn == mfn_x(mfn) )
+            if ( new_entry.mfn == mfn_x(mfn) )
                 need_modify_vtd_table = 0;
             else
-                ept_entry->mfn = mfn_x(mfn);
+                new_entry.mfn = mfn_x(mfn);
 
-            ept_p2m_type_to_flags(ept_entry, p2mt);
+            ept_p2m_type_to_flags(&new_entry, p2mt);
+
+            ept_entry->epte = new_entry.epte;
         }
         else
             ept_entry->epte = 0;
@@ -339,6 +351,7 @@
     {
         /* We need to split the original page. */
         ept_entry_t split_ept_entry;
+        ept_entry_t new_entry;
 
         ASSERT(is_epte_superpage(ept_entry));
 
@@ -365,18 +378,20 @@
 
         ept_entry = table + index;
 
-        ept_entry->emt = epte_get_entry_emt(d, gfn, mfn, &ipat, direct_mmio);
-        ept_entry->ipat = ipat;
-        ept_entry->sp = i ? 1 : 0;
-        ept_entry->avail1 = p2mt;
-        ept_entry->avail2 = 0;
+        new_entry.emt = epte_get_entry_emt(d, gfn, mfn, &ipat, direct_mmio);
+        new_entry.ipat = ipat;
+        new_entry.sp = i ? 1 : 0;
+        new_entry.avail1 = p2mt;
+        new_entry.avail2 = 0;
 
-        if ( ept_entry->mfn == mfn_x(mfn) )
+        if ( new_entry.mfn == mfn_x(mfn) )
              need_modify_vtd_table = 0;
         else /* the caller should take care of the previous page */
-            ept_entry->mfn = mfn_x(mfn);
+            new_entry.mfn = mfn_x(mfn);
 
-        ept_p2m_type_to_flags(ept_entry, p2mt);
+        ept_p2m_type_to_flags(&new_entry, p2mt);
+
+        ept_entry->epte = new_entry.epte;
     }
 
     /* Track the highest gfn for which we have ever had a valid mapping */
@@ -437,10 +452,6 @@
     int i;
     int ret = 0;
     mfn_t mfn = _mfn(INVALID_MFN);
-    int do_locking = !p2m_locked_by_me(p2m);
-
-    if ( do_locking )
-        p2m_lock(p2m);
 
     *t = p2m_mmio_dm;
 
@@ -517,8 +528,6 @@
     }
 
 out:
-    if ( do_locking )
-        p2m_unlock(p2m);
     unmap_domain_page(table);
     return mfn;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2010-12-14 10:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-14  8:39 regression from c/s 22071:c5aed2e049bc (ept: Put locks around ept_get_entry) ? Jan Beulich
2010-12-14 10:47 ` George Dunlap [this message]
2010-12-14 10:48   ` George Dunlap
2010-12-14 12:37   ` Jan Beulich
2010-12-14 14:32     ` George Dunlap
2010-12-14 14:34       ` George Dunlap
2010-12-14 14:50         ` Jan Beulich
2010-12-16 15:51   ` Jan Beulich
2010-12-16 16:12     ` Keir Fraser
2010-12-16 16:22       ` Jan Beulich
2010-12-16 16:42         ` Keir Fraser
2010-12-16 16:50           ` Jan Beulich
2010-12-16 17:03             ` Keir Fraser
2010-12-16 20:34               ` Keir Fraser
2010-12-17 11:15                 ` Tim Deegan
2010-12-20 16:24                   ` George Dunlap
2010-12-17 14:03               ` Olaf Hering
2010-12-17 14:18                 ` Keir Fraser
2010-12-16 16:59           ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikB5ym067GQOj4pCJCx3egjki20Guj-_Z67Tcis@mail.gmail.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Christoph.Egger@amd.com \
    --cc=JBeulich@novell.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).