[PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory
@ 2010-08-20 15:57 Tim Deegan
  2010-08-20 15:58 ` [PATCH 1 of 4] x86 shadow: for multi-page shadows, explicitly track the first page Tim Deegan
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Tim Deegan @ 2010-08-20 15:57 UTC (permalink / raw)
  To: xen-devel

This series of patches removes the need for shadow pagetable memory to
be allocated in 4-page contiguous blocks, by reusing the page_info 
list header for yet one more thing. 

It fixes a long-standing issue where on a fairly full machine which has
seen a lot of ballooning HVM domain creation can fail because the 
remaining memory is too fragmented to use for shadows.

Posting as an RFC for now because I haven't had a chance to do any
heavy testing (compile tests under 32-bit WinXP seem fine though) 
and I'm away for the next week.  I hope to fold in any feedback 
and commit this change the week after that.

Cheers,

Tim.

5 files changed, 332 insertions(+), 301 deletions(-)
xen/arch/x86/mm/shadow/common.c  |  345 +++++++++++++-------------------------
xen/arch/x86/mm/shadow/multi.c   |  143 +++++++++------
xen/arch/x86/mm/shadow/private.h |  117 +++++++++++-
xen/include/asm-x86/domain.h     |    3 
xen/include/asm-x86/mm.h         |   25 +-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1 of 4] x86 shadow: for multi-page shadows, explicitly track the first page
  2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
@ 2010-08-20 15:58 ` Tim Deegan
  2010-08-20 15:58 ` [PATCH 2 of 4] x86 shadow: explicitly link the pages of multipage shadows Tim Deegan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Tim Deegan @ 2010-08-20 15:58 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

x86 shadow: for multi-page shadows, explicitly track the first page
(where the refcounts are) and check that none of the routines
that do refcounting ever see the second, third or fourth page.
This is just stating and enforcing an existing implicit requirement.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>


4 files changed, 47 insertions(+), 14 deletions(-)
xen/arch/x86/mm/shadow/common.c  |    8 +++++-
xen/arch/x86/mm/shadow/multi.c   |   48 ++++++++++++++++++++++++++++----------
xen/arch/x86/mm/shadow/private.h |    2 +
xen/include/asm-x86/mm.h         |    3 +-



[-- Attachment #2: xen-unstable.hg-4.patch --]
[-- Type: text/x-patch, Size: 11886 bytes --]

# HG changeset patch
# User Tim Deegan <Tim.Deegan@citrix.com>
# Date 1282319532 -3600
# Node ID 66abfa6bc671b9b67c1fd729ddb9292c969d6ca2
# Parent  f68726cdf357f8948700a56e96d4c10e3131bce2
x86 shadow: for multi-page shadows, explicitly track the first page.

x86 shadow: for multi-page shadows, explicitly track the first page
(where the refcounts are) and check that none of the routines
that do refcounting ever see the second, third or fourth page.
This is just stating and enforcing an existing implicit requirement.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r f68726cdf357 -r 66abfa6bc671 xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Thu Aug 19 18:24:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:12 2010 +0100
@@ -1593,6 +1593,9 @@ mfn_t shadow_alloc(struct domain *d,
         sp[i].u.sh.type = shadow_type;
         sp[i].u.sh.pinned = 0;
         sp[i].u.sh.count = 0;
+        sp[i].u.sh.head = ( shadow_type >= SH_type_min_shadow 
+                            && shadow_type <= SH_type_max_shadow 
+                            && i == 0 );
         sp[i].v.sh.back = backpointer;
         set_next_shadow(&sp[i], NULL);
         perfc_incr(shadow_alloc_count);
@@ -1616,6 +1619,7 @@ void shadow_free(struct domain *d, mfn_t
     shadow_type = sp->u.sh.type;
     ASSERT(shadow_type != SH_type_none);
     ASSERT(shadow_type != SH_type_p2m_table);
+    ASSERT(sp->u.sh.head || (shadow_type > SH_type_max_shadow));
     order = shadow_order(shadow_type);
 
     d->arch.paging.shadow.free_pages += 1 << order;
@@ -1637,7 +1641,7 @@ void shadow_free(struct domain *d, mfn_t
         }
 #endif
         /* Strip out the type: this is now a free shadow page */
-        sp[i].u.sh.type = 0;
+        sp[i].u.sh.type = sp[i].u.sh.head = 0;
         /* Remember the TLB timestamp so we will know whether to flush 
          * TLBs when we reuse the page.  Because the destructors leave the
          * contents of the pages in place, we can delay TLB flushes until
@@ -1941,6 +1945,8 @@ static void sh_hash_audit_bucket(struct 
         /* Bogus type? */
         BUG_ON( sp->u.sh.type == 0 );
         BUG_ON( sp->u.sh.type > SH_type_max_shadow );
+        /* Wrong page of a multi-page shadow? */
+        BUG_ON( !sp->u.sh.head );
         /* Wrong bucket? */
         BUG_ON( sh_hash(__backpointer(sp), sp->u.sh.type) != bucket );
         /* Duplicate entry? */
diff -r f68726cdf357 -r 66abfa6bc671 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Thu Aug 19 18:24:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:12 2010 +0100
@@ -94,6 +94,7 @@ get_fl1_shadow_status(struct vcpu *v, gf
 /* Look for FL1 shadows in the hash table */
 {
     mfn_t smfn = shadow_hash_lookup(v, gfn_x(gfn), SH_type_fl1_shadow);
+    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
     return smfn;
 }
 
@@ -102,6 +103,7 @@ get_shadow_status(struct vcpu *v, mfn_t 
 /* Look for shadows in the hash table */
 {
     mfn_t smfn = shadow_hash_lookup(v, mfn_x(gmfn), shadow_type);
+    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
     perfc_incr(shadow_get_shadow_status);
     return smfn;
 }
@@ -113,6 +115,7 @@ set_fl1_shadow_status(struct vcpu *v, gf
     SHADOW_PRINTK("gfn=%"SH_PRI_gfn", type=%08x, smfn=%05lx\n",
                    gfn_x(gfn), SH_type_fl1_shadow, mfn_x(smfn));
 
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
     shadow_hash_insert(v, gfn_x(gfn), SH_type_fl1_shadow, smfn);
 }
 
@@ -127,6 +130,8 @@ set_shadow_status(struct vcpu *v, mfn_t 
                    d->domain_id, v->vcpu_id, mfn_x(gmfn),
                    shadow_type, mfn_x(smfn));
 
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
+
     /* 32-on-64 PV guests don't own their l4 pages so can't get_page them */
     if ( !is_pv_32on64_vcpu(v) || shadow_type != SH_type_l4_64_shadow )
     {
@@ -143,6 +148,7 @@ delete_fl1_shadow_status(struct vcpu *v,
 {
     SHADOW_PRINTK("gfn=%"SH_PRI_gfn", type=%08x, smfn=%05lx\n",
                    gfn_x(gfn), SH_type_fl1_shadow, mfn_x(smfn));
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
     shadow_hash_delete(v, gfn_x(gfn), SH_type_fl1_shadow, smfn);
 }
 
@@ -153,6 +159,7 @@ delete_shadow_status(struct vcpu *v, mfn
     SHADOW_PRINTK("d=%d, v=%d, gmfn=%05lx, type=%08x, smfn=%05lx\n",
                    v->domain->domain_id, v->vcpu_id,
                    mfn_x(gmfn), shadow_type, mfn_x(smfn));
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
     shadow_hash_delete(v, mfn_x(gmfn), shadow_type, smfn);
     /* 32-on-64 PV guests don't own their l4 pages; see set_shadow_status */
     if ( !is_pv_32on64_vcpu(v) || shadow_type != SH_type_l4_64_shadow )
@@ -432,6 +439,7 @@ shadow_l1_index(mfn_t *smfn, u32 guest_i
 shadow_l1_index(mfn_t *smfn, u32 guest_index)
 {
 #if (GUEST_PAGING_LEVELS == 2)
+    ASSERT(mfn_to_page(*smfn)->u.sh.head);
     *smfn = _mfn(mfn_x(*smfn) +
                  (guest_index / SHADOW_L1_PAGETABLE_ENTRIES));
     return (guest_index % SHADOW_L1_PAGETABLE_ENTRIES);
@@ -444,6 +452,7 @@ shadow_l2_index(mfn_t *smfn, u32 guest_i
 shadow_l2_index(mfn_t *smfn, u32 guest_index)
 {
 #if (GUEST_PAGING_LEVELS == 2)
+    ASSERT(mfn_to_page(*smfn)->u.sh.head);
     // Because we use 2 shadow l2 entries for each guest entry, the number of
     // guest entries per shadow page is SHADOW_L2_PAGETABLE_ENTRIES/2
     //
@@ -1023,6 +1032,7 @@ static int shadow_set_l2e(struct vcpu *v
     if ( shadow_l2e_get_flags(new_sl2e) & _PAGE_PRESENT ) 
     {
         mfn_t sl1mfn = shadow_l2e_get_mfn(new_sl2e);
+        ASSERT(mfn_to_page(sl1mfn)->u.sh.head);
 
         /* About to install a new reference */
         if ( !sh_get_ref(v, sl1mfn, paddr) )
@@ -1033,13 +1043,15 @@ static int shadow_set_l2e(struct vcpu *v
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
         {
             struct page_info *sp = mfn_to_page(sl1mfn);
-            mfn_t gl1mfn = backpointer(sp);
-
+            mfn_t gl1mfn;
+
+            ASSERT(sp->u.sh.head);
+            gl1mfn = backpointer(sp);
             /* If the shadow is a fl1 then the backpointer contains
                the GFN instead of the GMFN, and it's definitely not
                OOS. */
             if ( (sp->u.sh.type != SH_type_fl1_shadow) && mfn_valid(gl1mfn)
-                 && mfn_is_out_of_sync(gl1mfn) )
+                 && mfn_is_out_of_sync(gl1mfn) ) 
                 sh_resync(v, gl1mfn);
         }
 #endif
@@ -1993,15 +2005,17 @@ void sh_destroy_l4_shadow(struct vcpu *v
 void sh_destroy_l4_shadow(struct vcpu *v, mfn_t smfn)
 {
     shadow_l4e_t *sl4e;
-    u32 t = mfn_to_page(smfn)->u.sh.type;
+    struct page_info *sp = mfn_to_page(smfn);
+    u32 t = sp->u.sh.type;
     mfn_t gmfn, sl4mfn;
 
     SHADOW_DEBUG(DESTROY_SHADOW,
                   "%s(%05lx)\n", __func__, mfn_x(smfn));
     ASSERT(t == SH_type_l4_shadow);
+    ASSERT(sp->u.sh.head);
 
     /* Record that the guest page isn't shadowed any more (in this type) */
-    gmfn = backpointer(mfn_to_page(smfn));
+    gmfn = backpointer(sp);
     delete_shadow_status(v, gmfn, t, smfn);
     shadow_demote(v, gmfn, t);
     /* Decrement refcounts of all the old entries */
@@ -2022,15 +2036,17 @@ void sh_destroy_l3_shadow(struct vcpu *v
 void sh_destroy_l3_shadow(struct vcpu *v, mfn_t smfn)
 {
     shadow_l3e_t *sl3e;
-    u32 t = mfn_to_page(smfn)->u.sh.type;
+    struct page_info *sp = mfn_to_page(smfn);
+    u32 t = sp->u.sh.type;
     mfn_t gmfn, sl3mfn;
 
     SHADOW_DEBUG(DESTROY_SHADOW,
                   "%s(%05lx)\n", __func__, mfn_x(smfn));
     ASSERT(t == SH_type_l3_shadow);
+    ASSERT(sp->u.sh.head);
 
     /* Record that the guest page isn't shadowed any more (in this type) */
-    gmfn = backpointer(mfn_to_page(smfn));
+    gmfn = backpointer(sp);
     delete_shadow_status(v, gmfn, t, smfn);
     shadow_demote(v, gmfn, t);
 
@@ -2052,7 +2068,8 @@ void sh_destroy_l2_shadow(struct vcpu *v
 void sh_destroy_l2_shadow(struct vcpu *v, mfn_t smfn)
 {
     shadow_l2e_t *sl2e;
-    u32 t = mfn_to_page(smfn)->u.sh.type;
+    struct page_info *sp = mfn_to_page(smfn);
+    u32 t = sp->u.sh.type;
     mfn_t gmfn, sl2mfn;
 
     SHADOW_DEBUG(DESTROY_SHADOW,
@@ -2063,9 +2080,10 @@ void sh_destroy_l2_shadow(struct vcpu *v
 #else
     ASSERT(t == SH_type_l2_shadow);
 #endif
+    ASSERT(sp->u.sh.head);
 
     /* Record that the guest page isn't shadowed any more (in this type) */
-    gmfn = backpointer(mfn_to_page(smfn));
+    gmfn = backpointer(sp);
     delete_shadow_status(v, gmfn, t, smfn);
     shadow_demote(v, gmfn, t);
 
@@ -2086,21 +2104,23 @@ void sh_destroy_l1_shadow(struct vcpu *v
 {
     struct domain *d = v->domain;
     shadow_l1e_t *sl1e;
-    u32 t = mfn_to_page(smfn)->u.sh.type;
+    struct page_info *sp = mfn_to_page(smfn);
+    u32 t = sp->u.sh.type;
 
     SHADOW_DEBUG(DESTROY_SHADOW,
                   "%s(%05lx)\n", __func__, mfn_x(smfn));
     ASSERT(t == SH_type_l1_shadow || t == SH_type_fl1_shadow);
+    ASSERT(sp->u.sh.head);
 
     /* Record that the guest page isn't shadowed any more (in this type) */
     if ( t == SH_type_fl1_shadow )
     {
-        gfn_t gfn = _gfn(mfn_to_page(smfn)->v.sh.back);
+        gfn_t gfn = _gfn(sp->v.sh.back);
         delete_fl1_shadow_status(v, gfn, smfn);
     }
     else 
     {
-        mfn_t gmfn = backpointer(mfn_to_page(smfn));
+        mfn_t gmfn = backpointer(sp);
         delete_shadow_status(v, gmfn, t, smfn);
         shadow_demote(v, gmfn, t);
     }
@@ -5160,6 +5180,7 @@ int sh_audit_l1_table(struct vcpu *v, mf
     int done = 0;
     
     /* Follow the backpointer */
+    ASSERT(mfn_to_page(sl1mfn)->u.sh.head);
     gl1mfn = backpointer(mfn_to_page(sl1mfn));
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -5254,6 +5275,7 @@ int sh_audit_l2_table(struct vcpu *v, mf
     int done = 0;
 
     /* Follow the backpointer */
+    ASSERT(mfn_to_page(sl2mfn)->u.sh.head);
     gl2mfn = backpointer(mfn_to_page(sl2mfn));
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -5303,6 +5325,7 @@ int sh_audit_l3_table(struct vcpu *v, mf
     int done = 0;
 
     /* Follow the backpointer */
+    ASSERT(mfn_to_page(sl3mfn)->u.sh.head);
     gl3mfn = backpointer(mfn_to_page(sl3mfn));
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) 
@@ -5350,6 +5373,7 @@ int sh_audit_l4_table(struct vcpu *v, mf
     int done = 0;
 
     /* Follow the backpointer */
+    ASSERT(mfn_to_page(sl4mfn)->u.sh.head);
     gl4mfn = backpointer(mfn_to_page(sl4mfn));
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) 
diff -r f68726cdf357 -r 66abfa6bc671 xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Thu Aug 19 18:24:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/private.h	Fri Aug 20 16:52:12 2010 +0100
@@ -625,6 +625,7 @@ static inline int sh_get_ref(struct vcpu
     struct page_info *sp = mfn_to_page(smfn);
 
     ASSERT(mfn_valid(smfn));
+    ASSERT(sp->u.sh.head);
 
     x = sp->u.sh.count;
     nx = x + 1;
@@ -657,6 +658,7 @@ static inline void sh_put_ref(struct vcp
     struct page_info *sp = mfn_to_page(smfn);
 
     ASSERT(mfn_valid(smfn));
+    ASSERT(sp->u.sh.head);
     ASSERT(!(sp->count_info & PGC_count_mask));
 
     /* If this is the entry in the up-pointer, remove it */
diff -r f68726cdf357 -r 66abfa6bc671 xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h	Thu Aug 19 18:24:12 2010 +0100
+++ b/xen/include/asm-x86/mm.h	Fri Aug 20 16:52:12 2010 +0100
@@ -63,7 +63,8 @@ struct page_info
         struct {
             unsigned long type:5;   /* What kind of shadow is this? */
             unsigned long pinned:1; /* Is the shadow pinned? */
-            unsigned long count:26; /* Reference count */
+            unsigned long head:1;   /* Is this the first page of the shadow? */
+            unsigned long count:25; /* Reference count */
         } sh;
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2 of 4] x86 shadow: explicitly link the pages of multipage shadows
  2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
  2010-08-20 15:58 ` [PATCH 1 of 4] x86 shadow: for multi-page shadows, explicitly track the first page Tim Deegan
@ 2010-08-20 15:58 ` Tim Deegan
  2010-08-20 15:58 ` [PATCH 3 of 4] x86 shadow: remove the assumption that multipage shadows are contiguous Tim Deegan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Tim Deegan @ 2010-08-20 15:58 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

x86 shadow: explicitly link the pages of multipage shadows
together using their list headers.  Update the users of the
pinned-shadows list to expect l2_32 shadows to have four entries
in the list, which must be kept together during updates.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>


4 files changed, 137 insertions(+), 34 deletions(-)
xen/arch/x86/mm/shadow/common.c  |   18 +++--
xen/arch/x86/mm/shadow/multi.c   |   23 +++----
xen/arch/x86/mm/shadow/private.h |  115 +++++++++++++++++++++++++++++++++++---
xen/include/asm-x86/mm.h         |   15 +++-



[-- Attachment #2: xen-unstable.hg-4.patch --]
[-- Type: text/x-patch, Size: 14190 bytes --]

# HG changeset patch
# User Tim Deegan <Tim.Deegan@citrix.com>
# Date 1282319533 -3600
# Node ID 1544aa105c624f8a49e16900b97e3f10aa30d0cd
# Parent  66abfa6bc671b9b67c1fd729ddb9292c969d6ca2
x86 shadow: explicitly link the pages of multipage shadows.

x86 shadow: explicitly link the pages of multipage shadows
together using their list headers.  Update the users of the
pinned-shadows list to expect l2_32 shadows to have four entries
in the list, which must be kept together during updates.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r 66abfa6bc671 -r 1544aa105c62 xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:13 2010 +0100
@@ -1377,7 +1377,7 @@ static void _shadow_prealloc(
 
     /* Stage one: walk the list of pinned pages, unpinning them */
     perfc_incr(shadow_prealloc_1);
-    page_list_for_each_safe_reverse(sp, t, &d->arch.paging.shadow.pinned_shadows)
+    foreach_pinned_shadow(d, sp, t)
     {
         smfn = page_to_mfn(sp);
 
@@ -1445,7 +1445,7 @@ static void shadow_blow_tables(struct do
     ASSERT(v != NULL);
 
     /* Pass one: unpin all pinned pages */
-    page_list_for_each_safe_reverse(sp, t, &d->arch.paging.shadow.pinned_shadows)
+    foreach_pinned_shadow(d, sp, t)
     {
         smfn = page_to_mfn(sp);
         sh_unpin(v, smfn);
@@ -1527,6 +1527,7 @@ mfn_t shadow_alloc(struct domain *d,
 {
     struct page_info *sp = NULL;
     unsigned int order = shadow_order(shadow_type);
+    struct page_list_head tmp_list;
     cpumask_t mask;
     void *p;
     int i;
@@ -1572,6 +1573,11 @@ mfn_t shadow_alloc(struct domain *d,
         break;
     }
 
+    /* Page lists don't have pointers back to the head structure, so
+     * it's safe to use a head structure on the stack to link the pages
+     * together. */
+    INIT_PAGE_LIST_HEAD(&tmp_list);
+
     /* Init page info fields and clear the pages */
     for ( i = 0; i < 1<<order ; i++ ) 
     {
@@ -1598,6 +1604,7 @@ mfn_t shadow_alloc(struct domain *d,
                             && i == 0 );
         sp[i].v.sh.back = backpointer;
         set_next_shadow(&sp[i], NULL);
+        page_list_add_tail(&sp[i], &tmp_list);
         perfc_incr(shadow_alloc_count);
     }
     return page_to_mfn(sp);
@@ -2668,10 +2675,7 @@ static int sh_remove_shadow_via_pointer(
 
     ASSERT(sp->u.sh.type > 0);
     ASSERT(sp->u.sh.type < SH_type_max_shadow);
-    ASSERT(sp->u.sh.type != SH_type_l2_32_shadow);
-    ASSERT(sp->u.sh.type != SH_type_l2_pae_shadow);
-    ASSERT(sp->u.sh.type != SH_type_l2h_pae_shadow);
-    ASSERT(sp->u.sh.type != SH_type_l4_64_shadow);
+    ASSERT(sh_type_has_up_pointer(v, sp->u.sh.type));
     
     if (sp->up == 0) return 0;
     pmfn = _mfn(sp->up >> PAGE_SHIFT);
@@ -2823,7 +2827,7 @@ void sh_remove_shadows(struct vcpu *v, m
     }                                                                   \
     if ( sh_type_is_pinnable(v, t) )                                    \
         sh_unpin(v, smfn);                                              \
-    else                                                                \
+    else if ( sh_type_has_up_pointer(v, t) )                            \
         sh_remove_shadow_via_pointer(v, smfn);                          \
     if( !fast                                                           \
         && (pg->count_info & PGC_page_table)                            \
diff -r 66abfa6bc671 -r 1544aa105c62 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:13 2010 +0100
@@ -1588,10 +1588,7 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
     SHADOW_DEBUG(MAKE_SHADOW, "(%05lx, %u)=>%05lx\n",
                   mfn_x(gmfn), shadow_type, mfn_x(smfn));
 
-    if ( shadow_type != SH_type_l2_32_shadow 
-         && shadow_type != SH_type_l2_pae_shadow 
-         && shadow_type != SH_type_l2h_pae_shadow 
-         && shadow_type != SH_type_l4_64_shadow )
+    if ( sh_type_has_up_pointer(v, shadow_type) )
         /* Lower-level shadow, not yet linked form a higher level */
         mfn_to_page(smfn)->up = 0;
 
@@ -1622,7 +1619,10 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
             page_list_for_each_safe(sp, t, &v->domain->arch.paging.shadow.pinned_shadows)
             {
                 if ( sp->u.sh.type == SH_type_l3_64_shadow )
+                {
                     sh_unpin(v, page_to_mfn(sp));
+                    sp->up = 0;
+                }
             }
             v->domain->arch.paging.shadow.opt_flags &= ~SHOPT_LINUX_L3_TOPLEVEL;
         }
@@ -2534,9 +2534,12 @@ int sh_safe_not_to_sync(struct vcpu *v, 
     struct page_info *sp;
     mfn_t smfn;
 
+    if ( !sh_type_has_up_pointer(v, SH_type_l1_shadow) )
+        return 0;
+
     smfn = get_shadow_status(v, gl1mfn, SH_type_l1_shadow);
     ASSERT(mfn_valid(smfn)); /* Otherwise we would not have been called */
-    
+
     /* Up to l2 */
     sp = mfn_to_page(smfn);
     if ( sp->u.sh.count != 1 || !sp->up )
@@ -2547,6 +2550,7 @@ int sh_safe_not_to_sync(struct vcpu *v, 
 #if (SHADOW_PAGING_LEVELS == 4) 
     /* up to l3 */
     sp = mfn_to_page(smfn);
+    ASSERT(sh_type_has_up_pointer(v, SH_type_l2_shadow));
     if ( sp->u.sh.count != 1 || !sp->up )
         return 0;
     smfn = _mfn(sp->up >> PAGE_SHIFT);
@@ -2555,17 +2559,10 @@ int sh_safe_not_to_sync(struct vcpu *v, 
     /* up to l4 */
     sp = mfn_to_page(smfn);
     if ( sp->u.sh.count != 1
-         || sh_type_is_pinnable(v, SH_type_l3_64_shadow) || !sp->up )
+         || !sh_type_has_up_pointer(v, SH_type_l3_64_shadow) || !sp->up )
         return 0;
     smfn = _mfn(sp->up >> PAGE_SHIFT);
     ASSERT(mfn_valid(smfn));
-#endif
-
-#if (GUEST_PAGING_LEVELS == 2 && SHADOW_PAGING_LEVELS == 3)
-    /* In 2-on-3 shadow mode the up pointer contains the link to the
-     * shadow page, but the shadow_table contains only the first of the
-     * four pages that makes the PAE top shadow tables. */
-    smfn = _mfn(mfn_x(smfn) & ~0x3UL);
 #endif
 
     if ( pagetable_get_pfn(v->arch.shadow_table[0]) == mfn_x(smfn)
diff -r 66abfa6bc671 -r 1544aa105c62 xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h	Fri Aug 20 16:52:12 2010 +0100
+++ b/xen/arch/x86/mm/shadow/private.h	Fri Aug 20 16:52:13 2010 +0100
@@ -270,6 +270,17 @@ static inline int sh_type_is_pinnable(st
 
     /* Everything else is not pinnable, and can use the "up" pointer */
     return 0;
+}
+
+static inline int sh_type_has_up_pointer(struct vcpu *v, unsigned int t) 
+{
+    /* Multi-page shadows don't have up-pointers */
+    if ( t == SH_type_l1_32_shadow
+         || t == SH_type_fl1_32_shadow
+         || t == SH_type_l2_32_shadow )
+        return 0;
+    /* Pinnable shadows don't have up-pointers either */
+    return !sh_type_is_pinnable(v, t);
 }
 
 /*
@@ -642,7 +653,7 @@ static inline int sh_get_ref(struct vcpu
 
     /* We remember the first shadow entry that points to each shadow. */
     if ( entry_pa != 0 
-         && !sh_type_is_pinnable(v, sp->u.sh.type)
+         && sh_type_has_up_pointer(v, sp->u.sh.type)
          && sp->up == 0 ) 
         sp->up = entry_pa;
     
@@ -663,7 +674,7 @@ static inline void sh_put_ref(struct vcp
 
     /* If this is the entry in the up-pointer, remove it */
     if ( entry_pa != 0 
-         && !sh_type_is_pinnable(v, sp->u.sh.type)
+         && sh_type_has_up_pointer(v, sp->u.sh.type)
          && sp->up == entry_pa ) 
         sp->up = 0;
 
@@ -685,21 +696,76 @@ static inline void sh_put_ref(struct vcp
 }
 
 
+/* Walk the list of pinned shadows, from the tail forwards, 
+ * skipping the non-head-page entries */
+static inline struct page_info *
+prev_pinned_shadow(const struct page_info *page,
+                   const struct domain *d)
+{
+    struct page_info *p;
+
+    if ( page == d->arch.paging.shadow.pinned_shadows.next ) 
+        return NULL;
+    
+    if ( page == NULL ) /* If no current place, start at the tail */
+        p = d->arch.paging.shadow.pinned_shadows.tail;
+    else
+        p = pdx_to_page(page->list.prev);
+    /* Skip over the non-tail parts of multi-page shadows */
+    if ( p && p->u.sh.type == SH_type_l2_32_shadow )
+    {
+        p = pdx_to_page(p->list.prev);
+        ASSERT(p && p->u.sh.type == SH_type_l2_32_shadow);
+        p = pdx_to_page(p->list.prev);
+        ASSERT(p && p->u.sh.type == SH_type_l2_32_shadow);
+        p = pdx_to_page(p->list.prev);
+        ASSERT(p && p->u.sh.type == SH_type_l2_32_shadow);
+    }
+    ASSERT(!p || p->u.sh.head);
+    return p;
+}
+
+#define foreach_pinned_shadow(dom, pos, tmp)                    \
+    for ( pos = prev_pinned_shadow(NULL, (dom));                \
+          pos ? (tmp = prev_pinned_shadow(pos, (dom)), 1) : 0;  \
+          pos = tmp )
+
 /* Pin a shadow page: take an extra refcount, set the pin bit,
  * and put the shadow at the head of the list of pinned shadows.
  * Returns 0 for failure, 1 for success. */
 static inline int sh_pin(struct vcpu *v, mfn_t smfn)
 {
     struct page_info *sp;
+    struct page_list_head h, *pin_list;
     
     ASSERT(mfn_valid(smfn));
     sp = mfn_to_page(smfn);
     ASSERT(sh_type_is_pinnable(v, sp->u.sh.type));
+    ASSERT(sp->u.sh.head);
+
+    /* Treat the up-to-four pages of the shadow as a unit in the list ops */
+    h.next = h.tail = sp; 
+    if ( sp->u.sh.type == SH_type_l2_32_shadow ) 
+    {
+        h.tail = pdx_to_page(h.tail->list.next);
+        h.tail = pdx_to_page(h.tail->list.next);
+        h.tail = pdx_to_page(h.tail->list.next);
+        ASSERT(h.tail->u.sh.type == SH_type_l2_32_shadow); 
+    }
+    pin_list = &v->domain->arch.paging.shadow.pinned_shadows;
+
     if ( sp->u.sh.pinned )
     {
         /* Already pinned: take it out of the pinned-list so it can go 
          * at the front */
-        page_list_del(sp, &v->domain->arch.paging.shadow.pinned_shadows);
+        if ( pin_list->next == h.next )
+            return 1;
+        page_list_prev(h.next, pin_list)->list.next = h.tail->list.next;
+        if ( pin_list->tail == h.tail )
+            pin_list->tail = page_list_prev(h.next, pin_list);
+        else
+            page_list_next(h.tail, pin_list)->list.prev = h.next->list.prev;
+        h.tail->list.next = h.next->list.prev = PAGE_LIST_NULL;
     }
     else
     {
@@ -707,9 +773,11 @@ static inline int sh_pin(struct vcpu *v,
         if ( !sh_get_ref(v, smfn, 0) )
             return 0;
         sp->u.sh.pinned = 1;
+        ASSERT(h.next->list.prev == PAGE_LIST_NULL);
+        ASSERT(h.tail->list.next == PAGE_LIST_NULL);
     }
     /* Put it at the head of the list of pinned shadows */
-    page_list_add(sp, &v->domain->arch.paging.shadow.pinned_shadows);
+    page_list_splice(&h, pin_list);
     return 1;
 }
 
@@ -717,18 +785,47 @@ static inline int sh_pin(struct vcpu *v,
  * of pinned shadows, and release the extra ref. */
 static inline void sh_unpin(struct vcpu *v, mfn_t smfn)
 {
+    struct page_list_head h, *pin_list;
     struct page_info *sp;
     
     ASSERT(mfn_valid(smfn));
     sp = mfn_to_page(smfn);
     ASSERT(sh_type_is_pinnable(v, sp->u.sh.type));
-    if ( sp->u.sh.pinned )
+    ASSERT(sp->u.sh.head);
+
+    /* Treat the up-to-four pages of the shadow as a unit in the list ops */
+    h.next = h.tail = sp; 
+    if ( sp->u.sh.type == SH_type_l2_32_shadow ) 
     {
-        sp->u.sh.pinned = 0;
-        page_list_del(sp, &v->domain->arch.paging.shadow.pinned_shadows);
-        sp->up = 0; /* in case this stops being a pinnable type in future */
-        sh_put_ref(v, smfn, 0);
+        h.tail = pdx_to_page(h.tail->list.next);
+        h.tail = pdx_to_page(h.tail->list.next);
+        h.tail = pdx_to_page(h.tail->list.next);
+        ASSERT(h.tail->u.sh.type == SH_type_l2_32_shadow); 
     }
+    pin_list = &v->domain->arch.paging.shadow.pinned_shadows;
+
+    if ( !sp->u.sh.pinned )
+        return;
+
+    sp->u.sh.pinned = 0;
+
+    /* Cut the sub-list out of the list of pinned shadows */
+    if ( pin_list->next == h.next && pin_list->tail == h.tail )
+        pin_list->next = pin_list->tail = NULL;
+    else 
+    {
+        if ( pin_list->next == h.next )
+            pin_list->next = page_list_next(h.tail, pin_list);
+        else
+            page_list_prev(h.next, pin_list)->list.next = h.tail->list.next;
+        if ( pin_list->tail == h.tail )
+            pin_list->tail = page_list_prev(h.next, pin_list);
+        else
+            page_list_next(h.tail, pin_list)->list.prev = h.next->list.prev;
+    }
+    h.tail->list.next = h.next->list.prev = PAGE_LIST_NULL;
+    
+    sh_put_ref(v, smfn, 0);
 }
 
 
diff -r 66abfa6bc671 -r 1544aa105c62 xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h	Fri Aug 20 16:52:12 2010 +0100
+++ b/xen/include/asm-x86/mm.h	Fri Aug 20 16:52:13 2010 +0100
@@ -35,13 +35,18 @@ struct page_info
     union {
         /* Each frame can be threaded onto a doubly-linked list.
          *
-         * For unused shadow pages, a list of pages of this order; for
-         * pinnable shadows, if pinned, a list of other pinned shadows
-         * (see sh_type_is_pinnable() below for the definition of
-         * "pinnable" shadow types).
+         * For unused shadow pages, a list of pages of this order; 
+         * for multi-page shadows, links to the other pages in this shadow;
+         * for pinnable shadows, if pinned, a list of all pinned shadows
+         * (see sh_type_is_pinnable() for the definition of "pinnable" 
+         * shadow types).  N.B. a shadow may be both pinnable and multi-page.
+         * In that case the pages are inserted in order in the list of
+         * pinned shadows and walkers of that list must be prepared 
+         * to keep them all together during updates. 
          */
         struct page_list_entry list;
-        /* For non-pinnable shadows, a higher entry that points at us. */
+        /* For non-pinnable single-page shadows, a higher entry that points
+         * at us. */
         paddr_t up;
         /* For shared/sharable pages the sharing handle */
         uint64_t shr_handle; 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3 of 4] x86 shadow: remove the assumption that multipage shadows are contiguous
  2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
  2010-08-20 15:58 ` [PATCH 1 of 4] x86 shadow: for multi-page shadows, explicitly track the first page Tim Deegan
  2010-08-20 15:58 ` [PATCH 2 of 4] x86 shadow: explicitly link the pages of multipage shadows Tim Deegan
@ 2010-08-20 15:58 ` Tim Deegan
  2010-08-20 15:58 ` [PATCH 4 of 4] x86 shadow: allocate all shadow memory in single pages Tim Deegan
  2010-08-20 16:29 ` [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Dan Magenheimer
  4 siblings, 0 replies; 7+ messages in thread
From: Tim Deegan @ 2010-08-20 15:58 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

x86 shadow: remove the assumption that multipage shadows are contiguous
and move from page to page using the linked list instead.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>


2 files changed, 43 insertions(+), 32 deletions(-)
xen/arch/x86/mm/shadow/common.c |    4 +-
xen/arch/x86/mm/shadow/multi.c  |   71 ++++++++++++++++++++++-----------------



[-- Attachment #2: xen-unstable.hg-4.patch --]
[-- Type: text/x-patch, Size: 7671 bytes --]

# HG changeset patch
# User Tim Deegan <Tim.Deegan@citrix.com>
# Date 1282319534 -3600
# Node ID 10627fb7c8cfa1f9a055a2a30011de5a087931a8
# Parent  1544aa105c624f8a49e16900b97e3f10aa30d0cd
x86 shadow: remove the assumption that multipage shadows are contiguous.

x86 shadow: remove the assumption that multipage shadows are contiguous
and move from page to page using the linked list instead.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r 1544aa105c62 -r 10627fb7c8cf xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:13 2010 +0100
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:14 2010 +0100
@@ -1214,8 +1214,8 @@ int shadow_cmpxchg_guest_entry(struct vc
  * l1 tables (covering 2MB of virtual address space each).  Similarly, a
  * 32-bit guest l2 table (4GB va) needs to be shadowed by four
  * PAE/64-bit l2 tables (1GB va each).  These multi-page shadows are
- * contiguous and aligned; functions for handling offsets into them are
- * defined in shadow.c (shadow_l1_index() etc.)
+ * not contiguous in memory; functions for handling offsets into them are
+ * defined in shadow/multi.c (shadow_l1_index() etc.)
  *    
  * This table shows the allocation behaviour of the different modes:
  *
diff -r 1544aa105c62 -r 10627fb7c8cf xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:13 2010 +0100
+++ b/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:14 2010 +0100
@@ -421,13 +421,27 @@ sh_guest_get_eff_l1e(struct vcpu *v, uns
  * way to see this is: a 32-bit guest L2 page maps 4GB of virtual address
  * space, while a PAE- or 64-bit shadow L2 page maps 1GB of virtual address
  * space.)
- *
- * For PAE guests, for every 32-bytes of guest L3 page table, we use 64-bytes
- * of shadow (to store both the shadow, and the info that would normally be
- * stored in page_info fields).  This arrangement allows the shadow and the
- * "page_info" fields to always be stored in the same page (in fact, in
- * the same cache line), avoiding an extra call to map_domain_page().
- */
+ */
+
+/* From one page of a multi-page shadow, find the next one */
+static inline mfn_t sh_next_page(mfn_t smfn)
+{
+    mfn_t next;
+    struct page_info *pg = mfn_to_page(smfn);
+
+    ASSERT(pg->u.sh.type == SH_type_l1_32_shadow
+           || pg->u.sh.type == SH_type_fl1_32_shadow
+           || pg->u.sh.type == SH_type_l2_32_shadow);
+    ASSERT(pg->u.sh.type == SH_type_l2_32_shadow || pg->u.sh.head);
+    ASSERT(pg->list.next != PAGE_LIST_NULL);
+
+    next = _mfn(pdx_to_pfn(pg->list.next));
+
+    /* XXX not for long */ ASSERT(mfn_x(next) == mfn_x(smfn) + 1);
+    ASSERT(mfn_to_page(next)->u.sh.type == pg->u.sh.type);
+    ASSERT(!mfn_to_page(next)->u.sh.head);
+    return next;
+}
 
 static inline u32
 guest_index(void *ptr)
@@ -440,8 +454,8 @@ shadow_l1_index(mfn_t *smfn, u32 guest_i
 {
 #if (GUEST_PAGING_LEVELS == 2)
     ASSERT(mfn_to_page(*smfn)->u.sh.head);
-    *smfn = _mfn(mfn_x(*smfn) +
-                 (guest_index / SHADOW_L1_PAGETABLE_ENTRIES));
+    if ( guest_index >= SHADOW_L1_PAGETABLE_ENTRIES )
+        *smfn = sh_next_page(*smfn);
     return (guest_index % SHADOW_L1_PAGETABLE_ENTRIES);
 #else
     return guest_index;
@@ -452,13 +466,12 @@ shadow_l2_index(mfn_t *smfn, u32 guest_i
 shadow_l2_index(mfn_t *smfn, u32 guest_index)
 {
 #if (GUEST_PAGING_LEVELS == 2)
+    int i;
     ASSERT(mfn_to_page(*smfn)->u.sh.head);
     // Because we use 2 shadow l2 entries for each guest entry, the number of
     // guest entries per shadow page is SHADOW_L2_PAGETABLE_ENTRIES/2
-    //
-    *smfn = _mfn(mfn_x(*smfn) +
-                 (guest_index / (SHADOW_L2_PAGETABLE_ENTRIES / 2)));
-
+    for ( i = 0; i < guest_index / (SHADOW_L2_PAGETABLE_ENTRIES / 2); i++ )
+        *smfn = sh_next_page(*smfn);
     // We multiply by two to get the index of the first of the two entries
     // used to shadow the specified guest entry.
     return (guest_index % (SHADOW_L2_PAGETABLE_ENTRIES / 2)) * 2;
@@ -1014,11 +1027,11 @@ static int shadow_set_l2e(struct vcpu *v
     /* In 2-on-3 we work with pairs of l2es pointing at two-page
      * shadows.  Reference counting and up-pointers track from the first
      * page of the shadow to the first l2e, so make sure that we're 
-     * working with those:     
-     * Align the pointer down so it's pointing at the first of the pair */
+     * working with those:
+     * Start with a pair of identical entries */
+    shadow_l2e_t pair[2] = { new_sl2e, new_sl2e };
+    /* Align the pointer down so it's pointing at the first of the pair */
     sl2e = (shadow_l2e_t *)((unsigned long)sl2e & ~(sizeof(shadow_l2e_t)));
-    /* Align the mfn of the shadow entry too */
-    new_sl2e.l2 &= ~(1<<PAGE_SHIFT);
 #endif
 
     ASSERT(sl2e != NULL);
@@ -1055,19 +1068,16 @@ static int shadow_set_l2e(struct vcpu *v
                 sh_resync(v, gl1mfn);
         }
 #endif
+#if GUEST_PAGING_LEVELS == 2
+        /* Update the second entry to point tio the second half of the l1 */
+        sl1mfn = sh_next_page(sl1mfn);
+        pair[1] = shadow_l2e_from_mfn(sl1mfn, shadow_l2e_get_flags(new_sl2e));
+#endif
     }
 
     /* Write the new entry */
 #if GUEST_PAGING_LEVELS == 2
-    {
-        shadow_l2e_t pair[2] = { new_sl2e, new_sl2e };
-        /* The l1 shadow is two pages long and need to be pointed to by
-         * two adjacent l1es.  The pair have the same flags, but point
-         * at odd and even MFNs */
-        ASSERT(!(pair[0].l2 & (1<<PAGE_SHIFT)));
-        pair[1].l2 |= (1<<PAGE_SHIFT);
-        shadow_write_entries(sl2e, &pair, 2, sl2mfn);
-    }
+    shadow_write_entries(sl2e, &pair, 2, sl2mfn);
 #else /* normal case */
     shadow_write_entries(sl2e, &new_sl2e, 1, sl2mfn);
 #endif
@@ -1301,7 +1311,7 @@ do {                                    
     int __done = 0;                                                     \
     _SHADOW_FOREACH_L1E(_sl1mfn, _sl1e, _gl1p,                          \
                          ({ (__done = _done); }), _code);               \
-    _sl1mfn = _mfn(mfn_x(_sl1mfn) + 1);                                 \
+    _sl1mfn = sh_next_page(_sl1mfn);                                    \
     if ( !__done )                                                      \
         _SHADOW_FOREACH_L1E(_sl1mfn, _sl1e, _gl1p,                      \
                              ({ (__done = _done); }), _code);           \
@@ -1335,7 +1345,7 @@ do {                                    
                 increment_ptr_to_guest_entry(_gl2p);                      \
             }                                                             \
         sh_unmap_domain_page(_sp);                                        \
-        _sl2mfn = _mfn(mfn_x(_sl2mfn) + 1);                               \
+        if ( _j < 3 ) _sl2mfn = sh_next_page(_sl2mfn);                    \
     }                                                                     \
 } while (0)
 
@@ -4332,13 +4342,14 @@ sh_update_cr3(struct vcpu *v, int do_loc
     ///
 #if SHADOW_PAGING_LEVELS == 3
         {
-            mfn_t smfn;
+            mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
             int i;
             for ( i = 0; i < 4; i++ )
             {
 #if GUEST_PAGING_LEVELS == 2
                 /* 2-on-3: make a PAE l3 that points at the four-page l2 */
-                smfn = _mfn(pagetable_get_pfn(v->arch.shadow_table[0]) + i);
+                if ( i != 0 )
+                    smfn = sh_next_page(smfn);
 #else
                 /* 3-on-3: make a PAE l3 that points at the four l2 pages */
                 smfn = pagetable_get_mfn(v->arch.shadow_table[i]);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4 of 4] x86 shadow: allocate all shadow memory in single pages
  2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
                   ` (2 preceding siblings ...)
  2010-08-20 15:58 ` [PATCH 3 of 4] x86 shadow: remove the assumption that multipage shadows are contiguous Tim Deegan
@ 2010-08-20 15:58 ` Tim Deegan
  2010-08-20 16:29 ` [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Dan Magenheimer
  4 siblings, 0 replies; 7+ messages in thread
From: Tim Deegan @ 2010-08-20 15:58 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

x86 shadow: allocate all shadow memory in single pages
now that multi-page shadows need not be contiguous.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>


4 files changed, 105 insertions(+), 221 deletions(-)
xen/arch/x86/mm/shadow/common.c |  315 ++++++++++++---------------------------
xen/arch/x86/mm/shadow/multi.c  |    1 
xen/include/asm-x86/domain.h    |    3 
xen/include/asm-x86/mm.h        |    7 



[-- Attachment #2: xen-unstable.hg-4.patch --]
[-- Type: text/x-patch, Size: 23940 bytes --]

# HG changeset patch
# User Tim Deegan <Tim.Deegan@citrix.com>
# Date 1282319534 -3600
# Node ID 22241338e51a781350fa1c0234b6c70b06f39478
# Parent  10627fb7c8cfa1f9a055a2a30011de5a087931a8
x86 shadow: allocate all shadow memory in single pages.

x86 shadow: allocate all shadow memory in single pages
now that multi-page shadows need not be contiguous.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r 10627fb7c8cf -r 22241338e51a xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:14 2010 +0100
+++ b/xen/arch/x86/mm/shadow/common.c	Fri Aug 20 16:52:14 2010 +0100
@@ -45,11 +45,8 @@ DEFINE_PER_CPU(uint32_t,trace_shadow_pat
  * Called for every domain from arch_domain_create() */
 void shadow_domain_init(struct domain *d, unsigned int domcr_flags)
 {
-    int i;
     shadow_lock_init(d);
-    for ( i = 0; i <= SHADOW_MAX_ORDER; i++ )
-        INIT_PAGE_LIST_HEAD(&d->arch.paging.shadow.freelists[i]);
-    INIT_PAGE_LIST_HEAD(&d->arch.paging.shadow.p2m_freelist);
+    INIT_PAGE_LIST_HEAD(&d->arch.paging.shadow.freelist);
     INIT_PAGE_LIST_HEAD(&d->arch.paging.shadow.pinned_shadows);
 
     /* Use shadow pagetables for log-dirty support */
@@ -1229,11 +1226,6 @@ int shadow_cmpxchg_guest_entry(struct vc
  * sl3 size         -    -    -    -    4k
  * sl4 size         -    -    -    -    4k
  *
- * We allocate memory from xen in four-page units and break them down
- * with a simple buddy allocator.  Can't use the xen allocator to handle
- * this as it only works for contiguous zones, and a domain's shadow
- * pool is made of fragments.
- *
  * In HVM guests, the p2m table is built out of shadow pages, and we provide 
  * a function for the p2m management to steal pages, in max-order chunks, from 
  * the free pool.  We don't provide for giving them back, yet.
@@ -1261,57 +1253,31 @@ static unsigned int shadow_min_acceptabl
     return (vcpu_count * 128);
 } 
 
-/* Figure out the order of allocation needed for a given shadow type */
+/* Figure out the size (in pages) of a given shadow type */
 static inline u32
-shadow_order(unsigned int shadow_type) 
+shadow_size(unsigned int shadow_type) 
 {
-    static const u32 type_to_order[SH_type_unused] = {
-        0, /* SH_type_none           */
-        1, /* SH_type_l1_32_shadow   */
-        1, /* SH_type_fl1_32_shadow  */
-        2, /* SH_type_l2_32_shadow   */
-        0, /* SH_type_l1_pae_shadow  */
-        0, /* SH_type_fl1_pae_shadow */
-        0, /* SH_type_l2_pae_shadow  */
-        0, /* SH_type_l2h_pae_shadow */
-        0, /* SH_type_l1_64_shadow   */
-        0, /* SH_type_fl1_64_shadow  */
-        0, /* SH_type_l2_64_shadow   */
-        0, /* SH_type_l2h_64_shadow  */
-        0, /* SH_type_l3_64_shadow   */
-        0, /* SH_type_l4_64_shadow   */
-        2, /* SH_type_p2m_table      */
-        0, /* SH_type_monitor_table  */
-        0  /* SH_type_oos_snapshot   */
+    static const u32 type_to_size[SH_type_unused] = {
+        1, /* SH_type_none           */
+        2, /* SH_type_l1_32_shadow   */
+        2, /* SH_type_fl1_32_shadow  */
+        4, /* SH_type_l2_32_shadow   */
+        1, /* SH_type_l1_pae_shadow  */
+        1, /* SH_type_fl1_pae_shadow */
+        1, /* SH_type_l2_pae_shadow  */
+        1, /* SH_type_l2h_pae_shadow */
+        1, /* SH_type_l1_64_shadow   */
+        1, /* SH_type_fl1_64_shadow  */
+        1, /* SH_type_l2_64_shadow   */
+        1, /* SH_type_l2h_64_shadow  */
+        1, /* SH_type_l3_64_shadow   */
+        1, /* SH_type_l4_64_shadow   */
+        1, /* SH_type_p2m_table      */
+        1, /* SH_type_monitor_table  */
+        1  /* SH_type_oos_snapshot   */
         };
     ASSERT(shadow_type < SH_type_unused);
-    return type_to_order[shadow_type];
-}
-
-static inline unsigned int
-shadow_max_order(struct domain *d)
-{
-    return is_hvm_domain(d) ? SHADOW_MAX_ORDER : 0;
-}
-
-/* Do we have at total of count pages of the requested order free? */
-static inline int space_is_available(
-    struct domain *d,
-    unsigned int order,
-    unsigned int count)
-{
-    for ( ; order <= shadow_max_order(d); ++order )
-    {
-        unsigned int n = count;
-        const struct page_info *sp;
-
-        page_list_for_each ( sp, &d->arch.paging.shadow.freelists[order] )
-            if ( --n == 0 )
-                return 1;
-        count = (count + 1) >> 1;
-    }
-
-    return 0;
+    return type_to_size[shadow_type];
 }
 
 /* Dispatcher function: call the per-mode function that will unhook the
@@ -1357,7 +1323,7 @@ static inline void trace_shadow_prealloc
  * available in the shadow page pool. */
 static void _shadow_prealloc(
     struct domain *d,
-    unsigned int order,
+    unsigned int pages,
     unsigned int count)
 {
     /* Need a vpcu for calling unpins; for now, since we don't have
@@ -1367,8 +1333,7 @@ static void _shadow_prealloc(
     mfn_t smfn;
     int i;
 
-    ASSERT(order <= shadow_max_order(d));
-    if ( space_is_available(d, order, count) ) return;
+    if ( d->arch.paging.shadow.free_pages >= pages ) return;
     
     v = current;
     if ( v->domain != d )
@@ -1386,7 +1351,7 @@ static void _shadow_prealloc(
         sh_unpin(v, smfn);
 
         /* See if that freed up enough space */
-        if ( space_is_available(d, order, count) ) return;
+        if ( d->arch.paging.shadow.free_pages >= pages ) return;
     }
 
     /* Stage two: all shadow pages are in use in hierarchies that are
@@ -1404,7 +1369,7 @@ static void _shadow_prealloc(
                                pagetable_get_mfn(v2->arch.shadow_table[i]), 0);
 
                 /* See if that freed up enough space */
-                if ( space_is_available(d, order, count) )
+                if ( d->arch.paging.shadow.free_pages >= pages )
                 {
                     flush_tlb_mask(&d->domain_dirty_cpumask);
                     return;
@@ -1414,9 +1379,9 @@ static void _shadow_prealloc(
     
     /* Nothing more we can do: all remaining shadows are of pages that
      * hold Xen mappings for some vcpu.  This can never happen. */
-    SHADOW_ERROR("Can't pre-allocate %u order-%u shadow pages!\n"
+    SHADOW_ERROR("Can't pre-allocate %u x %u shadow pages!\n"
                  "  shadow pages total = %u, free = %u, p2m=%u\n",
-                 count, order,
+                 count, pages,
                  d->arch.paging.shadow.total_pages,
                  d->arch.paging.shadow.free_pages,
                  d->arch.paging.shadow.p2m_pages);
@@ -1430,7 +1395,7 @@ static void _shadow_prealloc(
  * to avoid freeing shadows that the caller is currently working on. */
 void shadow_prealloc(struct domain *d, u32 type, unsigned int count)
 {
-    return _shadow_prealloc(d, shadow_order(type), count);
+    return _shadow_prealloc(d, shadow_size(type), count);
 }
 
 /* Deliberately free all the memory we can: this will tear down all of
@@ -1506,6 +1471,7 @@ __initcall(shadow_blow_tables_keyhandler
 __initcall(shadow_blow_tables_keyhandler_init);
 #endif /* !NDEBUG */
 
+/* Accessors for the singly-linked list that's used for hash chains */
 static inline struct page_info *
 next_shadow(const struct page_info *sp)
 {
@@ -1526,42 +1492,29 @@ mfn_t shadow_alloc(struct domain *d,
                     unsigned long backpointer)
 {
     struct page_info *sp = NULL;
-    unsigned int order = shadow_order(shadow_type);
+    unsigned int pages = shadow_size(shadow_type);
     struct page_list_head tmp_list;
     cpumask_t mask;
     void *p;
     int i;
 
     ASSERT(shadow_locked_by_me(d));
-    if (shadow_type == SH_type_p2m_table && order > shadow_max_order(d))
-        order = shadow_max_order(d);
-    ASSERT(order <= shadow_max_order(d));
     ASSERT(shadow_type != SH_type_none);
     perfc_incr(shadow_alloc);
 
-    /* Find smallest order which can satisfy the request. */
-    for ( i = order; i <= SHADOW_MAX_ORDER; i++ )
-        if ( (sp = page_list_remove_head(&d->arch.paging.shadow.freelists[i])) )
-            goto found;
-    
-    /* If we get here, we failed to allocate. This should never happen.
-     * It means that we didn't call shadow_prealloc() correctly before
-     * we allocated.  We can't recover by calling prealloc here, because
-     * we might free up higher-level pages that the caller is working on. */
-    SHADOW_ERROR("Can't allocate %i shadow pages!\n", 1 << order);
-    BUG();
+    if ( d->arch.paging.shadow.free_pages < pages )
+    {
+        /* If we get here, we failed to allocate. This should never
+         * happen.  It means that we didn't call shadow_prealloc()
+         * correctly before we allocated.  We can't recover by calling
+         * prealloc here, because we might free up higher-level pages
+         * that the caller is working on. */
+        SHADOW_ERROR("Can't allocate %i shadow pages!\n", pages);
+        BUG();
+    }
+    d->arch.paging.shadow.free_pages -= pages;
 
- found:
-    /* We may have to halve the chunk a number of times. */
-    while ( i != order )
-    {
-        i--;
-        sp->v.free.order = i;
-        page_list_add_tail(sp, &d->arch.paging.shadow.freelists[i]);
-        sp += 1 << i;
-    }
-    d->arch.paging.shadow.free_pages -= 1 << order;
-
+    /* Backpointers that are MFNs need to be packed into PDXs (PFNs don't) */
     switch (shadow_type)
     {
     case SH_type_fl1_32_shadow:
@@ -1579,34 +1532,36 @@ mfn_t shadow_alloc(struct domain *d,
     INIT_PAGE_LIST_HEAD(&tmp_list);
 
     /* Init page info fields and clear the pages */
-    for ( i = 0; i < 1<<order ; i++ ) 
+    for ( i = 0; i < pages ; i++ ) 
     {
+        sp = page_list_remove_head(&d->arch.paging.shadow.freelist);
         /* Before we overwrite the old contents of this page, 
          * we need to be sure that no TLB holds a pointer to it. */
         mask = d->domain_dirty_cpumask;
-        tlbflush_filter(mask, sp[i].tlbflush_timestamp);
+        tlbflush_filter(mask, sp->tlbflush_timestamp);
         if ( unlikely(!cpus_empty(mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
             flush_tlb_mask(&mask);
         }
         /* Now safe to clear the page for reuse */
-        p = __map_domain_page(sp+i);
+        p = __map_domain_page(sp);
         ASSERT(p != NULL);
         clear_page(p);
         sh_unmap_domain_page(p);
-        INIT_PAGE_LIST_ENTRY(&sp[i].list);
-        sp[i].u.sh.type = shadow_type;
-        sp[i].u.sh.pinned = 0;
-        sp[i].u.sh.count = 0;
-        sp[i].u.sh.head = ( shadow_type >= SH_type_min_shadow 
-                            && shadow_type <= SH_type_max_shadow 
-                            && i == 0 );
-        sp[i].v.sh.back = backpointer;
-        set_next_shadow(&sp[i], NULL);
-        page_list_add_tail(&sp[i], &tmp_list);
+        INIT_PAGE_LIST_ENTRY(&sp->list);
+        page_list_add(sp, &tmp_list);
+        sp->u.sh.type = shadow_type;
+        sp->u.sh.pinned = 0;
+        sp->u.sh.count = 0;
+        sp->u.sh.head = 0;
+        sp->v.sh.back = backpointer;
+        set_next_shadow(sp, NULL);
         perfc_incr(shadow_alloc_count);
     }
+    if ( shadow_type >= SH_type_min_shadow 
+         && shadow_type <= SH_type_max_shadow )
+        sp->u.sh.head = 1;
     return page_to_mfn(sp);
 }
 
@@ -1614,10 +1569,9 @@ mfn_t shadow_alloc(struct domain *d,
 /* Return some shadow pages to the pool. */
 void shadow_free(struct domain *d, mfn_t smfn)
 {
-    struct page_info *sp = mfn_to_page(smfn); 
+    struct page_info *next = NULL, *sp = mfn_to_page(smfn); 
+    unsigned int pages;
     u32 shadow_type;
-    unsigned long order;
-    unsigned long mask;
     int i;
 
     ASSERT(shadow_locked_by_me(d));
@@ -1627,11 +1581,9 @@ void shadow_free(struct domain *d, mfn_t
     ASSERT(shadow_type != SH_type_none);
     ASSERT(shadow_type != SH_type_p2m_table);
     ASSERT(sp->u.sh.head || (shadow_type > SH_type_max_shadow));
-    order = shadow_order(shadow_type);
+    pages = shadow_size(shadow_type);
 
-    d->arch.paging.shadow.free_pages += 1 << order;
-
-    for ( i = 0; i < 1<<order; i++ ) 
+    for ( i = 0; i < pages; i++ ) 
     {
 #if SHADOW_OPTIMIZATIONS & (SHOPT_WRITABLE_HEURISTIC | SHOPT_FAST_EMULATION)
         struct vcpu *v;
@@ -1639,7 +1591,8 @@ void shadow_free(struct domain *d, mfn_t
         {
 #if SHADOW_OPTIMIZATIONS & SHOPT_WRITABLE_HEURISTIC
             /* No longer safe to look for a writeable mapping in this shadow */
-            if ( v->arch.paging.shadow.last_writeable_pte_smfn == mfn_x(smfn) + i ) 
+            if ( v->arch.paging.shadow.last_writeable_pte_smfn 
+                 == mfn_x(page_to_mfn(sp)) ) 
                 v->arch.paging.shadow.last_writeable_pte_smfn = 0;
 #endif
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
@@ -1647,108 +1600,57 @@ void shadow_free(struct domain *d, mfn_t
 #endif
         }
 #endif
+        /* Get the next page before we overwrite the list header */
+        if ( i < pages - 1 )
+            next = pdx_to_page(sp->list.next);
         /* Strip out the type: this is now a free shadow page */
-        sp[i].u.sh.type = sp[i].u.sh.head = 0;
+        sp->u.sh.type = sp->u.sh.head = 0;
         /* Remember the TLB timestamp so we will know whether to flush 
          * TLBs when we reuse the page.  Because the destructors leave the
          * contents of the pages in place, we can delay TLB flushes until
          * just before the allocator hands the page out again. */
-        sp[i].tlbflush_timestamp = tlbflush_current_time();
+        sp->tlbflush_timestamp = tlbflush_current_time();
         perfc_decr(shadow_alloc_count);
+        page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
+        sp = next;
     }
 
-    /* Merge chunks as far as possible. */
-    for ( ; order < shadow_max_order(d); ++order )
-    {
-        mask = 1 << order;
-        if ( (mfn_x(page_to_mfn(sp)) & mask) ) {
-            /* Merge with predecessor block? */
-            if ( ((sp-mask)->u.sh.type != PGT_none) ||
-                 ((sp-mask)->v.free.order != order) )
-                break;
-            sp -= mask;
-            page_list_del(sp, &d->arch.paging.shadow.freelists[order]);
-        } else {
-            /* Merge with successor block? */
-            if ( ((sp+mask)->u.sh.type != PGT_none) ||
-                 ((sp+mask)->v.free.order != order) )
-                break;
-            page_list_del(sp + mask, &d->arch.paging.shadow.freelists[order]);
-        }
-    }
-
-    sp->v.free.order = order;
-    page_list_add_tail(sp, &d->arch.paging.shadow.freelists[order]);
+    d->arch.paging.shadow.free_pages += pages;
 }
 
-/* Divert some memory from the pool to be used by the p2m mapping.
+/* Divert a page from the pool to be used by the p2m mapping.
  * This action is irreversible: the p2m mapping only ever grows.
  * That's OK because the p2m table only exists for translated domains,
- * and those domains can't ever turn off shadow mode.
- * Also, we only ever allocate a max-order chunk, so as to preserve
- * the invariant that shadow_prealloc() always works.
- * Returns 0 iff it can't get a chunk (the caller should then
- * free up some pages in domheap and call sh_set_allocation);
- * returns non-zero on success.
- */
-static int
-sh_alloc_p2m_pages(struct domain *d)
-{
-    struct page_info *pg;
-    u32 i;
-    unsigned int order = shadow_max_order(d);
-
-    ASSERT(shadow_locked_by_me(d));
-    
-    if ( d->arch.paging.shadow.total_pages 
-         < (shadow_min_acceptable_pages(d) + (1 << order)) )
-        return 0; /* Not enough shadow memory: need to increase it first */
-    
-    shadow_prealloc(d, SH_type_p2m_table, 1);
-    pg = mfn_to_page(shadow_alloc(d, SH_type_p2m_table, 0));
-    d->arch.paging.shadow.p2m_pages += (1 << order);
-    d->arch.paging.shadow.total_pages -= (1 << order);
-    for (i = 0; i < (1U << order); i++)
-    {
-        /* Unlike shadow pages, mark p2m pages as owned by the domain.
-         * Marking the domain as the owner would normally allow the guest to
-         * create mappings of these pages, but these p2m pages will never be
-         * in the domain's guest-physical address space, and so that is not
-         * believed to be a concern.
-         */
-        page_set_owner(&pg[i], d);
-        pg[i].count_info |= 1;
-        page_list_add_tail(&pg[i], &d->arch.paging.shadow.p2m_freelist);
-    }
-    return 1;
-}
-
-// Returns 0 if no memory is available...
+ * and those domains can't ever turn off shadow mode. */
 static struct page_info *
 shadow_alloc_p2m_page(struct p2m_domain *p2m)
 {
     struct domain *d = p2m->domain;
     struct page_info *pg;
-    mfn_t mfn;
-    void *p;
     
     shadow_lock(d);
 
-    if ( page_list_empty(&d->arch.paging.shadow.p2m_freelist) &&
-         !sh_alloc_p2m_pages(d) )
+    if ( d->arch.paging.shadow.total_pages 
+         < shadow_min_acceptable_pages(d) + 1 )
     {
         shadow_unlock(d);
         return NULL;
     }
-    pg = page_list_remove_head(&d->arch.paging.shadow.p2m_freelist);
+ 
+    shadow_prealloc(d, SH_type_p2m_table, 1);
+    pg = mfn_to_page(shadow_alloc(d, SH_type_p2m_table, 0));
 
     shadow_unlock(d);
 
-    mfn = page_to_mfn(pg);
-    p = sh_map_domain_page(mfn);
-    clear_page(p);
-    sh_unmap_domain_page(p);
-
+    /* Unlike shadow pages, mark p2m pages as owned by the domain.
+     * Marking the domain as the owner would normally allow the guest to
+     * create mappings of these pages, but these p2m pages will never be
+     * in the domain's guest-physical address space, and so that is not
+     * believed to be a concern. */
+    page_set_owner(pg, d);
+    pg->count_info |= 1;
+    d->arch.paging.shadow.p2m_pages++;
+    d->arch.paging.shadow.total_pages--;
     return pg;
 }
 
@@ -1827,7 +1729,6 @@ static unsigned int sh_set_allocation(st
 {
     struct page_info *sp;
     unsigned int lower_bound;
-    unsigned int j, order = shadow_max_order(d);
 
     ASSERT(shadow_locked_by_me(d));
 
@@ -1844,9 +1745,6 @@ static unsigned int sh_set_allocation(st
         lower_bound = shadow_min_acceptable_pages(d) + (d->tot_pages / 256);
         if ( pages < lower_bound )
             pages = lower_bound;
-        
-        /* Round up to largest block size */
-        pages = (pages + ((1<<SHADOW_MAX_ORDER)-1)) & ~((1<<SHADOW_MAX_ORDER)-1);
     }
 
     SHADOW_PRINTK("current %i target %i\n", 
@@ -1858,39 +1756,34 @@ static unsigned int sh_set_allocation(st
         {
             /* Need to allocate more memory from domheap */
             sp = (struct page_info *)
-                alloc_domheap_pages(NULL, order, MEMF_node(domain_to_node(d)));
+                alloc_domheap_page(NULL, MEMF_node(domain_to_node(d)));
             if ( sp == NULL ) 
             { 
                 SHADOW_PRINTK("failed to allocate shadow pages.\n");
                 return -ENOMEM;
             }
-            d->arch.paging.shadow.free_pages += 1 << order;
-            d->arch.paging.shadow.total_pages += 1 << order;
-            for ( j = 0; j < 1U << order; j++ )
-            {
-                sp[j].u.sh.type = 0;
-                sp[j].u.sh.pinned = 0;
-                sp[j].u.sh.count = 0;
-                sp[j].tlbflush_timestamp = 0; /* Not in any TLB */
-            }
-            sp->v.free.order = order;
-            page_list_add_tail(sp, &d->arch.paging.shadow.freelists[order]);
+            d->arch.paging.shadow.free_pages++;
+            d->arch.paging.shadow.total_pages++;
+            sp->u.sh.type = 0;
+            sp->u.sh.pinned = 0;
+            sp->u.sh.count = 0;
+            sp->tlbflush_timestamp = 0; /* Not in any TLB */
+            page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
         } 
         else if ( d->arch.paging.shadow.total_pages > pages ) 
         {
             /* Need to return memory to domheap */
-            _shadow_prealloc(d, order, 1);
-            sp = page_list_remove_head(&d->arch.paging.shadow.freelists[order]);
+            _shadow_prealloc(d, 1, 1);
+            sp = page_list_remove_head(&d->arch.paging.shadow.freelist);
             ASSERT(sp);
             /*
              * The pages were allocated anonymously, but the owner field
              * gets overwritten normally, so need to clear it here.
              */
-            for ( j = 0; j < 1U << order; j++ )
-                page_set_owner(&((struct page_info *)sp)[j], NULL);
-            d->arch.paging.shadow.free_pages -= 1 << order;
-            d->arch.paging.shadow.total_pages -= 1 << order;
-            free_domheap_pages((struct page_info *)sp, order);
+            page_set_owner(sp, NULL);
+            d->arch.paging.shadow.free_pages--;
+            d->arch.paging.shadow.total_pages--;
+            free_domheap_page(sp);
         }
 
         /* Check to see if we need to yield and try again */
@@ -3223,7 +3116,6 @@ void shadow_teardown(struct domain *d)
 {
     struct vcpu *v;
     mfn_t mfn;
-    struct page_info *pg;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     ASSERT(d->is_dying);
@@ -3277,9 +3169,6 @@ void shadow_teardown(struct domain *d)
 #endif /* OOS */
     }
 #endif /* (SHADOW_OPTIMIZATIONS & (SHOPT_VIRTUAL_TLB|SHOPT_OUT_OF_SYNC)) */
-
-    while ( (pg = page_list_remove_head(&d->arch.paging.shadow.p2m_freelist)) )
-        shadow_free_p2m_page(p2m, pg);
 
     if ( d->arch.paging.shadow.total_pages != 0 )
     {
diff -r 10627fb7c8cf -r 22241338e51a xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:14 2010 +0100
+++ b/xen/arch/x86/mm/shadow/multi.c	Fri Aug 20 16:52:14 2010 +0100
@@ -437,7 +437,6 @@ static inline mfn_t sh_next_page(mfn_t s
 
     next = _mfn(pdx_to_pfn(pg->list.next));
 
-    /* XXX not for long */ ASSERT(mfn_x(next) == mfn_x(smfn) + 1);
     ASSERT(mfn_to_page(next)->u.sh.type == pg->u.sh.type);
     ASSERT(!mfn_to_page(next)->u.sh.head);
     return next;
diff -r 10627fb7c8cf -r 22241338e51a xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Fri Aug 20 16:52:14 2010 +0100
+++ b/xen/include/asm-x86/domain.h	Fri Aug 20 16:52:14 2010 +0100
@@ -97,8 +97,7 @@ struct shadow_domain {
     struct page_list_head pinned_shadows;
 
     /* Memory allocation */
-    struct page_list_head freelists[SHADOW_MAX_ORDER + 1];
-    struct page_list_head p2m_freelist;
+    struct page_list_head freelist;
     unsigned int      total_pages;  /* number of pages allocated */
     unsigned int      free_pages;   /* number of pages on freelists */
     unsigned int      p2m_pages;    /* number of pages allocates to p2m */
diff -r 10627fb7c8cf -r 22241338e51a xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h	Fri Aug 20 16:52:14 2010 +0100
+++ b/xen/include/asm-x86/mm.h	Fri Aug 20 16:52:14 2010 +0100
@@ -35,7 +35,7 @@ struct page_info
     union {
         /* Each frame can be threaded onto a doubly-linked list.
          *
-         * For unused shadow pages, a list of pages of this order; 
+         * For unused shadow pages, a list of free shadow pages;
          * for multi-page shadows, links to the other pages in this shadow;
          * for pinnable shadows, if pinned, a list of all pinned shadows
          * (see sh_type_is_pinnable() for the definition of "pinnable" 
@@ -94,7 +94,7 @@ struct page_info
             __pdx_t back;
         } sh;
 
-        /* Page is on a free list (including shadow code free lists). */
+        /* Page is on a free list. */
         struct {
             /* Order-size of the free chunk this page is the head of. */
             unsigned int order;
@@ -258,9 +258,6 @@ struct spage_info
 #elif defined(__x86_64__)
 #define PRtype_info "016lx"/* should only be used for printk's */
 #endif
-
-/* The order of the largest allocation unit we use for shadow pages */
-#define SHADOW_MAX_ORDER 2 /* Need up to 16k allocs for 32-bit on PAE/64 */
 
 /* The number of out-of-sync shadows we allow per vcpu (prime, please) */
 #define SHADOW_OOS_PAGES 3

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory
  2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
                   ` (3 preceding siblings ...)
  2010-08-20 15:58 ` [PATCH 4 of 4] x86 shadow: allocate all shadow memory in single pages Tim Deegan
@ 2010-08-20 16:29 ` Dan Magenheimer
  2010-08-20 16:49   ` Keir Fraser
  4 siblings, 1 reply; 7+ messages in thread
From: Dan Magenheimer @ 2010-08-20 16:29 UTC (permalink / raw)
  To: Tim Deegan, xen-devel

ooohh!!  thank you thank you

/me wonders how many other order>0 allocations are left now
that will break a running or newly-launching domain when the
allocation fails due to fragmentation, and if it will be easy
to track them down and shoot them

> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com]
> Sent: Friday, August 20, 2010 9:58 AM
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH 0 of 4] [RFC] x86 shadow: get rid of the
> need for contiguous memory
> 
> 
> This series of patches removes the need for shadow pagetable memory to
> be allocated in 4-page contiguous blocks, by reusing the page_info
> list header for yet one more thing.
> 
> It fixes a long-standing issue where on a fairly full machine which has
> seen a lot of ballooning HVM domain creation can fail because the
> remaining memory is too fragmented to use for shadows.
> 
> Posting as an RFC for now because I haven't had a chance to do any
> heavy testing (compile tests under 32-bit WinXP seem fine though)
> and I'm away for the next week.  I hope to fold in any feedback
> and commit this change the week after that.
> 
> Cheers,
> 
> Tim.
> 
> 5 files changed, 332 insertions(+), 301 deletions(-)
> xen/arch/x86/mm/shadow/common.c  |  345 +++++++++++++------------------
> -------
> xen/arch/x86/mm/shadow/multi.c   |  143 +++++++++------
> xen/arch/x86/mm/shadow/private.h |  117 +++++++++++-
> xen/include/asm-x86/domain.h     |    3
> xen/include/asm-x86/mm.h         |   25 +-
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory
  2010-08-20 16:29 ` [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Dan Magenheimer
@ 2010-08-20 16:49   ` Keir Fraser
  0 siblings, 0 replies; 7+ messages in thread
From: Keir Fraser @ 2010-08-20 16:49 UTC (permalink / raw)
  To: Dan Magenheimer, Tim Deegan, xen-devel@lists.xensource.com

Hack up the vcpu dand domain structures and handle only being able to
allocate a single-page hypercall compat tranlsation page, and that might be
pretty much it.

 -- Keir

On 20/08/2010 17:29, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> ooohh!!  thank you thank you
> 
> /me wonders how many other order>0 allocations are left now
> that will break a running or newly-launching domain when the
> allocation fails due to fragmentation, and if it will be easy
> to track them down and shoot them
> 
>> -----Original Message-----
>> From: Tim Deegan [mailto:Tim.Deegan@citrix.com]
>> Sent: Friday, August 20, 2010 9:58 AM
>> To: xen-devel@lists.xensource.com
>> Subject: [Xen-devel] [PATCH 0 of 4] [RFC] x86 shadow: get rid of the
>> need for contiguous memory
>> 
>> 
>> This series of patches removes the need for shadow pagetable memory to
>> be allocated in 4-page contiguous blocks, by reusing the page_info
>> list header for yet one more thing.
>> 
>> It fixes a long-standing issue where on a fairly full machine which has
>> seen a lot of ballooning HVM domain creation can fail because the
>> remaining memory is too fragmented to use for shadows.
>> 
>> Posting as an RFC for now because I haven't had a chance to do any
>> heavy testing (compile tests under 32-bit WinXP seem fine though)
>> and I'm away for the next week.  I hope to fold in any feedback
>> and commit this change the week after that.
>> 
>> Cheers,
>> 
>> Tim.
>> 
>> 5 files changed, 332 insertions(+), 301 deletions(-)
>> xen/arch/x86/mm/shadow/common.c  |  345 +++++++++++++------------------
>> -------
>> xen/arch/x86/mm/shadow/multi.c   |  143 +++++++++------
>> xen/arch/x86/mm/shadow/private.h |  117 +++++++++++-
>> xen/include/asm-x86/domain.h     |    3
>> xen/include/asm-x86/mm.h         |   25 +-
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-08-20 16:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-20 15:57 [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Tim Deegan
2010-08-20 15:58 ` [PATCH 1 of 4] x86 shadow: for multi-page shadows, explicitly track the first page Tim Deegan
2010-08-20 15:58 ` [PATCH 2 of 4] x86 shadow: explicitly link the pages of multipage shadows Tim Deegan
2010-08-20 15:58 ` [PATCH 3 of 4] x86 shadow: remove the assumption that multipage shadows are contiguous Tim Deegan
2010-08-20 15:58 ` [PATCH 4 of 4] x86 shadow: allocate all shadow memory in single pages Tim Deegan
2010-08-20 16:29 ` [PATCH 0 of 4] [RFC] x86 shadow: get rid of the need for contiguous memory Dan Magenheimer
2010-08-20 16:49   ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).