[PATCH v2 for-next v2 0/8] Refactor x86 mm.c

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 for-next v2 0/8] Refactor x86 mm.c
@ 2017-04-03 11:22 Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

This serie split PV and HVM specific code from x86/mm.c and x86_64/mm.c.

A lot more code is moved in this version, given that I am now able to move
do_mmuext_op.

Wei.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Tim Deegan <tim@xen.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

Wei Liu (8):
  mm: provide more grep fodders
  x86/mm: carve out create_grant_pv_mapping
  x86/mm: carve out replace_grant_pv_mapping
  x86/mm: extract PAGE_CACHE_ATTRS to mm.h
  x86/mm: split PV guest supporting code to pv/mm.c
  x86/mm: move two PV hypercalls from x86_64/mm.c to pv/mm.c
  x86/mm: move check_descriptor to pv/mm.c
  x86/mm: split HVM grant table code to hvm/mm.c

 xen/arch/x86/hvm/Makefile         |    1 +
 xen/arch/x86/hvm/mm.c             |   85 +
 xen/arch/x86/mm.c                 | 4983 ++++---------------------------------
 xen/arch/x86/pv/Makefile          |    1 +
 xen/arch/x86/pv/mm.c              | 4272 +++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c          |  157 --
 xen/include/asm-x86/grant_table.h |    9 +
 xen/include/asm-x86/mm.h          |   13 +-
 xen/include/xen/mm.h              |   13 +
 9 files changed, 4830 insertions(+), 4704 deletions(-)
 create mode 100644 xen/arch/x86/hvm/mm.c
 create mode 100644 xen/arch/x86/pv/mm.c

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 1/8] mm: provide more grep fodders
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-03 11:26   ` George Dunlap
  2017-04-19 15:08   ` Jan Beulich
  2017-04-03 11:22 ` [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping Wei Liu
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Define several _* and *_x macros for better grep-ability. This also
helps indexing tool like GNU Global.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/mm.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1fa6..c92cba41a0 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -60,7 +60,11 @@ TYPE_SAFE(unsigned long, mfn);
 
 #ifndef mfn_t
 #define mfn_t /* Grep fodder: mfn_t, _mfn() and mfn_x() are defined above */
+#define _mfn
+#define mfn_x
 #undef mfn_t
+#undef _mfn
+#undef mfn_x
 #endif
 
 static inline mfn_t mfn_add(mfn_t mfn, unsigned long i)
@@ -89,7 +93,11 @@ TYPE_SAFE(unsigned long, gfn);
 
 #ifndef gfn_t
 #define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above */
+#define _gfn
+#define gfn_x
 #undef gfn_t
+#undef _gfn
+#undef gfn_x
 #endif
 
 static inline gfn_t gfn_add(gfn_t gfn, unsigned long i)
@@ -118,7 +126,11 @@ TYPE_SAFE(unsigned long, pfn);
 
 #ifndef pfn_t
 #define pfn_t /* Grep fodder: pfn_t, _pfn() and pfn_x() are defined above */
+#define _pfn
+#define pfn_x
 #undef pfn_t
+#undef _pfn
+#undef pfn_x
 #endif
 
 struct page_info;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 1/8] mm: provide more grep fodders
  2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
@ 2017-04-03 11:26   ` George Dunlap
  2017-04-19 15:08   ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: George Dunlap @ 2017-04-03 11:26 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

On 03/04/17 12:22, Wei Liu wrote:
> Define several _* and *_x macros for better grep-ability. This also
> helps indexing tool like GNU Global.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

"fodder" is uncountable, so the patch title should be "provide more grep
fodder".

With that fixed:

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
>  xen/include/xen/mm.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> index 88de3c1fa6..c92cba41a0 100644
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -60,7 +60,11 @@ TYPE_SAFE(unsigned long, mfn);
>  
>  #ifndef mfn_t
>  #define mfn_t /* Grep fodder: mfn_t, _mfn() and mfn_x() are defined above */
> +#define _mfn
> +#define mfn_x
>  #undef mfn_t
> +#undef _mfn
> +#undef mfn_x
>  #endif
>  
>  static inline mfn_t mfn_add(mfn_t mfn, unsigned long i)
> @@ -89,7 +93,11 @@ TYPE_SAFE(unsigned long, gfn);
>  
>  #ifndef gfn_t
>  #define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above */
> +#define _gfn
> +#define gfn_x
>  #undef gfn_t
> +#undef _gfn
> +#undef gfn_x
>  #endif
>  
>  static inline gfn_t gfn_add(gfn_t gfn, unsigned long i)
> @@ -118,7 +126,11 @@ TYPE_SAFE(unsigned long, pfn);
>  
>  #ifndef pfn_t
>  #define pfn_t /* Grep fodder: pfn_t, _pfn() and pfn_x() are defined above */
> +#define _pfn
> +#define pfn_x
>  #undef pfn_t
> +#undef _pfn
> +#undef pfn_x
>  #endif
>  
>  struct page_info;
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 1/8] mm: provide more grep fodders
  2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
  2017-04-03 11:26   ` George Dunlap
@ 2017-04-19 15:08   ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:08 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> Define several _* and *_x macros for better grep-ability. This also
> helps indexing tool like GNU Global.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-19 15:12   ` Jan Beulich
  2017-04-03 11:22 ` [PATCH v2 for-next v2 3/8] x86/mm: carve out replace_grant_pv_mapping Wei Liu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

We will later split out PV specific code to pv/mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index be4e308608..2137f2b6f8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4269,15 +4269,12 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
         return GNTST_okay;
 }
 
-int create_grant_host_mapping(uint64_t addr, unsigned long frame, 
-                              unsigned int flags, unsigned int cache_flags)
+static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                                   unsigned int flags, unsigned int cache_flags)
 {
     l1_pgentry_t pte;
     uint32_t grant_pte_flags;
 
-    if ( paging_mode_external(current->domain) )
-        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
-
     grant_pte_flags =
         _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
     if ( cpu_has_nx )
@@ -4300,6 +4297,15 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
     return create_grant_va_mapping(addr, pte, current);
 }
 
+int create_grant_host_mapping(uint64_t addr, unsigned long frame,
+                              unsigned int flags, unsigned int cache_flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
+
+    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
+}
+
 static int replace_grant_p2m_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping
  2017-04-03 11:22 ` [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-04-19 15:12   ` Jan Beulich
  2017-04-21 10:13     ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:12 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> @@ -4300,6 +4297,15 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
>      return create_grant_va_mapping(addr, pte, current);
>  }
>  
> +int create_grant_host_mapping(uint64_t addr, unsigned long frame,
> +                              unsigned int flags, unsigned int cache_flags)
> +{
> +    if ( paging_mode_external(current->domain) )
> +        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
> +
> +    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
> +}

I wonder whether this and its siblings wouldn't better become inline
functions.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping
  2017-04-19 15:12   ` Jan Beulich
@ 2017-04-21 10:13     ` Wei Liu
  2017-04-21 10:36       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-21 10:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, WeiLiu, Xen-devel

On Wed, Apr 19, 2017 at 09:12:54AM -0600, Jan Beulich wrote:
> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> > @@ -4300,6 +4297,15 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
> >      return create_grant_va_mapping(addr, pte, current);
> >  }
> >  
> > +int create_grant_host_mapping(uint64_t addr, unsigned long frame,
> > +                              unsigned int flags, unsigned int cache_flags)
> > +{
> > +    if ( paging_mode_external(current->domain) )
> > +        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
> > +
> > +    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
> > +}
> 
> I wonder whether this and its siblings wouldn't better become inline
> functions.
> 

Do you want me to just add "inline"? That's fine by me.

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping
  2017-04-21 10:13     ` Wei Liu
@ 2017-04-21 10:36       ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-04-21 10:36 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 21.04.17 at 12:13, <wei.liu2@citrix.com> wrote:
> On Wed, Apr 19, 2017 at 09:12:54AM -0600, Jan Beulich wrote:
>> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
>> > @@ -4300,6 +4297,15 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
>> >      return create_grant_va_mapping(addr, pte, current);
>> >  }
>> >  
>> > +int create_grant_host_mapping(uint64_t addr, unsigned long frame,
>> > +                              unsigned int flags, unsigned int cache_flags)
>> > +{
>> > +    if ( paging_mode_external(current->domain) )
>> > +        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
>> > +
>> > +    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
>> > +}
>> 
>> I wonder whether this and its siblings wouldn't better become inline
>> functions.
> 
> Do you want me to just add "inline"? That's fine by me.

Well, plus "static", i.e. the implication was for them to also move
to asm-x86/grant_table.h then.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 3/8] x86/mm: carve out replace_grant_pv_mapping
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h Wei Liu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

We will later split out PV specific code to pv/mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2137f2b6f8..823aba01a8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4332,23 +4332,20 @@ static int replace_grant_p2m_mapping(
     return GNTST_okay;
 }
 
-int replace_grant_host_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
+static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                                    uint64_t new_addr, unsigned int flags)
 {
     struct vcpu *curr = current;
     l1_pgentry_t *pl1e, ol1e;
     unsigned long gl1mfn;
     struct page_info *l1pg;
     int rc;
-    
-    if ( paging_mode_external(current->domain) )
-        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
 
     if ( flags & GNTMAP_contains_pte )
     {
         if ( !new_addr )
             return destroy_grant_pte_mapping(addr, frame, curr->domain);
-        
+
         return GNTST_general_error;
     }
 
@@ -4408,6 +4405,15 @@ int replace_grant_host_mapping(
     return rc;
 }
 
+int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
+                               uint64_t new_addr, unsigned int flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
+
+    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
+}
+
 int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
                   ` (2 preceding siblings ...)
  2017-04-03 11:22 ` [PATCH v2 for-next v2 3/8] x86/mm: carve out replace_grant_pv_mapping Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-19 15:15   ` Jan Beulich
  2017-04-03 11:22 ` [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c Wei Liu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 --
 xen/include/asm-x86/mm.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 823aba01a8..e1ce77b9ac 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -148,8 +148,6 @@ bool_t __read_mostly machine_to_phys_mapping_valid = 0;
 
 struct rangeset *__read_mostly mmio_ro_ranges;
 
-#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
-
 bool_t __read_mostly opt_allow_superpage;
 boolean_param("allowsuperpage", opt_allow_superpage);
 
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index e22603ce98..8e55593154 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -602,4 +602,6 @@ extern const char zero_page[];
 /* Build a 32bit PSE page table using 4MB pages. */
 void write_32bit_pse_identmap(uint32_t *l2);
 
+#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h
  2017-04-03 11:22 ` [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h Wei Liu
@ 2017-04-19 15:15   ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:15 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -602,4 +602,6 @@ extern const char zero_page[];
>  /* Build a 32bit PSE page table using 4MB pages. */
>  void write_32bit_pse_identmap(uint32_t *l2);
>  
> +#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)

These better go next to the _PAGE_* definitions (in page.h).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
                   ` (3 preceding siblings ...)
  2017-04-03 11:22 ` [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-19 15:31   ` Jan Beulich
  2017-04-03 11:22 ` [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c " Wei Liu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Move the following PV specific code to the new file:

1. Several hypercalls that are tied to PV:
   1. do_mmuext_op
   2. do_mmu_update
   3. do_update_va_mapping
   4. do_update_va_mapping_otherdomain
   5. do_set_gdt
   6. do_update_descriptor
2. PV MMIO emulation code
3. PV writable page table emulation code
4. PV grant table mapping creation / destruction code
5. Other supporting code for the above items

Move everything in one patch because they share a lot of code. Also move
the PV page table API comment to the new file. Remove all trailing
white spaces.

Due to the code movement, a few functions are exported via relevant
header files. Some configuration variables are made non-static.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c                 | 4964 ++++---------------------------------
 xen/arch/x86/pv/Makefile          |    1 +
 xen/arch/x86/pv/mm.c              | 4118 ++++++++++++++++++++++++++++++
 xen/include/asm-x86/grant_table.h |    4 +
 xen/include/asm-x86/mm.h          |    9 +
 xen/include/xen/mm.h              |    1 +
 6 files changed, 4581 insertions(+), 4516 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e1ce77b9ac..169ae7e4a1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -18,71 +18,6 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * A description of the x86 page table API:
- * 
- * Domains trap to do_mmu_update with a list of update requests.
- * This is a list of (ptr, val) pairs, where the requested operation
- * is *ptr = val.
- * 
- * Reference counting of pages:
- * ----------------------------
- * Each page has two refcounts: tot_count and type_count.
- * 
- * TOT_COUNT is the obvious reference count. It counts all uses of a
- * physical page frame by a domain, including uses as a page directory,
- * a page table, or simple mappings via a PTE. This count prevents a
- * domain from releasing a frame back to the free pool when it still holds
- * a reference to it.
- * 
- * TYPE_COUNT is more subtle. A frame can be put to one of three
- * mutually-exclusive uses: it might be used as a page directory, or a
- * page table, or it may be mapped writable by the domain [of course, a
- * frame may not be used in any of these three ways!].
- * So, type_count is a count of the number of times a frame is being 
- * referred to in its current incarnation. Therefore, a page can only
- * change its type when its type count is zero.
- * 
- * Pinning the page type:
- * ----------------------
- * The type of a page can be pinned/unpinned with the commands
- * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
- * pinning is not reference counted, so it can't be nested).
- * This is useful to prevent a page's type count falling to zero, at which
- * point safety checks would need to be carried out next time the count
- * is increased again.
- * 
- * A further note on writable page mappings:
- * -----------------------------------------
- * For simplicity, the count of writable mappings for a page may not
- * correspond to reality. The 'writable count' is incremented for every
- * PTE which maps the page with the _PAGE_RW flag set. However, for
- * write access to be possible the page directory entry must also have
- * its _PAGE_RW bit set. We do not check this as it complicates the 
- * reference counting considerably [consider the case of multiple
- * directory entries referencing a single page table, some with the RW
- * bit set, others not -- it starts getting a bit messy].
- * In normal use, this simplification shouldn't be a problem.
- * However, the logic can be added if required.
- * 
- * One more note on read-only page mappings:
- * -----------------------------------------
- * We want domains to be able to map pages for read-only access. The
- * main reason is that page tables and directories should be readable
- * by a domain, but it would not be safe for them to be writable.
- * However, domains have free access to rings 1 & 2 of the Intel
- * privilege model. In terms of page protection, these are considered
- * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
- * read-only restrictions are respected in supervisor mode -- if the 
- * bit is clear then any mapped page is writable.
- * 
- * We get round this by always setting the WP bit and disallowing 
- * updates to it. This is very unlikely to cause a problem for guest
- * OS's, which will generally use the WP bit to simplify copy-on-write
- * implementation (in that case, OS wants a fault when it writes to
- * an application-supplied buffer).
- */
-
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -151,30 +86,9 @@ struct rangeset *__read_mostly mmio_ro_ranges;
 bool_t __read_mostly opt_allow_superpage;
 boolean_param("allowsuperpage", opt_allow_superpage);
 
-static void put_superpage(unsigned long mfn);
-
-static uint32_t base_disallow_mask;
-/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
-#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
-
-#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
-                          ? base_disallow_mask & ~_PAGE_PSE \
-                          : base_disallow_mask)
-
-#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
-                             base_disallow_mask : 0xFFFFF198U)
-
-#define L4_DISALLOW_MASK (base_disallow_mask)
-
-#define l1_disallow_mask(d)                                     \
-    ((d != dom_io) &&                                           \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+uint32_t base_disallow_mask;
 
-static s8 __read_mostly opt_mmio_relax;
+s8 __read_mostly opt_mmio_relax;
 static void __init parse_mmio_relax(const char *s)
 {
     if ( !*s )
@@ -539,165 +453,7 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/* Get a mapping of a PV guest's l1e for this virtual address. */
-static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
-{
-    l2_pgentry_t l2e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) )
-        return NULL;
-
-    /* Find this l1e and its enclosing l1mfn in the linear map. */
-    if ( __copy_from_user(&l2e,
-                          &__linear_l2_table[l2_linear_offset(addr)],
-                          sizeof(l2_pgentry_t)) )
-        return NULL;
-
-    /* Check flags that it will be safe to read the l1e. */
-    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
-        return NULL;
-
-    *gl1mfn = l2e_get_pfn(l2e);
-
-    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
-           l1_table_offset(addr);
-}
-
-/* Pull down the mapping we got from guest_map_l1e(). */
-static inline void guest_unmap_l1e(void *p)
-{
-    unmap_domain_page(p);
-}
-
-/* Read a PV guest's l1e that maps this virtual address. */
-static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
-{
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) ||
-         __copy_from_user(eff_l1e,
-                          &__linear_l1_table[l1_linear_offset(addr)],
-                          sizeof(l1_pgentry_t)) )
-        *eff_l1e = l1e_empty();
-}
-
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
-                                          void *eff_l1e)
-{
-    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    TOGGLE_MODE();
-    guest_get_eff_l1e(addr, eff_l1e);
-    TOGGLE_MODE();
-}
-
-const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
-    zero_page[PAGE_SIZE];
-
-static void invalidate_shadow_ldt(struct vcpu *v, int flush)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    struct page_info *page;
-
-    BUG_ON(unlikely(in_irq()));
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
-        goto out;
-
-    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-
-    for ( i = 16; i < 32; i++ )
-    {
-        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
-            continue;
-        page = l1e_get_page(pl1e[i]);
-        l1e_write(&pl1e[i], l1e_empty());
-        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
-        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
-        put_page_and_type(page);
-    }
-
-    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
-    if ( flush )
-        flush_tlb_mask(v->vcpu_dirty_cpumask);
-
- out:
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-}
-
-
-static int alloc_segdesc_page(struct page_info *page)
-{
-    const struct domain *owner = page_get_owner(page);
-    struct desc_struct *descs = __map_domain_page(page);
-    unsigned i;
-
-    for ( i = 0; i < 512; i++ )
-        if ( unlikely(!check_descriptor(owner, &descs[i])) )
-            break;
-
-    unmap_domain_page(descs);
-
-    return i == 512 ? 0 : -EINVAL;
-}
-
-
-/* Map shadow page at offset @off. */
-int map_ldt_shadow_page(unsigned int off)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    unsigned long gmfn;
-    struct page_info *page;
-    l1_pgentry_t l1e, nl1e;
-    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
-    int okay;
-
-    BUG_ON(unlikely(in_irq()));
-
-    if ( is_pv_32bit_domain(d) )
-        gva = (u32)gva;
-    guest_get_eff_kern_l1e(v, gva, &l1e);
-    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
-        return 0;
-
-    gmfn = l1e_get_pfn(l1e);
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-    if ( unlikely(!page) )
-        return 0;
-
-    okay = get_page_type(page, PGT_seg_desc_page);
-    if ( unlikely(!okay) )
-    {
-        put_page(page);
-        return 0;
-    }
-
-    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
-    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    return 1;
-}
-
-
-static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
 
@@ -712,11 +468,11 @@ static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 }
 
 
-static int get_page_and_type_from_pagenr(unsigned long page_nr, 
-                                         unsigned long type,
-                                         struct domain *d,
-                                         int partial,
-                                         int preemptible)
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible)
 {
     struct page_info *page = mfn_to_page(page_nr);
     int rc;
@@ -736,72 +492,6 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr,
     return rc;
 }
 
-static void put_data_page(
-    struct page_info *page, int writeable)
-{
-    if ( writeable )
-        put_page_and_type(page);
-    else
-        put_page(page);
-}
-
-/*
- * We allow root tables to map each other (a.k.a. linear page tables). It
- * needs some special care with reference counts and access permissions:
- *  1. The mapping entry must be read-only, or the guest may get write access
- *     to its own PTEs.
- *  2. We must only bump the reference counts for an *already validated*
- *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
- *     on a validation that is required to complete that validation.
- *  3. We only need to increment the reference counts for the mapped page
- *     frame if it is mapped by a different root table. This is sufficient and
- *     also necessary to allow validation of a root table mapping itself.
- */
-#define define_get_linear_pagetable(level)                                  \
-static int                                                                  \
-get_##level##_linear_pagetable(                                             \
-    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
-{                                                                           \
-    unsigned long x, y;                                                     \
-    struct page_info *page;                                                 \
-    unsigned long pfn;                                                      \
-                                                                            \
-    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
-    {                                                                       \
-        gdprintk(XENLOG_WARNING,                                            \
-                 "Attempt to create linear p.t. with write perms\n");       \
-        return 0;                                                           \
-    }                                                                       \
-                                                                            \
-    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
-    {                                                                       \
-        /* Make sure the mapped frame belongs to the correct domain. */     \
-        if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
-            return 0;                                                       \
-                                                                            \
-        /*                                                                  \
-         * Ensure that the mapped frame is an already-validated page table. \
-         * If so, atomically increment the count (checking for overflow).   \
-         */                                                                 \
-        page = mfn_to_page(pfn);                                            \
-        y = page->u.inuse.type_info;                                        \
-        do {                                                                \
-            x = y;                                                          \
-            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
-                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
-                          (PGT_##level##_page_table|PGT_validated)) )       \
-            {                                                               \
-                put_page(page);                                             \
-                return 0;                                                   \
-            }                                                               \
-        }                                                                   \
-        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
-    }                                                                       \
-                                                                            \
-    return 1;                                                               \
-}
-
-
 bool is_iomem_page(mfn_t mfn)
 {
     struct page_info *page;
@@ -816,7 +506,7 @@ bool is_iomem_page(mfn_t mfn)
     return (page_get_owner(page) == dom_io);
 }
 
-static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
+int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
 {
     int err = 0;
     bool_t alias = mfn >= PFN_DOWN(xen_phys_start) &&
@@ -834,3414 +524,489 @@ static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
     return err;
 }
 
-#ifndef NDEBUG
-struct mmio_emul_range_ctxt {
-    const struct domain *d;
-    unsigned long mfn;
-};
-
-static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
+bool_t fill_ro_mpt(unsigned long mfn)
 {
-    const struct mmio_emul_range_ctxt *ctxt = arg;
-
-    if ( ctxt->mfn > e )
-        return 0;
+    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
+    bool_t ret = 0;
 
-    if ( ctxt->mfn >= s )
+    if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) )
     {
-        static DEFINE_SPINLOCK(last_lock);
-        static const struct domain *last_d;
-        static unsigned long last_s = ~0UL, last_e;
-        bool_t print = 0;
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+        ret = 1;
+    }
+    unmap_domain_page(l4tab);
 
-        spin_lock(&last_lock);
-        if ( last_d != ctxt->d || last_s != s || last_e != e )
-        {
-            last_d = ctxt->d;
-            last_s = s;
-            last_e = e;
-            print = 1;
-        }
-        spin_unlock(&last_lock);
+    return ret;
+}
 
-        if ( print )
-            printk(XENLOG_G_INFO
-                   "d%d: Forcing write emulation on MFNs %lx-%lx\n",
-                   ctxt->d->domain_id, s, e);
-    }
+void zap_ro_mpt(unsigned long mfn)
+{
+    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
 
-    return 1;
+    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+    unmap_domain_page(l4tab);
 }
-#endif
 
-int
-get_page_from_l1e(
-    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
+int page_lock(struct page_info *page)
 {
-    unsigned long mfn = l1e_get_pfn(l1e);
-    struct page_info *page = mfn_to_page(mfn);
-    uint32_t l1f = l1e_get_flags(l1e);
-    struct vcpu *curr = current;
-    struct domain *real_pg_owner;
-    bool_t write;
-
-    if ( !(l1f & _PAGE_PRESENT) )
-        return 0;
+    unsigned long x, nx;
 
-    if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                 l1f & l1_disallow_mask(l1e_owner));
-        return -EINVAL;
-    }
+    do {
+        while ( (x = page->u.inuse.type_info) & PGT_locked )
+            cpu_relax();
+        nx = x + (1 | PGT_locked);
+        if ( !(x & PGT_validated) ||
+             !(x & PGT_count_mask) ||
+             !(nx & PGT_count_mask) )
+            return 0;
+    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
 
-    if ( !mfn_valid(_mfn(mfn)) ||
-         (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
-    {
-        int flip = 0;
+    return 1;
+}
 
-        /* Only needed the reference to confirm dom_io ownership. */
-        if ( mfn_valid(_mfn(mfn)) )
-            put_page(page);
+void page_unlock(struct page_info *page)
+{
+    unsigned long x, nx, y = page->u.inuse.type_info;
 
-        /* DOMID_IO reverts to caller for privilege checks. */
-        if ( pg_owner == dom_io )
-            pg_owner = curr->domain;
+    do {
+        x = y;
+        nx = x - (1 | PGT_locked);
+    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
+}
 
-        if ( !iomem_access_permitted(pg_owner, mfn, mfn) )
-        {
-            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
-            {
-                gdprintk(XENLOG_WARNING,
-                         "d%d non-privileged attempt to map MMIO space %"PRI_mfn"\n",
-                         pg_owner->domain_id, mfn);
-                return -EPERM;
-            }
-            return -EINVAL;
-        }
+static int cleanup_page_cacheattr(struct page_info *page)
+{
+    unsigned int cacheattr =
+        (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
 
-        if ( pg_owner != l1e_owner &&
-             !iomem_access_permitted(l1e_owner, mfn, mfn) )
-        {
-            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
-            {
-                gdprintk(XENLOG_WARNING,
-                         "d%d attempted to map MMIO space %"PRI_mfn" in d%d to d%d\n",
-                         curr->domain->domain_id, mfn, pg_owner->domain_id,
-                         l1e_owner->domain_id);
-                return -EPERM;
-            }
-            return -EINVAL;
-        }
+    if ( likely(cacheattr == 0) )
+        return 0;
 
-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        {
-            /* MMIO pages must not be mapped cachable unless requested so. */
-            switch ( opt_mmio_relax )
-            {
-            case 0:
-                break;
-            case 1:
-                if ( !is_hardware_domain(l1e_owner) )
-                    break;
-                /* fallthrough */
-            case -1:
-                return 0;
-            default:
-                ASSERT_UNREACHABLE();
-            }
-        }
-        else if ( l1f & _PAGE_RW )
-        {
-#ifndef NDEBUG
-            const unsigned long *ro_map;
-            unsigned int seg, bdf;
-
-            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
-                  test_bit(bdf, ro_map)) )
-                printk(XENLOG_G_WARNING
-                       "d%d: Forcing read-only access to MFN %lx\n",
-                       l1e_owner->domain_id, mfn);
-            else
-                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                       print_mmio_emul_range,
-                                       &(struct mmio_emul_range_ctxt){
-                                           .d = l1e_owner,
-                                           .mfn = mfn });
-#endif
-            flip = _PAGE_RW;
-        }
+    page->count_info &= ~PGC_cacheattr_mask;
 
-        switch ( l1f & PAGE_CACHE_ATTRS )
-        {
-        case 0: /* WB */
-            flip |= _PAGE_PWT | _PAGE_PCD;
-            break;
-        case _PAGE_PWT: /* WT */
-        case _PAGE_PWT | _PAGE_PAT: /* WP */
-            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
-            break;
-        }
+    BUG_ON(is_xen_heap_page(page));
 
-        return flip;
-    }
+    return update_xen_mappings(page_to_mfn(page), 0);
+}
 
-    if ( unlikely( (real_pg_owner != pg_owner) &&
-                   (real_pg_owner != dom_cow) ) )
-    {
-        /*
-         * Let privileged domains transfer the right to map their target
-         * domain's pages. This is used to allow stub-domain pvfb export to
-         * dom0, until pvfb supports granted mappings. At that time this
-         * minor hack can go away.
-         */
-        if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
-             xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
-                     pg_owner->domain_id, l1e_owner->domain_id,
-                     real_pg_owner ? real_pg_owner->domain_id : -1);
-            goto could_not_pin;
-        }
-        pg_owner = real_pg_owner;
-    }
+void put_page(struct page_info *page)
+{
+    unsigned long nx, x, y = page->count_info;
 
-    /* Extra paranoid check for shared memory. Writable mappings 
-     * disallowed (unshare first!) */
-    if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) )
-        goto could_not_pin;
-
-    /* Foreign mappings into guests in shadow external mode don't
-     * contribute to writeable mapping refcounts.  (This allows the
-     * qemu-dm helper process in dom0 to map the domain's memory without
-     * messing up the count of "real" writable mappings.) */
-    write = (l1f & _PAGE_RW) &&
-            ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
-    if ( write && !get_page_type(page, PGT_writable_page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page type PGT_writable_page\n");
-        goto could_not_pin;
+    do {
+        ASSERT((y & PGC_count_mask) != 0);
+        x  = y;
+        nx = x - 1;
     }
+    while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) );
 
-    if ( pte_flags_to_cacheattr(l1f) !=
-         ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
+    if ( unlikely((nx & PGC_count_mask) == 0) )
     {
-        unsigned long x, nx, y = page->count_info;
-        unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
-        int err;
-
-        if ( is_xen_heap_page(page) )
-        {
-            if ( write )
-                put_page_type(page);
-            put_page(page);
+        if ( cleanup_page_cacheattr(page) == 0 )
+            free_domheap_page(page);
+        else
             gdprintk(XENLOG_WARNING,
-                     "Attempt to change cache attributes of Xen heap page\n");
-            return -EACCES;
-        }
+                     "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page));
+    }
+}
 
-        do {
-            x  = y;
-            nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
-        } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
 
-        err = update_xen_mappings(mfn, cacheattr);
-        if ( unlikely(err) )
-        {
-            cacheattr = y & PGC_cacheattr_mask;
-            do {
-                x  = y;
-                nx = (x & ~PGC_cacheattr_mask) | cacheattr;
-            } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
-
-            if ( write )
-                put_page_type(page);
-            put_page(page);
+struct domain *page_get_owner_and_reference(struct page_info *page)
+{
+    unsigned long x, y = page->count_info;
+    struct domain *owner;
 
-            gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn
-                     " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n",
-                     mfn, get_gpfn_from_mfn(mfn),
-                     l1e_get_intpte(l1e), l1e_owner->domain_id);
-            return err;
-        }
+    do {
+        x = y;
+        /*
+         * Count ==  0: Page is not allocated, so we cannot take a reference.
+         * Count == -1: Reference count would wrap, which is invalid. 
+         * Count == -2: Remaining unused ref is reserved for get_page_light().
+         */
+        if ( unlikely(((x + 2) & PGC_count_mask) <= 2) )
+            return NULL;
     }
+    while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
 
-    return 0;
+    owner = page_get_owner(page);
+    ASSERT(owner);
 
- could_not_pin:
-    gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
-             ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
-             mfn, get_gpfn_from_mfn(mfn),
-             l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
-    if ( real_pg_owner != NULL )
-        put_page(page);
-    return -EBUSY;
+    return owner;
 }
 
 
-/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
-define_get_linear_pagetable(l2);
-static int
-get_page_from_l2e(
-    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+int get_page(struct page_info *page, struct domain *domain)
 {
-    unsigned long mfn = l2e_get_pfn(l2e);
-    int rc;
+    struct domain *owner = page_get_owner_and_reference(page);
 
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+    if ( likely(owner == domain) )
         return 1;
 
-    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
-    {
-        rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0);
-        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-            rc = 0;
-        return rc;
-    }
+    if ( !paging_mode_refcounts(domain) && !domain->is_dying )
+        gprintk(XENLOG_INFO,
+                "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n",
+                page_to_mfn(page), domain->domain_id,
+                owner ? owner->domain_id : DOMID_INVALID,
+                page->count_info - !!owner, page->u.inuse.type_info);
 
-    if ( !opt_allow_superpage )
-    {
-        gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n");
-        return -EINVAL;
-    }
+    if ( owner )
+        put_page(page);
 
-    if ( mfn & (L1_PAGETABLE_ENTRIES-1) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn);
-        return -EINVAL;
-    }
-
-    return get_superpage(mfn, d);
-}
-
-
-define_get_linear_pagetable(l3);
-static int
-get_page_from_l3e(
-    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                 l3e_get_flags(l3e) & l3_disallow_mask(d));
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_pagenr(
-        l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) &&
-         !is_pv_32bit_domain(d) &&
-         get_l3_linear_pagetable(l3e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-define_get_linear_pagetable(l4);
-static int
-get_page_from_l4e(
-    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_pagenr(
-        l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-#define adjust_guest_l1e(pl1e, d)                                            \
-    do {                                                                     \
-        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
-             likely(!is_pv_32bit_domain(d)) )                                \
-        {                                                                    \
-            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
-            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
-                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
-                gdprintk(XENLOG_WARNING,                                     \
-                         "Global bit is set to kernel page %lx\n",           \
-                         l1e_get_pfn((pl1e)));                               \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
-                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
-                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
-        }                                                                    \
-    } while ( 0 )
-
-#define adjust_guest_l2e(pl2e, d)                               \
-    do {                                                        \
-        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l2e_add_flags((pl2e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define adjust_guest_l3e(pl3e, d)                                   \
-    do {                                                            \
-        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
-            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
-                                         _PAGE_USER :               \
-                                         _PAGE_USER|_PAGE_RW);      \
-    } while ( 0 )
-
-#define adjust_guest_l4e(pl4e, d)                               \
-    do {                                                        \
-        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l4e_add_flags((pl4e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define unadjust_guest_l3e(pl3e, d)                                         \
-    do {                                                                    \
-        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
-             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
-            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
-    } while ( 0 )
-
-void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
-{
-    unsigned long     pfn = l1e_get_pfn(l1e);
-    struct page_info *page;
-    struct domain    *pg_owner;
-    struct vcpu      *v;
-
-    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) )
-        return;
-
-    page = mfn_to_page(pfn);
-    pg_owner = page_get_owner(page);
-
-    /*
-     * Check if this is a mapping that was established via a grant reference.
-     * If it was then we should not be here: we require that such mappings are
-     * explicitly destroyed via the grant-table interface.
-     * 
-     * The upshot of this is that the guest can end up with active grants that
-     * it cannot destroy (because it no longer has a PTE to present to the
-     * grant-table interface). This can lead to subtle hard-to-catch bugs,
-     * hence a special grant PTE flag can be enabled to catch the bug early.
-     * 
-     * (Note that the undestroyable active grants are not a security hole in
-     * Xen. All active grants can safely be cleaned up when the domain dies.)
-     */
-    if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
-         !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
-                 l1e_get_intpte(l1e));
-        domain_crash(l1e_owner);
-    }
-
-    /* Remember we didn't take a type-count of foreign writable mappings
-     * to paging-external domains */
-    if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
-         ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
-    {
-        put_page_and_type(page);
-    }
-    else
-    {
-        /* We expect this is rare so we blow the entire shadow LDT. */
-        if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) == 
-                       PGT_seg_desc_page)) &&
-             unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) &&
-             (l1e_owner == pg_owner) )
-        {
-            for_each_vcpu ( pg_owner, v )
-                invalidate_shadow_ldt(v, 1);
-        }
-        put_page(page);
-    }
+    return 0;
 }
 
-
 /*
- * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
- * Note also that this automatically deals correctly with linear p.t.'s.
+ * Special version of get_page() to be used exclusively when
+ * - a page is known to already have a non-zero reference count
+ * - the page does not need its owner to be checked
+ * - it will not be called more than once without dropping the thus
+ *   acquired reference again.
+ * Due to get_page() reserving one reference, this call cannot fail.
  */
-static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
-{
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
-        return 1;
-
-    if ( l2e_get_flags(l2e) & _PAGE_PSE )
-        put_superpage(l2e_get_pfn(l2e));
-    else
-        put_page_and_type(l2e_get_page(l2e));
-
-    return 0;
-}
-
-static int __put_page_type(struct page_info *, int preemptible);
-
-static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
-                             int partial, bool_t defer)
-{
-    struct page_info *pg;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
-        return 1;
-
-    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
-    {
-        unsigned long mfn = l3e_get_pfn(l3e);
-        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
-
-        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
-        do {
-            put_data_page(mfn_to_page(mfn), writeable);
-        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
-
-        return 0;
-    }
-
-    pg = l3e_get_page(l3e);
-
-    if ( unlikely(partial > 0) )
-    {
-        ASSERT(!defer);
-        return __put_page_type(pg, 1);
-    }
-
-    if ( defer )
-    {
-        current->arch.old_guest_table = pg;
-        return 0;
-    }
-
-    return put_page_and_type_preemptible(pg);
-}
-
-static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
-                             int partial, bool_t defer)
-{
-    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && 
-         (l4e_get_pfn(l4e) != pfn) )
-    {
-        struct page_info *pg = l4e_get_page(l4e);
-
-        if ( unlikely(partial > 0) )
-        {
-            ASSERT(!defer);
-            return __put_page_type(pg, 1);
-        }
-
-        if ( defer )
-        {
-            current->arch.old_guest_table = pg;
-            return 0;
-        }
-
-        return put_page_and_type_preemptible(pg);
-    }
-    return 1;
-}
-
-static int alloc_l1_table(struct page_info *page)
+void get_page_light(struct page_info *page)
 {
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l1_pgentry_t  *pl1e;
-    unsigned int   i;
-    int            ret = 0;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-    {
-        if ( is_guest_l1_slot(i) )
-            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
-            {
-            default:
-                goto fail;
-            case 0:
-                break;
-            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-                l1e_flip_flags(pl1e[i], ret);
-                break;
-            }
+    unsigned long x, nx, y = page->count_info;
 
-        adjust_guest_l1e(pl1e[i], d);
+    do {
+        x  = y;
+        nx = x + 1;
+        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
+        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
+        y = cmpxchg(&page->count_info, x, nx);
     }
-
-    unmap_domain_page(pl1e);
-    return 0;
-
- fail:
-    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
-    while ( i-- > 0 )
-        if ( is_guest_l1_slot(i) )
-            put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-    return ret;
+    while ( unlikely(y != x) );
 }
 
-static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+static int __put_final_page_type(
+    struct page_info *page, unsigned long type, int preemptible)
 {
-    struct page_info *page;
-    l3_pgentry_t     l3e3;
-
-    if ( !is_pv_32bit_domain(d) )
-        return 1;
-
-    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
-
-    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
-    l3e3 = pl3e[3];
-    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
-        return 0;
-    }
+    int rc = free_page_type(page, type, preemptible);
 
-    /*
-     * The Xen-private mappings include linear mappings. The L2 thus cannot
-     * be shared by multiple L3 tables. The test here is adequate because:
-     *  1. Cannot appear in slots != 3 because get_page_type() checks the
-     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
-     *  2. Cannot appear in another page table's L3:
-     *     a. alloc_l3_table() calls this function and this check will fail
-     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
-     */
-    page = l3e_get_page(l3e3);
-    BUG_ON(page->u.inuse.type_info & PGT_pinned);
-    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
-    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    /* No need for atomic update of type_info here: noone else updates it. */
+    if ( rc == 0 )
     {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
-        return 0;
+        /*
+         * Record TLB information for flush later. We do not stamp page tables
+         * when running in shadow mode:
+         *  1. Pointless, since it's the shadow pt's which must be tracked.
+         *  2. Shadow mode reuses this field for shadowed page tables to
+         *     store flags info -- we don't want to conflict with that.
+         */
+        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+               (page->count_info & PGC_page_table)) )
+            page->tlbflush_timestamp = tlbflush_current_time();
+        wmb();
+        page->u.inuse.type_info--;
     }
-
-    return 1;
-}
-
-static int alloc_l2_table(struct page_info *page, unsigned long type,
-                          int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l2_pgentry_t  *pl2e;
-    unsigned int   i;
-    int            rc = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    else if ( rc == -EINTR )
     {
-        if ( preemptible && i > page->nr_validated_ptes
-             && hypercall_preempt_check() )
-        {
-            page->nr_validated_ptes = i;
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( !is_guest_l2_slot(d, type, i) ||
-             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
-            continue;
-
-        if ( rc < 0 )
-        {
-            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
-            while ( i-- > 0 )
-                if ( is_guest_l2_slot(d, type, i) )
-                    put_page_from_l2e(pl2e[i], pfn);
-            break;
-        }
-
-        adjust_guest_l2e(pl2e[i], d);
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+               (page->count_info & PGC_page_table)) )
+            page->tlbflush_timestamp = tlbflush_current_time();
+        wmb();
+        page->u.inuse.type_info |= PGT_validated;
     }
-
-    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    else
     {
-        /* Xen private mappings. */
-        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
-               &compat_idle_pg_table_l2[
-                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
-               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+        BUG_ON(rc != -ERESTART);
+        wmb();
+        get_page_light(page);
+        page->u.inuse.type_info |= PGT_partial;
     }
 
-    unmap_domain_page(pl2e);
-    return rc > 0 ? 0 : rc;
+    return rc;
 }
 
-static int alloc_l3_table(struct page_info *page)
+int __put_page_type(struct page_info *page,
+                    int preemptible)
 {
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l3_pgentry_t  *pl3e;
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    /*
-     * PAE guests allocate full pages, but aren't required to initialize
-     * more than the first four entries; when running in compatibility
-     * mode, however, the full page is visible to the MMU, and hence all
-     * 512 entries must be valid/verified, which is most easily achieved
-     * by clearing them out.
-     */
-    if ( is_pv_32bit_domain(d) )
-        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+    unsigned long nx, x, y = page->u.inuse.type_info;
+    int rc = 0;
 
-    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
+    for ( ; ; )
     {
-        if ( is_pv_32bit_domain(d) && (i == 3) )
-        {
-            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
-                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
-                rc = -EINVAL;
-            else
-                rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]),
-                                                   PGT_l2_page_table |
-                                                   PGT_pae_xen_l2,
-                                                   d, partial, 1);
-        }
-        else if ( !is_guest_l3_slot(i) ||
-                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc == -EINTR && i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            rc = -ERESTART;
-        }
-        if ( rc < 0 )
-            break;
+        x  = y;
+        nx = x - 1;
 
-        adjust_guest_l3e(pl3e[i], d);
-    }
+        ASSERT((x & PGT_count_mask) != 0);
 
-    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
-        rc = -EINVAL;
-    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
-    {
-        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
-        if ( i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            current->arch.old_guest_table = page;
-        }
-        while ( i-- > 0 )
+        if ( unlikely((nx & PGT_count_mask) == 0) )
         {
-            if ( !is_guest_l3_slot(i) )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    }
-
-    unmap_domain_page(pl3e);
-    return rc > 0 ? 0 : rc;
-}
-
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
-                         bool_t zap_ro_mpt)
-{
-    /* Xen private mappings. */
-    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
-#ifndef NDEBUG
-    if ( l4e_get_intpte(split_l4e) )
-        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
-            split_l4e;
-#endif
-    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
-        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
-    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
-        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
-    if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) )
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-}
-
-bool_t fill_ro_mpt(unsigned long mfn)
-{
-    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
-    bool_t ret = 0;
-
-    if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) )
-    {
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
-            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
-        ret = 1;
-    }
-    unmap_domain_page(l4tab);
-
-    return ret;
-}
-
-void zap_ro_mpt(unsigned long mfn)
-{
-    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
-
-    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-    unmap_domain_page(l4tab);
-}
-
-static int alloc_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( !is_guest_l4_slot(d, i) ||
-             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc < 0 )
-        {
-            if ( rc != -EINTR )
-                gdprintk(XENLOG_WARNING,
-                         "Failure in alloc_l4_table: slot %#x\n", i);
-            if ( i )
-            {
-                page->nr_validated_ptes = i;
-                page->partial_pte = 0;
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else
-                {
-                    if ( current->arch.old_guest_table )
-                        page->nr_validated_ptes++;
-                    current->arch.old_guest_table = page;
-                }
-            }
-        }
-        if ( rc < 0 )
-        {
-            unmap_domain_page(pl4e);
-            return rc;
-        }
-
-        adjust_guest_l4e(pl4e[i], d);
-    }
-
-    if ( rc >= 0 )
-    {
-        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
-        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-    unmap_domain_page(pl4e);
-
-    return rc;
-}
-
-static void free_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l1_pgentry_t *pl1e;
-    unsigned int  i;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-        if ( is_guest_l1_slot(i) )
-            put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-}
-
-
-static int free_l2_table(struct page_info *page, int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l2_pgentry_t *pl2e;
-    unsigned int  i = page->nr_validated_ptes - 1;
-    int err = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    ASSERT(page->nr_validated_ptes);
-    do {
-        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
-             put_page_from_l2e(pl2e[i], pfn) == 0 &&
-             preemptible && i && hypercall_preempt_check() )
-        {
-           page->nr_validated_ptes = i;
-           err = -ERESTART;
-        }
-    } while ( !err && i-- );
-
-    unmap_domain_page(pl2e);
-
-    if ( !err )
-        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
-
-    return err;
-}
-
-static int free_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l3_pgentry_t *pl3e;
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    do {
-        if ( is_guest_l3_slot(i) )
-        {
-            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
-            if ( rc < 0 )
-                break;
-            partial = 0;
-            if ( rc > 0 )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    } while ( i-- );
-
-    unmap_domain_page(pl3e);
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-    return rc > 0 ? 0 : rc;
-}
-
-static int free_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    do {
-        if ( is_guest_l4_slot(d, i) )
-            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
-        if ( rc < 0 )
-            break;
-        partial = 0;
-    } while ( i-- );
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-
-    unmap_domain_page(pl4e);
-
-    if ( rc >= 0 )
-    {
-        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-
-    return rc;
-}
-
-int page_lock(struct page_info *page)
-{
-    unsigned long x, nx;
-
-    do {
-        while ( (x = page->u.inuse.type_info) & PGT_locked )
-            cpu_relax();
-        nx = x + (1 | PGT_locked);
-        if ( !(x & PGT_validated) ||
-             !(x & PGT_count_mask) ||
-             !(nx & PGT_count_mask) )
-            return 0;
-    } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
-
-    return 1;
-}
-
-void page_unlock(struct page_info *page)
-{
-    unsigned long x, nx, y = page->u.inuse.type_info;
-
-    do {
-        x = y;
-        nx = x - (1 | PGT_locked);
-    } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
-}
-
-/* How to write an entry to the guest pagetables.
- * Returns 0 for failure (pointer not valid), 1 for success. */
-static inline int update_intpte(intpte_t *p, 
-                                intpte_t old, 
-                                intpte_t new,
-                                unsigned long mfn,
-                                struct vcpu *v,
-                                int preserve_ad)
-{
-    int rv = 1;
-#ifndef PTE_UPDATE_WITH_CMPXCHG
-    if ( !preserve_ad )
-    {
-        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-    }
-    else
-#endif
-    {
-        intpte_t t = old;
-        for ( ; ; )
-        {
-            intpte_t _new = new;
-            if ( preserve_ad )
-                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
-            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
-            if ( unlikely(rv == 0) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Failed to update %" PRIpte " -> %" PRIpte
-                         ": saw %" PRIpte "\n", old, _new, t);
-                break;
-            }
-
-            if ( t == old )
-                break;
-
-            /* Allowed to change in Accessed/Dirty flags only. */
-            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
-            old = t;
-        }
-    }
-    return rv;
-}
-
-/* Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
-    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
-                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v), (_ad))
-
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST                                     \
-    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
-     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-                        unsigned long gl1mfn, int preserve_ad,
-                        struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
-    l1_pgentry_t ol1e;
-    struct domain *pt_dom = pt_vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
-        return -EFAULT;
-
-    if ( unlikely(paging_mode_refcounts(pt_dom)) )
-    {
-        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
-            return 0;
-        return -EBUSY;
-    }
-
-    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
-    {
-        /* Translate foreign guest addresses. */
-        struct page_info *page = NULL;
-
-        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
-            return -EINVAL;
-        }
-
-        if ( paging_mode_translate(pg_dom) )
-        {
-            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
-            if ( !page )
-                return -EINVAL;
-            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l1e(nl1e, pt_dom);
-            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                              preserve_ad);
-            if ( page )
-                put_page(page);
-            return rc ? 0 : -EBUSY;
-        }
-
-        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
-        {
-        default:
-            if ( page )
-                put_page(page);
-            return rc;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(nl1e, rc);
-            rc = 0;
-            break;
-        }
-        if ( page )
-            put_page(page);
-
-        adjust_guest_l1e(nl1e, pt_dom);
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                    preserve_ad)) )
-        {
-            ol1e = nl1e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l1e(ol1e, pt_dom);
-    return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e, 
-                        l2_pgentry_t nl2e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l2_pgentry_t ol2e;
-    struct domain *d = vcpu->domain;
-    struct page_info *l2pg = mfn_to_page(pfn);
-    unsigned long type = l2pg->u.inuse.type_info;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl2e));
-        return -EPERM;
-    }
-
-    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
-        return -EFAULT;
-
-    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l2e(nl2e, d);
-            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
-                return 0;
-            return -EBUSY;
-        }
-
-        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
-            return rc;
-
-        adjust_guest_l2e(nl2e, d);
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol2e = nl2e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l2e(ol2e, pfn);
-    return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e, 
-                        l3_pgentry_t nl3e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l3_pgentry_t ol3e;
-    struct domain *d = vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl3e));
-        return -EINVAL;
-    }
-
-    /*
-     * Disallow updates to final L3 slot. It contains Xen mappings, and it
-     * would be a pain to ensure they remain continuously valid throughout.
-     */
-    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
-        return -EINVAL;
-
-    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
-        return -EFAULT;
-
-    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l3e(nl3e, d);
-            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l3e(nl3e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l3e(nl3e, d);
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol3e = nl3e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    if ( likely(rc == 0) )
-        if ( !create_pae_xen_mappings(d, pl3e) )
-            BUG();
-
-    put_page_from_l3e(ol3e, pfn, 0, 1);
-    return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e, 
-                        l4_pgentry_t nl4e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    struct domain *d = vcpu->domain;
-    l4_pgentry_t ol4e;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl4e));
-        return -EINVAL;
-    }
-
-    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
-        return -EFAULT;
-
-    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l4e(nl4e, d);
-            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l4e(nl4e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l4e(nl4e, d);
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol4e = nl4e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    put_page_from_l4e(ol4e, pfn, 0, 1);
-    return rc;
-}
-
-static int cleanup_page_cacheattr(struct page_info *page)
-{
-    unsigned int cacheattr =
-        (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
-
-    if ( likely(cacheattr == 0) )
-        return 0;
-
-    page->count_info &= ~PGC_cacheattr_mask;
-
-    BUG_ON(is_xen_heap_page(page));
-
-    return update_xen_mappings(page_to_mfn(page), 0);
-}
-
-void put_page(struct page_info *page)
-{
-    unsigned long nx, x, y = page->count_info;
-
-    do {
-        ASSERT((y & PGC_count_mask) != 0);
-        x  = y;
-        nx = x - 1;
-    }
-    while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) );
-
-    if ( unlikely((nx & PGC_count_mask) == 0) )
-    {
-        if ( cleanup_page_cacheattr(page) == 0 )
-            free_domheap_page(page);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page));
-    }
-}
-
-
-struct domain *page_get_owner_and_reference(struct page_info *page)
-{
-    unsigned long x, y = page->count_info;
-    struct domain *owner;
-
-    do {
-        x = y;
-        /*
-         * Count ==  0: Page is not allocated, so we cannot take a reference.
-         * Count == -1: Reference count would wrap, which is invalid. 
-         * Count == -2: Remaining unused ref is reserved for get_page_light().
-         */
-        if ( unlikely(((x + 2) & PGC_count_mask) <= 2) )
-            return NULL;
-    }
-    while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
-
-    owner = page_get_owner(page);
-    ASSERT(owner);
-
-    return owner;
-}
-
-
-int get_page(struct page_info *page, struct domain *domain)
-{
-    struct domain *owner = page_get_owner_and_reference(page);
-
-    if ( likely(owner == domain) )
-        return 1;
-
-    if ( !paging_mode_refcounts(domain) && !domain->is_dying )
-        gprintk(XENLOG_INFO,
-                "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n",
-                page_to_mfn(page), domain->domain_id,
-                owner ? owner->domain_id : DOMID_INVALID,
-                page->count_info - !!owner, page->u.inuse.type_info);
-
-    if ( owner )
-        put_page(page);
-
-    return 0;
-}
-
-/*
- * Special version of get_page() to be used exclusively when
- * - a page is known to already have a non-zero reference count
- * - the page does not need its owner to be checked
- * - it will not be called more than once without dropping the thus
- *   acquired reference again.
- * Due to get_page() reserving one reference, this call cannot fail.
- */
-static void get_page_light(struct page_info *page)
-{
-    unsigned long x, nx, y = page->count_info;
-
-    do {
-        x  = y;
-        nx = x + 1;
-        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
-        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
-        y = cmpxchg(&page->count_info, x, nx);
-    }
-    while ( unlikely(y != x) );
-}
-
-static int alloc_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    int rc;
-
-    /* A page table is dirtied when its type count becomes non-zero. */
-    if ( likely(owner != NULL) )
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        rc = alloc_l1_table(page);
-        break;
-    case PGT_l2_page_table:
-        rc = alloc_l2_table(page, type, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l4_table(page);
-        break;
-    case PGT_seg_desc_page:
-        rc = alloc_segdesc_page(page);
-        break;
-    default:
-        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n", 
-               type, page->u.inuse.type_info,
-               page->count_info);
-        rc = -EINVAL;
-        BUG();
-    }
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    wmb();
-    switch ( rc )
-    {
-    case 0:
-        page->u.inuse.type_info |= PGT_validated;
-        break;
-    case -EINTR:
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        page->u.inuse.type_info &= ~PGT_count_mask;
-        break;
-    default:
-        ASSERT(rc < 0);
-        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
-                 " (pfn %" PRI_pfn ") for type %" PRtype_info
-                 ": caf=%08lx taf=%" PRtype_info "\n",
-                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
-                 type, page->count_info, page->u.inuse.type_info);
-        if ( page != current->arch.old_guest_table )
-            page->u.inuse.type_info = 0;
-        else
-        {
-            ASSERT((page->u.inuse.type_info &
-                    (PGT_count_mask | PGT_validated)) == 1);
-    case -ERESTART:
-            get_page_light(page);
-            page->u.inuse.type_info |= PGT_partial;
-        }
-        break;
-    }
-
-    return rc;
-}
-
-
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    unsigned long gmfn;
-    int rc;
-
-    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
-    {
-        /* A page table is dirtied when its type count becomes zero. */
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-        if ( shadow_mode_refcounts(owner) )
-            return 0;
-
-        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
-        ASSERT(VALID_M2P(gmfn));
-        /* Page sharing not supported for shadowed domains */
-        if(!SHARED_M2P(gmfn))
-            shadow_remove_all_shadows(owner, _mfn(gmfn));
-    }
-
-    if ( !(type & PGT_partial) )
-    {
-        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
-        page->partial_pte = 0;
-    }
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        free_l1_table(page);
-        rc = 0;
-        break;
-    case PGT_l2_page_table:
-        rc = free_l2_table(page, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = free_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = free_l4_table(page);
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
-                 type, page_to_mfn(page));
-        rc = -EINVAL;
-        BUG();
-    }
-
-    return rc;
-}
-
-
-static int __put_final_page_type(
-    struct page_info *page, unsigned long type, int preemptible)
-{
-    int rc = free_page_type(page, type, preemptible);
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    if ( rc == 0 )
-    {
-        /*
-         * Record TLB information for flush later. We do not stamp page tables
-         * when running in shadow mode:
-         *  1. Pointless, since it's the shadow pt's which must be tracked.
-         *  2. Shadow mode reuses this field for shadowed page tables to
-         *     store flags info -- we don't want to conflict with that.
-         */
-        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-               (page->count_info & PGC_page_table)) )
-            page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
-        page->u.inuse.type_info--;
-    }
-    else if ( rc == -EINTR )
-    {
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-               (page->count_info & PGC_page_table)) )
-            page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
-        page->u.inuse.type_info |= PGT_validated;
-    }
-    else
-    {
-        BUG_ON(rc != -ERESTART);
-        wmb();
-        get_page_light(page);
-        page->u.inuse.type_info |= PGT_partial;
-    }
-
-    return rc;
-}
-
-
-static int __put_page_type(struct page_info *page,
-                           int preemptible)
-{
-    unsigned long nx, x, y = page->u.inuse.type_info;
-    int rc = 0;
-
-    for ( ; ; )
-    {
-        x  = y;
-        nx = x - 1;
-
-        ASSERT((x & PGT_count_mask) != 0);
-
-        if ( unlikely((nx & PGT_count_mask) == 0) )
-        {
-            if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
-                 likely(nx & (PGT_validated|PGT_partial)) )
-            {
-                /*
-                 * Page-table pages must be unvalidated when count is zero. The
-                 * 'free' is safe because the refcnt is non-zero and validated
-                 * bit is clear => other ops will spin or fail.
-                 */
-                nx = x & ~(PGT_validated|PGT_partial);
-                if ( unlikely((y = cmpxchg(&page->u.inuse.type_info,
-                                           x, nx)) != x) )
-                    continue;
-                /* We cleared the 'valid bit' so we do the clean up. */
-                rc = __put_final_page_type(page, x, preemptible);
-                if ( x & PGT_partial )
-                    put_page(page);
-                break;
-            }
-
-            /*
-             * Record TLB information for flush later. We do not stamp page
-             * tables when running in shadow mode:
-             *  1. Pointless, since it's the shadow pt's which must be tracked.
-             *  2. Shadow mode reuses this field for shadowed page tables to
-             *     store flags info -- we don't want to conflict with that.
-             */
-            if ( !(shadow_mode_enabled(page_get_owner(page)) &&
-                   (page->count_info & PGC_page_table)) )
-                page->tlbflush_timestamp = tlbflush_current_time();
-        }
-
-        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
-            break;
-
-        if ( preemptible && hypercall_preempt_check() )
-            return -EINTR;
-    }
-
-    return rc;
-}
-
-
-static int __get_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
-{
-    unsigned long nx, x, y = page->u.inuse.type_info;
-    int rc = 0, iommu_ret = 0;
-
-    ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
-    ASSERT(!in_irq());
-
-    for ( ; ; )
-    {
-        x  = y;
-        nx = x + 1;
-        if ( unlikely((nx & PGT_count_mask) == 0) )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Type count overflow on mfn %"PRI_mfn"\n",
-                     page_to_mfn(page));
-            return -EINVAL;
-        }
-        else if ( unlikely((x & PGT_count_mask) == 0) )
-        {
-            struct domain *d = page_get_owner(page);
-
-            /* Normally we should never let a page go from type count 0
-             * to type count 1 when it is shadowed. One exception:
-             * out-of-sync shadowed pages are allowed to become
-             * writeable. */
-            if ( d && shadow_mode_enabled(d)
-                 && (page->count_info & PGC_page_table)
-                 && !((page->shadow_flags & (1u<<29))
-                      && type == PGT_writable_page) )
-               shadow_remove_all_shadows(d, _mfn(page_to_mfn(page)));
-
-            ASSERT(!(x & PGT_pae_xen_l2));
-            if ( (x & PGT_type_mask) != type )
-            {
-                /*
-                 * On type change we check to flush stale TLB entries. This 
-                 * may be unnecessary (e.g., page was GDT/LDT) but those 
-                 * circumstances should be very rare.
-                 */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                BUG_ON(in_irq());
-                cpumask_copy(mask, d->domain_dirty_cpumask);
-
-                /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
-
-                if ( unlikely(!cpumask_empty(mask)) &&
-                     /* Shadow mode: track only writable pages. */
-                     (!shadow_mode_enabled(page_get_owner(page)) ||
-                      ((nx & PGT_type_mask) == PGT_writable_page)) )
-                {
-                    perfc_incr(need_flush_tlb_flush);
-                    flush_tlb_mask(mask);
-                }
-
-                /* We lose existing type and validity. */
-                nx &= ~(PGT_type_mask | PGT_validated);
-                nx |= type;
-
-                /* No special validation needed for writable pages. */
-                /* Page tables and GDT/LDT need to be scanned for validity. */
-                if ( type == PGT_writable_page || type == PGT_shared_page )
-                    nx |= PGT_validated;
-            }
-        }
-        else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
-        {
-            /* Don't log failure if it could be a recursive-mapping attempt. */
-            if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
-                 (type == PGT_l1_page_table) )
-                return -EINVAL;
-            if ( ((x & PGT_type_mask) == PGT_l3_page_table) &&
-                 (type == PGT_l2_page_table) )
-                return -EINVAL;
-            if ( ((x & PGT_type_mask) == PGT_l4_page_table) &&
-                 (type == PGT_l3_page_table) )
-                return -EINVAL;
-            gdprintk(XENLOG_WARNING,
-                     "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") "
-                     "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
-                     x, type, page_to_mfn(page),
-                     get_gpfn_from_mfn(page_to_mfn(page)));
-            return -EINVAL;
-        }
-        else if ( unlikely(!(x & PGT_validated)) )
-        {
-            if ( !(x & PGT_partial) )
-            {
-                /* Someone else is updating validation of this page. Wait... */
-                while ( (y = page->u.inuse.type_info) == x )
-                {
-                    if ( preemptible && hypercall_preempt_check() )
-                        return -EINTR;
-                    cpu_relax();
-                }
-                continue;
-            }
-            /* Type ref count was left at 1 when PGT_partial got set. */
-            ASSERT((x & PGT_count_mask) == 1);
-            nx = x & ~PGT_partial;
-        }
-
-        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
-            break;
-
-        if ( preemptible && hypercall_preempt_check() )
-            return -EINTR;
-    }
-
-    if ( unlikely((x & PGT_type_mask) != type) )
-    {
-        /* Special pages should not be accessible from devices. */
-        struct domain *d = page_get_owner(page);
-        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
-        {
-            if ( (x & PGT_type_mask) == PGT_writable_page )
-                iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
-            else if ( type == PGT_writable_page )
-                iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
-                                           page_to_mfn(page),
-                                           IOMMUF_readable|IOMMUF_writable);
-        }
-    }
-
-    if ( unlikely(!(nx & PGT_validated)) )
-    {
-        if ( !(x & PGT_partial) )
-        {
-            page->nr_validated_ptes = 0;
-            page->partial_pte = 0;
-        }
-        rc = alloc_page_type(page, type, preemptible);
-    }
-
-    if ( (x & PGT_partial) && !(nx & PGT_partial) )
-        put_page(page);
-
-    if ( !rc )
-        rc = iommu_ret;
-
-    return rc;
-}
-
-void put_page_type(struct page_info *page)
-{
-    int rc = __put_page_type(page, 0);
-    ASSERT(rc == 0);
-    (void)rc;
-}
-
-int get_page_type(struct page_info *page, unsigned long type)
-{
-    int rc = __get_page_type(page, type, 0);
-    if ( likely(rc == 0) )
-        return 1;
-    ASSERT(rc != -EINTR && rc != -ERESTART);
-    return 0;
-}
-
-int put_page_type_preemptible(struct page_info *page)
-{
-    return __put_page_type(page, 1);
-}
-
-int get_page_type_preemptible(struct page_info *page, unsigned long type)
-{
-    ASSERT(!current->arch.old_guest_table);
-    return __get_page_type(page, type, 1);
-}
-
-static int get_spage_pages(struct page_info *page, struct domain *d)
-{
-    int i;
-
-    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
-    {
-        if (!get_page_and_type(page, d, PGT_writable_page))
-        {
-            while (--i >= 0)
-                put_page_and_type(--page);
-            return 0;
-        }
-    }
-    return 1;
-}
-
-static void put_spage_pages(struct page_info *page)
-{
-    int i;
-
-    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
-    {
-        put_page_and_type(page);
-    }
-    return;
-}
-
-static int mark_superpage(struct spage_info *spage, struct domain *d)
-{
-    unsigned long x, nx, y = spage->type_info;
-    int pages_done = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    do {
-        x = y;
-        nx = x + 1;
-        if ( (x & SGT_type_mask) == SGT_mark )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Duplicate superpage mark attempt mfn %" PRI_mfn "\n",
-                     spage_to_mfn(spage));
-            if ( pages_done )
-                put_spage_pages(spage_to_page(spage));
-            return -EINVAL;
-        }
-        if ( (x & SGT_type_mask) == SGT_dynamic )
-        {
-            if ( pages_done )
-            {
-                put_spage_pages(spage_to_page(spage));
-                pages_done = 0;
-            }
-        }
-        else if ( !pages_done )
-        {
-            if ( !get_spage_pages(spage_to_page(spage), d) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Superpage type conflict in mark attempt mfn %" PRI_mfn "\n",
-                         spage_to_mfn(spage));
-                return -EINVAL;
-            }
-            pages_done = 1;
-        }
-        nx = (nx & ~SGT_type_mask) | SGT_mark;
-
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    return 0;
-}
-
-static int unmark_superpage(struct spage_info *spage)
-{
-    unsigned long x, nx, y = spage->type_info;
-    unsigned long do_pages = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    do {
-        x = y;
-        nx = x - 1;
-        if ( (x & SGT_type_mask) != SGT_mark )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n",
-                     spage_to_mfn(spage));
-            return -EINVAL;
-        }
-        if ( (nx & SGT_count_mask) == 0 )
-        {
-            nx = (nx & ~SGT_type_mask) | SGT_none;
-            do_pages = 1;
-        }
-        else
-        {
-            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
-        }
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    if ( do_pages )
-        put_spage_pages(spage_to_page(spage));
-
-    return 0;
-}
-
-void clear_superpage_mark(struct page_info *page)
-{
-    struct spage_info *spage;
-
-    if ( !opt_allow_superpage )
-        return;
-
-    spage = page_to_spage(page);
-    if ((spage->type_info & SGT_type_mask) == SGT_mark)
-        unmark_superpage(spage);
-
-}
-
-int get_superpage(unsigned long mfn, struct domain *d)
-{
-    struct spage_info *spage;
-    unsigned long x, nx, y;
-    int pages_done = 0;
-
-    ASSERT(opt_allow_superpage);
-
-    if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
-        return -EINVAL;
-
-    spage = mfn_to_spage(mfn);
-    y = spage->type_info;
-    do {
-        x = y;
-        nx = x + 1;
-        if ( (x & SGT_type_mask) != SGT_none )
-        {
-            if ( pages_done )
-            {
-                put_spage_pages(spage_to_page(spage));
-                pages_done = 0;
-            }
-        }
-        else
-        {
-            if ( !get_spage_pages(spage_to_page(spage), d) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Type conflict on superpage mapping mfn %" PRI_mfn "\n",
-                         spage_to_mfn(spage));
-                return -EINVAL;
-            }
-            pages_done = 1;
-            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
-        }
-    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
-
-    return 0;
-}
-
-static void put_superpage(unsigned long mfn)
-{
-    struct spage_info *spage;
-    unsigned long x, nx, y;
-    unsigned long do_pages = 0;
-
-    if ( !opt_allow_superpage )
-    {
-        put_spage_pages(mfn_to_page(mfn));
-        return;
-    }
-
-    spage = mfn_to_spage(mfn);
-    y = spage->type_info;
-    do {
-        x = y;
-        nx = x - 1;
-        if ((x & SGT_type_mask) == SGT_dynamic)
-        {
-            if ((nx & SGT_count_mask) == 0)
-            {
-                nx = (nx & ~SGT_type_mask) | SGT_none;
-                do_pages = 1;
-            }
-        }
-
-    } while ((y = cmpxchg(&spage->type_info, x, nx)) != x);
-
-    if (do_pages)
-        put_spage_pages(spage_to_page(spage));
-
-    return;
-}
-
-int put_old_guest_table(struct vcpu *v)
-{
-    int rc;
-
-    if ( !v->arch.old_guest_table )
-        return 0;
-
-    switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) )
-    {
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    }
-
-    v->arch.old_guest_table = NULL;
-
-    return rc;
-}
-
-int vcpu_destroy_pagetables(struct vcpu *v)
-{
-    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
-    struct page_info *page;
-    l4_pgentry_t *l4tab = NULL;
-    int rc = put_old_guest_table(v);
-
-    if ( rc )
-        return rc;
-
-    if ( is_pv_32bit_vcpu(v) )
-    {
-        l4tab = map_domain_page(_mfn(mfn));
-        mfn = l4e_get_pfn(*l4tab);
-    }
-
-    if ( mfn )
-    {
-        page = mfn_to_page(mfn);
-        if ( paging_mode_refcounts(v->domain) )
-            put_page(page);
-        else
-            rc = put_page_and_type_preemptible(page);
-    }
-
-    if ( l4tab )
-    {
-        if ( !rc )
-            l4e_write(l4tab, l4e_empty());
-        unmap_domain_page(l4tab);
-    }
-    else if ( !rc )
-    {
-        v->arch.guest_table = pagetable_null();
-
-        /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */
-        mfn = pagetable_get_pfn(v->arch.guest_table_user);
-        if ( mfn )
-        {
-            page = mfn_to_page(mfn);
-            if ( paging_mode_refcounts(v->domain) )
-                put_page(page);
-            else
-                rc = put_page_and_type_preemptible(page);
-        }
-        if ( !rc )
-            v->arch.guest_table_user = pagetable_null();
-    }
-
-    v->arch.cr3 = 0;
-
-    /*
-     * put_page_and_type_preemptible() is liable to return -EINTR. The
-     * callers of us expect -ERESTART so convert it over.
-     */
-    return rc != -EINTR ? rc : -ERESTART;
-}
-
-int new_guest_cr3(unsigned long mfn)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    int rc;
-    unsigned long old_base_mfn;
-
-    if ( is_pv_32bit_domain(d) )
-    {
-        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
-        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
-
-        rc = paging_mode_refcounts(d)
-             ? -EINVAL /* Old code was broken, but what should it be? */
-             : mod_l4_entry(
-                    pl4e,
-                    l4e_from_pfn(
-                        mfn,
-                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
-                    gt_mfn, 0, curr);
-        unmap_domain_page(pl4e);
-        switch ( rc )
-        {
-        case 0:
-            break;
-        case -EINTR:
-        case -ERESTART:
-            return -ERESTART;
-        default:
-            gdprintk(XENLOG_WARNING,
-                     "Error while installing new compat baseptr %" PRI_mfn "\n",
-                     mfn);
-            return rc;
-        }
-
-        invalidate_shadow_ldt(curr, 0);
-        write_ptbase(curr);
-
-        return 0;
-    }
-
-    rc = put_old_guest_table(curr);
-    if ( unlikely(rc) )
-        return rc;
-
-    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
-    /*
-     * This is particularly important when getting restarted after the
-     * previous attempt got preempted in the put-old-MFN phase.
-     */
-    if ( old_base_mfn == mfn )
-    {
-        write_ptbase(curr);
-        return 0;
-    }
-
-    rc = paging_mode_refcounts(d)
-         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
-         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
-    switch ( rc )
-    {
-    case 0:
-        break;
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    default:
-        gdprintk(XENLOG_WARNING,
-                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
-        return rc;
-    }
-
-    invalidate_shadow_ldt(curr, 0);
-
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-        fill_ro_mpt(mfn);
-    curr->arch.guest_table = pagetable_from_pfn(mfn);
-    update_cr3(curr);
-
-    write_ptbase(curr);
-
-    if ( likely(old_base_mfn != 0) )
-    {
-        struct page_info *page = mfn_to_page(old_base_mfn);
-
-        if ( paging_mode_refcounts(d) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-                rc = -ERESTART;
-                /* fallthrough */
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-    }
-
-    return rc;
-}
-
-static struct domain *get_pg_owner(domid_t domid)
-{
-    struct domain *pg_owner = NULL, *curr = current->domain;
-
-    if ( likely(domid == DOMID_SELF) )
-    {
-        pg_owner = rcu_lock_current_domain();
-        goto out;
-    }
-
-    if ( unlikely(domid == curr->domain_id) )
-    {
-        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
-        goto out;
-    }
-
-    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Cannot mix foreign mappings with translated domains\n");
-        goto out;
-    }
-
-    switch ( domid )
-    {
-    case DOMID_IO:
-        pg_owner = rcu_lock_domain(dom_io);
-        break;
-    case DOMID_XEN:
-        pg_owner = rcu_lock_domain(dom_xen);
-        break;
-    default:
-        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
-        {
-            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
-            break;
-        }
-        break;
-    }
-
- out:
-    return pg_owner;
-}
-
-static void put_pg_owner(struct domain *pg_owner)
-{
-    rcu_unlock_domain(pg_owner);
-}
-
-static inline int vcpumask_to_pcpumask(
-    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
-{
-    unsigned int vcpu_id, vcpu_bias, offs;
-    unsigned long vmask;
-    struct vcpu *v;
-    bool_t is_native = !is_pv_32bit_domain(d);
-
-    cpumask_clear(pmask);
-    for ( vmask = 0, offs = 0; ; ++offs)
-    {
-        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
-        if ( vcpu_bias >= d->max_vcpus )
-            return 0;
-
-        if ( unlikely(is_native ?
-                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
-                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
-                                             offs, 1)) )
-        {
-            cpumask_clear(pmask);
-            return -EFAULT;
-        }
-
-        while ( vmask )
-        {
-            vcpu_id = find_first_set_bit(vmask);
-            vmask &= ~(1UL << vcpu_id);
-            vcpu_id += vcpu_bias;
-            if ( (vcpu_id >= d->max_vcpus) )
-                return 0;
-            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
-                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
-        }
-    }
-}
-
-long do_mmuext_op(
-    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmuext_op op;
-    unsigned long type;
-    unsigned int i, done = 0;
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct domain *pg_owner;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(uops)) )
-    {
-        /* See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below. */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmuext_op);
-
-    if ( unlikely(!guest_handle_okay(uops, count)) )
-        return -EFAULT;
-
-    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
-        return -ESRCH;
-
-    if ( !is_pv_domain(pg_owner) )
-    {
-        put_pg_owner(pg_owner);
-        return -EINVAL;
-    }
-
-    rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner);
-    if ( rc )
-    {
-        put_pg_owner(pg_owner);
-        return rc;
-    }
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( is_hvm_domain(d) )
-        {
-            switch ( op.cmd )
-            {
-            case MMUEXT_PIN_L1_TABLE:
-            case MMUEXT_PIN_L2_TABLE:
-            case MMUEXT_PIN_L3_TABLE:
-            case MMUEXT_PIN_L4_TABLE:
-            case MMUEXT_UNPIN_TABLE:
-                break;
-            default:
-                rc = -EOPNOTSUPP;
-                goto done;
-            }
-        }
-
-        rc = 0;
-
-        switch ( op.cmd )
-        {
-        case MMUEXT_PIN_L1_TABLE:
-            type = PGT_l1_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L2_TABLE:
-            type = PGT_l2_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L3_TABLE:
-            type = PGT_l3_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L4_TABLE:
-            if ( is_pv_32bit_domain(pg_owner) )
-                break;
-            type = PGT_l4_page_table;
-
-        pin_page: {
-            struct page_info *page;
-
-            /* Ignore pinning of invalid paging levels. */
-            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
-                break;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            rc = get_page_type_preemptible(page, type);
-            if ( unlikely(rc) )
-            {
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else if ( rc != -ERESTART )
-                    gdprintk(XENLOG_WARNING,
-                             "Error %d while pinning mfn %" PRI_mfn "\n",
-                            rc, page_to_mfn(page));
-                if ( page != curr->arch.old_guest_table )
-                    put_page(page);
-                break;
-            }
-
-            rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page);
-            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
-                                                  &page->u.inuse.type_info)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page));
-                rc = -EINVAL;
-            }
-
-            if ( unlikely(rc) )
-                goto pin_drop;
-
-            /* A page is dirtied when its pin status is set. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            /* We can race domain destruction (domain_relinquish_resources). */
-            if ( unlikely(pg_owner != d) )
-            {
-                int drop_ref;
-                spin_lock(&pg_owner->page_alloc_lock);
-                drop_ref = (pg_owner->is_dying &&
-                            test_and_clear_bit(_PGT_pinned,
-                                               &page->u.inuse.type_info));
-                spin_unlock(&pg_owner->page_alloc_lock);
-                if ( drop_ref )
-                {
-        pin_drop:
-                    if ( type == PGT_l1_page_table )
-                        put_page_and_type(page);
-                    else
-                        curr->arch.old_guest_table = page;
-                }
-            }
-
-            break;
-        }
-
-        case MMUEXT_UNPIN_TABLE: {
-            struct page_info *page;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
-                         op.arg1.mfn, pg_owner->domain_id);
-                rc = -EINVAL;
-                break;
-            }
-
-            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            {
-                put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                rc = 0;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-            put_page(page);
-
-            /* A page is dirtied when its pin status is cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            break;
-        }
-
-        case MMUEXT_NEW_BASEPTR:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(d)) )
-                rc = -EINVAL;
-            else
-                rc = new_guest_cr3(op.arg1.mfn);
-            break;
-
-        case MMUEXT_NEW_USER_BASEPTR: {
-            unsigned long old_mfn;
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(d)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
-            /*
-             * This is particularly important when getting restarted after the
-             * previous attempt got preempted in the put-old-MFN phase.
-             */
-            if ( old_mfn == op.arg1.mfn )
-                break;
-
-            if ( op.arg1.mfn != 0 )
-            {
-                if ( paging_mode_refcounts(d) )
-                    rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL;
-                else
-                    rc = get_page_and_type_from_pagenr(
-                        op.arg1.mfn, PGT_root_page_table, d, 0, 1);
-
-                if ( unlikely(rc) )
-                {
-                    if ( rc == -EINTR )
-                        rc = -ERESTART;
-                    else if ( rc != -ERESTART )
-                        gdprintk(XENLOG_WARNING,
-                                 "Error %d installing new mfn %" PRI_mfn "\n",
-                                 rc, op.arg1.mfn);
-                    break;
-                }
-                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-                    zap_ro_mpt(op.arg1.mfn);
-            }
-
-            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
-
-            if ( old_mfn != 0 )
-            {
-                struct page_info *page = mfn_to_page(old_mfn);
-
-                if ( paging_mode_refcounts(d) )
-                    put_page(page);
-                else
-                    switch ( rc = put_page_and_type_preemptible(page) )
-                    {
-                    case -EINTR:
-                        rc = -ERESTART;
-                        /* fallthrough */
-                    case -ERESTART:
-                        curr->arch.old_guest_table = page;
-                        break;
-                    default:
-                        BUG_ON(rc);
-                        break;
-                    }
-            }
-
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_LOCAL:
-            if ( likely(d == pg_owner) )
-                flush_tlb_local();
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_LOCAL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else
-                paging_invlpg(curr, op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_TLB_FLUSH_MULTI:
-        case MMUEXT_INVLPG_MULTI:
-        {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(vcpumask_to_pcpumask(d,
-                                   guest_handle_to_param(op.arg2.vcpumask,
-                                                         const_void),
-                                   mask)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(mask);
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(mask, op.arg1.linear_addr);
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_ALL:
-            if ( likely(d == pg_owner) )
-                flush_tlb_mask(d->domain_dirty_cpumask);
-            else
-                rc = -EPERM;
-            break;
-    
-        case MMUEXT_INVLPG_ALL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(d->domain_dirty_cpumask, op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_FLUSH_CACHE:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(!cache_flush_permitted(d)) )
-                rc = -EACCES;
-            else
-                wbinvd();
-            break;
-
-        case MMUEXT_FLUSH_CACHE_GLOBAL:
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( likely(cache_flush_permitted(d)) )
-            {
-                unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                cpumask_clear(mask);
-                for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(mask,
-                                             per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, mask);
-                flush_mask(mask, FLUSH_CACHE);
-            }
-            else
-                rc = -EINVAL;
-            break;
-
-        case MMUEXT_SET_LDT:
-        {
-            unsigned int ents = op.arg2.nr_ents;
-            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
-
-            if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( paging_mode_external(d) )
-                rc = -EINVAL;
-            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
-                      (ents > 8192) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
-                rc = -EINVAL;
-            }
-            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
-                      (curr->arch.pv_vcpu.ldt_base != ptr) )
-            {
-                invalidate_shadow_ldt(curr, 0);
-                flush_tlb_local();
-                curr->arch.pv_vcpu.ldt_base = ptr;
-                curr->arch.pv_vcpu.ldt_ents = ents;
-                load_LDT(curr);
-            }
-            break;
-        }
-
-        case MMUEXT_CLEAR_PAGE: {
-            struct page_info *page;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( !page || !get_page_type(page, PGT_writable_page) )
-            {
-                if ( page )
-                    put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            /* A page is dirtied when it's being cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            clear_domain_page(_mfn(page_to_mfn(page)));
-
-            put_page_and_type(page);
-            break;
-        }
-
-        case MMUEXT_COPY_PAGE:
-        {
-            struct page_info *src_page, *dst_page;
-
-            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL,
-                                         P2M_ALLOC);
-            if ( unlikely(!src_page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Error copying from mfn %" PRI_mfn "\n",
-                         op.arg2.src_mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL,
-                                         P2M_ALLOC);
-            rc = (dst_page &&
-                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
-            if ( unlikely(rc) )
-            {
-                put_page(src_page);
-                if ( dst_page )
-                    put_page(dst_page);
-                gdprintk(XENLOG_WARNING,
-                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
-                break;
-            }
-
-            /* A page is dirtied when it's being copied to. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
-
-            copy_domain_page(_mfn(page_to_mfn(dst_page)),
-                             _mfn(page_to_mfn(src_page)));
-
-            put_page_and_type(dst_page);
-            put_page(src_page);
-            break;
-        }
-
-        case MMUEXT_MARK_SUPER:
-        case MMUEXT_UNMARK_SUPER:
-        {
-            unsigned long mfn = op.arg1.mfn;
-
-            if ( !opt_allow_superpage )
-                rc = -EOPNOTSUPP;
-            else if ( unlikely(d != pg_owner) )
-                rc = -EPERM;
-            else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Unaligned superpage mfn %" PRI_mfn "\n", mfn);
-                rc = -EINVAL;
-            }
-            else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
-                rc = -EINVAL;
-            else if ( op.cmd == MMUEXT_MARK_SUPER )
-                rc = mark_superpage(mfn_to_spage(mfn), d);
-            else
-                rc = unmark_superpage(mfn_to_spage(mfn));
-            break;
-        }
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
- done:
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(uops, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmuext_op, "hihi",
-            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmuext_op, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    perfc_add(num_mmuext_ops, i);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-long do_mmu_update(
-    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmu_update req;
-    void *va;
-    unsigned long gpfn, gmfn, mfn;
-    struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
-    struct vcpu *curr = current, *v = curr;
-    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
-    struct domain_mmap_cache mapcache;
-    uint32_t xsm_needed = 0;
-    uint32_t xsm_checked = 0;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(ureqs)) )
-    {
-        /* See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below. */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmu_update);
-
-    if ( unlikely(!guest_handle_okay(ureqs, count)) )
-        return -EFAULT;
-
-    if ( (pt_dom = foreigndom >> 16) != 0 )
-    {
-        /* Pagetables belong to a foreign domain (PFD). */
-        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -ESRCH;
-
-        if ( pt_owner == d )
-            rcu_unlock_domain(pt_owner);
-        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
-        {
-            rc = -EINVAL;
-            goto out;
-        }
-    }
-
-    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
-    {
-        rc = -ESRCH;
-        goto out;
-    }
-
-    domain_mmap_cache_init(&mapcache);
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
-        switch ( cmd )
-        {
-            /*
-             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
-             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
-             * current A/D bits.
-             */
-        case MMU_NORMAL_PT_UPDATE:
-        case MMU_PT_UPDATE_PRESERVE_AD:
-        {
-            p2m_type_t p2mt;
-
-            rc = -EOPNOTSUPP;
-            if ( unlikely(paging_mode_refcounts(pt_owner)) )
-                break;
-
-            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
-            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
-            {
-                xsm_needed |= XSM_MMU_UPDATE_READ;
-                if ( get_pte_flags(req.val) & _PAGE_RW )
-                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
-            }
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-            rc = -EINVAL;
-
-            req.ptr -= cmd;
-            gmfn = req.ptr >> PAGE_SHIFT;
-            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
-            if ( p2m_is_paged(p2mt) )
-            {
-                ASSERT(!page);
-                p2m_mem_paging_populate(pg_owner, gmfn);
-                rc = -ENOENT;
-                break;
-            }
-
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for normal update\n");
-                break;
-            }
-
-            mfn = page_to_mfn(page);
-            va = map_domain_page_with_cache(mfn, &mapcache);
-            va = (void *)((unsigned long)va +
-                          (unsigned long)(req.ptr & ~PAGE_MASK));
-
-            if ( page_lock(page) )
-            {
-                switch ( page->u.inuse.type_info & PGT_type_mask )
-                {
-                case PGT_l1_page_table:
-                {
-                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    p2m_type_t l1e_p2mt = p2m_ram_rw;
-                    struct page_info *target = NULL;
-                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
-                                        P2M_UNSHARE : P2M_ALLOC;
-
-                    if ( paging_mode_translate(pg_owner) )
-                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
-                                                   &l1e_p2mt, q);
-
-                    if ( p2m_is_paged(l1e_p2mt) )
-                    {
-                        if ( target )
-                            put_page(target);
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
-                    {
-                        rc = -ENOENT;
-                        break;
-                    }
-                    /* If we tried to unshare and failed */
-                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
-                    {
-                        /* We could not have obtained a page ref. */
-                        ASSERT(target == NULL);
-                        /* And mem_sharing_notify has already been called. */
-                        rc = -ENOMEM;
-                        break;
-                    }
-
-                    rc = mod_l1_entry(va, l1e, mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-                                      pg_owner);
-                    if ( target )
-                        put_page(target);
-                }
-                break;
-                case PGT_l2_page_table:
-                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l3_page_table:
-                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l4_page_table:
-                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                break;
-                case PGT_writable_page:
-                    perfc_incr(writable_mmu_updates);
-                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                        rc = 0;
-                    break;
-                }
-                page_unlock(page);
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-            }
-            else if ( get_page_type(page, PGT_writable_page) )
-            {
-                perfc_incr(writable_mmu_updates);
-                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                    rc = 0;
-                put_page_type(page);
-            }
-
-            unmap_domain_page_with_cache(va, &mapcache);
-            put_page(page);
-        }
-        break;
-
-        case MMU_MACHPHYS_UPDATE:
-            if ( unlikely(d != pt_owner) )
-            {
-                rc = -EPERM;
-                break;
-            }
-
-            if ( unlikely(paging_mode_translate(pg_owner)) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            mfn = req.ptr >> PAGE_SHIFT;
-            gpfn = req.val;
-
-            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-
-            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
+                 likely(nx & (PGT_validated|PGT_partial)) )
             {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for mach->phys update\n");
-                rc = -EINVAL;
+                /*
+                 * Page-table pages must be unvalidated when count is zero. The
+                 * 'free' is safe because the refcnt is non-zero and validated
+                 * bit is clear => other ops will spin or fail.
+                 */
+                nx = x & ~(PGT_validated|PGT_partial);
+                if ( unlikely((y = cmpxchg(&page->u.inuse.type_info,
+                                           x, nx)) != x) )
+                    continue;
+                /* We cleared the 'valid bit' so we do the clean up. */
+                rc = __put_final_page_type(page, x, preemptible);
+                if ( x & PGT_partial )
+                    put_page(page);
                 break;
             }
 
-            set_gpfn_from_mfn(mfn, gpfn);
-
-            paging_mark_dirty(pg_owner, _mfn(mfn));
-
-            put_page(mfn_to_page(mfn));
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
+            /*
+             * Record TLB information for flush later. We do not stamp page
+             * tables when running in shadow mode:
+             *  1. Pointless, since it's the shadow pt's which must be tracked.
+             *  2. Shadow mode reuses this field for shadowed page tables to
+             *     store flags info -- we don't want to conflict with that.
+             */
+            if ( !(shadow_mode_enabled(page_get_owner(page)) &&
+                   (page->count_info & PGC_page_table)) )
+                page->tlbflush_timestamp = tlbflush_current_time();
         }
 
-        if ( unlikely(rc) )
+        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
             break;
 
-        guest_handle_add_offset(ureqs, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmu_update, "hihi",
-            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmu_update, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    domain_mmap_cache_destroy(&mapcache);
-
-    perfc_add(num_page_updates, i);
-
- out:
-    if ( pt_owner != d )
-        rcu_unlock_domain(pt_owner);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
+        if ( preemptible && hypercall_preempt_check() )
+            return -EINTR;
     }
 
     return rc;
 }
 
 
-static int create_grant_pte_mapping(
-    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+static int __get_page_type(struct page_info *page, unsigned long type,
+                           int preemptible)
 {
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-    struct domain *d = v->domain;
-
-    adjust_guest_l1e(nl1e, d);
-
-    gmfn = pte_addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+    unsigned long nx, x, y = page->u.inuse.type_info;
+    int rc = 0, iommu_ret = 0;
 
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
+    ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
+    ASSERT(!in_irq());
 
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    for ( ; ; )
     {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
+        x  = y;
+        nx = x + 1;
+        if ( unlikely((nx & PGT_count_mask) == 0) )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Type count overflow on mfn %"PRI_mfn"\n",
+                     page_to_mfn(page));
+            return -EINVAL;
+        }
+        else if ( unlikely((x & PGT_count_mask) == 0) )
+        {
+            struct domain *d = page_get_owner(page);
 
-    ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    } 
+            /* Normally we should never let a page go from type count 0
+             * to type count 1 when it is shadowed. One exception:
+             * out-of-sync shadowed pages are allowed to become
+             * writeable. */
+            if ( d && shadow_mode_enabled(d)
+                 && (page->count_info & PGC_page_table)
+                 && !((page->shadow_flags & (1u<<29))
+                      && type == PGT_writable_page) )
+               shadow_remove_all_shadows(d, _mfn(page_to_mfn(page)));
 
-    page_unlock(page);
+            ASSERT(!(x & PGT_pae_xen_l2));
+            if ( (x & PGT_type_mask) != type )
+            {
+                /*
+                 * On type change we check to flush stale TLB entries. This 
+                 * may be unnecessary (e.g., page was GDT/LDT) but those 
+                 * circumstances should be very rare.
+                 */
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-    if ( !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
+                BUG_ON(in_irq());
+                cpumask_copy(mask, d->domain_dirty_cpumask);
 
- failed:
-    unmap_domain_page(va);
-    put_page(page);
+                /* Don't flush if the timestamp is old enough */
+                tlbflush_filter(mask, page->tlbflush_timestamp);
 
-    return rc;
-}
+                if ( unlikely(!cpumask_empty(mask)) &&
+                     /* Shadow mode: track only writable pages. */
+                     (!shadow_mode_enabled(page_get_owner(page)) ||
+                      ((nx & PGT_type_mask) == PGT_writable_page)) )
+                {
+                    perfc_incr(need_flush_tlb_flush);
+                    flush_tlb_mask(mask);
+                }
 
-static int destroy_grant_pte_mapping(
-    uint64_t addr, unsigned long frame, struct domain *d)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
+                /* We lose existing type and validity. */
+                nx &= ~(PGT_type_mask | PGT_validated);
+                nx |= type;
 
-    gmfn = addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+                /* No special validation needed for writable pages. */
+                /* Page tables and GDT/LDT need to be scanned for validity. */
+                if ( type == PGT_writable_page || type == PGT_shared_page )
+                    nx |= PGT_validated;
+            }
+        }
+        else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
+        {
+            /* Don't log failure if it could be a recursive-mapping attempt. */
+            if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
+                 (type == PGT_l1_page_table) )
+                return -EINVAL;
+            if ( ((x & PGT_type_mask) == PGT_l3_page_table) &&
+                 (type == PGT_l2_page_table) )
+                return -EINVAL;
+            if ( ((x & PGT_type_mask) == PGT_l4_page_table) &&
+                 (type == PGT_l3_page_table) )
+                return -EINVAL;
+            gdprintk(XENLOG_WARNING,
+                     "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") "
+                     "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
+                     x, type, page_to_mfn(page),
+                     get_gpfn_from_mfn(page_to_mfn(page)));
+            return -EINVAL;
+        }
+        else if ( unlikely(!(x & PGT_validated)) )
+        {
+            if ( !(x & PGT_partial) )
+            {
+                /* Someone else is updating validation of this page. Wait... */
+                while ( (y = page->u.inuse.type_info) == x )
+                {
+                    if ( preemptible && hypercall_preempt_check() )
+                        return -EINTR;
+                    cpu_relax();
+                }
+                continue;
+            }
+            /* Type ref count was left at 1 when PGT_partial got set. */
+            ASSERT((x & PGT_count_mask) == 1);
+            nx = x & ~PGT_partial;
+        }
 
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+        if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) )
+            break;
 
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
+        if ( preemptible && hypercall_preempt_check() )
+            return -EINTR;
     }
 
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    if ( unlikely((x & PGT_type_mask) != type) )
     {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
+        /* Special pages should not be accessible from devices. */
+        struct domain *d = page_get_owner(page);
+        if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
+        {
+            if ( (x & PGT_type_mask) == PGT_writable_page )
+                iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
+            else if ( type == PGT_writable_page )
+                iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
+                                           page_to_mfn(page),
+                                           IOMMUF_readable|IOMMUF_writable);
+        }
     }
 
-    ol1e = *(l1_pgentry_t *)va;
-    
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    if ( unlikely(!(nx & PGT_validated)) )
     {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n",
-                 l1e_get_intpte(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto failed;
+        if ( !(x & PGT_partial) )
+        {
+            page->nr_validated_ptes = 0;
+            page->partial_pte = 0;
+        }
+        rc = alloc_page_type(page, type, preemptible);
     }
 
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY
-                  (l1, 
-                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
-                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
-                   0)) )
-    {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
-        rc = GNTST_general_error;
-        goto failed;
-    }
+    if ( (x & PGT_partial) && !(nx & PGT_partial) )
+        put_page(page);
 
-    page_unlock(page);
+    if ( !rc )
+        rc = iommu_ret;
 
- failed:
-    unmap_domain_page(va);
-    put_page(page);
     return rc;
 }
 
-
-static int create_grant_va_mapping(
-    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+void put_page_type(struct page_info *page)
 {
-    l1_pgentry_t *pl1e, ol1e;
-    struct domain *d = v->domain;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int okay;
-    
-    adjust_guest_l1e(nl1e, d);
-
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+    int rc = __put_page_type(page, 0);
+    ASSERT(rc == 0);
+    (void)rc;
+}
 
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
+int get_page_type(struct page_info *page, unsigned long type)
+{
+    int rc = __get_page_type(page, type, 0);
+    if ( likely(rc == 0) )
+        return 1;
+    ASSERT(rc != -EINTR && rc != -ERESTART);
+    return 0;
+}
 
-    if ( okay && !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
+int put_page_type_preemptible(struct page_info *page)
+{
+    return __put_page_type(page, 1);
+}
 
-    return okay ? GNTST_okay : GNTST_general_error;
+int get_page_type_preemptible(struct page_info *page, unsigned long type)
+{
+    ASSERT(!current->arch.old_guest_table);
+    return __get_page_type(page, type, 1);
 }
 
-static int replace_grant_va_mapping(
-    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+int vcpu_destroy_pagetables(struct vcpu *v)
 {
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc = 0;
-    
-    pl1e = guest_map_l1e(addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr);
-        return GNTST_general_error;
-    }
+    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
+    struct page_info *page;
+    l4_pgentry_t *l4tab = NULL;
+    int rc = put_old_guest_table(v);
 
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        rc = GNTST_general_error;
-        goto out;
-    }
+    if ( rc )
+        return rc;
 
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
+    if ( is_pv_32bit_vcpu(v) )
     {
-        rc = GNTST_general_error;
-        put_page(l1pg);
-        goto out;
+        l4tab = map_domain_page(_mfn(mfn));
+        mfn = l4e_get_pfn(*l4tab);
     }
 
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    if ( mfn )
     {
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        page = mfn_to_page(mfn);
+        if ( paging_mode_refcounts(v->domain) )
+            put_page(page);
+        else
+            rc = put_page_and_type_preemptible(page);
     }
 
-    ol1e = *pl1e;
-
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    if ( l4tab )
     {
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
-                 l1e_get_pfn(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        if ( !rc )
+            l4e_write(l4tab, l4e_empty());
+        unmap_domain_page(l4tab);
     }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    else if ( !rc )
     {
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
+        v->arch.guest_table = pagetable_null();
+
+        /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */
+        mfn = pagetable_get_pfn(v->arch.guest_table_user);
+        if ( mfn )
+        {
+            page = mfn_to_page(mfn);
+            if ( paging_mode_refcounts(v->domain) )
+                put_page(page);
+            else
+                rc = put_page_and_type_preemptible(page);
+        }
+        if ( !rc )
+            v->arch.guest_table_user = pagetable_null();
     }
 
- unlock_and_out:
-    page_unlock(l1pg);
-    put_page(l1pg);
- out:
-    guest_unmap_l1e(pl1e);
-    return rc;
-}
+    v->arch.cr3 = 0;
 
-static int destroy_grant_va_mapping(
-    unsigned long addr, unsigned long frame, struct vcpu *v)
-{
-    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+    /*
+     * put_page_and_type_preemptible() is liable to return -EINTR. The
+     * callers of us expect -ERESTART so convert it over.
+     */
+    return rc != -EINTR ? rc : -ERESTART;
 }
 
 static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
@@ -4267,34 +1032,6 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
         return GNTST_okay;
 }
 
-static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                   unsigned int flags, unsigned int cache_flags)
-{
-    l1_pgentry_t pte;
-    uint32_t grant_pte_flags;
-
-    grant_pte_flags =
-        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
-    if ( cpu_has_nx )
-        grant_pte_flags |= _PAGE_NX_BIT;
-
-    pte = l1e_from_pfn(frame, grant_pte_flags);
-    if ( (flags & GNTMAP_application_map) )
-        l1e_add_flags(pte,_PAGE_USER);
-    if ( !(flags & GNTMAP_readonly) )
-        l1e_add_flags(pte,_PAGE_RW);
-
-    l1e_add_flags(pte,
-                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
-                   & _PAGE_AVAIL);
-
-    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
-
-    if ( flags & GNTMAP_contains_pte )
-        return create_grant_pte_mapping(addr, pte, current);
-    return create_grant_va_mapping(addr, pte, current);
-}
-
 int create_grant_host_mapping(uint64_t addr, unsigned long frame,
                               unsigned int flags, unsigned int cache_flags)
 {
@@ -4327,453 +1064,108 @@ static int replace_grant_p2m_mapping(
     guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
 
     put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
-static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                    uint64_t new_addr, unsigned int flags)
-{
-    struct vcpu *curr = current;
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc;
-
-    if ( flags & GNTMAP_contains_pte )
-    {
-        if ( !new_addr )
-            return destroy_grant_pte_mapping(addr, frame, curr->domain);
-
-        return GNTST_general_error;
-    }
-
-    if ( !new_addr )
-        return destroy_grant_va_mapping(addr, frame, curr);
-
-    pl1e = guest_map_l1e(new_addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
-                                gl1mfn, curr, 0)) )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
-
-    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
-    if ( rc && !paging_mode_refcounts(curr->domain) )
-        put_page_from_l1e(ol1e, curr->domain);
-
-    return rc;
-}
-
-int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
-                               uint64_t new_addr, unsigned int flags)
-{
-    if ( paging_mode_external(current->domain) )
-        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
-
-    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
-}
-
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
-        goto fail;
-
-    if ( d->is_dying )
-        goto fail;
-
-    if ( page->count_info & ~(PGC_allocated | 1) )
-        goto fail;
-
-    if ( !(memflags & MEMF_no_refcount) )
-    {
-        if ( d->tot_pages >= d->max_pages )
-            goto fail;
-        domain_adjust_tot_pages(d, 1);
-    }
-
-    page->count_info = PGC_allocated | 1;
-    page_set_owner(page, d);
-    page_list_add_tail(page,&d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
-             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             page_to_mfn(page), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -1;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    unsigned long x, y;
-    bool_t drop_dom_ref = 0;
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
-        goto fail;
-
-    /*
-     * We require there is just one reference (PGC_allocated). We temporarily
-     * drop this reference now so that we can safely swizzle the owner.
-     */
-    y = page->count_info;
-    do {
-        x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
-            goto fail;
-        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
-    } while ( y != x );
-
-    /* Swizzle the owner then reinstate the PGC_allocated reference. */
-    page_set_owner(page, NULL);
-    y = page->count_info;
-    do {
-        x = y;
-        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
-
-    /* Unlink from original owner. */
-    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
-        drop_dom_ref = 1;
-    page_list_del(page, &d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    if ( unlikely(drop_dom_ref) )
-        put_domain(d);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn
-             " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             page_to_mfn(page), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -1;
-}
-
-static int __do_update_va_mapping(
-    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
-    l1_pgentry_t   val = l1e_from_intpte(val64);
-    struct vcpu   *v   = current;
-    struct domain *d   = v->domain;
-    struct page_info *gl1pg;
-    l1_pgentry_t  *pl1e;
-    unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t     *mask = NULL;
-    int            rc;
-
-    perfc_incr(calls_to_update_va);
-
-    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
-    if ( rc )
-        return rc;
-
-    rc = -EINVAL;
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
-        goto out;
-
-    gl1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(gl1pg) )
-    {
-        put_page(gl1pg);
-        goto out;
-    }
-
-    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(gl1pg);
-        put_page(gl1pg);
-        goto out;
-    }
-
-    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
-
-    page_unlock(gl1pg);
-    put_page(gl1pg);
-
- out:
-    if ( pl1e )
-        guest_unmap_l1e(pl1e);
-
-    switch ( flags & UVMF_FLUSHTYPE_MASK )
-    {
-    case UVMF_TLB_FLUSH:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            flush_tlb_local();
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_mask(mask);
-        break;
-
-    case UVMF_INVLPG:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            paging_invlpg(v, va);
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_one_mask(mask, va);
-        break;
-    }
-
-    return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
-                          unsigned long flags)
-{
-    return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
-                                      unsigned long flags,
-                                      domid_t domid)
-{
-    struct domain *pg_owner;
-    int rc;
-
-    if ( (pg_owner = get_pg_owner(domid)) == NULL )
-        return -ESRCH;
-
-    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
-    put_pg_owner(pg_owner);
-
-    return rc;
-}
-
-
-
-/*************************
- * Descriptor Tables
- */
-
-void destroy_gdt(struct vcpu *v)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
-
-    v->arch.pv_vcpu.gdt_ents = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
-    {
-        pfn = l1e_get_pfn(pl1e[i]);
-        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
-            put_page_and_type(mfn_to_page(pfn));
-        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
-        v->arch.pv_vcpu.gdt_frames[i] = 0;
-    }
-}
-
-
-long set_gdt(struct vcpu *v, 
-             unsigned long *frames,
-             unsigned int entries)
-{
-    struct domain *d = v->domain;
-    l1_pgentry_t *pl1e;
-    /* NB. There are 512 8-byte entries per GDT page. */
-    unsigned int i, nr_pages = (entries + 511) / 512;
-
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    /* Check the pages in the new GDT. */
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        struct page_info *page;
-
-        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
-        if ( !page )
-            goto fail;
-        if ( !get_page_type(page, PGT_seg_desc_page) )
-        {
-            put_page(page);
-            goto fail;
-        }
-        frames[i] = page_to_mfn(page);
-    }
-
-    /* Tear down the old GDT. */
-    destroy_gdt(v);
-
-    /* Install the new GDT. */
-    v->arch.pv_vcpu.gdt_ents = entries;
-    pl1e = gdt_ldt_ptes(d, v);
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
-        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
-    }
-
-    return 0;
-
- fail:
-    while ( i-- > 0 )
-    {
-        put_page_and_type(mfn_to_page(frames[i]));
-    }
-    return -EINVAL;
+    return GNTST_okay;
 }
 
-
-long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
-                unsigned int entries)
+int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
+                               uint64_t new_addr, unsigned int flags)
 {
-    int nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    struct vcpu *curr = current;
-    long ret;
+    if ( paging_mode_external(current->domain) )
+        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
 
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-    
-    if ( copy_from_guest(frames, frame_list, nr_pages) )
-        return -EFAULT;
+    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
+}
+
+int donate_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    const struct domain *owner = dom_xen;
 
-    domain_lock(curr->domain);
+    spin_lock(&d->page_alloc_lock);
 
-    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
-        flush_tlb_local();
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
+        goto fail;
 
-    domain_unlock(curr->domain);
+    if ( d->is_dying )
+        goto fail;
 
-    return ret;
-}
+    if ( page->count_info & ~(PGC_allocated | 1) )
+        goto fail;
 
+    if ( !(memflags & MEMF_no_refcount) )
+    {
+        if ( d->tot_pages >= d->max_pages )
+            goto fail;
+        domain_adjust_tot_pages(d, 1);
+    }
 
-long do_update_descriptor(u64 pa, u64 desc)
-{
-    struct domain *dom = current->domain;
-    unsigned long gmfn = pa >> PAGE_SHIFT;
-    unsigned long mfn;
-    unsigned int  offset;
-    struct desc_struct *gdt_pent, d;
-    struct page_info *page;
-    long ret = -EINVAL;
+    page->count_info = PGC_allocated | 1;
+    page_set_owner(page, d);
+    page_list_add_tail(page,&d->page_list);
 
-    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+    spin_unlock(&d->page_alloc_lock);
+    return 0;
 
-    *(u64 *)&d = desc;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
+             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             page_to_mfn(page), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -1;
+}
 
-    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
-    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
-         !page ||
-         !check_descriptor(dom, &d) )
-    {
-        if ( page )
-            put_page(page);
-        return -EINVAL;
-    }
-    mfn = page_to_mfn(page);
+int steal_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    unsigned long x, y;
+    bool_t drop_dom_ref = 0;
+    const struct domain *owner = dom_xen;
 
-    /* Check if the given frame is in use in an unsafe context. */
-    switch ( page->u.inuse.type_info & PGT_type_mask )
-    {
-    case PGT_seg_desc_page:
-        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
-            goto out;
-        break;
-    default:
-        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
-            goto out;
-        break;
-    }
+    spin_lock(&d->page_alloc_lock);
 
-    paging_mark_dirty(dom, _mfn(mfn));
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
+        goto fail;
 
-    /* All is good so make the update. */
-    gdt_pent = map_domain_page(_mfn(mfn));
-    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
-    unmap_domain_page(gdt_pent);
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
 
-    put_page_type(page);
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
 
-    ret = 0; /* success */
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = 1;
+    page_list_del(page, &d->page_list);
 
- out:
-    put_page(page);
+    spin_unlock(&d->page_alloc_lock);
+    if ( unlikely(drop_dom_ref) )
+        put_domain(d);
+    return 0;
 
-    return ret;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn
+             " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             page_to_mfn(page), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -1;
 }
 
 typedef struct e820entry e820entry_t;
@@ -5181,466 +1573,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return 0;
 }
 
-
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                 ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t)-1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               &ptwr_ctxt->ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes*8)) - 1);
-        val <<= (offset)*8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes*8)) - 1);
-        old <<= (offset)*8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(mfn);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        int okay;
-        intpte_t t = old;
-        ol1e = l1e_from_intpte(old);
-
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(
-        offset, 0, val, bytes, 0,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
-    {
-        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(
-        offset, old, new, bytes, 1,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
-                                              : X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, 
-                       struct cpu_user_regs *regs)
-{
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .ctxt = {
-            .regs = regs,
-            .vendor = d->arch.cpuid->x86_vendor,
-            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .swint_emulate = x86_swint_emulate_none,
-        },
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
-         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
-        goto bail;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        goto bail;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        goto bail;
-    }
-
-    ptwr_ctxt.cr2 = addr;
-    ptwr_ctxt.pte = pte;
-
-    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to pagetables which are marked
-         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
-         * update has succeeded on a different vcpu).  Anything else is an
-         * emulation bug, or a guest playing with the instruction stream under
-         * Xen's feet.
-         */
-        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ptwr_ctxt.ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ptwr_ctxt.ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
- bail:
-    return 0;
-}
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
-int mmio_ro_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
-
-    /* Only allow naturally-aligned stores at the original %cr2 address. */
-    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
-         offset != mmio_ro_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ro_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmio_ro_emulated_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-int mmcfg_intercept_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
-    /*
-     * Only allow naturally-aligned stores no wider than 4 bytes to the
-     * original %cr2 address.
-     */
-    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
-         offset != mmio_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    offset &= 0xfff;
-    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
-                                  offset, bytes, p_data) >= 0 )
-        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
-                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
-                        *(uint32_t *)p_data);
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmcfg_intercept_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
-                          struct cpu_user_regs *regs)
-{
-    l1_pgentry_t pte;
-    unsigned long mfn;
-    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
-    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
-    struct x86_emulate_ctxt ctxt = {
-        .regs = regs,
-        .vendor = v->domain->arch.cpuid->x86_vendor,
-        .addr_size = addr_size,
-        .sp_size = addr_size,
-        .swint_emulate = x86_swint_emulate_none,
-        .data = &mmio_ro_ctxt
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of MMIO pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
-        return 0;
-
-    mfn = l1e_get_pfn(pte);
-    if ( mfn_valid(_mfn(mfn)) )
-    {
-        struct page_info *page = mfn_to_page(mfn);
-        struct domain *owner = page_get_owner_and_reference(page);
-
-        if ( owner )
-            put_page(page);
-        if ( owner != dom_io )
-            return 0;
-    }
-
-    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        return 0;
-
-    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
-        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
-    else
-        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to MMCFG space or read-only MFNs.
-         * We tolerate #PF (from hitting an adjacent page or a successful
-         * concurrent pagetable update).  Anything else is an emulation bug,
-         * or a guest playing with the instruction stream under Xen's feet.
-         */
-        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ctxt.event.type, ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 void *alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index ea94599438..665be5536c 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,2 +1,3 @@
 obj-y += hypercall.o
 obj-bin-y += dom0_build.init.o
+obj-y += mm.o
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..b5277b5d28
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,4118 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * A description of the x86 page table API:
+ *
+ * Domains trap to do_mmu_update with a list of update requests.
+ * This is a list of (ptr, val) pairs, where the requested operation
+ * is *ptr = val.
+ *
+ * Reference counting of pages:
+ * ----------------------------
+ * Each page has two refcounts: tot_count and type_count.
+ *
+ * TOT_COUNT is the obvious reference count. It counts all uses of a
+ * physical page frame by a domain, including uses as a page directory,
+ * a page table, or simple mappings via a PTE. This count prevents a
+ * domain from releasing a frame back to the free pool when it still holds
+ * a reference to it.
+ *
+ * TYPE_COUNT is more subtle. A frame can be put to one of three
+ * mutually-exclusive uses: it might be used as a page directory, or a
+ * page table, or it may be mapped writable by the domain [of course, a
+ * frame may not be used in any of these three ways!].
+ * So, type_count is a count of the number of times a frame is being
+ * referred to in its current incarnation. Therefore, a page can only
+ * change its type when its type count is zero.
+ *
+ * Pinning the page type:
+ * ----------------------
+ * The type of a page can be pinned/unpinned with the commands
+ * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
+ * pinning is not reference counted, so it can't be nested).
+ * This is useful to prevent a page's type count falling to zero, at which
+ * point safety checks would need to be carried out next time the count
+ * is increased again.
+ *
+ * A further note on writable page mappings:
+ * -----------------------------------------
+ * For simplicity, the count of writable mappings for a page may not
+ * correspond to reality. The 'writable count' is incremented for every
+ * PTE which maps the page with the _PAGE_RW flag set. However, for
+ * write access to be possible the page directory entry must also have
+ * its _PAGE_RW bit set. We do not check this as it complicates the
+ * reference counting considerably [consider the case of multiple
+ * directory entries referencing a single page table, some with the RW
+ * bit set, others not -- it starts getting a bit messy].
+ * In normal use, this simplification shouldn't be a problem.
+ * However, the logic can be added if required.
+ *
+ * One more note on read-only page mappings:
+ * -----------------------------------------
+ * We want domains to be able to map pages for read-only access. The
+ * main reason is that page tables and directories should be readable
+ * by a domain, but it would not be safe for them to be writable.
+ * However, domains have free access to rings 1 & 2 of the Intel
+ * privilege model. In terms of page protection, these are considered
+ * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
+ * read-only restrictions are respected in supervisor mode -- if the
+ * bit is clear then any mapped page is writable.
+ *
+ * We get round this by always setting the WP bit and disallowing
+ * updates to it. This is very unlikely to cause a problem for guest
+ * OS's, which will generally use the WP bit to simplify copy-on-write
+ * implementation (in that case, OS wants a fault when it writes to
+ * an application-supplied buffer).
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/iocap.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/trace.h>
+#include <xsm/xsm.h>
+
+#include <asm/ldt.h>
+#include <asm/p2m.h>
+#include <asm/paging.h>
+#include <asm/shadow.h>
+#include <asm/x86_emulate.h>
+
+extern s8 __read_mostly opt_mmio_relax;
+
+extern uint32_t base_disallow_mask;
+/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
+#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+
+#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
+                          ? base_disallow_mask & ~_PAGE_PSE \
+                          : base_disallow_mask)
+
+#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
+                             base_disallow_mask : 0xFFFFF198U)
+
+#define L4_DISALLOW_MASK (base_disallow_mask)
+
+#define l1_disallow_mask(d)                                     \
+    ((d != dom_io) &&                                           \
+     (rangeset_is_empty((d)->iomem_caps) &&                     \
+      rangeset_is_empty((d)->arch.ioport_caps) &&               \
+      !has_arch_pdevs(d) &&                                     \
+      is_pv_domain(d)) ?                                        \
+     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+
+/* Get a mapping of a PV guest's l1e for this virtual address. */
+static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
+{
+    l2_pgentry_t l2e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) )
+        return NULL;
+
+    /* Find this l1e and its enclosing l1mfn in the linear map. */
+    if ( __copy_from_user(&l2e,
+                          &__linear_l2_table[l2_linear_offset(addr)],
+                          sizeof(l2_pgentry_t)) )
+        return NULL;
+
+    /* Check flags that it will be safe to read the l1e. */
+    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+        return NULL;
+
+    *gl1mfn = l2e_get_pfn(l2e);
+
+    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
+           l1_table_offset(addr);
+}
+
+/* Pull down the mapping we got from guest_map_l1e(). */
+static inline void guest_unmap_l1e(void *p)
+{
+    unmap_domain_page(p);
+}
+
+/* Read a PV guest's l1e that maps this virtual address. */
+static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
+{
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) ||
+         __copy_from_user(eff_l1e,
+                          &__linear_l1_table[l1_linear_offset(addr)],
+                          sizeof(l1_pgentry_t)) )
+        *eff_l1e = l1e_empty();
+}
+
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                                          void *eff_l1e)
+{
+    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    TOGGLE_MODE();
+    guest_get_eff_l1e(addr, eff_l1e);
+    TOGGLE_MODE();
+}
+
+const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
+    zero_page[PAGE_SIZE];
+
+static void invalidate_shadow_ldt(struct vcpu *v, int flush)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    struct page_info *page;
+
+    BUG_ON(unlikely(in_irq()));
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
+        goto out;
+
+    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+
+    for ( i = 16; i < 32; i++ )
+    {
+        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
+            continue;
+        page = l1e_get_page(pl1e[i]);
+        l1e_write(&pl1e[i], l1e_empty());
+        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
+        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
+        put_page_and_type(page);
+    }
+
+    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
+    if ( flush )
+        flush_tlb_mask(v->vcpu_dirty_cpumask);
+
+ out:
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+}
+
+
+static int alloc_segdesc_page(struct page_info *page)
+{
+    const struct domain *owner = page_get_owner(page);
+    struct desc_struct *descs = __map_domain_page(page);
+    unsigned i;
+
+    for ( i = 0; i < 512; i++ )
+        if ( unlikely(!check_descriptor(owner, &descs[i])) )
+            break;
+
+    unmap_domain_page(descs);
+
+    return i == 512 ? 0 : -EINVAL;
+}
+
+
+/* Map shadow page at offset @off. */
+int map_ldt_shadow_page(unsigned int off)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    unsigned long gmfn;
+    struct page_info *page;
+    l1_pgentry_t l1e, nl1e;
+    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
+    int okay;
+
+    BUG_ON(unlikely(in_irq()));
+
+    if ( is_pv_32bit_domain(d) )
+        gva = (u32)gva;
+    guest_get_eff_kern_l1e(v, gva, &l1e);
+    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
+        return 0;
+
+    gmfn = l1e_get_pfn(l1e);
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    if ( unlikely(!page) )
+        return 0;
+
+    okay = get_page_type(page, PGT_seg_desc_page);
+    if ( unlikely(!okay) )
+    {
+        put_page(page);
+        return 0;
+    }
+
+    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
+    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    return 1;
+}
+
+
+/*
+ * We allow root tables to map each other (a.k.a. linear page tables). It
+ * needs some special care with reference counts and access permissions:
+ *  1. The mapping entry must be read-only, or the guest may get write access
+ *     to its own PTEs.
+ *  2. We must only bump the reference counts for an *already validated*
+ *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
+ *     on a validation that is required to complete that validation.
+ *  3. We only need to increment the reference counts for the mapped page
+ *     frame if it is mapped by a different root table. This is sufficient and
+ *     also necessary to allow validation of a root table mapping itself.
+ */
+#define define_get_linear_pagetable(level)                                  \
+static int                                                                  \
+get_##level##_linear_pagetable(                                             \
+    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
+{                                                                           \
+    unsigned long x, y;                                                     \
+    struct page_info *page;                                                 \
+    unsigned long pfn;                                                      \
+                                                                            \
+    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
+    {                                                                       \
+        gdprintk(XENLOG_WARNING,                                            \
+                 "Attempt to create linear p.t. with write perms\n");       \
+        return 0;                                                           \
+    }                                                                       \
+                                                                            \
+    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
+    {                                                                       \
+        /* Make sure the mapped frame belongs to the correct domain. */     \
+        if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
+            return 0;                                                       \
+                                                                            \
+        /*                                                                  \
+         * Ensure that the mapped frame is an already-validated page table. \
+         * If so, atomically increment the count (checking for overflow).   \
+         */                                                                 \
+        page = mfn_to_page(pfn);                                            \
+        y = page->u.inuse.type_info;                                        \
+        do {                                                                \
+            x = y;                                                          \
+            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
+                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
+                          (PGT_##level##_page_table|PGT_validated)) )       \
+            {                                                               \
+                put_page(page);                                             \
+                return 0;                                                   \
+            }                                                               \
+        }                                                                   \
+        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
+    }                                                                       \
+                                                                            \
+    return 1;                                                               \
+}
+
+#ifndef NDEBUG
+struct mmio_emul_range_ctxt {
+    const struct domain *d;
+    unsigned long mfn;
+};
+
+static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
+{
+    const struct mmio_emul_range_ctxt *ctxt = arg;
+
+    if ( ctxt->mfn > e )
+        return 0;
+
+    if ( ctxt->mfn >= s )
+    {
+        static DEFINE_SPINLOCK(last_lock);
+        static const struct domain *last_d;
+        static unsigned long last_s = ~0UL, last_e;
+        bool_t print = 0;
+
+        spin_lock(&last_lock);
+        if ( last_d != ctxt->d || last_s != s || last_e != e )
+        {
+            last_d = ctxt->d;
+            last_s = s;
+            last_e = e;
+            print = 1;
+        }
+        spin_unlock(&last_lock);
+
+        if ( print )
+            printk(XENLOG_G_INFO
+                   "d%d: Forcing write emulation on MFNs %lx-%lx\n",
+                   ctxt->d->domain_id, s, e);
+    }
+
+    return 1;
+}
+#endif
+
+int
+get_page_from_l1e(
+    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
+{
+    unsigned long mfn = l1e_get_pfn(l1e);
+    struct page_info *page = mfn_to_page(mfn);
+    uint32_t l1f = l1e_get_flags(l1e);
+    struct vcpu *curr = current;
+    struct domain *real_pg_owner;
+    bool_t write;
+
+    if ( !(l1f & _PAGE_PRESENT) )
+        return 0;
+
+    if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                 l1f & l1_disallow_mask(l1e_owner));
+        return -EINVAL;
+    }
+
+    if ( !mfn_valid(_mfn(mfn)) ||
+         (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
+    {
+        int flip = 0;
+
+        /* Only needed the reference to confirm dom_io ownership. */
+        if ( mfn_valid(_mfn(mfn)) )
+            put_page(page);
+
+        /* DOMID_IO reverts to caller for privilege checks. */
+        if ( pg_owner == dom_io )
+            pg_owner = curr->domain;
+
+        if ( !iomem_access_permitted(pg_owner, mfn, mfn) )
+        {
+            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
+            {
+                gdprintk(XENLOG_WARNING,
+                         "d%d non-privileged attempt to map MMIO space %"PRI_mfn"\n",
+                         pg_owner->domain_id, mfn);
+                return -EPERM;
+            }
+            return -EINVAL;
+        }
+
+        if ( pg_owner != l1e_owner &&
+             !iomem_access_permitted(l1e_owner, mfn, mfn) )
+        {
+            if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */
+            {
+                gdprintk(XENLOG_WARNING,
+                         "d%d attempted to map MMIO space %"PRI_mfn" in d%d to d%d\n",
+                         curr->domain->domain_id, mfn, pg_owner->domain_id,
+                         l1e_owner->domain_id);
+                return -EPERM;
+            }
+            return -EINVAL;
+        }
+
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( !is_hardware_domain(l1e_owner) )
+                    break;
+                /* fallthrough */
+            case -1:
+                return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+#ifndef NDEBUG
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
+#endif
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
+    }
+
+    if ( unlikely( (real_pg_owner != pg_owner) &&
+                   (real_pg_owner != dom_cow) ) )
+    {
+        /*
+         * Let privileged domains transfer the right to map their target
+         * domain's pages. This is used to allow stub-domain pvfb export to
+         * dom0, until pvfb supports granted mappings. At that time this
+         * minor hack can go away.
+         */
+        if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
+             xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n",
+                     pg_owner->domain_id, l1e_owner->domain_id,
+                     real_pg_owner ? real_pg_owner->domain_id : -1);
+            goto could_not_pin;
+        }
+        pg_owner = real_pg_owner;
+    }
+
+    /* Extra paranoid check for shared memory. Writable mappings
+     * disallowed (unshare first!) */
+    if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) )
+        goto could_not_pin;
+
+    /* Foreign mappings into guests in shadow external mode don't
+     * contribute to writeable mapping refcounts.  (This allows the
+     * qemu-dm helper process in dom0 to map the domain's memory without
+     * messing up the count of "real" writable mappings.) */
+    write = (l1f & _PAGE_RW) &&
+            ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner));
+    if ( write && !get_page_type(page, PGT_writable_page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page type PGT_writable_page\n");
+        goto could_not_pin;
+    }
+
+    if ( pte_flags_to_cacheattr(l1f) !=
+         ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
+    {
+        unsigned long x, nx, y = page->count_info;
+        unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
+        int err;
+
+        if ( is_xen_heap_page(page) )
+        {
+            if ( write )
+                put_page_type(page);
+            put_page(page);
+            gdprintk(XENLOG_WARNING,
+                     "Attempt to change cache attributes of Xen heap page\n");
+            return -EACCES;
+        }
+
+        do {
+            x  = y;
+            nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
+        } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+
+        err = update_xen_mappings(mfn, cacheattr);
+        if ( unlikely(err) )
+        {
+            cacheattr = y & PGC_cacheattr_mask;
+            do {
+                x  = y;
+                nx = (x & ~PGC_cacheattr_mask) | cacheattr;
+            } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
+
+            if ( write )
+                put_page_type(page);
+            put_page(page);
+
+            gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn
+                     " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n",
+                     mfn, get_gpfn_from_mfn(mfn),
+                     l1e_get_intpte(l1e), l1e_owner->domain_id);
+            return err;
+        }
+    }
+
+    return 0;
+
+ could_not_pin:
+    gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn
+             ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d",
+             mfn, get_gpfn_from_mfn(mfn),
+             l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id);
+    if ( real_pg_owner != NULL )
+        put_page(page);
+    return -EBUSY;
+}
+
+
+/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
+define_get_linear_pagetable(l2);
+static int
+get_page_from_l2e(
+    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+{
+    unsigned long mfn = l2e_get_pfn(l2e);
+    int rc;
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
+    {
+        rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+        return rc;
+    }
+
+    if ( !opt_allow_superpage )
+    {
+        gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n");
+        return -EINVAL;
+    }
+
+    if ( mfn & (L1_PAGETABLE_ENTRIES-1) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn);
+        return -EINVAL;
+    }
+
+    return get_superpage(mfn, d);
+}
+
+
+define_get_linear_pagetable(l3);
+static int
+get_page_from_l3e(
+    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
+{
+    int rc;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                 l3e_get_flags(l3e) & l3_disallow_mask(d));
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_pagenr(
+        l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) &&
+         !is_pv_32bit_domain(d) &&
+         get_l3_linear_pagetable(l3e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+define_get_linear_pagetable(l4);
+static int
+get_page_from_l4e(
+    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
+{
+    int rc;
+
+    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_pagenr(
+        l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+#define adjust_guest_l1e(pl1e, d)                                            \
+    do {                                                                     \
+        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
+             likely(!is_pv_32bit_domain(d)) )                                \
+        {                                                                    \
+            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
+            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
+                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
+                gdprintk(XENLOG_WARNING,                                     \
+                         "Global bit is set to kernel page %lx\n",           \
+                         l1e_get_pfn((pl1e)));                               \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
+                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
+                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
+        }                                                                    \
+    } while ( 0 )
+
+#define adjust_guest_l2e(pl2e, d)                               \
+    do {                                                        \
+        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l2e_add_flags((pl2e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define adjust_guest_l3e(pl3e, d)                                   \
+    do {                                                            \
+        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
+            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
+                                         _PAGE_USER :               \
+                                         _PAGE_USER|_PAGE_RW);      \
+    } while ( 0 )
+
+#define adjust_guest_l4e(pl4e, d)                               \
+    do {                                                        \
+        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l4e_add_flags((pl4e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define unadjust_guest_l3e(pl3e, d)                                         \
+    do {                                                                    \
+        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
+             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
+            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
+    } while ( 0 )
+
+void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
+{
+    unsigned long     pfn = l1e_get_pfn(l1e);
+    struct page_info *page;
+    struct domain    *pg_owner;
+    struct vcpu      *v;
+
+    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) )
+        return;
+
+    page = mfn_to_page(pfn);
+    pg_owner = page_get_owner(page);
+
+    /*
+     * Check if this is a mapping that was established via a grant reference.
+     * If it was then we should not be here: we require that such mappings are
+     * explicitly destroyed via the grant-table interface.
+     *
+     * The upshot of this is that the guest can end up with active grants that
+     * it cannot destroy (because it no longer has a PTE to present to the
+     * grant-table interface). This can lead to subtle hard-to-catch bugs,
+     * hence a special grant PTE flag can be enabled to catch the bug early.
+     *
+     * (Note that the undestroyable active grants are not a security hole in
+     * Xen. All active grants can safely be cleaned up when the domain dies.)
+     */
+    if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
+         !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Attempt to implicitly unmap a granted PTE %" PRIpte "\n",
+                 l1e_get_intpte(l1e));
+        domain_crash(l1e_owner);
+    }
+
+    /* Remember we didn't take a type-count of foreign writable mappings
+     * to paging-external domains */
+    if ( (l1e_get_flags(l1e) & _PAGE_RW) &&
+         ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
+    {
+        put_page_and_type(page);
+    }
+    else
+    {
+        /* We expect this is rare so we blow the entire shadow LDT. */
+        if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) ==
+                       PGT_seg_desc_page)) &&
+             unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) &&
+             (l1e_owner == pg_owner) )
+        {
+            for_each_vcpu ( pg_owner, v )
+                invalidate_shadow_ldt(v, 1);
+        }
+        put_page(page);
+    }
+}
+
+static void put_superpage(unsigned long mfn);
+/*
+ * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
+ * Note also that this automatically deals correctly with linear p.t.'s.
+ */
+static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+{
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
+        return 1;
+
+    if ( l2e_get_flags(l2e) & _PAGE_PSE )
+        put_superpage(l2e_get_pfn(l2e));
+    else
+        put_page_and_type(l2e_get_page(l2e));
+
+    return 0;
+}
+
+static void put_data_page(
+    struct page_info *page, int writeable)
+{
+    if ( writeable )
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
+extern int __put_page_type(struct page_info *, int preemptible);
+
+static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                             int partial, bool_t defer)
+{
+    struct page_info *pg;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
+        return 1;
+
+    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
+    {
+        unsigned long mfn = l3e_get_pfn(l3e);
+        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
+
+        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
+        do {
+            put_data_page(mfn_to_page(mfn), writeable);
+        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
+
+        return 0;
+    }
+
+    pg = l3e_get_page(l3e);
+
+    if ( unlikely(partial > 0) )
+    {
+        ASSERT(!defer);
+        return __put_page_type(pg, 1);
+    }
+
+    if ( defer )
+    {
+        current->arch.old_guest_table = pg;
+        return 0;
+    }
+
+    return put_page_and_type_preemptible(pg);
+}
+
+static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                             int partial, bool_t defer)
+{
+    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
+         (l4e_get_pfn(l4e) != pfn) )
+    {
+        struct page_info *pg = l4e_get_page(l4e);
+
+        if ( unlikely(partial > 0) )
+        {
+            ASSERT(!defer);
+            return __put_page_type(pg, 1);
+        }
+
+        if ( defer )
+        {
+            current->arch.old_guest_table = pg;
+            return 0;
+        }
+
+        return put_page_and_type_preemptible(pg);
+    }
+    return 1;
+}
+
+static int alloc_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l1_pgentry_t  *pl1e;
+    unsigned int   i;
+    int            ret = 0;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( is_guest_l1_slot(i) )
+            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
+            {
+            default:
+                goto fail;
+            case 0:
+                break;
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
+                break;
+            }
+
+        adjust_guest_l1e(pl1e[i], d);
+    }
+
+    unmap_domain_page(pl1e);
+    return 0;
+
+ fail:
+    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
+    while ( i-- > 0 )
+        if ( is_guest_l1_slot(i) )
+            put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+    return ret;
+}
+
+static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+{
+    struct page_info *page;
+    l3_pgentry_t     l3e3;
+
+    if ( !is_pv_32bit_domain(d) )
+        return 1;
+
+    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
+
+    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
+    l3e3 = pl3e[3];
+    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
+        return 0;
+    }
+
+    /*
+     * The Xen-private mappings include linear mappings. The L2 thus cannot
+     * be shared by multiple L3 tables. The test here is adequate because:
+     *  1. Cannot appear in slots != 3 because get_page_type() checks the
+     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
+     *  2. Cannot appear in another page table's L3:
+     *     a. alloc_l3_table() calls this function and this check will fail
+     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
+     */
+    page = l3e_get_page(l3e3);
+    BUG_ON(page->u.inuse.type_info & PGT_pinned);
+    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
+    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
+        return 0;
+    }
+
+    return 1;
+}
+
+static int alloc_l2_table(struct page_info *page, unsigned long type,
+                          int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l2_pgentry_t  *pl2e;
+    unsigned int   i;
+    int            rc = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( preemptible && i > page->nr_validated_ptes
+             && hypercall_preempt_check() )
+        {
+            page->nr_validated_ptes = i;
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( !is_guest_l2_slot(d, type, i) ||
+             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
+            continue;
+
+        if ( rc < 0 )
+        {
+            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
+            while ( i-- > 0 )
+                if ( is_guest_l2_slot(d, type, i) )
+                    put_page_from_l2e(pl2e[i], pfn);
+            break;
+        }
+
+        adjust_guest_l2e(pl2e[i], d);
+    }
+
+    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    {
+        /* Xen private mappings. */
+        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
+               &compat_idle_pg_table_l2[
+                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
+               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+    }
+
+    unmap_domain_page(pl2e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l3_pgentry_t  *pl3e;
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    /*
+     * PAE guests allocate full pages, but aren't required to initialize
+     * more than the first four entries; when running in compatibility
+     * mode, however, the full page is visible to the MMU, and hence all
+     * 512 entries must be valid/verified, which is most easily achieved
+     * by clearing them out.
+     */
+    if ( is_pv_32bit_domain(d) )
+        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+
+    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( is_pv_32bit_domain(d) && (i == 3) )
+        {
+            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
+                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
+                rc = -EINVAL;
+            else
+                rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]),
+                                                   PGT_l2_page_table |
+                                                   PGT_pae_xen_l2,
+                                                   d, partial, 1);
+        }
+        else if ( !is_guest_l3_slot(i) ||
+                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc == -EINTR && i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            rc = -ERESTART;
+        }
+        if ( rc < 0 )
+            break;
+
+        adjust_guest_l3e(pl3e[i], d);
+    }
+
+    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
+        rc = -EINVAL;
+    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
+    {
+        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
+        if ( i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            current->arch.old_guest_table = page;
+        }
+        while ( i-- > 0 )
+        {
+            if ( !is_guest_l3_slot(i) )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    }
+
+    unmap_domain_page(pl3e);
+    return rc > 0 ? 0 : rc;
+}
+
+#ifndef NDEBUG
+static unsigned int __read_mostly root_pgt_pv_xen_slots
+    = ROOT_PAGETABLE_PV_XEN_SLOTS;
+static l4_pgentry_t __read_mostly split_l4e;
+#else
+#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
+#endif
+
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool_t zap_ro_mpt)
+{
+    /* Xen private mappings. */
+    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
+#ifndef NDEBUG
+    if ( l4e_get_intpte(split_l4e) )
+        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
+            split_l4e;
+#endif
+    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
+        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
+    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
+        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+    if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+}
+
+static int alloc_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( !is_guest_l4_slot(d, i) ||
+             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc < 0 )
+        {
+            if ( rc != -EINTR )
+                gdprintk(XENLOG_WARNING,
+                         "Failure in alloc_l4_table: slot %#x\n", i);
+            if ( i )
+            {
+                page->nr_validated_ptes = i;
+                page->partial_pte = 0;
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else
+                {
+                    if ( current->arch.old_guest_table )
+                        page->nr_validated_ptes++;
+                    current->arch.old_guest_table = page;
+                }
+            }
+        }
+        if ( rc < 0 )
+        {
+            unmap_domain_page(pl4e);
+            return rc;
+        }
+
+        adjust_guest_l4e(pl4e[i], d);
+    }
+
+    if ( rc >= 0 )
+    {
+        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
+        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+    unmap_domain_page(pl4e);
+
+    return rc;
+}
+
+static void free_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l1_pgentry_t *pl1e;
+    unsigned int  i;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+        if ( is_guest_l1_slot(i) )
+            put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+}
+
+static int free_l2_table(struct page_info *page, int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l2_pgentry_t *pl2e;
+    unsigned int  i = page->nr_validated_ptes - 1;
+    int err = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    ASSERT(page->nr_validated_ptes);
+    do {
+        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
+             put_page_from_l2e(pl2e[i], pfn) == 0 &&
+             preemptible && i && hypercall_preempt_check() )
+        {
+           page->nr_validated_ptes = i;
+           err = -ERESTART;
+        }
+    } while ( !err && i-- );
+
+    unmap_domain_page(pl2e);
+
+    if ( !err )
+        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
+
+    return err;
+}
+
+static int free_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l3_pgentry_t *pl3e;
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    do {
+        if ( is_guest_l3_slot(i) )
+        {
+            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
+            if ( rc < 0 )
+                break;
+            partial = 0;
+            if ( rc > 0 )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    } while ( i-- );
+
+    unmap_domain_page(pl3e);
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+    return rc > 0 ? 0 : rc;
+}
+
+static int free_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    do {
+        if ( is_guest_l4_slot(d, i) )
+            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
+        if ( rc < 0 )
+            break;
+        partial = 0;
+    } while ( i-- );
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+
+    unmap_domain_page(pl4e);
+
+    if ( rc >= 0 )
+    {
+        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+
+    return rc;
+}
+
+
+/* How to write an entry to the guest pagetables.
+ * Returns 0 for failure (pointer not valid), 1 for success. */
+static inline int update_intpte(intpte_t *p,
+                                intpte_t old,
+                                intpte_t new,
+                                unsigned long mfn,
+                                struct vcpu *v,
+                                int preserve_ad)
+{
+    int rv = 1;
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
+    {
+        intpte_t t = old;
+        for ( ; ; )
+        {
+            intpte_t _new = new;
+            if ( preserve_ad )
+                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+            if ( unlikely(rv == 0) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Failed to update %" PRIpte " -> %" PRIpte
+                         ": saw %" PRIpte "\n", old, _new, t);
+                break;
+            }
+
+            if ( t == old )
+                break;
+
+            /* Allowed to change in Accessed/Dirty flags only. */
+            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+            old = t;
+        }
+    }
+    return rv;
+}
+
+/* Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
+    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
+                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                  (_m), (_v), (_ad))
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST                                     \
+    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+                        unsigned long gl1mfn, int preserve_ad,
+                        struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+    l1_pgentry_t ol1e;
+    struct domain *pt_dom = pt_vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+        return -EFAULT;
+
+    if ( unlikely(paging_mode_refcounts(pt_dom)) )
+    {
+        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
+            return 0;
+        return -EBUSY;
+    }
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        struct page_info *page = NULL;
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+            return -EINVAL;
+        }
+
+        if ( paging_mode_translate(pg_dom) )
+        {
+            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
+            if ( !page )
+                return -EINVAL;
+            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l1e(nl1e, pt_dom);
+            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                              preserve_ad);
+            if ( page )
+                put_page(page);
+            return rc ? 0 : -EBUSY;
+        }
+
+        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        default:
+            if ( page )
+                put_page(page);
+            return rc;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
+            rc = 0;
+            break;
+        }
+        if ( page )
+            put_page(page);
+
+        adjust_guest_l1e(nl1e, pt_dom);
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                    preserve_ad)) )
+        {
+            ol1e = nl1e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l1e(ol1e, pt_dom);
+    return rc;
+}
+
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e,
+                        l2_pgentry_t nl2e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l2_pgentry_t ol2e;
+    struct domain *d = vcpu->domain;
+    struct page_info *l2pg = mfn_to_page(pfn);
+    unsigned long type = l2pg->u.inuse.type_info;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl2e));
+        return -EPERM;
+    }
+
+    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+        return -EFAULT;
+
+    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l2e(nl2e, d);
+            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+                return 0;
+            return -EBUSY;
+        }
+
+        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+            return rc;
+
+        adjust_guest_l2e(nl2e, d);
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol2e = nl2e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l2e(ol2e, pfn);
+    return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e,
+                        l3_pgentry_t nl3e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l3_pgentry_t ol3e;
+    struct domain *d = vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl3e));
+        return -EINVAL;
+    }
+
+    /*
+     * Disallow updates to final L3 slot. It contains Xen mappings, and it
+     * would be a pain to ensure they remain continuously valid throughout.
+     */
+    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+        return -EINVAL;
+
+    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+        return -EFAULT;
+
+    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l3e(nl3e, d);
+            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l3e(nl3e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l3e(nl3e, d);
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol3e = nl3e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    if ( likely(rc == 0) )
+        if ( !create_pae_xen_mappings(d, pl3e) )
+            BUG();
+
+    put_page_from_l3e(ol3e, pfn, 0, 1);
+    return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e,
+                        l4_pgentry_t nl4e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    struct domain *d = vcpu->domain;
+    l4_pgentry_t ol4e;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl4e));
+        return -EINVAL;
+    }
+
+    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+        return -EFAULT;
+
+    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l4e(nl4e, d);
+            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l4e(nl4e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l4e(nl4e, d);
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol4e = nl4e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    put_page_from_l4e(ol4e, pfn, 0, 1);
+    return rc;
+}
+
+
+int alloc_page_type(struct page_info *page, unsigned long type,
+                    int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    int rc;
+
+    /* A page table is dirtied when its type count becomes non-zero. */
+    if ( likely(owner != NULL) )
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        rc = alloc_l1_table(page);
+        break;
+    case PGT_l2_page_table:
+        rc = alloc_l2_table(page, type, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l4_table(page);
+        break;
+    case PGT_seg_desc_page:
+        rc = alloc_segdesc_page(page);
+        break;
+    default:
+        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n",
+               type, page->u.inuse.type_info,
+               page->count_info);
+        rc = -EINVAL;
+        BUG();
+    }
+
+    /* No need for atomic update of type_info here: noone else updates it. */
+    wmb();
+    switch ( rc )
+    {
+    case 0:
+        page->u.inuse.type_info |= PGT_validated;
+        break;
+    case -EINTR:
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        page->u.inuse.type_info &= ~PGT_count_mask;
+        break;
+    default:
+        ASSERT(rc < 0);
+        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
+                 " (pfn %" PRI_pfn ") for type %" PRtype_info
+                 ": caf=%08lx taf=%" PRtype_info "\n",
+                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
+                 type, page->count_info, page->u.inuse.type_info);
+        if ( page != current->arch.old_guest_table )
+            page->u.inuse.type_info = 0;
+        else
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_count_mask | PGT_validated)) == 1);
+    case -ERESTART:
+            get_page_light(page);
+            page->u.inuse.type_info |= PGT_partial;
+        }
+        break;
+    }
+
+    return rc;
+}
+
+int free_page_type(struct page_info *page, unsigned long type,
+                   int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    unsigned long gmfn;
+    int rc;
+
+    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
+    {
+        /* A page table is dirtied when its type count becomes zero. */
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+        if ( shadow_mode_refcounts(owner) )
+            return 0;
+
+        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
+        ASSERT(VALID_M2P(gmfn));
+        /* Page sharing not supported for shadowed domains */
+        if(!SHARED_M2P(gmfn))
+            shadow_remove_all_shadows(owner, _mfn(gmfn));
+    }
+
+    if ( !(type & PGT_partial) )
+    {
+        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
+        page->partial_pte = 0;
+    }
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        free_l1_table(page);
+        rc = 0;
+        break;
+    case PGT_l2_page_table:
+        rc = free_l2_table(page, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = free_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = free_l4_table(page);
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
+                 type, page_to_mfn(page));
+        rc = -EINVAL;
+        BUG();
+    }
+
+    return rc;
+}
+
+static int get_spage_pages(struct page_info *page, struct domain *d)
+{
+    int i;
+
+    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
+    {
+        if (!get_page_and_type(page, d, PGT_writable_page))
+        {
+            while (--i >= 0)
+                put_page_and_type(--page);
+            return 0;
+        }
+    }
+    return 1;
+}
+
+static void put_spage_pages(struct page_info *page)
+{
+    int i;
+
+    for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++)
+    {
+        put_page_and_type(page);
+    }
+    return;
+}
+
+static int mark_superpage(struct spage_info *spage, struct domain *d)
+{
+    unsigned long x, nx, y = spage->type_info;
+    int pages_done = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    do {
+        x = y;
+        nx = x + 1;
+        if ( (x & SGT_type_mask) == SGT_mark )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Duplicate superpage mark attempt mfn %" PRI_mfn "\n",
+                     spage_to_mfn(spage));
+            if ( pages_done )
+                put_spage_pages(spage_to_page(spage));
+            return -EINVAL;
+        }
+        if ( (x & SGT_type_mask) == SGT_dynamic )
+        {
+            if ( pages_done )
+            {
+                put_spage_pages(spage_to_page(spage));
+                pages_done = 0;
+            }
+        }
+        else if ( !pages_done )
+        {
+            if ( !get_spage_pages(spage_to_page(spage), d) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Superpage type conflict in mark attempt mfn %" PRI_mfn "\n",
+                         spage_to_mfn(spage));
+                return -EINVAL;
+            }
+            pages_done = 1;
+        }
+        nx = (nx & ~SGT_type_mask) | SGT_mark;
+
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    return 0;
+}
+
+static int unmark_superpage(struct spage_info *spage)
+{
+    unsigned long x, nx, y = spage->type_info;
+    unsigned long do_pages = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    do {
+        x = y;
+        nx = x - 1;
+        if ( (x & SGT_type_mask) != SGT_mark )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n",
+                     spage_to_mfn(spage));
+            return -EINVAL;
+        }
+        if ( (nx & SGT_count_mask) == 0 )
+        {
+            nx = (nx & ~SGT_type_mask) | SGT_none;
+            do_pages = 1;
+        }
+        else
+        {
+            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
+        }
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    if ( do_pages )
+        put_spage_pages(spage_to_page(spage));
+
+    return 0;
+}
+
+void clear_superpage_mark(struct page_info *page)
+{
+    struct spage_info *spage;
+
+    if ( !opt_allow_superpage )
+        return;
+
+    spage = page_to_spage(page);
+    if ((spage->type_info & SGT_type_mask) == SGT_mark)
+        unmark_superpage(spage);
+
+}
+
+int get_superpage(unsigned long mfn, struct domain *d)
+{
+    struct spage_info *spage;
+    unsigned long x, nx, y;
+    int pages_done = 0;
+
+    ASSERT(opt_allow_superpage);
+
+    if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
+        return -EINVAL;
+
+    spage = mfn_to_spage(mfn);
+    y = spage->type_info;
+    do {
+        x = y;
+        nx = x + 1;
+        if ( (x & SGT_type_mask) != SGT_none )
+        {
+            if ( pages_done )
+            {
+                put_spage_pages(spage_to_page(spage));
+                pages_done = 0;
+            }
+        }
+        else
+        {
+            if ( !get_spage_pages(spage_to_page(spage), d) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Type conflict on superpage mapping mfn %" PRI_mfn "\n",
+                         spage_to_mfn(spage));
+                return -EINVAL;
+            }
+            pages_done = 1;
+            nx = (nx & ~SGT_type_mask) | SGT_dynamic;
+        }
+    } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x );
+
+    return 0;
+}
+
+static void put_superpage(unsigned long mfn)
+{
+    struct spage_info *spage;
+    unsigned long x, nx, y;
+    unsigned long do_pages = 0;
+
+    if ( !opt_allow_superpage )
+    {
+        put_spage_pages(mfn_to_page(mfn));
+        return;
+    }
+
+    spage = mfn_to_spage(mfn);
+    y = spage->type_info;
+    do {
+        x = y;
+        nx = x - 1;
+        if ((x & SGT_type_mask) == SGT_dynamic)
+        {
+            if ((nx & SGT_count_mask) == 0)
+            {
+                nx = (nx & ~SGT_type_mask) | SGT_none;
+                do_pages = 1;
+            }
+        }
+
+    } while ((y = cmpxchg(&spage->type_info, x, nx)) != x);
+
+    if (do_pages)
+        put_spage_pages(spage_to_page(spage));
+
+    return;
+}
+
+int put_old_guest_table(struct vcpu *v)
+{
+    int rc;
+
+    if ( !v->arch.old_guest_table )
+        return 0;
+
+    switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) )
+    {
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    }
+
+    v->arch.old_guest_table = NULL;
+
+    return rc;
+}
+
+int new_guest_cr3(unsigned long mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    int rc;
+    unsigned long old_base_mfn;
+
+    if ( is_pv_32bit_domain(d) )
+    {
+        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
+        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
+
+        rc = paging_mode_refcounts(d)
+             ? -EINVAL /* Old code was broken, but what should it be? */
+             : mod_l4_entry(
+                    pl4e,
+                    l4e_from_pfn(
+                        mfn,
+                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
+                    gt_mfn, 0, curr);
+        unmap_domain_page(pl4e);
+        switch ( rc )
+        {
+        case 0:
+            break;
+        case -EINTR:
+        case -ERESTART:
+            return -ERESTART;
+        default:
+            gdprintk(XENLOG_WARNING,
+                     "Error while installing new compat baseptr %" PRI_mfn "\n",
+                     mfn);
+            return rc;
+        }
+
+        invalidate_shadow_ldt(curr, 0);
+        write_ptbase(curr);
+
+        return 0;
+    }
+
+    rc = put_old_guest_table(curr);
+    if ( unlikely(rc) )
+        return rc;
+
+    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
+    /*
+     * This is particularly important when getting restarted after the
+     * previous attempt got preempted in the put-old-MFN phase.
+     */
+    if ( old_base_mfn == mfn )
+    {
+        write_ptbase(curr);
+        return 0;
+    }
+
+    rc = paging_mode_refcounts(d)
+         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
+         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
+    switch ( rc )
+    {
+    case 0:
+        break;
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    default:
+        gdprintk(XENLOG_WARNING,
+                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
+        return rc;
+    }
+
+    invalidate_shadow_ldt(curr, 0);
+
+    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+        fill_ro_mpt(mfn);
+    curr->arch.guest_table = pagetable_from_pfn(mfn);
+    update_cr3(curr);
+
+    write_ptbase(curr);
+
+    if ( likely(old_base_mfn != 0) )
+    {
+        struct page_info *page = mfn_to_page(old_base_mfn);
+
+        if ( paging_mode_refcounts(d) )
+            put_page(page);
+        else
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+                rc = -ERESTART;
+                /* fallthrough */
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+    }
+
+    return rc;
+}
+
+static struct domain *get_pg_owner(domid_t domid)
+{
+    struct domain *pg_owner = NULL, *curr = current->domain;
+
+    if ( likely(domid == DOMID_SELF) )
+    {
+        pg_owner = rcu_lock_current_domain();
+        goto out;
+    }
+
+    if ( unlikely(domid == curr->domain_id) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
+        goto out;
+    }
+
+    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Cannot mix foreign mappings with translated domains\n");
+        goto out;
+    }
+
+    switch ( domid )
+    {
+    case DOMID_IO:
+        pg_owner = rcu_lock_domain(dom_io);
+        break;
+    case DOMID_XEN:
+        pg_owner = rcu_lock_domain(dom_xen);
+        break;
+    default:
+        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
+        {
+            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
+            break;
+        }
+        break;
+    }
+
+ out:
+    return pg_owner;
+}
+
+static void put_pg_owner(struct domain *pg_owner)
+{
+    rcu_unlock_domain(pg_owner);
+}
+
+static inline int vcpumask_to_pcpumask(
+    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
+{
+    unsigned int vcpu_id, vcpu_bias, offs;
+    unsigned long vmask;
+    struct vcpu *v;
+    bool_t is_native = !is_pv_32bit_domain(d);
+
+    cpumask_clear(pmask);
+    for ( vmask = 0, offs = 0; ; ++offs)
+    {
+        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
+        if ( vcpu_bias >= d->max_vcpus )
+            return 0;
+
+        if ( unlikely(is_native ?
+                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
+                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
+                                             offs, 1)) )
+        {
+            cpumask_clear(pmask);
+            return -EFAULT;
+        }
+
+        while ( vmask )
+        {
+            vcpu_id = find_first_set_bit(vmask);
+            vmask &= ~(1UL << vcpu_id);
+            vcpu_id += vcpu_bias;
+            if ( (vcpu_id >= d->max_vcpus) )
+                return 0;
+            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
+                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
+        }
+    }
+}
+
+long do_mmuext_op(
+    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
+    unsigned int count,
+    XEN_GUEST_HANDLE_PARAM(uint) pdone,
+    unsigned int foreigndom)
+{
+    struct mmuext_op op;
+    unsigned long type;
+    unsigned int i, done = 0;
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct domain *pg_owner;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(uops)) )
+    {
+        /* See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below. */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmuext_op);
+
+    if ( unlikely(!guest_handle_okay(uops, count)) )
+        return -EFAULT;
+
+    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
+        return -ESRCH;
+
+    if ( !is_pv_domain(pg_owner) )
+    {
+        put_pg_owner(pg_owner);
+        return -EINVAL;
+    }
+
+    rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner);
+    if ( rc )
+    {
+        put_pg_owner(pg_owner);
+        return rc;
+    }
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( is_hvm_domain(d) )
+        {
+            switch ( op.cmd )
+            {
+            case MMUEXT_PIN_L1_TABLE:
+            case MMUEXT_PIN_L2_TABLE:
+            case MMUEXT_PIN_L3_TABLE:
+            case MMUEXT_PIN_L4_TABLE:
+            case MMUEXT_UNPIN_TABLE:
+                break;
+            default:
+                rc = -EOPNOTSUPP;
+                goto done;
+            }
+        }
+
+        rc = 0;
+
+        switch ( op.cmd )
+        {
+        case MMUEXT_PIN_L1_TABLE:
+            type = PGT_l1_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L2_TABLE:
+            type = PGT_l2_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L3_TABLE:
+            type = PGT_l3_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L4_TABLE:
+            if ( is_pv_32bit_domain(pg_owner) )
+                break;
+            type = PGT_l4_page_table;
+
+        pin_page: {
+            struct page_info *page;
+
+            /* Ignore pinning of invalid paging levels. */
+            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
+                break;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            rc = get_page_type_preemptible(page, type);
+            if ( unlikely(rc) )
+            {
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else if ( rc != -ERESTART )
+                    gdprintk(XENLOG_WARNING,
+                             "Error %d while pinning mfn %" PRI_mfn "\n",
+                            rc, page_to_mfn(page));
+                if ( page != curr->arch.old_guest_table )
+                    put_page(page);
+                break;
+            }
+
+            rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page);
+            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
+                                                  &page->u.inuse.type_info)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page));
+                rc = -EINVAL;
+            }
+
+            if ( unlikely(rc) )
+                goto pin_drop;
+
+            /* A page is dirtied when its pin status is set. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            /* We can race domain destruction (domain_relinquish_resources). */
+            if ( unlikely(pg_owner != d) )
+            {
+                int drop_ref;
+                spin_lock(&pg_owner->page_alloc_lock);
+                drop_ref = (pg_owner->is_dying &&
+                            test_and_clear_bit(_PGT_pinned,
+                                               &page->u.inuse.type_info));
+                spin_unlock(&pg_owner->page_alloc_lock);
+                if ( drop_ref )
+                {
+        pin_drop:
+                    if ( type == PGT_l1_page_table )
+                        put_page_and_type(page);
+                    else
+                        curr->arch.old_guest_table = page;
+                }
+            }
+
+            break;
+        }
+
+        case MMUEXT_UNPIN_TABLE: {
+            struct page_info *page;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
+                         op.arg1.mfn, pg_owner->domain_id);
+                rc = -EINVAL;
+                break;
+            }
+
+            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
+            {
+                put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                rc = 0;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+            put_page(page);
+
+            /* A page is dirtied when its pin status is cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            break;
+        }
+
+        case MMUEXT_NEW_BASEPTR:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(d)) )
+                rc = -EINVAL;
+            else
+                rc = new_guest_cr3(op.arg1.mfn);
+            break;
+
+        case MMUEXT_NEW_USER_BASEPTR: {
+            unsigned long old_mfn;
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(d)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
+            /*
+             * This is particularly important when getting restarted after the
+             * previous attempt got preempted in the put-old-MFN phase.
+             */
+            if ( old_mfn == op.arg1.mfn )
+                break;
+
+            if ( op.arg1.mfn != 0 )
+            {
+                if ( paging_mode_refcounts(d) )
+                    rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL;
+                else
+                    rc = get_page_and_type_from_pagenr(
+                        op.arg1.mfn, PGT_root_page_table, d, 0, 1);
+
+                if ( unlikely(rc) )
+                {
+                    if ( rc == -EINTR )
+                        rc = -ERESTART;
+                    else if ( rc != -ERESTART )
+                        gdprintk(XENLOG_WARNING,
+                                 "Error %d installing new mfn %" PRI_mfn "\n",
+                                 rc, op.arg1.mfn);
+                    break;
+                }
+                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+                    zap_ro_mpt(op.arg1.mfn);
+            }
+
+            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
+
+            if ( old_mfn != 0 )
+            {
+                struct page_info *page = mfn_to_page(old_mfn);
+
+                if ( paging_mode_refcounts(d) )
+                    put_page(page);
+                else
+                    switch ( rc = put_page_and_type_preemptible(page) )
+                    {
+                    case -EINTR:
+                        rc = -ERESTART;
+                        /* fallthrough */
+                    case -ERESTART:
+                        curr->arch.old_guest_table = page;
+                        break;
+                    default:
+                        BUG_ON(rc);
+                        break;
+                    }
+            }
+
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_LOCAL:
+            if ( likely(d == pg_owner) )
+                flush_tlb_local();
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_LOCAL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else
+                paging_invlpg(curr, op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_TLB_FLUSH_MULTI:
+        case MMUEXT_INVLPG_MULTI:
+        {
+            cpumask_t *mask = this_cpu(scratch_cpumask);
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(vcpumask_to_pcpumask(d,
+                                   guest_handle_to_param(op.arg2.vcpumask,
+                                                         const_void),
+                                   mask)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
+                flush_tlb_mask(mask);
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_ALL:
+            if ( likely(d == pg_owner) )
+                flush_tlb_mask(d->domain_dirty_cpumask);
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_ALL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(d->domain_dirty_cpumask, op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_FLUSH_CACHE:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(!cache_flush_permitted(d)) )
+                rc = -EACCES;
+            else
+                wbinvd();
+            break;
+
+        case MMUEXT_FLUSH_CACHE_GLOBAL:
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( likely(cache_flush_permitted(d)) )
+            {
+                unsigned int cpu;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
+
+                cpumask_clear(mask);
+                for_each_online_cpu(cpu)
+                    if ( !cpumask_intersects(mask,
+                                             per_cpu(cpu_sibling_mask, cpu)) )
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
+            }
+            else
+                rc = -EINVAL;
+            break;
+
+        case MMUEXT_SET_LDT:
+        {
+            unsigned int ents = op.arg2.nr_ents;
+            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
+
+            if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( paging_mode_external(d) )
+                rc = -EINVAL;
+            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
+                      (ents > 8192) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
+                rc = -EINVAL;
+            }
+            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
+                      (curr->arch.pv_vcpu.ldt_base != ptr) )
+            {
+                invalidate_shadow_ldt(curr, 0);
+                flush_tlb_local();
+                curr->arch.pv_vcpu.ldt_base = ptr;
+                curr->arch.pv_vcpu.ldt_ents = ents;
+                load_LDT(curr);
+            }
+            break;
+        }
+
+        case MMUEXT_CLEAR_PAGE: {
+            struct page_info *page;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( !page || !get_page_type(page, PGT_writable_page) )
+            {
+                if ( page )
+                    put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            /* A page is dirtied when it's being cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            clear_domain_page(_mfn(page_to_mfn(page)));
+
+            put_page_and_type(page);
+            break;
+        }
+
+        case MMUEXT_COPY_PAGE:
+        {
+            struct page_info *src_page, *dst_page;
+
+            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL,
+                                         P2M_ALLOC);
+            if ( unlikely(!src_page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Error copying from mfn %" PRI_mfn "\n",
+                         op.arg2.src_mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL,
+                                         P2M_ALLOC);
+            rc = (dst_page &&
+                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
+            if ( unlikely(rc) )
+            {
+                put_page(src_page);
+                if ( dst_page )
+                    put_page(dst_page);
+                gdprintk(XENLOG_WARNING,
+                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
+                break;
+            }
+
+            /* A page is dirtied when it's being copied to. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
+
+            copy_domain_page(_mfn(page_to_mfn(dst_page)),
+                             _mfn(page_to_mfn(src_page)));
+
+            put_page_and_type(dst_page);
+            put_page(src_page);
+            break;
+        }
+
+        case MMUEXT_MARK_SUPER:
+        case MMUEXT_UNMARK_SUPER:
+        {
+            unsigned long mfn = op.arg1.mfn;
+
+            if ( !opt_allow_superpage )
+                rc = -EOPNOTSUPP;
+            else if ( unlikely(d != pg_owner) )
+                rc = -EPERM;
+            else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Unaligned superpage mfn %" PRI_mfn "\n", mfn);
+                rc = -EINVAL;
+            }
+            else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) )
+                rc = -EINVAL;
+            else if ( op.cmd == MMUEXT_MARK_SUPER )
+                rc = mark_superpage(mfn_to_spage(mfn), d);
+            else
+                rc = unmark_superpage(mfn_to_spage(mfn));
+            break;
+        }
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+ done:
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(uops, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmuext_op, "hihi",
+            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmuext_op, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    perfc_add(num_mmuext_ops, i);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+long do_mmu_update(
+    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+    unsigned int count,
+    XEN_GUEST_HANDLE_PARAM(uint) pdone,
+    unsigned int foreigndom)
+{
+    struct mmu_update req;
+    void *va;
+    unsigned long gpfn, gmfn, mfn;
+    struct page_info *page;
+    unsigned int cmd, i = 0, done = 0, pt_dom;
+    struct vcpu *curr = current, *v = curr;
+    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+    struct domain_mmap_cache mapcache;
+    uint32_t xsm_needed = 0;
+    uint32_t xsm_checked = 0;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(ureqs)) )
+    {
+        /* See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below. */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmu_update);
+
+    if ( unlikely(!guest_handle_okay(ureqs, count)) )
+        return -EFAULT;
+
+    if ( (pt_dom = foreigndom >> 16) != 0 )
+    {
+        /* Pagetables belong to a foreign domain (PFD). */
+        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+            return -ESRCH;
+
+        if ( pt_owner == d )
+            rcu_unlock_domain(pt_owner);
+        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+    }
+
+    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
+    {
+        rc = -ESRCH;
+        goto out;
+    }
+
+    domain_mmap_cache_init(&mapcache);
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+        switch ( cmd )
+        {
+            /*
+             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
+             */
+        case MMU_NORMAL_PT_UPDATE:
+        case MMU_PT_UPDATE_PRESERVE_AD:
+        {
+            p2m_type_t p2mt;
+
+            rc = -EOPNOTSUPP;
+            if ( unlikely(paging_mode_refcounts(pt_owner)) )
+                break;
+
+            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+            {
+                xsm_needed |= XSM_MMU_UPDATE_READ;
+                if ( get_pte_flags(req.val) & _PAGE_RW )
+                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
+            }
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+            rc = -EINVAL;
+
+            req.ptr -= cmd;
+            gmfn = req.ptr >> PAGE_SHIFT;
+            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+            if ( p2m_is_paged(p2mt) )
+            {
+                ASSERT(!page);
+                p2m_mem_paging_populate(pg_owner, gmfn);
+                rc = -ENOENT;
+                break;
+            }
+
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for normal update\n");
+                break;
+            }
+
+            mfn = page_to_mfn(page);
+            va = map_domain_page_with_cache(mfn, &mapcache);
+            va = (void *)((unsigned long)va +
+                          (unsigned long)(req.ptr & ~PAGE_MASK));
+
+            if ( page_lock(page) )
+            {
+                switch ( page->u.inuse.type_info & PGT_type_mask )
+                {
+                case PGT_l1_page_table:
+                {
+                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
+                    p2m_type_t l1e_p2mt = p2m_ram_rw;
+                    struct page_info *target = NULL;
+                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+                                        P2M_UNSHARE : P2M_ALLOC;
+
+                    if ( paging_mode_translate(pg_owner) )
+                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+                                                   &l1e_p2mt, q);
+
+                    if ( p2m_is_paged(l1e_p2mt) )
+                    {
+                        if ( target )
+                            put_page(target);
+                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+                        rc = -ENOENT;
+                        break;
+                    }
+                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+                    {
+                        rc = -ENOENT;
+                        break;
+                    }
+                    /* If we tried to unshare and failed */
+                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+                    {
+                        /* We could not have obtained a page ref. */
+                        ASSERT(target == NULL);
+                        /* And mem_sharing_notify has already been called. */
+                        rc = -ENOMEM;
+                        break;
+                    }
+
+                    rc = mod_l1_entry(va, l1e, mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+                                      pg_owner);
+                    if ( target )
+                        put_page(target);
+                }
+                break;
+                case PGT_l2_page_table:
+                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l3_page_table:
+                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l4_page_table:
+                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                break;
+                case PGT_writable_page:
+                    perfc_incr(writable_mmu_updates);
+                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                        rc = 0;
+                    break;
+                }
+                page_unlock(page);
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+            }
+            else if ( get_page_type(page, PGT_writable_page) )
+            {
+                perfc_incr(writable_mmu_updates);
+                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                    rc = 0;
+                put_page_type(page);
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+            put_page(page);
+        }
+        break;
+
+        case MMU_MACHPHYS_UPDATE:
+            if ( unlikely(d != pt_owner) )
+            {
+                rc = -EPERM;
+                break;
+            }
+
+            if ( unlikely(paging_mode_translate(pg_owner)) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            mfn = req.ptr >> PAGE_SHIFT;
+            gpfn = req.val;
+
+            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+
+            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for mach->phys update\n");
+                rc = -EINVAL;
+                break;
+            }
+
+            set_gpfn_from_mfn(mfn, gpfn);
+
+            paging_mark_dirty(pg_owner, _mfn(mfn));
+
+            put_page(mfn_to_page(mfn));
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(ureqs, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmu_update, "hihi",
+            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmu_update, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    domain_mmap_cache_destroy(&mapcache);
+
+    perfc_add(num_page_updates, i);
+
+ out:
+    if ( pt_owner != d )
+        rcu_unlock_domain(pt_owner);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+
+static int create_grant_pte_mapping(
+    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+    struct domain *d = v->domain;
+
+    adjust_guest_l1e(nl1e, d);
+
+    gmfn = pte_addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+    if ( !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+
+    return rc;
+}
+
+static int destroy_grant_pte_mapping(
+    uint64_t addr, unsigned long frame, struct domain *d)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+
+    gmfn = addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n",
+                 l1e_get_intpte(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY
+                  (l1,
+                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
+                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                   0)) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+    return rc;
+}
+
+
+static int create_grant_va_mapping(
+    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    struct domain *d = v->domain;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int okay;
+
+    adjust_guest_l1e(nl1e, d);
+
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    if ( okay && !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+    return okay ? GNTST_okay : GNTST_general_error;
+}
+
+static int replace_grant_va_mapping(
+    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc = 0;
+
+    pl1e = guest_map_l1e(addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        rc = GNTST_general_error;
+        goto out;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        rc = GNTST_general_error;
+        put_page(l1pg);
+        goto out;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    ol1e = *pl1e;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
+                 l1e_get_pfn(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+ unlock_and_out:
+    page_unlock(l1pg);
+    put_page(l1pg);
+ out:
+    guest_unmap_l1e(pl1e);
+    return rc;
+}
+
+static int destroy_grant_va_mapping(
+    unsigned long addr, unsigned long frame, struct vcpu *v)
+{
+    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			    unsigned int flags, unsigned int cache_flags)
+{
+    l1_pgentry_t pte;
+    uint32_t grant_pte_flags;
+
+    grant_pte_flags =
+        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
+    if ( cpu_has_nx )
+        grant_pte_flags |= _PAGE_NX_BIT;
+
+    pte = l1e_from_pfn(frame, grant_pte_flags);
+    if ( (flags & GNTMAP_application_map) )
+        l1e_add_flags(pte,_PAGE_USER);
+    if ( !(flags & GNTMAP_readonly) )
+        l1e_add_flags(pte,_PAGE_RW);
+
+    l1e_add_flags(pte,
+                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
+                   & _PAGE_AVAIL);
+
+    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
+
+    if ( flags & GNTMAP_contains_pte )
+        return create_grant_pte_mapping(addr, pte, current);
+    return create_grant_va_mapping(addr, pte, current);
+}
+
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			     uint64_t new_addr, unsigned int flags)
+{
+    struct vcpu *curr = current;
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc;
+
+    if ( flags & GNTMAP_contains_pte )
+    {
+        if ( !new_addr )
+            return destroy_grant_pte_mapping(addr, frame, curr->domain);
+
+        return GNTST_general_error;
+    }
+
+    if ( !new_addr )
+        return destroy_grant_va_mapping(addr, frame, curr);
+
+    pl1e = guest_map_l1e(new_addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
+    if ( rc && !paging_mode_refcounts(curr->domain) )
+        put_page_from_l1e(ol1e, curr->domain);
+
+    return rc;
+}
+
+static int __do_update_va_mapping(
+    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
+{
+    l1_pgentry_t   val = l1e_from_intpte(val64);
+    struct vcpu   *v   = current;
+    struct domain *d   = v->domain;
+    struct page_info *gl1pg;
+    l1_pgentry_t  *pl1e;
+    unsigned long  bmap_ptr, gl1mfn;
+    cpumask_t     *mask = NULL;
+    int            rc;
+
+    perfc_incr(calls_to_update_va);
+
+    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
+    if ( rc )
+        return rc;
+
+    rc = -EINVAL;
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
+        goto out;
+
+    gl1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(gl1pg) )
+    {
+        put_page(gl1pg);
+        goto out;
+    }
+
+    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(gl1pg);
+        put_page(gl1pg);
+        goto out;
+    }
+
+    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
+
+    page_unlock(gl1pg);
+    put_page(gl1pg);
+
+ out:
+    if ( pl1e )
+        guest_unmap_l1e(pl1e);
+
+    switch ( flags & UVMF_FLUSHTYPE_MASK )
+    {
+    case UVMF_TLB_FLUSH:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            flush_tlb_local();
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_mask(mask);
+        break;
+
+    case UVMF_INVLPG:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            paging_invlpg(v, va);
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
+        break;
+    }
+
+    return rc;
+}
+
+long do_update_va_mapping(unsigned long va, u64 val64,
+                          unsigned long flags)
+{
+    return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
+                                      unsigned long flags,
+                                      domid_t domid)
+{
+    struct domain *pg_owner;
+    int rc;
+
+    if ( (pg_owner = get_pg_owner(domid)) == NULL )
+        return -ESRCH;
+
+    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+    put_pg_owner(pg_owner);
+
+    return rc;
+}
+
+
+long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
+                unsigned int entries)
+{
+    int nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    struct vcpu *curr = current;
+    long ret;
+
+    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( copy_from_guest(frames, frame_list, nr_pages) )
+        return -EFAULT;
+
+    domain_lock(curr->domain);
+
+    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(curr->domain);
+
+    return ret;
+}
+
+
+long do_update_descriptor(u64 pa, u64 desc)
+{
+    struct domain *dom = current->domain;
+    unsigned long gmfn = pa >> PAGE_SHIFT;
+    unsigned long mfn;
+    unsigned int  offset;
+    struct desc_struct *gdt_pent, d;
+    struct page_info *page;
+    long ret = -EINVAL;
+
+    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+
+    *(u64 *)&d = desc;
+
+    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
+    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
+         !page ||
+         !check_descriptor(dom, &d) )
+    {
+        if ( page )
+            put_page(page);
+        return -EINVAL;
+    }
+    mfn = page_to_mfn(page);
+
+    /* Check if the given frame is in use in an unsafe context. */
+    switch ( page->u.inuse.type_info & PGT_type_mask )
+    {
+    case PGT_seg_desc_page:
+        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
+            goto out;
+        break;
+    default:
+        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+            goto out;
+        break;
+    }
+
+    paging_mark_dirty(dom, _mfn(mfn));
+
+    /* All is good so make the update. */
+    gdt_pent = map_domain_page(_mfn(mfn));
+    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
+    unmap_domain_page(gdt_pent);
+
+    put_page_type(page);
+
+    ret = 0; /* success */
+
+ out:
+    put_page(page);
+
+    return ret;
+}
+
+
+/*************************
+ * Descriptor Tables
+ */
+
+void destroy_gdt(struct vcpu *v)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
+
+    v->arch.pv_vcpu.gdt_ents = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
+    {
+        pfn = l1e_get_pfn(pl1e[i]);
+        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
+            put_page_and_type(mfn_to_page(pfn));
+        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
+        v->arch.pv_vcpu.gdt_frames[i] = 0;
+    }
+}
+
+
+long set_gdt(struct vcpu *v,
+             unsigned long *frames,
+             unsigned int entries)
+{
+    struct domain *d = v->domain;
+    l1_pgentry_t *pl1e;
+    /* NB. There are 512 8-byte entries per GDT page. */
+    unsigned int i, nr_pages = (entries + 511) / 512;
+
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    /* Check the pages in the new GDT. */
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        struct page_info *page;
+
+        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
+        if ( !page )
+            goto fail;
+        if ( !get_page_type(page, PGT_seg_desc_page) )
+        {
+            put_page(page);
+            goto fail;
+        }
+        frames[i] = page_to_mfn(page);
+    }
+
+    /* Tear down the old GDT. */
+    destroy_gdt(v);
+
+    /* Install the new GDT. */
+    v->arch.pv_vcpu.gdt_ents = entries;
+    pl1e = gdt_ldt_ptes(d, v);
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
+        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
+    }
+
+    return 0;
+
+ fail:
+    while ( i-- > 0 )
+    {
+        put_page_and_type(mfn_to_page(frames[i]));
+    }
+    return -EINVAL;
+}
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+static int ptwr_emulated_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(
+    unsigned long addr,
+    paddr_t old,
+    paddr_t val,
+    unsigned int bytes,
+    unsigned int do_cmpxchg,
+    struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                 ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t)-1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes*8)) - 1);
+        val <<= (offset)*8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes*8)) - 1);
+        old <<= (offset)*8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(mfn);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        int okay;
+        intpte_t t = old;
+        ol1e = l1e_from_intpte(old);
+
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(
+        offset, 0, val, bytes, 0,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
+    {
+        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(
+        offset, old, new, bytes, 1,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
+                                              : X86EMUL_UNHANDLEABLE;
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = ptwr_emulated_read,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+                       struct cpu_user_regs *regs)
+{
+    struct domain *d = v->domain;
+    struct page_info *page;
+    l1_pgentry_t      pte;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .vendor = d->arch.cpuid->x86_vendor,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .swint_emulate = x86_swint_emulate_none,
+        },
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
+        goto bail;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        goto bail;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        goto bail;
+    }
+
+    ptwr_ctxt.cr2 = addr;
+    ptwr_ctxt.pte = pte;
+
+    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which are marked
+         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
+         * update has succeeded on a different vcpu).  Anything else is an
+         * emulation bug, or a guest playing with the instruction stream under
+         * Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+ bail:
+    return 0;
+}
+
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+int mmio_ro_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
+
+    /* Only allow naturally-aligned stores at the original %cr2 address. */
+    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
+         offset != mmio_ro_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ro_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmio_ro_emulated_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+int mmcfg_intercept_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+    /*
+     * Only allow naturally-aligned stores no wider than 4 bytes to the
+     * original %cr2 address.
+     */
+    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+         offset != mmio_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    offset &= 0xfff;
+    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+                                  offset, bytes, p_data) >= 0 )
+        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
+                        *(uint32_t *)p_data);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmcfg_intercept_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+                          struct cpu_user_regs *regs)
+{
+    l1_pgentry_t pte;
+    unsigned long mfn;
+    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+    struct x86_emulate_ctxt ctxt = {
+        .regs = regs,
+        .vendor = v->domain->arch.cpuid->x86_vendor,
+        .addr_size = addr_size,
+        .sp_size = addr_size,
+        .swint_emulate = x86_swint_emulate_none,
+        .data = &mmio_ro_ctxt
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of MMIO pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
+        return 0;
+
+    mfn = l1e_get_pfn(pte);
+    if ( mfn_valid(_mfn(mfn)) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return 0;
+    }
+
+    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        return 0;
+
+    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
+    else
+        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page or a successful
+         * concurrent pagetable update).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index e1b3391efc..9580dc32dc 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -17,6 +17,10 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
 			      unsigned int flags, unsigned int cache_flags);
 int replace_grant_host_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags);
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			    unsigned int flags, unsigned int cache_flags);
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			     uint64_t new_addr, unsigned int flags);
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8e55593154..8e2bf91070 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -319,6 +319,8 @@ static inline void *__page_to_virt(const struct page_info *pg)
                     (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg))));
 }
 
+int alloc_page_type(struct page_info *page, unsigned long type,
+                    int preemptible);
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
@@ -364,6 +366,13 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible);
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d);
+void get_page_light(struct page_info *page);
 
 static inline void put_page_and_type(struct page_info *page)
 {
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index c92cba41a0..8929a7e01c 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -161,6 +161,7 @@ int map_pages_to_xen(
 /* Alter the permissions of a range of Xen virtual address space. */
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags);
 int destroy_xen_mappings(unsigned long v, unsigned long e);
+int update_xen_mappings(unsigned long mfn, unsigned int cacheattr);
 /*
  * Create only non-leaf page table entries for the
  * page range in Xen virtual address space.
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c
  2017-04-03 11:22 ` [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c Wei Liu
@ 2017-04-19 15:31   ` Jan Beulich
  2017-04-21 10:34     ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:31 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> -static s8 __read_mostly opt_mmio_relax;
> +s8 __read_mostly opt_mmio_relax;
>  static void __init parse_mmio_relax(const char *s)

The variable has exactly one use site outside of the parsing function,
which means both variable and parsing function should be moved too.
However, this sole user is get_page_from_l1e(), and that in turn is
also being used from shadow code. It's not immediately clear to me
whether moving it under pv/ (and hence eventually compiling it out
of Xen when PV is de-selected) is the right thing.

> -static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
> +int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
>  {

I don't think this should be made non-static without first correcting its
return type - either int (and returning -E... values) or bool.

> @@ -834,3414 +524,489 @@ static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
>      return err;
>  }
>  
> -#ifndef NDEBUG
> -struct mmio_emul_range_ctxt {
> -    const struct domain *d;
> -    unsigned long mfn;
> -};
> -
> -static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
> +bool_t fill_ro_mpt(unsigned long mfn)
>  {
> -    const struct mmio_emul_range_ctxt *ctxt = arg;
> -
> -    if ( ctxt->mfn > e )
> -        return 0;
> +    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
> +    bool_t ret = 0;

From here on the patch becomes basically unreviewable, because
plain removals (which I'd expect is all that is happening here) aren't
represented as such anymore. Please see whether you can
arrange for the diff to become more legible. If you can't talk git
into doing so, maybe an option would be to split this patch by
temporarily #include-ing pv/mm.c here until the move is complete.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c
  2017-04-19 15:31   ` Jan Beulich
@ 2017-04-21 10:34     ` Wei Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2017-04-21 10:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, WeiLiu, Xen-devel

On Wed, Apr 19, 2017 at 09:31:12AM -0600, Jan Beulich wrote:
> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> > -static s8 __read_mostly opt_mmio_relax;
> > +s8 __read_mostly opt_mmio_relax;
> >  static void __init parse_mmio_relax(const char *s)
> 
> The variable has exactly one use site outside of the parsing function,
> which means both variable and parsing function should be moved too.
> However, this sole user is get_page_from_l1e(), and that in turn is
> also being used from shadow code. It's not immediately clear to me
> whether moving it under pv/ (and hence eventually compiling it out
> of Xen when PV is de-selected) is the right thing.
> 

Yes, you're right. I read the shadow code but couldn't immediately
figure out whether the call site of get_page_from_l1e is PV specific.

I plan to leave get_page_from_l1e in mm.c and make adjustments to the
series where necessary.

> > -static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
> > +int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
> >  {
> 
> I don't think this should be made non-static without first correcting its
> return type - either int (and returning -E... values) or bool.
> 

Another patch it is.

> > @@ -834,3414 +524,489 @@ static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
> >      return err;
> >  }
> >  
> > -#ifndef NDEBUG
> > -struct mmio_emul_range_ctxt {
> > -    const struct domain *d;
> > -    unsigned long mfn;
> > -};
> > -
> > -static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
> > +bool_t fill_ro_mpt(unsigned long mfn)
> >  {
> > -    const struct mmio_emul_range_ctxt *ctxt = arg;
> > -
> > -    if ( ctxt->mfn > e )
> > -        return 0;
> > +    l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
> > +    bool_t ret = 0;
> 
> From here on the patch becomes basically unreviewable, because
> plain removals (which I'd expect is all that is happening here) aren't
> represented as such anymore. Please see whether you can
> arrange for the diff to become more legible. If you can't talk git
> into doing so, maybe an option would be to split this patch by
> temporarily #include-ing pv/mm.c here until the move is complete.
> 

I will see what I can do here. I will probably end up splitting it to
more patches.

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c to pv/mm.c
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
                   ` (4 preceding siblings ...)
  2017-04-03 11:22 ` [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-19 15:33   ` Jan Beulich
  2017-04-03 11:22 ` [PATCH v2 for-next v2 7/8] x86/mm: move check_descriptor " Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
  7 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/mm.c     | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c | 70 ------------------------------------------------
 2 files changed, 68 insertions(+), 70 deletions(-)

diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index b5277b5d28..48d3fe184b 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -4106,6 +4106,74 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
     return 0;
 }
 
+long do_stack_switch(unsigned long ss, unsigned long esp)
+{
+    fixup_guest_stack_selector(current->domain, ss);
+    current->arch.pv_vcpu.kernel_ss = ss;
+    current->arch.pv_vcpu.kernel_sp = esp;
+    return 0;
+}
+
+long do_set_segment_base(unsigned int which, unsigned long base)
+{
+    struct vcpu *v = current;
+    long ret = 0;
+
+    if ( is_pv_32bit_vcpu(v) )
+        return -ENOSYS; /* x86/64 only. */
+
+    switch ( which )
+    {
+    case SEGBASE_FS:
+        if ( is_canonical_address(base) )
+        {
+            wrfsbase(base);
+            v->arch.pv_vcpu.fs_base = base;
+        }
+        else
+            ret = -EINVAL;
+        break;
+
+    case SEGBASE_GS_USER:
+        if ( is_canonical_address(base) )
+        {
+            wrmsrl(MSR_SHADOW_GS_BASE, base);
+            v->arch.pv_vcpu.gs_base_user = base;
+        }
+        else
+            ret = -EINVAL;
+        break;
+
+    case SEGBASE_GS_KERNEL:
+        if ( is_canonical_address(base) )
+        {
+            wrgsbase(base);
+            v->arch.pv_vcpu.gs_base_kernel = base;
+        }
+        else
+            ret = -EINVAL;
+        break;
+
+    case SEGBASE_GS_USER_SEL:
+        __asm__ __volatile__ (
+            "     swapgs              \n"
+            "1:   movl %k0,%%gs       \n"
+            "    "safe_swapgs"        \n"
+            ".section .fixup,\"ax\"   \n"
+            "2:   xorl %k0,%k0        \n"
+            "     jmp  1b             \n"
+            ".previous                \n"
+            _ASM_EXTABLE(1b, 2b)
+            : : "r" (base&0xffff) );
+        break;
+
+    default:
+        ret = -EINVAL;
+        break;
+    }
+
+    return ret;
+}
 
 /*
  * Local variables:
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 34f3250fd7..19a14e8829 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1018,76 +1018,6 @@ long subarch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return rc;
 }
 
-long do_stack_switch(unsigned long ss, unsigned long esp)
-{
-    fixup_guest_stack_selector(current->domain, ss);
-    current->arch.pv_vcpu.kernel_ss = ss;
-    current->arch.pv_vcpu.kernel_sp = esp;
-    return 0;
-}
-
-long do_set_segment_base(unsigned int which, unsigned long base)
-{
-    struct vcpu *v = current;
-    long ret = 0;
-
-    if ( is_pv_32bit_vcpu(v) )
-        return -ENOSYS; /* x86/64 only. */
-
-    switch ( which )
-    {
-    case SEGBASE_FS:
-        if ( is_canonical_address(base) )
-        {
-            wrfsbase(base);
-            v->arch.pv_vcpu.fs_base = base;
-        }
-        else
-            ret = -EINVAL;
-        break;
-
-    case SEGBASE_GS_USER:
-        if ( is_canonical_address(base) )
-        {
-            wrmsrl(MSR_SHADOW_GS_BASE, base);
-            v->arch.pv_vcpu.gs_base_user = base;
-        }
-        else
-            ret = -EINVAL;
-        break;
-
-    case SEGBASE_GS_KERNEL:
-        if ( is_canonical_address(base) )
-        {
-            wrgsbase(base);
-            v->arch.pv_vcpu.gs_base_kernel = base;
-        }
-        else
-            ret = -EINVAL;
-        break;
-
-    case SEGBASE_GS_USER_SEL:
-        __asm__ __volatile__ (
-            "     swapgs              \n"
-            "1:   movl %k0,%%gs       \n"
-            "    "safe_swapgs"        \n"
-            ".section .fixup,\"ax\"   \n"
-            "2:   xorl %k0,%k0        \n"
-            "     jmp  1b             \n"
-            ".previous                \n"
-            _ASM_EXTABLE(1b, 2b)
-            : : "r" (base&0xffff) );
-        break;
-
-    default:
-        ret = -EINVAL;
-        break;
-    }
-
-    return ret;
-}
-
-
 /* Returns TRUE if given descriptor is valid for GDT or LDT. */
 int check_descriptor(const struct domain *dom, struct desc_struct *d)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c to pv/mm.c
  2017-04-03 11:22 ` [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c " Wei Liu
@ 2017-04-19 15:33   ` Jan Beulich
  2017-04-21 10:37     ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:33 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/pv/mm.c
> +++ b/xen/arch/x86/pv/mm.c
> @@ -4106,6 +4106,74 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,

Considering this file is still 4k lines, I'd prefer if stuff not obviously
belonging here could be placed elsewhere.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c to pv/mm.c
  2017-04-19 15:33   ` Jan Beulich
@ 2017-04-21 10:37     ` Wei Liu
  2017-04-21 10:43       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-21 10:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, WeiLiu, Xen-devel

On Wed, Apr 19, 2017 at 09:33:28AM -0600, Jan Beulich wrote:
> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> > --- a/xen/arch/x86/pv/mm.c
> > +++ b/xen/arch/x86/pv/mm.c
> > @@ -4106,6 +4106,74 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
> 
> Considering this file is still 4k lines, I'd prefer if stuff not obviously
> belonging here could be placed elsewhere.
> 

Looking back to previous patch's commit message, I can split this file
into more files: two or three for emulation code, one for grant table
code, one or two for hypercalls.

Would that be ok for you? If yes, do you have preferences for file
names?

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c to pv/mm.c
  2017-04-21 10:37     ` Wei Liu
@ 2017-04-21 10:43       ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-04-21 10:43 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 21.04.17 at 12:37, <wei.liu2@citrix.com> wrote:
> On Wed, Apr 19, 2017 at 09:33:28AM -0600, Jan Beulich wrote:
>> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
>> > --- a/xen/arch/x86/pv/mm.c
>> > +++ b/xen/arch/x86/pv/mm.c
>> > @@ -4106,6 +4106,74 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
>> 
>> Considering this file is still 4k lines, I'd prefer if stuff not obviously
>> belonging here could be placed elsewhere.
> 
> Looking back to previous patch's commit message, I can split this file
> into more files: two or three for emulation code, one for grant table
> code, one or two for hypercalls.
> 
> Would that be ok for you?

Oh, yes, certainly, provided you don't need to make overly many
static functions non-static.

> If yes, do you have preferences for file names?

Well, no, not really (short of saying "use sensible ones"). If there are
counterparts elsewhere in the tree, naming them the same would of
course be nice (like grant_table.c, despite me disliking the underscore).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 7/8] x86/mm: move check_descriptor to pv/mm.c
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
                   ` (5 preceding siblings ...)
  2017-04-03 11:22 ` [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c " Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-03 11:22 ` [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
  7 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

It's only needed there. Also make it static and delete the declaration
from mm.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/mm.c     | 86 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c | 87 ------------------------------------------------
 xen/include/asm-x86/mm.h |  2 --
 3 files changed, 86 insertions(+), 89 deletions(-)

diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 48d3fe184b..96f2431756 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -220,6 +220,92 @@ static void invalidate_shadow_ldt(struct vcpu *v, int flush)
     spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
 }
 
+/* Returns TRUE if given descriptor is valid for GDT or LDT. */
+static int check_descriptor(const struct domain *dom, struct desc_struct *d)
+{
+    u32 a = d->a, b = d->b;
+    u16 cs;
+    unsigned int dpl;
+
+    /* A not-present descriptor will always fault, so is safe. */
+    if ( !(b & _SEGMENT_P) )
+        return 1;
+
+    /* Check and fix up the DPL. */
+    dpl = (b >> 13) & 3;
+    __fixup_guest_selector(dom, dpl);
+    b = (b & ~_SEGMENT_DPL) | (dpl << 13);
+
+    /* All code and data segments are okay. No base/limit checking. */
+    if ( (b & _SEGMENT_S) )
+    {
+        if ( is_pv_32bit_domain(dom) )
+        {
+            unsigned long base, limit;
+
+            if ( b & _SEGMENT_L )
+                goto bad;
+
+            /*
+             * Older PAE Linux guests use segments which are limited to
+             * 0xf6800000. Extend these to allow access to the larger read-only
+             * M2P table available in 32on64 mode.
+             */
+            base = (b & (0xff << 24)) | ((b & 0xff) << 16) | (a >> 16);
+
+            limit = (b & 0xf0000) | (a & 0xffff);
+            limit++; /* We add one because limit is inclusive. */
+
+            if ( (b & _SEGMENT_G) )
+                limit <<= 12;
+
+            if ( (base == 0) && (limit > HYPERVISOR_COMPAT_VIRT_START(dom)) )
+            {
+                a |= 0x0000ffff;
+                b |= 0x000f0000;
+            }
+        }
+
+        goto good;
+    }
+
+    /* Invalid type 0 is harmless. It is used for 2nd half of a call gate. */
+    if ( (b & _SEGMENT_TYPE) == 0x000 )
+        return 1;
+
+    /* Everything but a call gate is discarded here. */
+    if ( (b & _SEGMENT_TYPE) != 0xc00 )
+        goto bad;
+
+    /* Validate the target code selector. */
+    cs = a >> 16;
+    if ( !guest_gate_selector_okay(dom, cs) )
+        goto bad;
+    /*
+     * Force DPL to zero, causing a GP fault with its error code indicating
+     * the gate in use, allowing emulation. This is necessary because with
+     * native guests (kernel in ring 3) call gates cannot be used directly
+     * to transition from user to kernel mode (and whether a gate is used
+     * to enter the kernel can only be determined when the gate is being
+     * used), and with compat guests call gates cannot be used at all as
+     * there are only 64-bit ones.
+     * Store the original DPL in the selector's RPL field.
+     */
+    b &= ~_SEGMENT_DPL;
+    cs = (cs & ~3) | dpl;
+    a = (a & 0xffffU) | (cs << 16);
+
+    /* Reserved bits must be zero. */
+    if ( b & (is_pv_32bit_domain(dom) ? 0xe0 : 0xff) )
+        goto bad;
+
+ good:
+    d->a = a;
+    d->b = b;
+    return 1;
+ bad:
+    return 0;
+}
 
 static int alloc_segdesc_page(struct page_info *page)
 {
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 19a14e8829..0bf85a59ed 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1018,93 +1018,6 @@ long subarch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return rc;
 }
 
-/* Returns TRUE if given descriptor is valid for GDT or LDT. */
-int check_descriptor(const struct domain *dom, struct desc_struct *d)
-{
-    u32 a = d->a, b = d->b;
-    u16 cs;
-    unsigned int dpl;
-
-    /* A not-present descriptor will always fault, so is safe. */
-    if ( !(b & _SEGMENT_P) ) 
-        return 1;
-
-    /* Check and fix up the DPL. */
-    dpl = (b >> 13) & 3;
-    __fixup_guest_selector(dom, dpl);
-    b = (b & ~_SEGMENT_DPL) | (dpl << 13);
-
-    /* All code and data segments are okay. No base/limit checking. */
-    if ( (b & _SEGMENT_S) )
-    {
-        if ( is_pv_32bit_domain(dom) )
-        {
-            unsigned long base, limit;
-
-            if ( b & _SEGMENT_L )
-                goto bad;
-
-            /*
-             * Older PAE Linux guests use segments which are limited to
-             * 0xf6800000. Extend these to allow access to the larger read-only
-             * M2P table available in 32on64 mode.
-             */
-            base = (b & (0xff << 24)) | ((b & 0xff) << 16) | (a >> 16);
-
-            limit = (b & 0xf0000) | (a & 0xffff);
-            limit++; /* We add one because limit is inclusive. */
-
-            if ( (b & _SEGMENT_G) )
-                limit <<= 12;
-
-            if ( (base == 0) && (limit > HYPERVISOR_COMPAT_VIRT_START(dom)) )
-            {
-                a |= 0x0000ffff;
-                b |= 0x000f0000;
-            }
-        }
-
-        goto good;
-    }
-
-    /* Invalid type 0 is harmless. It is used for 2nd half of a call gate. */
-    if ( (b & _SEGMENT_TYPE) == 0x000 )
-        return 1;
-
-    /* Everything but a call gate is discarded here. */
-    if ( (b & _SEGMENT_TYPE) != 0xc00 )
-        goto bad;
-
-    /* Validate the target code selector. */
-    cs = a >> 16;
-    if ( !guest_gate_selector_okay(dom, cs) )
-        goto bad;
-    /*
-     * Force DPL to zero, causing a GP fault with its error code indicating
-     * the gate in use, allowing emulation. This is necessary because with
-     * native guests (kernel in ring 3) call gates cannot be used directly
-     * to transition from user to kernel mode (and whether a gate is used
-     * to enter the kernel can only be determined when the gate is being
-     * used), and with compat guests call gates cannot be used at all as
-     * there are only 64-bit ones.
-     * Store the original DPL in the selector's RPL field.
-     */
-    b &= ~_SEGMENT_DPL;
-    cs = (cs & ~3) | dpl;
-    a = (a & 0xffffU) | (cs << 16);
-
-    /* Reserved bits must be zero. */
-    if ( b & (is_pv_32bit_domain(dom) ? 0xe0 : 0xff) )
-        goto bad;
-        
- good:
-    d->a = a;
-    d->b = b;
-    return 1;
- bad:
-    return 0;
-}
-
 int pagefault_by_memadd(unsigned long addr, struct cpu_user_regs *regs)
 {
     struct domain *d = current->domain;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8e2bf91070..ca70f14500 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -411,8 +411,6 @@ static inline int get_page_and_type(struct page_info *page,
     ASSERT(((_p)->count_info & PGC_count_mask) != 0);          \
     ASSERT(page_get_owner(_p) == (_d))
 
-int check_descriptor(const struct domain *, struct desc_struct *d);
-
 extern bool_t opt_allow_superpage;
 extern paddr_t mem_hotplug;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
                   ` (6 preceding siblings ...)
  2017-04-03 11:22 ` [PATCH v2 for-next v2 7/8] x86/mm: move check_descriptor " Wei Liu
@ 2017-04-03 11:22 ` Wei Liu
  2017-04-19 15:35   ` Jan Beulich
  7 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-03 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

PG_external requires hardware support so the code guarded by that is HVM
only.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/Makefile         |  1 +
 xen/arch/x86/hvm/mm.c             | 85 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm.c                 | 49 ----------------------
 xen/include/asm-x86/grant_table.h |  5 +++
 4 files changed, 91 insertions(+), 49 deletions(-)
 create mode 100644 xen/arch/x86/hvm/mm.c

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index 0a3d0f4f7e..3e8d362c8e 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -13,6 +13,7 @@ obj-y += intercept.o
 obj-y += io.o
 obj-y += ioreq.o
 obj-y += irq.o
+obj-y += mm.o
 obj-y += monitor.o
 obj-y += mtrr.o
 obj-y += nestedhvm.o
diff --git a/xen/arch/x86/hvm/mm.c b/xen/arch/x86/hvm/mm.c
new file mode 100644
index 0000000000..8a6950b551
--- /dev/null
+++ b/xen/arch/x86/hvm/mm.c
@@ -0,0 +1,85 @@
+/******************************************************************************
+ * arch/x86/hvm/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/types.h>
+
+#include <public/grant_table.h>
+
+#include <asm/p2m.h>
+
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags)
+{
+    p2m_type_t p2mt;
+    int rc;
+
+    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
+        return GNTST_general_error;
+
+    if ( flags & GNTMAP_readonly )
+        p2mt = p2m_grant_map_ro;
+    else
+        p2mt = p2m_grant_map_rw;
+    rc = guest_physmap_add_entry(current->domain,
+                                 _gfn(addr >> PAGE_SHIFT),
+                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
+    if ( rc )
+        return GNTST_general_error;
+    else
+        return GNTST_okay;
+}
+
+
+int replace_grant_p2m_mapping( int64_t addr, unsigned long frame,
+                               uint64_t new_addr, unsigned int flags)
+{
+    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
+    p2m_type_t type;
+    mfn_t old_mfn;
+    struct domain *d = current->domain;
+
+    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
+        return GNTST_general_error;
+
+    old_mfn = get_gfn(d, gfn, &type);
+    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
+    {
+        put_gfn(d, gfn);
+        gdprintk(XENLOG_WARNING,
+                 "old mapping invalid (type %d, mfn %" PRI_mfn ", frame %lx)\n",
+                 type, mfn_x(old_mfn), frame);
+        return GNTST_general_error;
+    }
+    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
+
+    put_gfn(d, gfn);
+    return GNTST_okay;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 169ae7e4a1..7ddb271097 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1009,29 +1009,6 @@ int vcpu_destroy_pagetables(struct vcpu *v)
     return rc != -EINTR ? rc : -ERESTART;
 }
 
-static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
-                                    unsigned int flags,
-                                    unsigned int cache_flags)
-{
-    p2m_type_t p2mt;
-    int rc;
-
-    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
-        return GNTST_general_error;
-
-    if ( flags & GNTMAP_readonly )
-        p2mt = p2m_grant_map_ro;
-    else
-        p2mt = p2m_grant_map_rw;
-    rc = guest_physmap_add_entry(current->domain,
-                                 _gfn(addr >> PAGE_SHIFT),
-                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
-    if ( rc )
-        return GNTST_general_error;
-    else
-        return GNTST_okay;
-}
-
 int create_grant_host_mapping(uint64_t addr, unsigned long frame,
                               unsigned int flags, unsigned int cache_flags)
 {
@@ -1041,32 +1018,6 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
     return create_grant_pv_mapping(addr, frame, flags, cache_flags);
 }
 
-static int replace_grant_p2m_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
-{
-    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
-    p2m_type_t type;
-    mfn_t old_mfn;
-    struct domain *d = current->domain;
-
-    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
-        return GNTST_general_error;
-
-    old_mfn = get_gfn(d, gfn, &type);
-    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
-    {
-        put_gfn(d, gfn);
-        gdprintk(XENLOG_WARNING,
-                 "old mapping invalid (type %d, mfn %" PRI_mfn ", frame %lx)\n",
-                 type, mfn_x(old_mfn), frame);
-        return GNTST_general_error;
-    }
-    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
-
-    put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
 int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
                                uint64_t new_addr, unsigned int flags)
 {
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index 9580dc32dc..f4cfcbe58d 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -21,6 +21,11 @@ int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
 			    unsigned int flags, unsigned int cache_flags);
 int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
 			     uint64_t new_addr, unsigned int flags);
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags);
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags);
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-03 11:22 ` [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
@ 2017-04-19 15:35   ` Jan Beulich
  2017-04-21 10:41     ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-04-19 15:35 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> PG_external requires hardware support so the code guarded by that is HVM
> only.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/hvm/Makefile         |  1 +
>  xen/arch/x86/hvm/mm.c             | 85 +++++++++++++++++++++++++++++++++++++++

I think the moved code would better go under x86/mm/, possibly (but
not necessarily) into p2m.c.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-19 15:35   ` Jan Beulich
@ 2017-04-21 10:41     ` Wei Liu
  2017-04-21 10:45       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-21 10:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, WeiLiu, Xen-devel

On Wed, Apr 19, 2017 at 09:35:50AM -0600, Jan Beulich wrote:
> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> > PG_external requires hardware support so the code guarded by that is HVM
> > only.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  xen/arch/x86/hvm/Makefile         |  1 +
> >  xen/arch/x86/hvm/mm.c             | 85 +++++++++++++++++++++++++++++++++++++++
> 
> I think the moved code would better go under x86/mm/, possibly (but
> not necessarily) into p2m.c.
> 

I disagree here. What good is there to keep this code when hvm support
is compiled out?

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-21 10:41     ` Wei Liu
@ 2017-04-21 10:45       ` Jan Beulich
  2017-04-21 10:52         ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-04-21 10:45 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 21.04.17 at 12:41, <wei.liu2@citrix.com> wrote:
> On Wed, Apr 19, 2017 at 09:35:50AM -0600, Jan Beulich wrote:
>> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
>> > PG_external requires hardware support so the code guarded by that is HVM
>> > only.
>> > 
>> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> > ---
>> >  xen/arch/x86/hvm/Makefile         |  1 +
>> >  xen/arch/x86/hvm/mm.c             | 85 
> +++++++++++++++++++++++++++++++++++++++
>> 
>> I think the moved code would better go under x86/mm/, possibly (but
>> not necessarily) into p2m.c.
> 
> I disagree here. What good is there to keep this code when hvm support
> is compiled out?

Hmm, do you mean to move e.g. mm/p2m-*.c and mm/hap/ to
hvm/mm/ then too? I think we shouldn't go overboard with this
shuffling.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-21 10:45       ` Jan Beulich
@ 2017-04-21 10:52         ` Wei Liu
  2017-04-21 11:30           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2017-04-21 10:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, WeiLiu, Xen-devel

On Fri, Apr 21, 2017 at 04:45:35AM -0600, Jan Beulich wrote:
> >>> On 21.04.17 at 12:41, <wei.liu2@citrix.com> wrote:
> > On Wed, Apr 19, 2017 at 09:35:50AM -0600, Jan Beulich wrote:
> >> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
> >> > PG_external requires hardware support so the code guarded by that is HVM
> >> > only.
> >> > 
> >> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >> > ---
> >> >  xen/arch/x86/hvm/Makefile         |  1 +
> >> >  xen/arch/x86/hvm/mm.c             | 85 
> > +++++++++++++++++++++++++++++++++++++++
> >> 
> >> I think the moved code would better go under x86/mm/, possibly (but
> >> not necessarily) into p2m.c.
> > 
> > I disagree here. What good is there to keep this code when hvm support
> > is compiled out?
> 
> Hmm, do you mean to move e.g. mm/p2m-*.c and mm/hap/ to
> hvm/mm/ then too? I think we shouldn't go overboard with this
> shuffling.
> 

Maybe, I haven't quite made up my mind.

I take it you will be against splitting / moving p2m code?

It has lower priority on my list because p2m isn't really guest facing,
while here the grant table code is (Obviously the boundary is a bit
blurred and subject to personal taste).

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c
  2017-04-21 10:52         ` Wei Liu
@ 2017-04-21 11:30           ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-04-21 11:30 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 21.04.17 at 12:52, <wei.liu2@citrix.com> wrote:
> On Fri, Apr 21, 2017 at 04:45:35AM -0600, Jan Beulich wrote:
>> >>> On 21.04.17 at 12:41, <wei.liu2@citrix.com> wrote:
>> > On Wed, Apr 19, 2017 at 09:35:50AM -0600, Jan Beulich wrote:
>> >> >>> On 03.04.17 at 13:22, <wei.liu2@citrix.com> wrote:
>> >> > PG_external requires hardware support so the code guarded by that is HVM
>> >> > only.
>> >> > 
>> >> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> >> > ---
>> >> >  xen/arch/x86/hvm/Makefile         |  1 +
>> >> >  xen/arch/x86/hvm/mm.c             | 85 
>> > +++++++++++++++++++++++++++++++++++++++
>> >> 
>> >> I think the moved code would better go under x86/mm/, possibly (but
>> >> not necessarily) into p2m.c.
>> > 
>> > I disagree here. What good is there to keep this code when hvm support
>> > is compiled out?
>> 
>> Hmm, do you mean to move e.g. mm/p2m-*.c and mm/hap/ to
>> hvm/mm/ then too? I think we shouldn't go overboard with this
>> shuffling.
>> 
> 
> Maybe, I haven't quite made up my mind.
> 
> I take it you will be against splitting / moving p2m code?

I think so, yes. The huge files you're breaking apart were ripe
for that even without the intention of separating PV from HVM.
The p2m code, otoh, had been split into pieces already quite
some time ago.

Btw, looking at the subject again - moving the code into
hvm/grant_table.c would be fine with me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-04-21 11:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-03 11:22 [PATCH v2 for-next v2 0/8] Refactor x86 mm.c Wei Liu
2017-04-03 11:22 ` [PATCH v2 for-next v2 1/8] mm: provide more grep fodders Wei Liu
2017-04-03 11:26   ` George Dunlap
2017-04-19 15:08   ` Jan Beulich
2017-04-03 11:22 ` [PATCH v2 for-next v2 2/8] x86/mm: carve out create_grant_pv_mapping Wei Liu
2017-04-19 15:12   ` Jan Beulich
2017-04-21 10:13     ` Wei Liu
2017-04-21 10:36       ` Jan Beulich
2017-04-03 11:22 ` [PATCH v2 for-next v2 3/8] x86/mm: carve out replace_grant_pv_mapping Wei Liu
2017-04-03 11:22 ` [PATCH v2 for-next v2 4/8] x86/mm: extract PAGE_CACHE_ATTRS to mm.h Wei Liu
2017-04-19 15:15   ` Jan Beulich
2017-04-03 11:22 ` [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c Wei Liu
2017-04-19 15:31   ` Jan Beulich
2017-04-21 10:34     ` Wei Liu
2017-04-03 11:22 ` [PATCH v2 for-next v2 6/8] x86/mm: move two PV hypercalls from x86_64/mm.c " Wei Liu
2017-04-19 15:33   ` Jan Beulich
2017-04-21 10:37     ` Wei Liu
2017-04-21 10:43       ` Jan Beulich
2017-04-03 11:22 ` [PATCH v2 for-next v2 7/8] x86/mm: move check_descriptor " Wei Liu
2017-04-03 11:22 ` [PATCH v2 for-next v2 8/8] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
2017-04-19 15:35   ` Jan Beulich
2017-04-21 10:41     ` Wei Liu
2017-04-21 10:45       ` Jan Beulich
2017-04-21 10:52         ` Wei Liu
2017-04-21 11:30           ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).