[PATCH 1/2] PV hugepages

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] PV hugepages - Xen patch
@ 2008-10-02 23:26 Dave McCracken
  2008-10-03  8:58 ` Keir Fraser
  0 siblings, 1 reply; 12+ messages in thread
From: Dave McCracken @ 2008-10-02 23:26 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]


This patch enables support of hugepages in a pv Xen environment.  It is 
against the latest xen unstable tree on http://xenbits.xensource.com.

The patch assumes the guest is passing a physically aligned hugepage.  It does 
reference counting on all the underlying pages.

Dave McCracken
Oracle Corp.

[-- Attachment #2: xen-hpage-04.patch --]
[-- Type: text/x-diff, Size: 3144 bytes --]

--- xen-unstable//./xen/include/asm-x86/x86_32/page.h	2008-07-17 09:49:27.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_32/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ extern unsigned int PAGE_HYPERVISOR_NOCA
  * Disallow unused flag bits plus PAT/PSE, PCD, PWT and GLOBAL.
  * Permit the NX bit if the hardware supports it.
  */
-#define BASE_DISALLOW_MASK (0xFFFFF198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFFFFF118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/include/asm-x86/x86_64/page.h	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_64/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ typedef l4_pgentry_t root_pgentry_t;
  * Permit the NX bit if the hardware supports it.
  * Note that range [62:52] is available for software use on x86/64.
  */
-#define BASE_DISALLOW_MASK (0xFF800198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFF800118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/arch/x86/mm.c	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/arch/x86/mm.c	2008-10-02 16:00:47.000000000 -0500
@@ -759,11 +759,29 @@ get_page_from_l2e(
         MEM_LOG("Bad L2 flags %x", l2e_get_flags(l2e) & L2_DISALLOW_MASK);
         return -EINVAL;
     }
+    if ( l2e_get_flags(l2e) & _PAGE_PSE ) {
+        unsigned long mfn = l2e_get_pfn(l2e);
+        unsigned long m, me;
+        struct page_info *page = mfn_to_page(mfn);
 
-    rc = get_page_and_type_from_pagenr(
-        l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
-    if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-        rc = 0;
+        rc = get_page(page, d);
+        if (unlikely(!rc)) {
+            return rc;
+        }
+
+        for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+            get_page_from_pagenr(m, d);
+        }
+#ifdef __x86_64__
+        map_pages_to_xen((unsigned long)mfn_to_virt(mfn), mfn, L1_PAGETABLE_ENTRIES,
+                         PAGE_HYPERVISOR | l2e_get_flags(l2e));
+#endif
+    } else {
+        rc = get_page_and_type_from_pagenr(
+            l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+    }
 
     return rc;
 }
@@ -955,7 +973,18 @@ static int put_page_from_l2e(l2_pgentry_
     if ( (l2e_get_flags(l2e) & _PAGE_PRESENT) && 
          (l2e_get_pfn(l2e) != pfn) )
     {
-        put_page_and_type(l2e_get_page(l2e));
+        if (l2e_get_flags(l2e) & _PAGE_PSE) {
+            unsigned long mfn = l2e_get_pfn(l2e);
+            unsigned long m, me;
+            struct page_info *page = mfn_to_page(mfn);
+
+            for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+                put_page(mfn_to_page(m));
+            }
+            put_page(page);
+        } else {
+            put_page_and_type(l2e_get_page(l2e));
+        }
         return 0;
     }
     return 1;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-02 23:26 [PATCH 1/2] PV hugepages - Xen patch Dave McCracken
@ 2008-10-03  8:58 ` Keir Fraser
  2008-10-08 17:05   ` Dave McCracken
  0 siblings, 1 reply; 12+ messages in thread
From: Keir Fraser @ 2008-10-03  8:58 UTC (permalink / raw)
  To: Dave McCracken, xen-devel

Some issues:
 * You need to check return value of get_page_from_pagenr() on every page of
the superpage. Any one of them can fail, causing you to undo your work so
far and then fail.
 * You need to get_page_type(PGT_writable) on every page if the superpage
mapping asserts _PAGE_RW. Otherwise the guest is getting write access
without that being asserted in the reference counts.
 * Look at get_page_from_l1e() for an example of how this is done for a
single page. You need to do similar work for every page of the super-page.
 * This surely breaks save/restore, since the restore code is not
superpage-aware.

 -- Keir

On 3/10/08 00:26, "Dave McCracken" <dcm@mccr.org> wrote:

> 
> This patch enables support of hugepages in a pv Xen environment.  It is
> against the latest xen unstable tree on http://xenbits.xensource.com.
> 
> The patch assumes the guest is passing a physically aligned hugepage.  It does
> reference counting on all the underlying pages.
> 
> Dave McCracken
> Oracle Corp.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-03  8:58 ` Keir Fraser
@ 2008-10-08 17:05   ` Dave McCracken
  2008-10-08 18:11     ` Keir Fraser
  0 siblings, 1 reply; 12+ messages in thread
From: Dave McCracken @ 2008-10-08 17:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On Friday 03 October 2008, Keir Fraser wrote:
> Some issues:
>  * You need to check return value of get_page_from_pagenr() on every page
> of the superpage. Any one of them can fail, causing you to undo your work
> so far and then fail.
>  * You need to get_page_type(PGT_writable) on every page if the superpage
> mapping asserts _PAGE_RW. Otherwise the guest is getting write access
> without that being asserted in the reference counts.
>  * Look at get_page_from_l1e() for an example of how this is done for a
> single page. You need to do similar work for every page of the super-page.

Ok, here's a version of the patch with all these issues addressed.

>  * This surely breaks save/restore, since the restore code is not
> superpage-aware.

I don't have this one solved yet.  I'm working on it.

Dave McCracken


[-- Attachment #2: xen-hpage-05.patch --]
[-- Type: text/x-diff, Size: 5147 bytes --]

--- xen-unstable//./xen/include/asm-x86/x86_32/page.h	2008-07-17 09:49:27.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_32/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ extern unsigned int PAGE_HYPERVISOR_NOCA
  * Disallow unused flag bits plus PAT/PSE, PCD, PWT and GLOBAL.
  * Permit the NX bit if the hardware supports it.
  */
-#define BASE_DISALLOW_MASK (0xFFFFF198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFFFFF118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/include/asm-x86/x86_64/page.h	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_64/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ typedef l4_pgentry_t root_pgentry_t;
  * Permit the NX bit if the hardware supports it.
  * Note that range [62:52] is available for software use on x86/64.
  */
-#define BASE_DISALLOW_MASK (0xFF800198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFF800118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/arch/x86/mm.c	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/arch/x86/mm.c	2008-10-08 11:35:44.000000000 -0500
@@ -584,6 +584,28 @@ static int get_page_and_type_from_pagenr
     return rc;
 }
 
+static int
+get_data_page(struct page_info *page, struct domain *d, int writeable)
+{
+    int rc;
+
+    if (writeable)
+        rc = get_page_and_type(page, d, PGT_writable_page);
+    else
+        rc = get_page(page, d);
+
+    return rc;
+}
+
+static void
+put_data_page(struct page_info *page, int writeable)
+{
+    if (writeable)
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
 /*
  * We allow root tables to map each other (a.k.a. linear page tables). It
  * needs some special care with reference counts and access permissions:
@@ -656,6 +678,7 @@ get_page_from_l1e(
     struct vcpu *curr = current;
     struct domain *owner;
     int okay;
+    int writeable;
 
     if ( !(l1f & _PAGE_PRESENT) )
         return 1;
@@ -698,10 +721,9 @@ get_page_from_l1e(
      * contribute to writeable mapping refcounts.  (This allows the
      * qemu-dm helper process in dom0 to map the domain's memory without
      * messing up the count of "real" writable mappings.) */
-    okay = (((l1f & _PAGE_RW) && 
-             !(unlikely(paging_mode_external(d) && (d != curr->domain))))
-            ? get_page_and_type(page, d, PGT_writable_page)
-            : get_page(page, d));
+    writeable = (l1f & _PAGE_RW) &&
+        !(unlikely(paging_mode_external(d) && (d != curr->domain)));
+    okay = get_data_page(page, d, writeable);
     if ( !okay )
     {
         MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
@@ -759,11 +781,39 @@ get_page_from_l2e(
         MEM_LOG("Bad L2 flags %x", l2e_get_flags(l2e) & L2_DISALLOW_MASK);
         return -EINVAL;
     }
+    if ( l2e_get_flags(l2e) & _PAGE_PSE ) {
+        unsigned long mfn = l2e_get_pfn(l2e);
+        unsigned long m, me;
+        struct page_info *page = mfn_to_page(mfn);
+        int writeable;
 
-    rc = get_page_and_type_from_pagenr(
-        l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
-    if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-        rc = 0;
+        writeable = l2e_get_flags(l2e) & _PAGE_RW;
+
+        rc = get_data_page(page, d, writeable);
+        if (unlikely(!rc)) {
+            return rc;
+        }
+
+        for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+            rc = get_data_page(mfn_to_page(m), d, writeable);
+            if (unlikely(!rc)) {
+                for (--m; m > mfn; --m) {
+                    put_data_page(mfn_to_page(m), writeable);
+                }
+                put_data_page(page, writeable);
+                return 0;
+            }
+        }
+#ifdef __x86_64__
+        map_pages_to_xen((unsigned long)mfn_to_virt(mfn), mfn, L1_PAGETABLE_ENTRIES,
+                         PAGE_HYPERVISOR | l2e_get_flags(l2e));
+#endif
+    } else {
+        rc = get_page_and_type_from_pagenr(
+            l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+    }
 
     return rc;
 }
@@ -955,7 +1005,19 @@ static int put_page_from_l2e(l2_pgentry_
     if ( (l2e_get_flags(l2e) & _PAGE_PRESENT) && 
          (l2e_get_pfn(l2e) != pfn) )
     {
-        put_page_and_type(l2e_get_page(l2e));
+        if (l2e_get_flags(l2e) & _PAGE_PSE) {
+            unsigned long mfn = l2e_get_pfn(l2e);
+            unsigned long m, me;
+            struct page_info *page = mfn_to_page(mfn);
+            int writeable = l2e_get_flags(l2e) & _PAGE_RW;
+
+            for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+                put_data_page(mfn_to_page(m), writeable);
+            }
+            put_data_page(page, writeable);
+        } else {
+            put_page_and_type(l2e_get_page(l2e));
+        }
         return 0;
     }
     return 1;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 17:05   ` Dave McCracken
@ 2008-10-08 18:11     ` Keir Fraser
  2008-10-08 18:28       ` Dave McCracken
  2008-10-08 22:50       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 12+ messages in thread
From: Keir Fraser @ 2008-10-08 18:11 UTC (permalink / raw)
  To: Dave McCracken, xen-devel; +Cc: Ian Pratt

On 8/10/08 18:05, "Dave McCracken" <dcm@mccr.org> wrote:

> On Friday 03 October 2008, Keir Fraser wrote:
>> Some issues:
>>  * You need to check return value of get_page_from_pagenr() on every page
>> of the superpage. Any one of them can fail, causing you to undo your work
>> so far and then fail.
>>  * You need to get_page_type(PGT_writable) on every page if the superpage
>> mapping asserts _PAGE_RW. Otherwise the guest is getting write access
>> without that being asserted in the reference counts.
>>  * Look at get_page_from_l1e() for an example of how this is done for a
>> single page. You need to do similar work for every page of the super-page.
> 
> Ok, here's a version of the patch with all these issues addressed.
> 
>>  * This surely breaks save/restore, since the restore code is not
>> superpage-aware.
> 
> I don't have this one solved yet.  I'm working on it.

Actually this is an interesting one. For a PV guest it may be in general
unsolvable, since the target machine may not have allocatable 2MB extents.
It may also screw live migration since 2MB is a very coarse granularity to
do dirty-page tracking. One option: perhaps the PV kernel could shatter and
then reconstruct (as best it can) superpage mappings across save/restore?
I'm actually not sure what's for the best here. Perhaps just make 2MB
mappings and save/restore mutually exclusive for now?

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 18:11     ` Keir Fraser
@ 2008-10-08 18:28       ` Dave McCracken
  2008-10-08 18:50         ` Keir Fraser
  2008-10-08 22:50       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 12+ messages in thread
From: Dave McCracken @ 2008-10-08 18:28 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel

On Wednesday 08 October 2008, Keir Fraser wrote:
> On 8/10/08 18:05, "Dave McCracken" <dcm@mccr.org> wrote:
> > On Friday 03 October 2008, Keir Fraser wrote:
> >>  * This surely breaks save/restore, since the restore code is not
> >> superpage-aware.
> >
> > I don't have this one solved yet.  I'm working on it.
>
> Actually this is an interesting one. For a PV guest it may be in general
> unsolvable, since the target machine may not have allocatable 2MB extents.
> It may also screw live migration since 2MB is a very coarse granularity to
> do dirty-page tracking. One option: perhaps the PV kernel could shatter and
> then reconstruct (as best it can) superpage mappings across save/restore?
> I'm actually not sure what's for the best here. Perhaps just make 2MB
> mappings and save/restore mutually exclusive for now?

Yeah, that's what I'm finding.  I think it's a good idea to document for now 
that hugepages don't work with save/restore.  I'll continue to dig into it 
and try to figure out a scheme to make it work as a future enhancement.

Dave McCracken

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 18:28       ` Dave McCracken
@ 2008-10-08 18:50         ` Keir Fraser
  2008-10-08 22:07           ` Dave McCracken
  0 siblings, 1 reply; 12+ messages in thread
From: Keir Fraser @ 2008-10-08 18:50 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Ian Pratt, xen-devel

On 8/10/08 19:28, "Dave McCracken" <dcm@mccr.org> wrote:

>> Actually this is an interesting one. For a PV guest it may be in general
>> unsolvable, since the target machine may not have allocatable 2MB extents.
>> It may also screw live migration since 2MB is a very coarse granularity to
>> do dirty-page tracking. One option: perhaps the PV kernel could shatter and
>> then reconstruct (as best it can) superpage mappings across save/restore?
>> I'm actually not sure what's for the best here. Perhaps just make 2MB
>> mappings and save/restore mutually exclusive for now?
> 
> Yeah, that's what I'm finding.  I think it's a good idea to document for now
> that hugepages don't work with save/restore.  I'll continue to dig into it
> and try to figure out a scheme to make it work as a future enhancement.

Then PV superpage support must be a configuration option, and disabled by
default.

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 18:50         ` Keir Fraser
@ 2008-10-08 22:07           ` Dave McCracken
  2008-10-09  6:45             ` Keir Fraser
  2008-10-09 10:21             ` Keir Fraser
  0 siblings, 2 replies; 12+ messages in thread
From: Dave McCracken @ 2008-10-08 22:07 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]

On Wednesday 08 October 2008, Keir Fraser wrote:
> Then PV superpage support must be a configuration option, and disabled by
> default.

I added a command line option to enable it.  Is this what you had in mind?

Dave McCracken


[-- Attachment #2: xen-hpage-06.patch --]
[-- Type: text/x-diff, Size: 5613 bytes --]

--- xen-unstable//./xen/include/asm-x86/x86_32/page.h	2008-07-17 09:49:27.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_32/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ extern unsigned int PAGE_HYPERVISOR_NOCA
  * Disallow unused flag bits plus PAT/PSE, PCD, PWT and GLOBAL.
  * Permit the NX bit if the hardware supports it.
  */
-#define BASE_DISALLOW_MASK (0xFFFFF198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFFFFF118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/include/asm-x86/x86_64/page.h	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/include/asm-x86/x86_64/page.h	2008-10-02 15:07:34.000000000 -0500
@@ -112,7 +112,7 @@ typedef l4_pgentry_t root_pgentry_t;
  * Permit the NX bit if the hardware supports it.
  * Note that range [62:52] is available for software use on x86/64.
  */
-#define BASE_DISALLOW_MASK (0xFF800198U & ~_PAGE_NX)
+#define BASE_DISALLOW_MASK (0xFF800118U & ~_PAGE_NX)
 
 #define L1_DISALLOW_MASK (BASE_DISALLOW_MASK | _PAGE_GNTTAB)
 #define L2_DISALLOW_MASK (BASE_DISALLOW_MASK)
--- xen-unstable//./xen/arch/x86/mm.c	2008-10-02 14:23:17.000000000 -0500
+++ xen-hpage/./xen/arch/x86/mm.c	2008-10-08 16:56:46.000000000 -0500
@@ -160,6 +160,9 @@ unsigned long total_pages;
 
 #define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
 
+static int opt_allow_hugepage = 0;
+boolean_param("allowhugepage", opt_allow_hugepage);
+
 #define l1_disallow_mask(d)                                     \
     ((d != dom_io) &&                                           \
      (rangeset_is_empty((d)->iomem_caps) &&                     \
@@ -584,6 +587,28 @@ static int get_page_and_type_from_pagenr
     return rc;
 }
 
+static int
+get_data_page(struct page_info *page, struct domain *d, int writeable)
+{
+    int rc;
+
+    if (writeable)
+        rc = get_page_and_type(page, d, PGT_writable_page);
+    else
+        rc = get_page(page, d);
+
+    return rc;
+}
+
+static void
+put_data_page(struct page_info *page, int writeable)
+{
+    if (writeable)
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
 /*
  * We allow root tables to map each other (a.k.a. linear page tables). It
  * needs some special care with reference counts and access permissions:
@@ -656,6 +681,7 @@ get_page_from_l1e(
     struct vcpu *curr = current;
     struct domain *owner;
     int okay;
+    int writeable;
 
     if ( !(l1f & _PAGE_PRESENT) )
         return 1;
@@ -698,10 +724,9 @@ get_page_from_l1e(
      * contribute to writeable mapping refcounts.  (This allows the
      * qemu-dm helper process in dom0 to map the domain's memory without
      * messing up the count of "real" writable mappings.) */
-    okay = (((l1f & _PAGE_RW) && 
-             !(unlikely(paging_mode_external(d) && (d != curr->domain))))
-            ? get_page_and_type(page, d, PGT_writable_page)
-            : get_page(page, d));
+    writeable = (l1f & _PAGE_RW) &&
+        !(unlikely(paging_mode_external(d) && (d != curr->domain)));
+    okay = get_data_page(page, d, writeable);
     if ( !okay )
     {
         MEM_LOG("Error getting mfn %lx (pfn %lx) from L1 entry %" PRIpte
@@ -759,11 +784,42 @@ get_page_from_l2e(
         MEM_LOG("Bad L2 flags %x", l2e_get_flags(l2e) & L2_DISALLOW_MASK);
         return -EINVAL;
     }
+    if ( l2e_get_flags(l2e) & _PAGE_PSE ) {
+        unsigned long mfn = l2e_get_pfn(l2e);
+        unsigned long m, me;
+        struct page_info *page = mfn_to_page(mfn);
+        int writeable;
 
-    rc = get_page_and_type_from_pagenr(
-        l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
-    if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-        rc = 0;
+        if (!opt_allow_hugepage)
+            return -EINVAL;
+
+        writeable = l2e_get_flags(l2e) & _PAGE_RW;
+
+        rc = get_data_page(page, d, writeable);
+        if (unlikely(!rc)) {
+            return rc;
+        }
+
+        for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+            rc = get_data_page(mfn_to_page(m), d, writeable);
+            if (unlikely(!rc)) {
+                for (--m; m > mfn; --m) {
+                    put_data_page(mfn_to_page(m), writeable);
+                }
+                put_data_page(page, writeable);
+                return 0;
+            }
+        }
+#ifdef __x86_64__
+        map_pages_to_xen((unsigned long)mfn_to_virt(mfn), mfn, L1_PAGETABLE_ENTRIES,
+                         PAGE_HYPERVISOR | l2e_get_flags(l2e));
+#endif
+    } else {
+        rc = get_page_and_type_from_pagenr(
+            l2e_get_pfn(l2e), PGT_l1_page_table, d, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+    }
 
     return rc;
 }
@@ -955,7 +1011,19 @@ static int put_page_from_l2e(l2_pgentry_
     if ( (l2e_get_flags(l2e) & _PAGE_PRESENT) && 
          (l2e_get_pfn(l2e) != pfn) )
     {
-        put_page_and_type(l2e_get_page(l2e));
+        if (l2e_get_flags(l2e) & _PAGE_PSE) {
+            unsigned long mfn = l2e_get_pfn(l2e);
+            unsigned long m, me;
+            struct page_info *page = mfn_to_page(mfn);
+            int writeable = l2e_get_flags(l2e) & _PAGE_RW;
+
+            for (m = mfn+1, me = m + (L1_PAGETABLE_ENTRIES-1); m <= me; m++) {
+                put_data_page(mfn_to_page(m), writeable);
+            }
+            put_data_page(page, writeable);
+        } else {
+            put_page_and_type(l2e_get_page(l2e));
+        }
         return 0;
     }
     return 1;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 18:11     ` Keir Fraser
  2008-10-08 18:28       ` Dave McCracken
@ 2008-10-08 22:50       ` Jeremy Fitzhardinge
  2008-10-09  8:38         ` Daniel P. Berrange
  1 sibling, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-08 22:50 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, Dave McCracken, xen-devel

Keir Fraser wrote:
> Actually this is an interesting one. For a PV guest it may be in general
> unsolvable, since the target machine may not have allocatable 2MB extents.
> It may also screw live migration since 2MB is a very coarse granularity to
> do dirty-page tracking. One option: perhaps the PV kernel could shatter and
> then reconstruct (as best it can) superpage mappings across save/restore?

That means you need to notify the guest when you're starting a live 
migration, rather than just springing it on them at the last moment as 
we do now.

But shattering large pages all over the place is going to be pretty 
expensive, and possibly awkward if it suddenly needs to come up with a 
pile of pages for the new L1 entries.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 22:07           ` Dave McCracken
@ 2008-10-09  6:45             ` Keir Fraser
  2008-10-09 10:21             ` Keir Fraser
  1 sibling, 0 replies; 12+ messages in thread
From: Keir Fraser @ 2008-10-09  6:45 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Ian Pratt, xen-devel

On 8/10/08 23:07, "Dave McCracken" <dcm@mccr.org> wrote:

> On Wednesday 08 October 2008, Keir Fraser wrote:
>> Then PV superpage support must be a configuration option, and disabled by
>> default.
> 
> I added a command line option to enable it.  Is this what you had in mind?

It'll certainly do, albeit rather coarse-grained.

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 22:50       ` Jeremy Fitzhardinge
@ 2008-10-09  8:38         ` Daniel P. Berrange
  2008-10-10  0:05           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrange @ 2008-10-09  8:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian Pratt, Dave McCracken, Keir Fraser

On Wed, Oct 08, 2008 at 03:50:56PM -0700, Jeremy Fitzhardinge wrote:
> Keir Fraser wrote:
> >Actually this is an interesting one. For a PV guest it may be in general
> >unsolvable, since the target machine may not have allocatable 2MB extents.
> >It may also screw live migration since 2MB is a very coarse granularity to
> >do dirty-page tracking. One option: perhaps the PV kernel could shatter and
> >then reconstruct (as best it can) superpage mappings across save/restore?
> 
> That means you need to notify the guest when you're starting a live 
> migration, rather than just springing it on them at the last moment as 
> we do now.
> 
> But shattering large pages all over the place is going to be pretty 
> expensive, and possibly awkward if it suddenly needs to come up with a 
> pile of pages for the new L1 entries.

Or you could just take the view this is a pre-migration capability check,
and that admin (or mgmt app) must ensure sufficient free hugepages on the 
destination before attempting migration. If this isn't satisfied then
XenD can just fail / abort the migration op and leave it running on original
host.

Dainel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-08 22:07           ` Dave McCracken
  2008-10-09  6:45             ` Keir Fraser
@ 2008-10-09 10:21             ` Keir Fraser
  1 sibling, 0 replies; 12+ messages in thread
From: Keir Fraser @ 2008-10-09 10:21 UTC (permalink / raw)
  To: Dave McCracken; +Cc: xen-devel

On 8/10/08 23:07, "Dave McCracken" <dcm@mccr.org> wrote:

> On Wednesday 08 October 2008, Keir Fraser wrote:
>> Then PV superpage support must be a configuration option, and disabled by
>> default.
> 
> I added a command line option to enable it.  Is this what you had in mind?

Please fix the coding style (brace positions; white space around
if/while/for headers; etc) and resubmit this and the Linux patch with a
signed-off-by attribution.

 -- Keir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] PV hugepages - Xen patch
  2008-10-09  8:38         ` Daniel P. Berrange
@ 2008-10-10  0:05           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-10-10  0:05 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: xen-devel, Ian Pratt, Dave McCracken, Keir Fraser

Daniel P. Berrange wrote:
> Or you could just take the view this is a pre-migration capability check,
> and that admin (or mgmt app) must ensure sufficient free hugepages on the 
> destination before attempting migration. If this isn't satisfied then
> XenD can just fail / abort the migration op and leave it running on original
> host.
>   

And do something to prevent new hugepages from being allocated during 
the dirty logging phase?

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-10-10  0:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-02 23:26 [PATCH 1/2] PV hugepages - Xen patch Dave McCracken
2008-10-03  8:58 ` Keir Fraser
2008-10-08 17:05   ` Dave McCracken
2008-10-08 18:11     ` Keir Fraser
2008-10-08 18:28       ` Dave McCracken
2008-10-08 18:50         ` Keir Fraser
2008-10-08 22:07           ` Dave McCracken
2008-10-09  6:45             ` Keir Fraser
2008-10-09 10:21             ` Keir Fraser
2008-10-08 22:50       ` Jeremy Fitzhardinge
2008-10-09  8:38         ` Daniel P. Berrange
2008-10-10  0:05           ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.