xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] libxc: support building large pv-domains
@ 2015-10-13 13:11 Juergen Gross
  2015-10-13 13:11 ` [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

The Xen hypervisor supports starting a dom0 with large memory (up to
the TB range) by not including the initrd and p2m list in the initial
kernel mapping. Especially the p2m list can grow larger than the
available virtual space in the initial mapping.

The started kernel is indicating the support of each feature via
elf notes.

This series enables the domain builder in libxc to do the same as the
hypervisor. This enables starting of huge pv-domUs via xl.

Unmapped initrd is supported for 64 and 32 bit domains, omitting the
p2m from initial kernel mapping is possible for 64 bit domains only.

Tested with:
- 32 bit domU (kernel not supporting unmapped initrd)
- 32 bit domU (kernel supporting unmapped initrd)
- 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
- 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
- 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
- HVM domU

Changes in v3:
- Rebased the complete series to new staging (hvm builder patches by
  Roger Pau Monne)
- Removed old patch 1 as it broke stubdom build
- Introduced new Patch 1 to make allocation of guest memory more clear
  regarding virtual/physical memory allocation (requested by Ian Campbell)
- Change name of flag to indicate support of unmapped initrd in patch 2
  (requested by Ian Campbell)
- Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
  alloc_pgtables", "introduce domain builder architecture specific data",
  "use domain builder architecture private data for x86 pv domains") to
  assist later page table work
- don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
  add explicit initrd parameters for start_info in struct xc_dom_image
  instead (requested by Ian Campbell)
- Introduce new patch 8 ("rework of domain builder's page table handler")
  to be able to use common helpers for unmapped p2m list (requested by
  Ian Campbell)
- use now known segment size in pages for p2m list in patch 9 (was patch
  5 in v2) instead of fiddling with segment end address (requested by
  Ian Campbell)
- split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
  variants (requested by Ian Campbell)

Changes in v2:
- patch 2 has been removed as it has been applied already
- introduced new patch 2 as suggested by Ian Campbell: add a flag
  indicating support of an unmapped initrd to the parsed elf data of
  the elf_dom_parms structure
- updated patch description of patch 3 as requested by Ian Campbell


Juergen Gross (9):
  libxc: reorganize domain builder guest memory allocator
  xen: add generic flag to elf_dom_parms indicating support of unmapped
    initrd
  libxc: rename domain builder count_pgtables to alloc_pgtables
  libxc: introduce domain builder architecture specific data
  libxc: use domain builder architecture private data for x86 pv domains
  libxc: create unmapped initrd in domain builder if supported
  libxc: split p2m allocation in domain builder from other magic pages
  libxc: rework of domain builder's page table handler
  libxc: create p2m list outside of kernel mapping if supported

 stubdom/grub/kexec.c               |  12 +-
 tools/libxc/include/xc_dom.h       |  32 +--
 tools/libxc/xc_dom_arm.c           |   6 +-
 tools/libxc/xc_dom_core.c          | 174 +++++++++----
 tools/libxc/xc_dom_x86.c           | 496 ++++++++++++++++++++++++-------------
 xen/arch/x86/domain_build.c        |   4 +-
 xen/common/libelf/libelf-dominfo.c |   3 +
 xen/include/xen/libelf.h           |   1 +
 8 files changed, 480 insertions(+), 248 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 15:32   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Guest memory allocation in the domain builder of libxc is done via
virtual addresses only. In order to be able to support preallocated
areas not virtually mapped reorganize the memory allocator to keep
track of allocated pages globally and in allocated segments.

This requires an interface change of the allocate callback of the
domain builder which currently is using the last mapped virtual
address as a parameter. This is no problem as the only user of this
callback is stubdom/grub/kexec.c using this virtual address to
calculate the last used pfn.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 stubdom/grub/kexec.c         |   6 +--
 tools/libxc/include/xc_dom.h |  11 +++--
 tools/libxc/xc_dom_core.c    | 101 +++++++++++++++++++++++++++++--------------
 3 files changed, 75 insertions(+), 43 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 0b2f4f3..2300318 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -100,9 +100,9 @@ static void do_exchange(struct xc_dom_image *dom, xen_pfn_t target_pfn, xen_pfn_
     dom->p2m_host[target_pfn] = source_mfn;
 }
 
-int kexec_allocate(struct xc_dom_image *dom, xen_vaddr_t up_to)
+int kexec_allocate(struct xc_dom_image *dom)
 {
-    unsigned long new_allocated = (up_to - dom->parms.virt_base) / PAGE_SIZE;
+    unsigned long new_allocated = dom->pfn_alloc_end - dom->rambase_pfn;
     unsigned long i;
 
     pages = realloc(pages, new_allocated * sizeof(*pages));
@@ -319,8 +319,6 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 
     /* Make sure the bootstrap page table does not RW-map any of our current
      * page table frames */
-    kexec_allocate(dom, dom->virt_pgtab_end);
-
     if ( (rc = xc_dom_update_guest_p2m(dom))) {
         grub_printf("xc_dom_update_guest_p2m returned %d\n", rc);
         errnum = ERR_BOOT_FAILURE;
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index e52b023..878dc52 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -29,6 +29,7 @@ struct xc_dom_seg {
     xen_vaddr_t vstart;
     xen_vaddr_t vend;
     xen_pfn_t pfn;
+    xen_pfn_t pages;
 };
 
 struct xc_dom_mem {
@@ -90,6 +91,7 @@ struct xc_dom_image {
     xen_pfn_t xenstore_pfn;
     xen_pfn_t shared_info_pfn;
     xen_pfn_t bootstack_pfn;
+    xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
 
@@ -177,7 +179,7 @@ struct xc_dom_image {
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
     /* allocate up to virt_alloc_end */
-    int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
+    int (*allocate) (struct xc_dom_image * dom);
 
     /* Container type (HVM or PV). */
     enum {
@@ -361,14 +363,11 @@ static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom,
                                       struct xc_dom_seg *seg,
                                       xen_pfn_t *pages_out)
 {
-    xen_vaddr_t segsize = seg->vend - seg->vstart;
-    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
-    xen_pfn_t pages = (segsize + page_size - 1) / page_size;
     void *retval;
 
-    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, pages);
+    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, seg->pages);
 
-    *pages_out = retval ? pages : 0;
+    *pages_out = retval ? seg->pages : 0;
     return retval;
 }
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index fbe4464..b1d7890 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -535,56 +535,75 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, xen_pfn_t pfn,
     return phys->ptr;
 }
 
-int xc_dom_alloc_segment(struct xc_dom_image *dom,
-                         struct xc_dom_seg *seg, char *name,
-                         xen_vaddr_t start, xen_vaddr_t size)
+static int xc_dom_chk_alloc_pages(struct xc_dom_image *dom, char *name,
+                                  xen_pfn_t pages)
 {
     unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
-    xen_pfn_t pages = (size + page_size - 1) / page_size;
-    xen_pfn_t pfn;
-    void *ptr;
 
-    if ( start == 0 )
-        start = dom->virt_alloc_end;
+    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
+         dom->pfn_alloc_end - dom->rambase_pfn > dom->total_pages ||
+         pages > dom->total_pages - dom->pfn_alloc_end + dom->rambase_pfn )
+    {
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: segment %s too large (0x%"PRIpfn" > "
+                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)", __FUNCTION__, name,
+                     pages, dom->total_pages,
+                     dom->pfn_alloc_end - dom->rambase_pfn);
+        return -1;
+    }
+
+    dom->pfn_alloc_end += pages;
+    dom->virt_alloc_end += pages * page_size;
 
-    if ( start & (page_size - 1) )
+    return 0;
+}
+
+static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t end)
+{
+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
+    xen_pfn_t pages;
+
+    if ( end & (page_size - 1) )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
                      "%s: segment start isn't page aligned (0x%" PRIx64 ")",
-                     __FUNCTION__, start);
+                     __FUNCTION__, end);
         return -1;
     }
-    if ( start < dom->virt_alloc_end )
+    if ( end < dom->virt_alloc_end )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
                      "%s: segment start too low (0x%" PRIx64 " < 0x%" PRIx64
-                     ")", __FUNCTION__, start, dom->virt_alloc_end);
+                     ")", __FUNCTION__, end, dom->virt_alloc_end);
         return -1;
     }
+    pages = (end - dom->virt_alloc_end) / page_size;
 
-    seg->vstart = start;
-    pfn = (seg->vstart - dom->parms.virt_base) / page_size;
-    seg->pfn = pfn + dom->rambase_pfn;
+    return xc_dom_chk_alloc_pages(dom, "padding", pages);
+}
 
-    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
-         pfn > dom->total_pages ||
-         pages > dom->total_pages - pfn)
-    {
-        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
-                     "%s: segment %s too large (0x%"PRIpfn" > "
-                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)",
-                     __FUNCTION__, name, pages, dom->total_pages, pfn);
+int xc_dom_alloc_segment(struct xc_dom_image *dom,
+                         struct xc_dom_seg *seg, char *name,
+                         xen_vaddr_t start, xen_vaddr_t size)
+{
+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
+    xen_pfn_t pages;
+    void *ptr;
+
+    if ( start && xc_dom_alloc_pad(dom, start) )
         return -1;
-    }
 
-    seg->vend = start + pages * page_size;
-    dom->virt_alloc_end = seg->vend;
-    if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
+    pages = (size + page_size - 1) / page_size;
+    start = dom->virt_alloc_end;
 
-    DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
-              "  (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
-              __FUNCTION__, name, seg->vstart, seg->vend, seg->pfn, pages);
+    seg->pfn = dom->pfn_alloc_end;
+    seg->pages = pages;
+
+    if ( xc_dom_chk_alloc_pages(dom, name, pages) )
+        return -1;
+
+    if (dom->allocate)
+        dom->allocate(dom);
 
     /* map and clear pages */
     ptr = xc_dom_seg_to_ptr(dom, seg);
@@ -592,6 +611,13 @@ int xc_dom_alloc_segment(struct xc_dom_image *dom,
         return -1;
     memset(ptr, 0, pages * page_size);
 
+    seg->vstart = start;
+    seg->vend = dom->virt_alloc_end;
+
+    DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
+              "  (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
+              __FUNCTION__, name, seg->vstart, seg->vend, seg->pfn, pages);
+
     return 0;
 }
 
@@ -602,10 +628,11 @@ int xc_dom_alloc_page(struct xc_dom_image *dom, char *name)
     xen_pfn_t pfn;
 
     start = dom->virt_alloc_end;
+    pfn = dom->pfn_alloc_end - dom->rambase_pfn;
     dom->virt_alloc_end += page_size;
+    dom->pfn_alloc_end++;
     if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
-    pfn = (start - dom->parms.virt_base) / page_size;
+        dom->allocate(dom);
 
     DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " (pfn 0x%" PRIpfn ")",
               __FUNCTION__, name, start, pfn);
@@ -886,6 +913,7 @@ int xc_dom_parse_image(struct xc_dom_image *dom)
 int xc_dom_rambase_init(struct xc_dom_image *dom, uint64_t rambase)
 {
     dom->rambase_pfn = rambase >> XC_PAGE_SHIFT;
+    dom->pfn_alloc_end = dom->rambase_pfn;
     DOMPRINTF("%s: RAM starts at %"PRI_xen_pfn,
               __FUNCTION__, dom->rambase_pfn);
     return 0;
@@ -1013,6 +1041,8 @@ int xc_dom_build_image(struct xc_dom_image *dom)
         goto err;
     }
     page_size = XC_DOM_PAGE_SIZE(dom);
+    if ( dom->parms.virt_base != UNSET_ADDR )
+        dom->virt_alloc_end = dom->parms.virt_base;
 
     /* load kernel */
     if ( xc_dom_alloc_segment(dom, &dom->kernel_seg, "kernel",
@@ -1067,6 +1097,11 @@ int xc_dom_build_image(struct xc_dom_image *dom)
               __FUNCTION__, dom->virt_alloc_end);
     DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
               __FUNCTION__, dom->virt_pgtab_end);
+
+    /* Make sure all memory mapped by initial page tables is available */
+    if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) )
+        return -1;
+
     return 0;
 
  err:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
  2015-10-13 13:11 ` [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 15:33   ` Wei Liu
  2015-10-28 15:49   ` Jan Beulich
  2015-10-13 13:11 ` [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Support of an unmapped initrd is indicated by the kernel of the domain
via elf notes. In order not to have to use raw elf data in the tools
for support of an unmapped initrd add a flag to the parsed data area
to indicate the kernel supporting this feature.

Switch using this flag in the hypervisor domain builder.

Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain_build.c        | 4 ++--
 xen/common/libelf/libelf-dominfo.c | 3 +++
 xen/include/xen/libelf.h           | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index c2ef87a..d02dc4b 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
 
         vstart = parms->virt_base;
         vend = round_pgup(parms->virt_kend);
-        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+        if ( !parms->unmapped_initrd )
             vend += round_pgup(initrd_len);
         end = vend + nr_pages * sizeof_long;
 
@@ -1037,7 +1037,7 @@ int __init construct_dom0(
     v_start          = parms.virt_base;
     vkern_start      = parms.virt_kstart;
     vkern_end        = parms.virt_kend;
-    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+    if ( parms.unmapped_initrd )
     {
         vinitrd_start  = vinitrd_end = 0;
         vphysmap_start = round_pgup(vkern_end);
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 3de1c23..c9243e4 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_INIT_P2M:
         parms->p2m_base = val;
         break;
+    case XEN_ELFNOTE_MOD_START_PFN:
+        parms->unmapped_initrd = !!val;
+        break;
     case XEN_ELFNOTE_PADDR_OFFSET:
         parms->elf_paddr_offset = val;
         break;
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index de788c7..6da4cc0 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -423,6 +423,7 @@ struct elf_dom_parms {
     char loader[16];
     enum xen_pae_type pae;
     bool bsd_symtab;
+    bool unmapped_initrd;
     uint64_t virt_base;
     uint64_t virt_entry;
     uint64_t virt_hypercall;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
  2015-10-13 13:11 ` [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
  2015-10-13 13:11 ` [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 15:34   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Rename the count_pgtables hook of the domain builder to alloc_pgtables
and do the allocation of the guest memory for page tables inside this
hook. This will remove the need for accessing the x86 specific pgtables
member of struct xc_dom_image in the generic domain builder code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |  2 +-
 tools/libxc/xc_dom_arm.c     |  6 +++---
 tools/libxc/xc_dom_core.c    | 11 ++---------
 tools/libxc/xc_dom_x86.c     | 26 +++++++++++++++++---------
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 878dc52..03d5407 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -221,7 +221,7 @@ void xc_dom_register_loader(struct xc_dom_loader *loader);
 struct xc_dom_arch {
     /* pagetable setup */
     int (*alloc_magic_pages) (struct xc_dom_image * dom);
-    int (*count_pgtables) (struct xc_dom_image * dom);
+    int (*alloc_pgtables) (struct xc_dom_image * dom);
     int (*setup_pgtables) (struct xc_dom_image * dom);
 
     /* arch-specific data structs setup */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 397eef0..d9a6371 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -49,7 +49,7 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
  * arm guests are hybrid and start off with paging disabled, therefore no
  * pagetables and nothing to do here.
  */
-static int count_pgtables_arm(struct xc_dom_image *dom)
+static int alloc_pgtables_arm(struct xc_dom_image *dom)
 {
     DOMPRINTF_CALLED(dom->xch);
     return 0;
@@ -534,7 +534,7 @@ static struct xc_dom_arch xc_dom_32 = {
     .page_shift = PAGE_SHIFT_ARM,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
+    .alloc_pgtables = alloc_pgtables_arm,
     .setup_pgtables = setup_pgtables_arm,
     .start_info = start_info_arm,
     .shared_info = shared_info_arm,
@@ -550,7 +550,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .page_shift = PAGE_SHIFT_ARM,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_arm,
+    .alloc_pgtables = alloc_pgtables_arm,
     .setup_pgtables = setup_pgtables_arm,
     .start_info = start_info_arm,
     .shared_info = shared_info_arm,
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index b1d7890..4b6eb9b 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1082,15 +1082,8 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     /* allocate other pages */
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
         goto err;
-    if ( dom->arch_hooks->count_pgtables )
-    {
-        if ( dom->arch_hooks->count_pgtables(dom) != 0 )
-            goto err;
-        if ( (dom->pgtables > 0) &&
-             (xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                   dom->pgtables * page_size) != 0) )
-                goto err;
-    }
+    if ( dom->arch_hooks->alloc_pgtables(dom) != 0 )
+        goto err;
     if ( dom->alloc_bootstack )
         dom->bootstack_pfn = xc_dom_alloc_page(dom, "boot stack");
     DOMPRINTF("%-20s: virt_alloc_end : 0x%" PRIx64 "",
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index dd331bf..7eda047 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -126,7 +126,7 @@ nr_page_tables(struct xc_dom_image *dom,
     return tables;
 }
 
-static int count_pgtables(struct xc_dom_image *dom, int pae,
+static int alloc_pgtables(struct xc_dom_image *dom, int pae,
                           int l4_bits, int l3_bits, int l2_bits, int l1_bits)
 {
     int pages, extra_pages;
@@ -172,7 +172,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae,
             break;
     }
     dom->virt_pgtab_end = try_virt_end + 1;
-    return 0;
+
+    return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
+                                dom->pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -182,9 +184,9 @@ static int count_pgtables(struct xc_dom_image *dom, int pae,
 #define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
 #define L3_PROT (_PAGE_PRESENT)
 
-static int count_pgtables_x86_32_pae(struct xc_dom_image *dom)
+static int alloc_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
-    return count_pgtables(dom, 1, 0, 32,
+    return alloc_pgtables(dom, 1, 0, 32,
                           L3_PAGETABLE_SHIFT_PAE, L2_PAGETABLE_SHIFT_PAE);
 }
 
@@ -355,9 +357,9 @@ pfn_error:
 /* ------------------------------------------------------------------------ */
 /* x86_64 pagetables                                                        */
 
-static int count_pgtables_x86_64(struct xc_dom_image *dom)
+static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 {
-    return count_pgtables(dom, 0,
+    return alloc_pgtables(dom, 0,
                           L4_PAGETABLE_SHIFT_X86_64 + 9,
                           L4_PAGETABLE_SHIFT_X86_64,
                           L3_PAGETABLE_SHIFT_X86_64,
@@ -1623,6 +1625,12 @@ static int bootlate_pv(struct xc_dom_image *dom)
     return 0;
 }
 
+static int alloc_pgtables_hvm(struct xc_dom_image *dom)
+{
+    DOMPRINTF("%s: doing nothing", __func__);
+    return 0;
+}
+
 static int bootlate_hvm(struct xc_dom_image *dom)
 {
     DOMPRINTF("%s: doing nothing", __func__);
@@ -1646,7 +1654,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_32_pae,
+    .alloc_pgtables = alloc_pgtables_x86_32_pae,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1662,7 +1670,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
-    .count_pgtables = count_pgtables_x86_64,
+    .alloc_pgtables = alloc_pgtables_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
@@ -1678,7 +1686,7 @@ static struct xc_dom_arch xc_hvm_32 = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
     .alloc_magic_pages = alloc_magic_pages_hvm,
-    .count_pgtables = NULL,
+    .alloc_pgtables = alloc_pgtables_hvm,
     .setup_pgtables = NULL,
     .start_info = NULL,
     .shared_info = NULL,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 4/9] libxc: introduce domain builder architecture specific data
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (2 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 15:37   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Reorganize struct xc_dom_image to contain a pointer to domain builder
architecture specific private data. This will abstract the architecture
or domain type specific data from the general used data.

The new area is allocated as soon as the domain type is known.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 stubdom/grub/kexec.c         |  6 +++++-
 tools/libxc/include/xc_dom.h |  6 +++++-
 tools/libxc/xc_dom_core.c    | 27 +++++++++++++++++++--------
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 2300318..8fd9ff9 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -272,7 +272,11 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 #endif
 
     /* equivalent of xc_dom_mem_init */
-    dom->arch_hooks = xc_dom_find_arch_hooks(xc_handle, dom->guest_type);
+    if (xc_dom_set_arch_hooks(dom)) {
+        grub_printf("xc_dom_set_arch_hooks failed\n");
+        errnum = ERR_EXEC_FORMAT;
+        goto out;
+    }
     dom->total_pages = start_info.nr_pages;
 
     /* equivalent of arch_setup_meminit */
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 03d5407..912d5cb 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -176,6 +176,9 @@ struct xc_dom_image {
     unsigned int *vnode_to_pnode;
     unsigned int nr_vnodes;
 
+    /* domain type/architecture specific data */
+    void *arch_private;
+
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
     /* allocate up to virt_alloc_end */
@@ -238,6 +241,7 @@ struct xc_dom_arch {
     char *native_protocol;
     int page_shift;
     int sizeof_pfn;
+    int arch_private_size;
 
     struct xc_dom_arch *next;
 };
@@ -291,7 +295,7 @@ int xc_dom_devicetree_mem(struct xc_dom_image *dom, const void *mem,
                           size_t memsize);
 
 int xc_dom_parse_image(struct xc_dom_image *dom);
-struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type);
+int xc_dom_set_arch_hooks(struct xc_dom_image *dom);
 int xc_dom_build_image(struct xc_dom_image *dom);
 int xc_dom_update_guest_p2m(struct xc_dom_image *dom);
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 4b6eb9b..8e1e17f 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -710,19 +710,30 @@ void xc_dom_register_arch_hooks(struct xc_dom_arch *hooks)
     first_hook = hooks;
 }
 
-struct xc_dom_arch *xc_dom_find_arch_hooks(xc_interface *xch, char *guest_type)
+int xc_dom_set_arch_hooks(struct xc_dom_image *dom)
 {
     struct xc_dom_arch *hooks = first_hook;
 
     while (  hooks != NULL )
     {
-        if ( !strcmp(hooks->guest_type, guest_type))
-            return hooks;
+        if ( !strcmp(hooks->guest_type, dom->guest_type) )
+        {
+            if ( hooks->arch_private_size )
+            {
+                dom->arch_private = malloc(hooks->arch_private_size);
+                if ( dom->arch_private == NULL )
+                    return -1;
+                memset(dom->arch_private, 0, hooks->arch_private_size);
+                dom->alloc_malloc += hooks->arch_private_size;
+            }
+            dom->arch_hooks = hooks;
+            return 0;
+        }
         hooks = hooks->next;
     }
-    xc_dom_panic(xch, XC_INVALID_KERNEL,
-                 "%s: not found (type %s)", __FUNCTION__, guest_type);
-    return NULL;
+    xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
+                 "%s: not found (type %s)", __FUNCTION__, dom->guest_type);
+    return -1;
 }
 
 /* ------------------------------------------------------------------------ */
@@ -734,6 +745,7 @@ void xc_dom_release(struct xc_dom_image *dom)
     if ( dom->phys_pages )
         xc_dom_unmap_all(dom);
     xc_dom_free_all(dom);
+    free(dom->arch_private);
     free(dom);
 }
 
@@ -924,8 +936,7 @@ int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb)
     unsigned int page_shift;
     xen_pfn_t nr_pages;
 
-    dom->arch_hooks = xc_dom_find_arch_hooks(dom->xch, dom->guest_type);
-    if ( dom->arch_hooks == NULL )
+    if ( xc_dom_set_arch_hooks(dom) )
     {
         xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, "%s: arch hooks not set",
                      __FUNCTION__);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (3 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 15:38   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Move some data private to the x86 domain builder to the private data
section. Remove extra_pages as they are used nowhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |  8 --------
 tools/libxc/xc_dom_x86.c     | 48 +++++++++++++++++++++++++++++---------------
 2 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 912d5cb..b0120a6 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -94,15 +94,7 @@ struct xc_dom_image {
     xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
-
-    /* initial page tables */
-    unsigned int pgtables;
-    unsigned int pg_l4;
-    unsigned int pg_l3;
-    unsigned int pg_l2;
-    unsigned int pg_l1;
     unsigned int alloc_bootstack;
-    unsigned int extra_pages;
     xen_vaddr_t virt_pgtab_end;
 
     /* other state info */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 7eda047..60d54b3 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -69,6 +69,15 @@
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
 
+struct xc_dom_image_x86 {
+    /* initial page tables */
+    unsigned int pgtables;
+    unsigned int pg_l4;
+    unsigned int pg_l3;
+    unsigned int pg_l2;
+    unsigned int pg_l1;
+};
+
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
                                           uint32_t domid)
@@ -132,9 +141,9 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
     int pages, extra_pages;
     xen_vaddr_t try_virt_end;
     xen_pfn_t try_pfn_end;
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
 
     extra_pages = dom->alloc_bootstack ? 1 : 0;
-    extra_pages += dom->extra_pages;
     extra_pages += 128; /* 512kB padding */
     pages = extra_pages;
     for ( ; ; )
@@ -152,29 +161,30 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
             return -ENOMEM;
         }
 
-        dom->pg_l4 =
+        domx86->pg_l4 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l4_bits);
-        dom->pg_l3 =
+        domx86->pg_l3 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l3_bits);
-        dom->pg_l2 =
+        domx86->pg_l2 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l2_bits);
-        dom->pg_l1 =
+        domx86->pg_l1 =
             nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l1_bits);
         if (pae && try_virt_end < 0xc0000000)
         {
             DOMPRINTF("%s: PAE: extra l2 page table for l3#3",
                       __FUNCTION__);
-            dom->pg_l2++;
+            domx86->pg_l2++;
         }
-        dom->pgtables = dom->pg_l4 + dom->pg_l3 + dom->pg_l2 + dom->pg_l1;
-        pages = dom->pgtables + extra_pages;
+        domx86->pgtables = domx86->pg_l4 + domx86->pg_l3 +
+                           domx86->pg_l2 + domx86->pg_l1;
+        pages = domx86->pgtables + extra_pages;
         if ( dom->virt_alloc_end + pages * PAGE_SIZE_X86 <= try_virt_end + 1 )
             break;
     }
     dom->virt_pgtab_end = try_virt_end + 1;
 
     return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                dom->pgtables * PAGE_SIZE_X86);
+                                domx86->pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -262,9 +272,10 @@ static xen_pfn_t move_l3_below_4G(struct xc_dom_image *dom,
 
 static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     xen_pfn_t l3pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l2pfn = l3pfn + dom->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + dom->pg_l2;
+    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
+    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
     l3_pgentry_64_t *l3tab;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
@@ -373,10 +384,11 @@ static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 
 static int setup_pgtables_x86_64(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     xen_pfn_t l4pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l3pfn = l4pfn + dom->pg_l4;
-    xen_pfn_t l2pfn = l3pfn + dom->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + dom->pg_l2;
+    xen_pfn_t l3pfn = l4pfn + domx86->pg_l4;
+    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
+    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
     l4_pgentry_64_t *l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1);
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
@@ -619,6 +631,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
 
 static int start_info_x86_32(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     start_info_x86_32_t *start_info =
         xc_dom_pfn_to_ptr(dom, dom->start_info_pfn, 1);
     xen_pfn_t shinfo =
@@ -639,7 +652,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = dom->pgtables;
+    start_info->nr_pt_frames = domx86->pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -665,6 +678,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
 
 static int start_info_x86_64(struct xc_dom_image *dom)
 {
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
     start_info_x86_64_t *start_info =
         xc_dom_pfn_to_ptr(dom, dom->start_info_pfn, 1);
     xen_pfn_t shinfo =
@@ -685,7 +699,7 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = dom->pgtables;
+    start_info->nr_pt_frames = domx86->pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -1653,6 +1667,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_32,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
+    .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
     .setup_pgtables = setup_pgtables_x86_32_pae,
@@ -1669,6 +1684,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_64,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
+    .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (4 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 16:11   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting an
unmapped initrd, don't waste precious virtual space for the initrd,
but allocate only guest physical memory for it.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |  5 +++++
 tools/libxc/xc_dom_core.c    | 19 +++++++++++++++++--
 tools/libxc/xc_dom_x86.c     |  8 ++++----
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index b0120a6..fa772a9 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -94,6 +94,11 @@ struct xc_dom_image {
     xen_pfn_t pfn_alloc_end;
     xen_vaddr_t virt_alloc_end;
     xen_vaddr_t bsd_symtab_start;
+
+    /* initrd parameters as specified in start_info page */
+    unsigned long initrd_start;
+    unsigned long initrd_len;
+
     unsigned int alloc_bootstack;
     xen_vaddr_t virt_pgtab_end;
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 8e1e17f..15e9fa3 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1041,6 +1041,7 @@ static int xc_dom_build_ramdisk(struct xc_dom_image *dom)
 int xc_dom_build_image(struct xc_dom_image *dom)
 {
     unsigned int page_size;
+    bool unmapped_initrd;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -1064,11 +1065,15 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->kernel_loader->loader(dom) != 0 )
         goto err;
 
-    /* load ramdisk */
-    if ( dom->ramdisk_blob )
+    /* Load ramdisk if initial mapping required. */
+    unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart;
+
+    if ( dom->ramdisk_blob && !unmapped_initrd )
     {
         if ( xc_dom_build_ramdisk(dom) != 0 )
             goto err;
+        dom->initrd_start = dom->ramdisk_seg.vstart;
+        dom->initrd_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
     }
 
     /* load devicetree */
@@ -1106,6 +1111,16 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) )
         return -1;
 
+    /* Load ramdisk if no initial mapping required. */
+    if ( dom->ramdisk_blob && unmapped_initrd )
+    {
+        if ( xc_dom_build_ramdisk(dom) != 0 )
+            goto err;
+        dom->flags |= SIF_MOD_START_PFN;
+        dom->initrd_start = dom->ramdisk_seg.pfn;
+        dom->initrd_len = page_size * dom->ramdisk_seg.pages;
+    }
+
     return 0;
 
  err:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 60d54b3..7e2f1c5 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -663,8 +663,8 @@ static int start_info_x86_32(struct xc_dom_image *dom)
 
     if ( dom->ramdisk_blob )
     {
-        start_info->mod_start = dom->ramdisk_seg.vstart;
-        start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        start_info->mod_start = dom->initrd_start;
+        start_info->mod_len = dom->initrd_len;
     }
 
     if ( dom->cmdline )
@@ -710,8 +710,8 @@ static int start_info_x86_64(struct xc_dom_image *dom)
 
     if ( dom->ramdisk_blob )
     {
-        start_info->mod_start = dom->ramdisk_seg.vstart;
-        start_info->mod_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        start_info->mod_start = dom->initrd_start;
+        start_info->mod_len = dom->initrd_len;
     }
 
     if ( dom->cmdline )
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (5 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-28 16:11   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 8/9] libxc: rework of domain builder's page table handler Juergen Gross
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

Carve out the p2m list allocation from the .alloc_magic_pages hook of
the domain builder in order to prepare allocating the p2m list outside
of the initial kernel mapping. This will be needed to support loading
domains with huge memory (>512 GB).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_core.c    |  3 +++
 tools/libxc/xc_dom_x86.c     | 11 ++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index fa772a9..79d830f 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -222,6 +222,7 @@ struct xc_dom_arch {
     /* pagetable setup */
     int (*alloc_magic_pages) (struct xc_dom_image * dom);
     int (*alloc_pgtables) (struct xc_dom_image * dom);
+    int (*alloc_p2m_list) (struct xc_dom_image * dom);
     int (*setup_pgtables) (struct xc_dom_image * dom);
 
     /* arch-specific data structs setup */
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 15e9fa3..18b98c9 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1096,6 +1096,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
+    if ( dom->arch_hooks->alloc_p2m_list &&
+         dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+        goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->alloc_pgtables(dom) != 0 )
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 7e2f1c5..c815e10 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -475,7 +475,7 @@ pfn_error:
 
 /* ------------------------------------------------------------------------ */
 
-static int alloc_magic_pages(struct xc_dom_image *dom)
+static int alloc_p2m_list(struct xc_dom_image *dom)
 {
     size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
 
@@ -487,6 +487,13 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     if ( dom->p2m_guest == NULL )
         return -1;
 
+    return 0;
+}
+
+/* ------------------------------------------------------------------------ */
+
+static int alloc_magic_pages(struct xc_dom_image *dom)
+{
     /* allocate special pages */
     dom->start_info_pfn = xc_dom_alloc_page(dom, "start info");
     dom->xenstore_pfn = xc_dom_alloc_page(dom, "xenstore");
@@ -1670,6 +1677,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
+    .alloc_p2m_list = alloc_p2m_list,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1687,6 +1695,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
+    .alloc_p2m_list = alloc_p2m_list,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (6 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-29 12:48   ` Wei Liu
  2015-10-13 13:11 ` [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
  2015-10-26 11:15 ` [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In order to prepare a p2m list outside of the initial kernel mapping
do a rework of the domain builder's page table handler. The goal is
to be able to use common helpers for page table allocation and setup
for initial kernel page tables and page tables mapping the p2m list.
This is achieved by supporting multiple mapping areas. The mapped
virtual addresses of the single areas must not overlap, while the
page tables of a new area added might already be partially present.
Especially the top level page table is existing only once, of course.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/xc_dom_x86.c | 404 ++++++++++++++++++++++++++++-------------------
 1 file changed, 240 insertions(+), 164 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index c815e10..333ef6b 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -65,17 +65,27 @@
 #define NR_IOREQ_SERVER_PAGES 8
 #define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
 
-#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
+#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
 
-struct xc_dom_image_x86 {
-    /* initial page tables */
+struct xc_dom_x86_mapping_lvl {
+    xen_vaddr_t from;
+    xen_vaddr_t to;
+    xen_pfn_t pfn;
     unsigned int pgtables;
-    unsigned int pg_l4;
-    unsigned int pg_l3;
-    unsigned int pg_l2;
-    unsigned int pg_l1;
+};
+
+struct xc_dom_x86_mapping {
+    struct xc_dom_x86_mapping_lvl area;
+    struct xc_dom_x86_mapping_lvl lvls[4];
+    xen_pfn_t pfn_start;
+};
+
+struct xc_dom_image_x86 {
+    unsigned n_mappings;
+#define MAPPING_MAX 1
+    struct xc_dom_x86_mapping maps[MAPPING_MAX];
 };
 
 /* get guest IO ABI protocol */
@@ -105,43 +115,107 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
     return protocol;
 }
 
-static unsigned long
-nr_page_tables(struct xc_dom_image *dom,
-               xen_vaddr_t start, xen_vaddr_t end, unsigned long bits)
+static void
+nr_page_tables(struct xc_dom_image *dom, int lvl,
+               xen_vaddr_t from, xen_vaddr_t to, unsigned long bits)
 {
     xen_vaddr_t mask = bits_to_mask(bits);
-    int tables;
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
+    struct xc_dom_x86_mapping *map_cmp;
+    unsigned map_n;
 
     if ( bits == 0 )
-        return 0;  /* unused */
+        return;  /* unused */
+
+    if ( lvl == 3 && domx86->n_mappings != 0 )
+        return;  /* Top level page table already in first mapping. */
 
     if ( bits == (8 * sizeof(unsigned long)) )
     {
-        /* must be pgd, need one */
-        start = 0;
-        end = -1;
-        tables = 1;
+        /* 32 bit top level page table special case */
+        map->lvls[lvl].from = 0;
+        map->lvls[lvl].to = -1;
+        map->lvls[lvl].pgtables = 1;
+        goto done;
     }
-    else
+
+    from = round_down(from, mask);
+    to = round_up(to, mask);
+
+    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
+          map_n++, map_cmp++ )
     {
-        start = round_down(start, mask);
-        end = round_up(end, mask);
-        tables = ((end - start) >> bits) + 1;
+        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
+            continue;
+        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
+            return;  /* Area already completely covered on this level. */
+        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
+            from = map_cmp->lvls[lvl].to + 1;
+        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
+            to = map_cmp->lvls[lvl].from - 1;
     }
 
+    map->lvls[lvl].from = from;
+    map->lvls[lvl].to = to;
+    map->lvls[lvl].pgtables = ((to - from) >> bits) + 1;
+
+ done:
     DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64
-              " -> 0x%016" PRIx64 ", %d table(s)",
-              __FUNCTION__, mask, bits, start, end, tables);
-    return tables;
+              " -> 0x%016" PRIx64 ", %d table(s)", __FUNCTION__, mask, bits,
+              map->lvls[lvl].from, map->lvls[lvl].to, map->lvls[lvl].pgtables);
 }
 
-static int alloc_pgtables(struct xc_dom_image *dom, int pae,
-                          int l4_bits, int l3_bits, int l2_bits, int l1_bits)
+static int count_pgtables(struct xc_dom_image *dom, int pae, int bits[4],
+                          xen_vaddr_t from, xen_vaddr_t to, xen_pfn_t pfn)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map;
+    xen_pfn_t pfn_end;
+    int level;
+
+    if ( domx86->n_mappings == MAPPING_MAX )
+    {
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: too many mappings\n", __FUNCTION__);
+        return -ENOMEM;
+    }
+    map = domx86->maps + domx86->n_mappings;
+
+    pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86);
+    if ( pfn_end >= dom->p2m_size )
+    {
+        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
+                     "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
+                     __FUNCTION__, pfn_end, dom->p2m_size);
+        return -ENOMEM;
+    }
+
+    memset(map, 0, sizeof(*map));
+
+    map->area.from = from;
+    map->area.to = to;
+    for (level = 3; level >= 0; level--)
+    {
+        map->lvls[level].pfn = pfn + map->area.pgtables;
+        nr_page_tables(dom, level, from, to, bits[level]);
+        if ( pae && to < 0xc0000000 && level == 1)
+        {
+            DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
+            map->lvls[level].pgtables++;
+        }
+        map->area.pgtables += map->lvls[level].pgtables;
+    }
+
+    return 0;
+}
+
+static int alloc_pgtables(struct xc_dom_image *dom, int pae, int bits[4])
 {
     int pages, extra_pages;
     xen_vaddr_t try_virt_end;
-    xen_pfn_t try_pfn_end;
     struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
 
     extra_pages = dom->alloc_bootstack ? 1 : 0;
     extra_pages += 128; /* 512kB padding */
@@ -151,40 +225,20 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
         try_virt_end = round_up(dom->virt_alloc_end + pages * PAGE_SIZE_X86,
                                 bits_to_mask(22)); /* 4MB alignment */
 
-        try_pfn_end = (try_virt_end - dom->parms.virt_base) >> PAGE_SHIFT_X86;
-
-        if ( try_pfn_end > dom->p2m_size )
-        {
-            xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
-                         "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
-                         __FUNCTION__, try_pfn_end, dom->p2m_size);
-            return -ENOMEM;
-        }
+        if ( count_pgtables(dom, pae, bits, dom->parms.virt_base, try_virt_end,
+                            dom->pfn_alloc_end) )
+            return -1;
 
-        domx86->pg_l4 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l4_bits);
-        domx86->pg_l3 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l3_bits);
-        domx86->pg_l2 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l2_bits);
-        domx86->pg_l1 =
-            nr_page_tables(dom, dom->parms.virt_base, try_virt_end, l1_bits);
-        if (pae && try_virt_end < 0xc0000000)
-        {
-            DOMPRINTF("%s: PAE: extra l2 page table for l3#3",
-                      __FUNCTION__);
-            domx86->pg_l2++;
-        }
-        domx86->pgtables = domx86->pg_l4 + domx86->pg_l3 +
-                           domx86->pg_l2 + domx86->pg_l1;
-        pages = domx86->pgtables + extra_pages;
+        pages = map->area.pgtables + extra_pages;
         if ( dom->virt_alloc_end + pages * PAGE_SIZE_X86 <= try_virt_end + 1 )
             break;
     }
+    map->area.pfn = 0;
+    domx86->n_mappings++;
     dom->virt_pgtab_end = try_virt_end + 1;
 
     return xc_dom_alloc_segment(dom, &dom->pgtables_seg, "page tables", 0,
-                                domx86->pgtables * PAGE_SIZE_X86);
+                                map->area.pgtables * PAGE_SIZE_X86);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -194,13 +248,17 @@ static int alloc_pgtables(struct xc_dom_image *dom, int pae,
 #define L2_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_USER)
 #define L3_PROT (_PAGE_PRESENT)
 
+static int pgtblshift_x86_32_pae[4] =
+    /* Top level page table must be index 3! */
+    { L2_PAGETABLE_SHIFT_PAE, L3_PAGETABLE_SHIFT_PAE, 0, 32 };
+
 static int alloc_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
-    return alloc_pgtables(dom, 1, 0, 32,
-                          L3_PAGETABLE_SHIFT_PAE, L2_PAGETABLE_SHIFT_PAE);
+    return alloc_pgtables(dom, 1, pgtblshift_x86_32_pae);
 }
 
 #define pfn_to_paddr(pfn) ((xen_paddr_t)(pfn) << PAGE_SHIFT_X86)
+#define pgentry_to_pfn(entry) ((xen_pfn_t)((entry) >> PAGE_SHIFT_X86))
 
 /*
  * Move the l3 page table page below 4G for guests which do not
@@ -273,17 +331,16 @@ static xen_pfn_t move_l3_below_4G(struct xc_dom_image *dom,
 static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
 {
     struct xc_dom_image_x86 *domx86 = dom->arch_private;
-    xen_pfn_t l3pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
-    l3_pgentry_64_t *l3tab;
-    l2_pgentry_64_t *l2tab = NULL;
-    l1_pgentry_64_t *l1tab = NULL;
-    unsigned long l3off, l2off = 0, l1off;
+    struct xc_dom_x86_mapping *map;
+    xen_pfn_t l3mfn, l3pfn, l2pfn, l1pfn;
+    l3_pgentry_64_t *l3tab, *l2tab, *l1tab;
+    unsigned long l3off, l2off, l1off;
     xen_vaddr_t addr;
+    unsigned mapping;
     xen_pfn_t pgpfn;
-    xen_pfn_t l3mfn = xc_dom_p2m(dom, l3pfn);
 
+    l3pfn = domx86->maps[0].lvls[3].pfn;
+    l3mfn = xc_dom_p2m(dom, l3pfn);
     if ( dom->parms.pae == XEN_PAE_YES )
     {
         if ( l3mfn >= 0x100000 )
@@ -302,55 +359,64 @@ static int setup_pgtables_x86_32_pae(struct xc_dom_image *dom)
     l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
     if ( l3tab == NULL )
         goto pfn_error;
-
-    for ( addr = dom->parms.virt_base; addr < dom->virt_pgtab_end;
-          addr += PAGE_SIZE_X86 )
+    for ( mapping = 0; mapping < domx86->n_mappings; mapping++ )
     {
-        if ( l2tab == NULL )
+        map = domx86->maps + mapping;
+        l2tab = NULL;
+        l1tab = NULL;
+        l2off = 0;
+
+        for ( addr = map->area.from; addr < map->area.to;
+              addr += PAGE_SIZE_X86 )
         {
-            /* get L2 tab, make L3 entry */
-            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
             if ( l2tab == NULL )
-                goto pfn_error;
-            l3off = l3_table_offset_pae(addr);
-            l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
-            l2pfn++;
-        }
+            {
+                /* get L2 tab, make L3 entry */
+                l3off = l3_table_offset_pae(addr);
+                pgpfn = (addr - map->lvls[1].from) >> L3_PAGETABLE_SHIFT_PAE;
+                l2pfn = l3tab[l3off] ? pgentry_to_pfn(l3tab[l3off]) :
+                        map->lvls[1].pfn + pgpfn;
+                l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
+                if ( l2tab == NULL )
+                    goto pfn_error;
+                l3tab[l3off] = pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
+            }
 
-        if ( l1tab == NULL )
-        {
-            /* get L1 tab, make L2 entry */
-            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
             if ( l1tab == NULL )
-                goto pfn_error;
-            l2off = l2_table_offset_pae(addr);
-            l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
-            l1pfn++;
-        }
+            {
+                /* get L1 tab, make L2 entry */
+                l2off = l2_table_offset_pae(addr);
+                pgpfn = (addr - map->lvls[0].from) >> L2_PAGETABLE_SHIFT_PAE;
+                l1pfn = l2tab[l2off] ? pgentry_to_pfn(l2tab[l2off]) :
+                        map->lvls[0].pfn + pgpfn;
+                l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
+                if ( l1tab == NULL )
+                    goto pfn_error;
+                l2tab[l2off] = pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
+            }
 
-        /* make L1 entry */
-        l1off = l1_table_offset_pae(addr);
-        pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
-        l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
-        if ( (!dom->pvh_enabled)                &&
-             (addr >= dom->pgtables_seg.vstart) &&
-             (addr < dom->pgtables_seg.vend) )
-            l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
-
-        if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
-        {
-            l1tab = NULL;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
-                l2tab = NULL;
+            /* make L1 entry */
+            l1off = l1_table_offset_pae(addr);
+            pgpfn = ((addr - map->area.from) >> PAGE_SHIFT_X86) + map->area.pfn;
+            l1tab[l1off] = pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
+            if ( (!dom->pvh_enabled)            &&
+                 (pgpfn >= map->lvls[3].pfn)    &&
+                 (pgpfn < map->lvls[3].pfn + map->area.pgtables) )
+                l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
+            if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
+            {
+                l1tab = NULL;
+                if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
+                    l2tab = NULL;
+            }
         }
     }
 
     if ( dom->virt_pgtab_end <= 0xc0000000 )
     {
         DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
+        l2pfn = domx86->maps[0].lvls[0].pfn - 1;
         l3tab[3] = pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
     }
     return 0;
@@ -368,13 +434,13 @@ pfn_error:
 /* ------------------------------------------------------------------------ */
 /* x86_64 pagetables                                                        */
 
+static int pgtblshift_x86_64[4] =
+    { L2_PAGETABLE_SHIFT_X86_64, L3_PAGETABLE_SHIFT_X86_64,
+      L4_PAGETABLE_SHIFT_X86_64, L4_PAGETABLE_SHIFT_X86_64 + 9 };
+
 static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 {
-    return alloc_pgtables(dom, 0,
-                          L4_PAGETABLE_SHIFT_X86_64 + 9,
-                          L4_PAGETABLE_SHIFT_X86_64,
-                          L3_PAGETABLE_SHIFT_X86_64,
-                          L2_PAGETABLE_SHIFT_X86_64);
+    return alloc_pgtables(dom, 0, pgtblshift_x86_64);
 }
 
 #define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
@@ -385,78 +451,88 @@ static int alloc_pgtables_x86_64(struct xc_dom_image *dom)
 static int setup_pgtables_x86_64(struct xc_dom_image *dom)
 {
     struct xc_dom_image_x86 *domx86 = dom->arch_private;
-    xen_pfn_t l4pfn = dom->pgtables_seg.pfn;
-    xen_pfn_t l3pfn = l4pfn + domx86->pg_l4;
-    xen_pfn_t l2pfn = l3pfn + domx86->pg_l3;
-    xen_pfn_t l1pfn = l2pfn + domx86->pg_l2;
-    l4_pgentry_64_t *l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1);
-    l3_pgentry_64_t *l3tab = NULL;
-    l2_pgentry_64_t *l2tab = NULL;
-    l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off = 0, l2off = 0, l1off;
+    struct xc_dom_x86_mapping *map;
+    xen_pfn_t l4pfn, l3pfn, l2pfn, l1pfn;
+    l4_pgentry_64_t *l4tab, *l3tab, *l2tab, *l1tab;
+    uint64_t l4off, l3off, l2off, l1off;
     uint64_t addr;
+    unsigned mapping;
     xen_pfn_t pgpfn;
 
+    l4pfn = domx86->maps[0].lvls[3].pfn;
+    l4tab = xc_dom_pfn_to_ptr(dom, l4pfn, 1);
     if ( l4tab == NULL )
         goto pfn_error;
 
-    for ( addr = dom->parms.virt_base; addr < dom->virt_pgtab_end;
-          addr += PAGE_SIZE_X86 )
+    for ( mapping = 0; mapping < domx86->n_mappings; mapping++ )
     {
-        if ( l3tab == NULL )
+        map = domx86->maps + mapping;
+        l3tab = NULL;
+        l2tab = NULL;
+        l1tab = NULL;
+        l3off = 0;
+        l2off = 0;
+
+        for ( addr = map->area.from; addr < map->area.to;
+              addr += PAGE_SIZE_X86 )
         {
-            /* get L3 tab, make L4 entry */
-            l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
             if ( l3tab == NULL )
-                goto pfn_error;
-            l4off = l4_table_offset_x86_64(addr);
-            l4tab[l4off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l3pfn)) | L4_PROT;
-            l3pfn++;
-        }
+            {
+                /* get L3 tab, make L4 entry */
+                l4off = l4_table_offset_x86_64(addr);
+                pgpfn = (addr - map->lvls[2].from) >> L4_PAGETABLE_SHIFT_X86_64;
+                l3pfn = l4tab[l4off] ? pgentry_to_pfn(l4tab[l4off]) :
+                        map->lvls[2].pfn + pgpfn;
+                l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
+                if ( l3tab == NULL )
+                    goto pfn_error;
+                l4tab[l4off] = pfn_to_paddr(xc_dom_p2m(dom, l3pfn)) | L4_PROT;
+            }
 
-        if ( l2tab == NULL )
-        {
-            /* get L2 tab, make L3 entry */
-            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
             if ( l2tab == NULL )
-                goto pfn_error;
-            l3off = l3_table_offset_x86_64(addr);
-            l3tab[l3off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
-            l2pfn++;
-        }
+            {
+                /* get L2 tab, make L3 entry */
+                l3off = l3_table_offset_x86_64(addr);
+                pgpfn = (addr - map->lvls[1].from) >> L3_PAGETABLE_SHIFT_X86_64;
+                l2pfn = l3tab[l3off] ? pgentry_to_pfn(l3tab[l3off]) :
+                        map->lvls[1].pfn + pgpfn;
+                l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
+                if ( l2tab == NULL )
+                    goto pfn_error;
+                l3tab[l3off] = pfn_to_paddr(xc_dom_p2m(dom, l2pfn)) | L3_PROT;
+            }
 
-        if ( l1tab == NULL )
-        {
-            /* get L1 tab, make L2 entry */
-            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
             if ( l1tab == NULL )
-                goto pfn_error;
-            l2off = l2_table_offset_x86_64(addr);
-            l2tab[l2off] =
-                pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
-            l1pfn++;
-        }
+            {
+                /* get L1 tab, make L2 entry */
+                l2off = l2_table_offset_x86_64(addr);
+                pgpfn = (addr - map->lvls[0].from) >> L2_PAGETABLE_SHIFT_X86_64;
+                l1pfn = l2tab[l2off] ? pgentry_to_pfn(l2tab[l2off]) :
+                        map->lvls[0].pfn + pgpfn;
+                l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
+                if ( l1tab == NULL )
+                    goto pfn_error;
+                l2tab[l2off] = pfn_to_paddr(xc_dom_p2m(dom, l1pfn)) | L2_PROT;
+            }
 
-        /* make L1 entry */
-        l1off = l1_table_offset_x86_64(addr);
-        pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86;
-        l1tab[l1off] =
-            pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
-        if ( (!dom->pvh_enabled)                &&
-             (addr >= dom->pgtables_seg.vstart) &&
-             (addr < dom->pgtables_seg.vend) )
-            l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
-
-        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
-        {
-            l1tab = NULL;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+            /* make L1 entry */
+            l1off = l1_table_offset_x86_64(addr);
+            pgpfn = ((addr - map->area.from) >> PAGE_SHIFT_X86) + map->area.pfn;
+            l1tab[l1off] = pfn_to_paddr(xc_dom_p2m(dom, pgpfn)) | L1_PROT;
+            if ( (!dom->pvh_enabled)            &&
+                 (pgpfn >= map->lvls[3].pfn)    &&
+                 (pgpfn < map->lvls[3].pfn + map->area.pgtables) )
+                l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
+            if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
             {
-                l2tab = NULL;
-                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                    l3tab = NULL;
+                l1tab = NULL;
+                if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+                {
+                    l2tab = NULL;
+                    if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                        l3tab = NULL;
+                }
             }
         }
     }
@@ -659,7 +735,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = domx86->pgtables;
+    start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
@@ -706,7 +782,7 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->nr_pages = dom->total_pages;
     start_info->shared_info = shinfo << PAGE_SHIFT_X86;
     start_info->pt_base = dom->pgtables_seg.vstart;
-    start_info->nr_pt_frames = domx86->pgtables;
+    start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
 
     start_info->flags = dom->flags;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (7 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 8/9] libxc: rework of domain builder's page table handler Juergen Gross
@ 2015-10-13 13:11 ` Juergen Gross
  2015-10-29 13:07   ` Wei Liu
  2015-10-26 11:15 ` [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
  9 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-13 13:11 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting a p2m
list outside the initial kernel mapping by specifying INIT_P2M, let
the domain builder allocate the memory for the p2m list from physical
guest memory only and map it to the address the kernel is expecting.

This will enable loading pv-domUs larger than 512 GB.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_core.c    | 15 ++++++++++-
 tools/libxc/xc_dom_x86.c     | 59 ++++++++++++++++++++++++++++++++++++++------
 3 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 79d830f..9c000ca 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -239,6 +239,7 @@ struct xc_dom_arch {
     char *native_protocol;
     int page_shift;
     int sizeof_pfn;
+    int p2m_base_supported;
     int arch_private_size;
 
     struct xc_dom_arch *next;
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 18b98c9..cfee598 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -777,6 +777,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
     dom->parms.virt_hypercall = UNSET_ADDR;
     dom->parms.virt_hv_start_low = UNSET_ADDR;
     dom->parms.elf_paddr_offset = UNSET_ADDR;
+    dom->parms.p2m_base = UNSET_ADDR;
 
     dom->alloc_malloc += sizeof(*dom);
     return dom;
@@ -1096,7 +1097,11 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
-    if ( dom->arch_hooks->alloc_p2m_list &&
+    if ( !dom->arch_hooks->p2m_base_supported ||
+         dom->parms.p2m_base >= dom->parms.virt_base ||
+         (dom->parms.p2m_base & (XC_DOM_PAGE_SIZE(dom) - 1)) )
+        dom->parms.p2m_base = UNSET_ADDR;
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base == UNSET_ADDR &&
          dom->arch_hooks->alloc_p2m_list(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
@@ -1124,6 +1129,14 @@ int xc_dom_build_image(struct xc_dom_image *dom)
         dom->initrd_len = page_size * dom->ramdisk_seg.pages;
     }
 
+    /* Allocate p2m list if outside of initial kernel mapping. */
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR )
+    {
+        if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+            goto err;
+        dom->p2m_seg.vstart = dom->parms.p2m_base;
+    }
+
     return 0;
 
  err:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 333ef6b..0847761 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -68,6 +68,8 @@
 #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
+#define round_pg(addr)    (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))
+#define round_pfn(addr)   (((addr) + PAGE_SIZE_X86 - 1) / PAGE_SIZE_X86)
 
 struct xc_dom_x86_mapping_lvl {
     xen_vaddr_t from;
@@ -84,7 +86,7 @@ struct xc_dom_x86_mapping {
 
 struct xc_dom_image_x86 {
     unsigned n_mappings;
-#define MAPPING_MAX 1
+#define MAPPING_MAX 2
     struct xc_dom_x86_mapping maps[MAPPING_MAX];
 };
 
@@ -536,6 +538,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
             }
         }
     }
+
     return 0;
 
 pfn_error:
@@ -551,11 +554,8 @@ pfn_error:
 
 /* ------------------------------------------------------------------------ */
 
-static int alloc_p2m_list(struct xc_dom_image *dom)
+static int alloc_p2m_list(struct xc_dom_image *dom, size_t p2m_alloc_size)
 {
-    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
-
-    /* allocate phys2mach table */
     if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",
                               0, p2m_alloc_size) )
         return -1;
@@ -566,6 +566,41 @@ static int alloc_p2m_list(struct xc_dom_image *dom)
     return 0;
 }
 
+static int alloc_p2m_list_x86_32(struct xc_dom_image *dom)
+{
+    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+
+    p2m_alloc_size = round_pg(p2m_alloc_size);
+    return alloc_p2m_list(dom, p2m_alloc_size);
+}
+
+static int alloc_p2m_list_x86_64(struct xc_dom_image *dom)
+{
+    struct xc_dom_image_x86 *domx86 = dom->arch_private;
+    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
+    size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+    xen_vaddr_t from, to;
+    unsigned lvl;
+
+    p2m_alloc_size = round_pg(p2m_alloc_size);
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        from = dom->parms.p2m_base;
+        to = from + p2m_alloc_size - 1;
+        if ( count_pgtables(dom, 0, pgtblshift_x86_64, from, to,
+                            dom->pfn_alloc_end) )
+            return -1;
+
+        map->area.pfn = dom->pfn_alloc_end;
+        for ( lvl = 0; lvl < 4; lvl++ )
+            map->lvls[lvl].pfn += p2m_alloc_size >> PAGE_SHIFT_X86;
+        domx86->n_mappings++;
+        p2m_alloc_size += map->area.pgtables << PAGE_SHIFT_X86;
+    }
+
+    return alloc_p2m_list(dom, p2m_alloc_size);
+}
+
 /* ------------------------------------------------------------------------ */
 
 static int alloc_magic_pages(struct xc_dom_image *dom)
@@ -784,6 +819,11 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->pt_base = dom->pgtables_seg.vstart;
     start_info->nr_pt_frames = domx86->maps[0].area.pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        start_info->first_p2m_pfn = dom->p2m_seg.pfn;
+        start_info->nr_p2m_frames = dom->p2m_seg.pages;
+    }
 
     start_info->flags = dom->flags;
     start_info->store_mfn = xc_dom_p2m(dom, dom->xenstore_pfn);
@@ -1671,7 +1711,10 @@ static int bootlate_pv(struct xc_dom_image *dom)
     if ( !xc_dom_feature_translated(dom) )
     {
         /* paravirtualized guest */
+
+        /* Drop references to all initial page tables before pinning. */
         xc_dom_unmap_one(dom, dom->pgtables_seg.pfn);
+        xc_dom_unmap_one(dom, dom->p2m_seg.pfn);
         rc = pin_table(dom->xch, pgd_type,
                        xc_dom_p2m(dom, dom->pgtables_seg.pfn),
                        dom->guest_domid);
@@ -1750,10 +1793,11 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_32,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
+    .p2m_base_supported = 0,
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_32_pae,
-    .alloc_p2m_list = alloc_p2m_list,
+    .alloc_p2m_list = alloc_p2m_list_x86_32,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
     .shared_info = shared_info_x86_32,
@@ -1768,10 +1812,11 @@ static struct xc_dom_arch xc_dom_64 = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_64,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
+    .p2m_base_supported = 1,
     .arch_private_size = sizeof(struct xc_dom_image_x86),
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_pgtables = alloc_pgtables_x86_64,
-    .alloc_p2m_list = alloc_p2m_list,
+    .alloc_p2m_list = alloc_p2m_list_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
     .shared_info = shared_info_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/9] libxc: support building large pv-domains
  2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
                   ` (8 preceding siblings ...)
  2015-10-13 13:11 ` [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
@ 2015-10-26 11:15 ` Juergen Gross
  9 siblings, 0 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-26 11:15 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, roger.pau

Ping?

On 10/13/2015 03:11 PM, Juergen Gross wrote:
> The Xen hypervisor supports starting a dom0 with large memory (up to
> the TB range) by not including the initrd and p2m list in the initial
> kernel mapping. Especially the p2m list can grow larger than the
> available virtual space in the initial mapping.
>
> The started kernel is indicating the support of each feature via
> elf notes.
>
> This series enables the domain builder in libxc to do the same as the
> hypervisor. This enables starting of huge pv-domUs via xl.
>
> Unmapped initrd is supported for 64 and 32 bit domains, omitting the
> p2m from initial kernel mapping is possible for 64 bit domains only.
>
> Tested with:
> - 32 bit domU (kernel not supporting unmapped initrd)
> - 32 bit domU (kernel supporting unmapped initrd)
> - 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
> - 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> - 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)
> - HVM domU
>
> Changes in v3:
> - Rebased the complete series to new staging (hvm builder patches by
>    Roger Pau Monne)
> - Removed old patch 1 as it broke stubdom build
> - Introduced new Patch 1 to make allocation of guest memory more clear
>    regarding virtual/physical memory allocation (requested by Ian Campbell)
> - Change name of flag to indicate support of unmapped initrd in patch 2
>    (requested by Ian Campbell)
> - Introduce new patches 3, 4, 5 ("rename domain builder count_pgtables to
>    alloc_pgtables", "introduce domain builder architecture specific data",
>    "use domain builder architecture private data for x86 pv domains") to
>    assist later page table work
> - don't fiddle with initrd virtual address in patch 6 (was patch 3 in v2),
>    add explicit initrd parameters for start_info in struct xc_dom_image
>    instead (requested by Ian Campbell)
> - Introduce new patch 8 ("rework of domain builder's page table handler")
>    to be able to use common helpers for unmapped p2m list (requested by
>    Ian Campbell)
> - use now known segment size in pages for p2m list in patch 9 (was patch
>    5 in v2) instead of fiddling with segment end address (requested by
>    Ian Campbell)
> - split alloc_p2m_list() in patch 9 (was patch 5 in v2) to 32/64 bit
>    variants (requested by Ian Campbell)
>
> Changes in v2:
> - patch 2 has been removed as it has been applied already
> - introduced new patch 2 as suggested by Ian Campbell: add a flag
>    indicating support of an unmapped initrd to the parsed elf data of
>    the elf_dom_parms structure
> - updated patch description of patch 3 as requested by Ian Campbell
>
>
> Juergen Gross (9):
>    libxc: reorganize domain builder guest memory allocator
>    xen: add generic flag to elf_dom_parms indicating support of unmapped
>      initrd
>    libxc: rename domain builder count_pgtables to alloc_pgtables
>    libxc: introduce domain builder architecture specific data
>    libxc: use domain builder architecture private data for x86 pv domains
>    libxc: create unmapped initrd in domain builder if supported
>    libxc: split p2m allocation in domain builder from other magic pages
>    libxc: rework of domain builder's page table handler
>    libxc: create p2m list outside of kernel mapping if supported
>
>   stubdom/grub/kexec.c               |  12 +-
>   tools/libxc/include/xc_dom.h       |  32 +--
>   tools/libxc/xc_dom_arm.c           |   6 +-
>   tools/libxc/xc_dom_core.c          | 174 +++++++++----
>   tools/libxc/xc_dom_x86.c           | 496 ++++++++++++++++++++++++-------------
>   xen/arch/x86/domain_build.c        |   4 +-
>   xen/common/libelf/libelf-dominfo.c |   3 +
>   xen/include/xen/libelf.h           |   1 +
>   8 files changed, 480 insertions(+), 248 deletions(-)
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator
  2015-10-13 13:11 ` [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
@ 2015-10-28 15:32   ` Wei Liu
  2015-10-28 15:51     ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-28 15:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:10PM +0200, Juergen Gross wrote:
> Guest memory allocation in the domain builder of libxc is done via
> virtual addresses only. In order to be able to support preallocated
> areas not virtually mapped reorganize the memory allocator to keep
> track of allocated pages globally and in allocated segments.
> 
> This requires an interface change of the allocate callback of the
> domain builder which currently is using the last mapped virtual
> address as a parameter. This is no problem as the only user of this
> callback is stubdom/grub/kexec.c using this virtual address to
> calculate the last used pfn.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  stubdom/grub/kexec.c         |   6 +--
>  tools/libxc/include/xc_dom.h |  11 +++--
>  tools/libxc/xc_dom_core.c    | 101 +++++++++++++++++++++++++++++--------------
>  3 files changed, 75 insertions(+), 43 deletions(-)
> 
> diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
> index 0b2f4f3..2300318 100644
> --- a/stubdom/grub/kexec.c
> +++ b/stubdom/grub/kexec.c
> @@ -100,9 +100,9 @@ static void do_exchange(struct xc_dom_image *dom, xen_pfn_t target_pfn, xen_pfn_
>      dom->p2m_host[target_pfn] = source_mfn;
>  }
>  
> -int kexec_allocate(struct xc_dom_image *dom, xen_vaddr_t up_to)
> +int kexec_allocate(struct xc_dom_image *dom)
>  {
> -    unsigned long new_allocated = (up_to - dom->parms.virt_base) / PAGE_SIZE;
> +    unsigned long new_allocated = dom->pfn_alloc_end - dom->rambase_pfn;
>      unsigned long i;
>  
>      pages = realloc(pages, new_allocated * sizeof(*pages));
> @@ -319,8 +319,6 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
>  
>      /* Make sure the bootstrap page table does not RW-map any of our current
>       * page table frames */
> -    kexec_allocate(dom, dom->virt_pgtab_end);
> -

Does this mean pvgrub is now able to use this shiny new feature?

>      if ( (rc = xc_dom_update_guest_p2m(dom))) {
>          grub_printf("xc_dom_update_guest_p2m returned %d\n", rc);
>          errnum = ERR_BOOT_FAILURE;
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index e52b023..878dc52 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -29,6 +29,7 @@ struct xc_dom_seg {
>      xen_vaddr_t vstart;
>      xen_vaddr_t vend;
>      xen_pfn_t pfn;
> +    xen_pfn_t pages;
>  };
>  
>  struct xc_dom_mem {
> @@ -90,6 +91,7 @@ struct xc_dom_image {
>      xen_pfn_t xenstore_pfn;
>      xen_pfn_t shared_info_pfn;
>      xen_pfn_t bootstack_pfn;
> +    xen_pfn_t pfn_alloc_end;
>      xen_vaddr_t virt_alloc_end;
>      xen_vaddr_t bsd_symtab_start;
>  
> @@ -177,7 +179,7 @@ struct xc_dom_image {
>      /* kernel loader */
>      struct xc_dom_arch *arch_hooks;
>      /* allocate up to virt_alloc_end */

I think you need to update this comment, too.

> -    int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
> +    int (*allocate) (struct xc_dom_image * dom);
>  
>      /* Container type (HVM or PV). */
>      enum {
> @@ -361,14 +363,11 @@ static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom,
>                                        struct xc_dom_seg *seg,
>                                        xen_pfn_t *pages_out)
>  {
> -    xen_vaddr_t segsize = seg->vend - seg->vstart;
> -    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
> -    xen_pfn_t pages = (segsize + page_size - 1) / page_size;
>      void *retval;
>  
> -    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, pages);
> +    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, seg->pages);
>  
> -    *pages_out = retval ? pages : 0;
> +    *pages_out = retval ? seg->pages : 0;
>      return retval;
>  }
>  
> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
> index fbe4464..b1d7890 100644
> --- a/tools/libxc/xc_dom_core.c
> +++ b/tools/libxc/xc_dom_core.c
> @@ -535,56 +535,75 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, xen_pfn_t pfn,
>      return phys->ptr;
>  }
>  
> -int xc_dom_alloc_segment(struct xc_dom_image *dom,
> -                         struct xc_dom_seg *seg, char *name,
> -                         xen_vaddr_t start, xen_vaddr_t size)
> +static int xc_dom_chk_alloc_pages(struct xc_dom_image *dom, char *name,
> +                                  xen_pfn_t pages)
>  {
>      unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
> -    xen_pfn_t pages = (size + page_size - 1) / page_size;
> -    xen_pfn_t pfn;
> -    void *ptr;
>  
> -    if ( start == 0 )
> -        start = dom->virt_alloc_end;
> +    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
> +         dom->pfn_alloc_end - dom->rambase_pfn > dom->total_pages ||
> +         pages > dom->total_pages - dom->pfn_alloc_end + dom->rambase_pfn )
> +    {
> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
> +                     "%s: segment %s too large (0x%"PRIpfn" > "
> +                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)", __FUNCTION__, name,
> +                     pages, dom->total_pages,
> +                     dom->pfn_alloc_end - dom->rambase_pfn);
> +        return -1;
> +    }
> +
> +    dom->pfn_alloc_end += pages;
> +    dom->virt_alloc_end += pages * page_size;
>  
> -    if ( start & (page_size - 1) )
> +    return 0;
> +}
> +
> +static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t end)
> +{
> +    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
> +    xen_pfn_t pages;
> +
> +    if ( end & (page_size - 1) )
>      {
>          xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>                       "%s: segment start isn't page aligned (0x%" PRIx64 ")",

"segment end"?

> -                     __FUNCTION__, start);
> +                     __FUNCTION__, end);
>          return -1;
>      }
> -    if ( start < dom->virt_alloc_end )
> +    if ( end < dom->virt_alloc_end )
>      {
>          xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>                       "%s: segment start too low (0x%" PRIx64 " < 0x%" PRIx64

Ditto.

A major part of this patch looks like refactoring to me. And to the
best of my knowledge it seems to be doing the right thing.

Wei.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-13 13:11 ` [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-10-28 15:33   ` Wei Liu
  2015-10-28 15:49   ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-10-28 15:33 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Andrew Cooper,
	ian.jackson, xen-devel, Jan Beulich, keir, roger.pau

CC HV maintainers

On Tue, Oct 13, 2015 at 03:11:11PM +0200, Juergen Gross wrote:
> Support of an unmapped initrd is indicated by the kernel of the domain
> via elf notes. In order not to have to use raw elf data in the tools
> for support of an unmapped initrd add a flag to the parsed data area
> to indicate the kernel supporting this feature.
> 
> Switch using this flag in the hypervisor domain builder.
> 
> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/arch/x86/domain_build.c        | 4 ++--
>  xen/common/libelf/libelf-dominfo.c | 3 +++
>  xen/include/xen/libelf.h           | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index c2ef87a..d02dc4b 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
>  
>          vstart = parms->virt_base;
>          vend = round_pgup(parms->virt_kend);
> -        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> +        if ( !parms->unmapped_initrd )
>              vend += round_pgup(initrd_len);
>          end = vend + nr_pages * sizeof_long;
>  
> @@ -1037,7 +1037,7 @@ int __init construct_dom0(
>      v_start          = parms.virt_base;
>      vkern_start      = parms.virt_kstart;
>      vkern_end        = parms.virt_kend;
> -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> +    if ( parms.unmapped_initrd )
>      {
>          vinitrd_start  = vinitrd_end = 0;
>          vphysmap_start = round_pgup(vkern_end);
> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
> index 3de1c23..c9243e4 100644
> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>      case XEN_ELFNOTE_INIT_P2M:
>          parms->p2m_base = val;
>          break;
> +    case XEN_ELFNOTE_MOD_START_PFN:
> +        parms->unmapped_initrd = !!val;
> +        break;
>      case XEN_ELFNOTE_PADDR_OFFSET:
>          parms->elf_paddr_offset = val;
>          break;
> diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
> index de788c7..6da4cc0 100644
> --- a/xen/include/xen/libelf.h
> +++ b/xen/include/xen/libelf.h
> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>      char loader[16];
>      enum xen_pae_type pae;
>      bool bsd_symtab;
> +    bool unmapped_initrd;
>      uint64_t virt_base;
>      uint64_t virt_entry;
>      uint64_t virt_hypercall;
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables
  2015-10-13 13:11 ` [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
@ 2015-10-28 15:34   ` Wei Liu
  0 siblings, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-10-28 15:34 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:12PM +0200, Juergen Gross wrote:
> Rename the count_pgtables hook of the domain builder to alloc_pgtables
> and do the allocation of the guest memory for page tables inside this
> hook. This will remove the need for accessing the x86 specific pgtables
> member of struct xc_dom_image in the generic domain builder code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

The code looks correct to me.

With the understanding this refactoring is in preparation for later
patch.

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/9] libxc: introduce domain builder architecture specific data
  2015-10-13 13:11 ` [PATCH v3 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
@ 2015-10-28 15:37   ` Wei Liu
  0 siblings, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-10-28 15:37 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:13PM +0200, Juergen Gross wrote:
> Reorganize struct xc_dom_image to contain a pointer to domain builder
> architecture specific private data. This will abstract the architecture
> or domain type specific data from the general used data.
> 
> The new area is allocated as soon as the domain type is known.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

The code looks correct to me, so:

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains
  2015-10-13 13:11 ` [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
@ 2015-10-28 15:38   ` Wei Liu
  0 siblings, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-10-28 15:38 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:14PM +0200, Juergen Gross wrote:
> Move some data private to the x86 domain builder to the private data
> section. Remove extra_pages as they are used nowhere.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-13 13:11 ` [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
  2015-10-28 15:33   ` Wei Liu
@ 2015-10-28 15:49   ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2015-10-28 15:49 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

>>> On 13.10.15 at 15:11, <JGross@suse.com> wrote:
> Support of an unmapped initrd is indicated by the kernel of the domain
> via elf notes. In order not to have to use raw elf data in the tools
> for support of an unmapped initrd add a flag to the parsed data area
> to indicate the kernel supporting this feature.
> 
> Switch using this flag in the hypervisor domain builder.
> 
> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator
  2015-10-28 15:32   ` Wei Liu
@ 2015-10-28 15:51     ` Juergen Gross
  2015-10-28 16:21       ` Wei Liu
  0 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-28 15:51 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/28/2015 04:32 PM, Wei Liu wrote:
> On Tue, Oct 13, 2015 at 03:11:10PM +0200, Juergen Gross wrote:
>> Guest memory allocation in the domain builder of libxc is done via
>> virtual addresses only. In order to be able to support preallocated
>> areas not virtually mapped reorganize the memory allocator to keep
>> track of allocated pages globally and in allocated segments.
>>
>> This requires an interface change of the allocate callback of the
>> domain builder which currently is using the last mapped virtual
>> address as a parameter. This is no problem as the only user of this
>> callback is stubdom/grub/kexec.c using this virtual address to
>> calculate the last used pfn.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   stubdom/grub/kexec.c         |   6 +--
>>   tools/libxc/include/xc_dom.h |  11 +++--
>>   tools/libxc/xc_dom_core.c    | 101 +++++++++++++++++++++++++++++--------------
>>   3 files changed, 75 insertions(+), 43 deletions(-)
>>
>> diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
>> index 0b2f4f3..2300318 100644
>> --- a/stubdom/grub/kexec.c
>> +++ b/stubdom/grub/kexec.c
>> @@ -100,9 +100,9 @@ static void do_exchange(struct xc_dom_image *dom, xen_pfn_t target_pfn, xen_pfn_
>>       dom->p2m_host[target_pfn] = source_mfn;
>>   }
>>
>> -int kexec_allocate(struct xc_dom_image *dom, xen_vaddr_t up_to)
>> +int kexec_allocate(struct xc_dom_image *dom)
>>   {
>> -    unsigned long new_allocated = (up_to - dom->parms.virt_base) / PAGE_SIZE;
>> +    unsigned long new_allocated = dom->pfn_alloc_end - dom->rambase_pfn;
>>       unsigned long i;
>>
>>       pages = realloc(pages, new_allocated * sizeof(*pages));
>> @@ -319,8 +319,6 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
>>
>>       /* Make sure the bootstrap page table does not RW-map any of our current
>>        * page table frames */
>> -    kexec_allocate(dom, dom->virt_pgtab_end);
>> -
>
> Does this mean pvgrub is now able to use this shiny new feature?

That's the plan. I have to admit I didn't test this. And don't mix this
up with grub-xen (I'm just working on that beast).

>
>>       if ( (rc = xc_dom_update_guest_p2m(dom))) {
>>           grub_printf("xc_dom_update_guest_p2m returned %d\n", rc);
>>           errnum = ERR_BOOT_FAILURE;
>> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
>> index e52b023..878dc52 100644
>> --- a/tools/libxc/include/xc_dom.h
>> +++ b/tools/libxc/include/xc_dom.h
>> @@ -29,6 +29,7 @@ struct xc_dom_seg {
>>       xen_vaddr_t vstart;
>>       xen_vaddr_t vend;
>>       xen_pfn_t pfn;
>> +    xen_pfn_t pages;
>>   };
>>
>>   struct xc_dom_mem {
>> @@ -90,6 +91,7 @@ struct xc_dom_image {
>>       xen_pfn_t xenstore_pfn;
>>       xen_pfn_t shared_info_pfn;
>>       xen_pfn_t bootstack_pfn;
>> +    xen_pfn_t pfn_alloc_end;
>>       xen_vaddr_t virt_alloc_end;
>>       xen_vaddr_t bsd_symtab_start;
>>
>> @@ -177,7 +179,7 @@ struct xc_dom_image {
>>       /* kernel loader */
>>       struct xc_dom_arch *arch_hooks;
>>       /* allocate up to virt_alloc_end */
>
> I think you need to update this comment, too.

Hmm, yes.

>
>> -    int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
>> +    int (*allocate) (struct xc_dom_image * dom);
>>
>>       /* Container type (HVM or PV). */
>>       enum {
>> @@ -361,14 +363,11 @@ static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom,
>>                                         struct xc_dom_seg *seg,
>>                                         xen_pfn_t *pages_out)
>>   {
>> -    xen_vaddr_t segsize = seg->vend - seg->vstart;
>> -    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
>> -    xen_pfn_t pages = (segsize + page_size - 1) / page_size;
>>       void *retval;
>>
>> -    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, pages);
>> +    retval = xc_dom_pfn_to_ptr(dom, seg->pfn, seg->pages);
>>
>> -    *pages_out = retval ? pages : 0;
>> +    *pages_out = retval ? seg->pages : 0;
>>       return retval;
>>   }
>>
>> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
>> index fbe4464..b1d7890 100644
>> --- a/tools/libxc/xc_dom_core.c
>> +++ b/tools/libxc/xc_dom_core.c
>> @@ -535,56 +535,75 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, xen_pfn_t pfn,
>>       return phys->ptr;
>>   }
>>
>> -int xc_dom_alloc_segment(struct xc_dom_image *dom,
>> -                         struct xc_dom_seg *seg, char *name,
>> -                         xen_vaddr_t start, xen_vaddr_t size)
>> +static int xc_dom_chk_alloc_pages(struct xc_dom_image *dom, char *name,
>> +                                  xen_pfn_t pages)
>>   {
>>       unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
>> -    xen_pfn_t pages = (size + page_size - 1) / page_size;
>> -    xen_pfn_t pfn;
>> -    void *ptr;
>>
>> -    if ( start == 0 )
>> -        start = dom->virt_alloc_end;
>> +    if ( pages > dom->total_pages || /* multiple test avoids overflow probs */
>> +         dom->pfn_alloc_end - dom->rambase_pfn > dom->total_pages ||
>> +         pages > dom->total_pages - dom->pfn_alloc_end + dom->rambase_pfn )
>> +    {
>> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
>> +                     "%s: segment %s too large (0x%"PRIpfn" > "
>> +                     "0x%"PRIpfn" - 0x%"PRIpfn" pages)", __FUNCTION__, name,
>> +                     pages, dom->total_pages,
>> +                     dom->pfn_alloc_end - dom->rambase_pfn);
>> +        return -1;
>> +    }
>> +
>> +    dom->pfn_alloc_end += pages;
>> +    dom->virt_alloc_end += pages * page_size;
>>
>> -    if ( start & (page_size - 1) )
>> +    return 0;
>> +}
>> +
>> +static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t end)
>> +{
>> +    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
>> +    xen_pfn_t pages;
>> +
>> +    if ( end & (page_size - 1) )
>>       {
>>           xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>>                        "%s: segment start isn't page aligned (0x%" PRIx64 ")",
>
> "segment end"?

No. This function is called to add a padding allocation before the start
of a new segment, which has to start at page boundary. Maybe a comment
should clarify this. :-)

>
>> -                     __FUNCTION__, start);
>> +                     __FUNCTION__, end);
>>           return -1;
>>       }
>> -    if ( start < dom->virt_alloc_end )
>> +    if ( end < dom->virt_alloc_end )
>>       {
>>           xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>>                        "%s: segment start too low (0x%" PRIx64 " < 0x%" PRIx64
>
> Ditto.

Ditto. ;-)

>
> A major part of this patch looks like refactoring to me. And to the
> best of my knowledge it seems to be doing the right thing.

Thanks.


Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported
  2015-10-13 13:11 ` [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
@ 2015-10-28 16:11   ` Wei Liu
  2015-10-28 17:07     ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-28 16:11 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:15PM +0200, Juergen Gross wrote:
> In case the kernel of a new pv-domU indicates it is supporting an
> unmapped initrd, don't waste precious virtual space for the initrd,
> but allocate only guest physical memory for it.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/include/xc_dom.h |  5 +++++
>  tools/libxc/xc_dom_core.c    | 19 +++++++++++++++++--
>  tools/libxc/xc_dom_x86.c     |  8 ++++----
>  3 files changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index b0120a6..fa772a9 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -94,6 +94,11 @@ struct xc_dom_image {
>      xen_pfn_t pfn_alloc_end;
>      xen_vaddr_t virt_alloc_end;
>      xen_vaddr_t bsd_symtab_start;
> +
> +    /* initrd parameters as specified in start_info page */
> +    unsigned long initrd_start;
> +    unsigned long initrd_len;
> +
>      unsigned int alloc_bootstack;
>      xen_vaddr_t virt_pgtab_end;
>  
> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
> index 8e1e17f..15e9fa3 100644
> --- a/tools/libxc/xc_dom_core.c
> +++ b/tools/libxc/xc_dom_core.c
> @@ -1041,6 +1041,7 @@ static int xc_dom_build_ramdisk(struct xc_dom_image *dom)
>  int xc_dom_build_image(struct xc_dom_image *dom)
>  {
>      unsigned int page_size;
> +    bool unmapped_initrd;
>  
>      DOMPRINTF_CALLED(dom->xch);
>  
> @@ -1064,11 +1065,15 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>      if ( dom->kernel_loader->loader(dom) != 0 )
>          goto err;
>  
> -    /* load ramdisk */
> -    if ( dom->ramdisk_blob )
> +    /* Load ramdisk if initial mapping required. */
> +    unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart;

A minor suggestion, the comment is describing the reverse logic of the
statement that immediately follows it, can you make the comment and
statement match?

I think I manage to work this out: if ramdisk_seg.vstart is set that
means upper layer wants the ramdisk to be mapped at that address (ARM is
doing that), so in that case even if kernel supports unmapped initrd we
should still map the ramdisk. Correct me if I'm wrong.

The rest of code looks correct. With that understand (and whether you
follow my minor suggestion or not):

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages
  2015-10-13 13:11 ` [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
@ 2015-10-28 16:11   ` Wei Liu
  0 siblings, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-10-28 16:11 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:16PM +0200, Juergen Gross wrote:
> Carve out the p2m list allocation from the .alloc_magic_pages hook of
> the domain builder in order to prepare allocating the p2m list outside
> of the initial kernel mapping. This will be needed to support loading
> domains with huge memory (>512 GB).
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator
  2015-10-28 15:51     ` Juergen Gross
@ 2015-10-28 16:21       ` Wei Liu
  2015-10-28 17:05         ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-28 16:21 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Wei Liu, Ian.Campbell, stefano.stabellini, ian.jackson, xen-devel,
	roger.pau

On Wed, Oct 28, 2015 at 04:51:05PM +0100, Juergen Gross wrote:
[...]
> >>+static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t end)
> >>+{
> >>+    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
> >>+    xen_pfn_t pages;
> >>+
> >>+    if ( end & (page_size - 1) )
> >>      {
> >>          xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> >>                       "%s: segment start isn't page aligned (0x%" PRIx64 ")",
> >
> >"segment end"?
> 
> No. This function is called to add a padding allocation before the start
> of a new segment, which has to start at page boundary. Maybe a comment
> should clarify this. :-)
> 

Heh, I worked out this function was used to add padding segment but
was confused by the dissonance.

A simpler solution might be just change "end" to "start"? I think it is
sensible because it is the start of the padding.

In any case, it's up to you. :-)

Wei.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator
  2015-10-28 16:21       ` Wei Liu
@ 2015-10-28 17:05         ` Juergen Gross
  0 siblings, 0 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-28 17:05 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/28/2015 05:21 PM, Wei Liu wrote:
> On Wed, Oct 28, 2015 at 04:51:05PM +0100, Juergen Gross wrote:
> [...]
>>>> +static int xc_dom_alloc_pad(struct xc_dom_image *dom, xen_vaddr_t end)
>>>> +{
>>>> +    unsigned int page_size = XC_DOM_PAGE_SIZE(dom);
>>>> +    xen_pfn_t pages;
>>>> +
>>>> +    if ( end & (page_size - 1) )
>>>>       {
>>>>           xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>>>>                        "%s: segment start isn't page aligned (0x%" PRIx64 ")",
>>>
>>> "segment end"?
>>
>> No. This function is called to add a padding allocation before the start
>> of a new segment, which has to start at page boundary. Maybe a comment
>> should clarify this. :-)
>>
>
> Heh, I worked out this function was used to add padding segment but
> was confused by the dissonance.
>
> A simpler solution might be just change "end" to "start"? I think it is
> sensible because it is the start of the padding.

Hmm, or use "boundary" for the variable and in the message. :-)


Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported
  2015-10-28 16:11   ` Wei Liu
@ 2015-10-28 17:07     ` Juergen Gross
  0 siblings, 0 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-28 17:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/28/2015 05:11 PM, Wei Liu wrote:
> On Tue, Oct 13, 2015 at 03:11:15PM +0200, Juergen Gross wrote:
>> In case the kernel of a new pv-domU indicates it is supporting an
>> unmapped initrd, don't waste precious virtual space for the initrd,
>> but allocate only guest physical memory for it.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/libxc/include/xc_dom.h |  5 +++++
>>   tools/libxc/xc_dom_core.c    | 19 +++++++++++++++++--
>>   tools/libxc/xc_dom_x86.c     |  8 ++++----
>>   3 files changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
>> index b0120a6..fa772a9 100644
>> --- a/tools/libxc/include/xc_dom.h
>> +++ b/tools/libxc/include/xc_dom.h
>> @@ -94,6 +94,11 @@ struct xc_dom_image {
>>       xen_pfn_t pfn_alloc_end;
>>       xen_vaddr_t virt_alloc_end;
>>       xen_vaddr_t bsd_symtab_start;
>> +
>> +    /* initrd parameters as specified in start_info page */
>> +    unsigned long initrd_start;
>> +    unsigned long initrd_len;
>> +
>>       unsigned int alloc_bootstack;
>>       xen_vaddr_t virt_pgtab_end;
>>
>> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
>> index 8e1e17f..15e9fa3 100644
>> --- a/tools/libxc/xc_dom_core.c
>> +++ b/tools/libxc/xc_dom_core.c
>> @@ -1041,6 +1041,7 @@ static int xc_dom_build_ramdisk(struct xc_dom_image *dom)
>>   int xc_dom_build_image(struct xc_dom_image *dom)
>>   {
>>       unsigned int page_size;
>> +    bool unmapped_initrd;
>>
>>       DOMPRINTF_CALLED(dom->xch);
>>
>> @@ -1064,11 +1065,15 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>>       if ( dom->kernel_loader->loader(dom) != 0 )
>>           goto err;
>>
>> -    /* load ramdisk */
>> -    if ( dom->ramdisk_blob )
>> +    /* Load ramdisk if initial mapping required. */
>> +    unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart;
>
> A minor suggestion, the comment is describing the reverse logic of the
> statement that immediately follows it, can you make the comment and
> statement match?

Sure.

> I think I manage to work this out: if ramdisk_seg.vstart is set that
> means upper layer wants the ramdisk to be mapped at that address (ARM is
> doing that), so in that case even if kernel supports unmapped initrd we
> should still map the ramdisk. Correct me if I'm wrong.

That's how it was meant to be.

> The rest of code looks correct. With that understand (and whether you
> follow my minor suggestion or not):
>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
>

Thanks,

Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-13 13:11 ` [PATCH v3 8/9] libxc: rework of domain builder's page table handler Juergen Gross
@ 2015-10-29 12:48   ` Wei Liu
  2015-10-29 13:18     ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-29 12:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
> In order to prepare a p2m list outside of the initial kernel mapping
> do a rework of the domain builder's page table handler. The goal is
> to be able to use common helpers for page table allocation and setup
> for initial kernel page tables and page tables mapping the p2m list.
> This is achieved by supporting multiple mapping areas. The mapped
> virtual addresses of the single areas must not overlap, while the
> page tables of a new area added might already be partially present.
> Especially the top level page table is existing only once, of course.
> 

Currently restrict the number of mappings to 1 because the only mapping
now is the initial mapping created by toolstack. There should not be
behaviour change and guest visible change introduced.

If my understanding is correct, can yo please add that to the commit
message?

Given this is a particularly thorny area, a second eye would be much
appreciated.

Some comments below.

> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/xc_dom_x86.c | 404 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 240 insertions(+), 164 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index c815e10..333ef6b 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -65,17 +65,27 @@
>  #define NR_IOREQ_SERVER_PAGES 8
>  #define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
>  
> -#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
> +#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)

Whitespace-only change here.

>  #define round_down(addr, mask)   ((addr) & ~(mask))
>  #define round_up(addr, mask)     ((addr) | (mask))
>  
> -struct xc_dom_image_x86 {
> -    /* initial page tables */
> +struct xc_dom_x86_mapping_lvl {
> +    xen_vaddr_t from;
> +    xen_vaddr_t to;
> +    xen_pfn_t pfn;
>      unsigned int pgtables;
> -    unsigned int pg_l4;
> -    unsigned int pg_l3;
> -    unsigned int pg_l2;
> -    unsigned int pg_l1;
> +};
> +
> +struct xc_dom_x86_mapping {
> +    struct xc_dom_x86_mapping_lvl area;
> +    struct xc_dom_x86_mapping_lvl lvls[4];
> +    xen_pfn_t pfn_start;

This is unused throughout the patch and the next patch. You can delete it.

> +};
> +
> +struct xc_dom_image_x86 {
> +    unsigned n_mappings;
> +#define MAPPING_MAX 1
> +    struct xc_dom_x86_mapping maps[MAPPING_MAX];
>  };
>  
>  /* get guest IO ABI protocol */
> @@ -105,43 +115,107 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
>      return protocol;
>  }
>  
> -static unsigned long
> -nr_page_tables(struct xc_dom_image *dom,
> -               xen_vaddr_t start, xen_vaddr_t end, unsigned long bits)
> +static void
> +nr_page_tables(struct xc_dom_image *dom, int lvl,
> +               xen_vaddr_t from, xen_vaddr_t to, unsigned long bits)
>  {
>      xen_vaddr_t mask = bits_to_mask(bits);
> -    int tables;
> +    struct xc_dom_image_x86 *domx86 = dom->arch_private;
> +    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
> +    struct xc_dom_x86_mapping *map_cmp;
> +    unsigned map_n;
>  
>      if ( bits == 0 )
> -        return 0;  /* unused */
> +        return;  /* unused */
> +
> +    if ( lvl == 3 && domx86->n_mappings != 0 )
> +        return;  /* Top level page table already in first mapping. */
>  
>      if ( bits == (8 * sizeof(unsigned long)) )
>      {
> -        /* must be pgd, need one */
> -        start = 0;
> -        end = -1;
> -        tables = 1;
> +        /* 32 bit top level page table special case */
> +        map->lvls[lvl].from = 0;
> +        map->lvls[lvl].to = -1;
> +        map->lvls[lvl].pgtables = 1;
> +        goto done;
>      }
> -    else
> +
> +    from = round_down(from, mask);
> +    to = round_up(to, mask);
> +
> +    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
> +          map_n++, map_cmp++ )
>      {
> -        start = round_down(start, mask);
> -        end = round_up(end, mask);
> -        tables = ((end - start) >> bits) + 1;
> +        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
> +            continue;
> +        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> +            return;  /* Area already completely covered on this level. */
> +        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
> +            from = map_cmp->lvls[lvl].to + 1;
> +        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> +            to = map_cmp->lvls[lvl].from - 1;

Is it possible that later mapping covers the previous ones? How is that
handled?

>      }
>  
> +    map->lvls[lvl].from = from;
> +    map->lvls[lvl].to = to;
> +    map->lvls[lvl].pgtables = ((to - from) >> bits) + 1;
> +
> + done:
>      DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64
> -              " -> 0x%016" PRIx64 ", %d table(s)",
> -              __FUNCTION__, mask, bits, start, end, tables);
> -    return tables;
> +              " -> 0x%016" PRIx64 ", %d table(s)", __FUNCTION__, mask, bits,
> +              map->lvls[lvl].from, map->lvls[lvl].to, map->lvls[lvl].pgtables);
>  }
>  
> -static int alloc_pgtables(struct xc_dom_image *dom, int pae,
> -                          int l4_bits, int l3_bits, int l2_bits, int l1_bits)
> +static int count_pgtables(struct xc_dom_image *dom, int pae, int bits[4],
> +                          xen_vaddr_t from, xen_vaddr_t to, xen_pfn_t pfn)
> +{
> +    struct xc_dom_image_x86 *domx86 = dom->arch_private;
> +    struct xc_dom_x86_mapping *map;
> +    xen_pfn_t pfn_end;
> +    int level;
> +
> +    if ( domx86->n_mappings == MAPPING_MAX )
> +    {
> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
> +                     "%s: too many mappings\n", __FUNCTION__);
> +        return -ENOMEM;
> +    }
> +    map = domx86->maps + domx86->n_mappings;
> +
> +    pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86);
> +    if ( pfn_end >= dom->p2m_size )
> +    {
> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
> +                     "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
> +                     __FUNCTION__, pfn_end, dom->p2m_size);
> +        return -ENOMEM;
> +    }
> +
> +    memset(map, 0, sizeof(*map));
> +
> +    map->area.from = from;
> +    map->area.to = to;
> +    for (level = 3; level >= 0; level--)
> +    {
> +        map->lvls[level].pfn = pfn + map->area.pgtables;
> +        nr_page_tables(dom, level, from, to, bits[level]);
> +        if ( pae && to < 0xc0000000 && level == 1)
> +        {
> +            DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
> +            map->lvls[level].pgtables++;
> +        }
> +        map->area.pgtables += map->lvls[level].pgtables;
> +    }
> +
> +    return 0;
> +}
> +
> +static int alloc_pgtables(struct xc_dom_image *dom, int pae, int bits[4])
>  {
>      int pages, extra_pages;
>      xen_vaddr_t try_virt_end;
> -    xen_pfn_t try_pfn_end;
>      struct xc_dom_image_x86 *domx86 = dom->arch_private;
> +    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
>  
>      extra_pages = dom->alloc_bootstack ? 1 : 0;
>      extra_pages += 128; /* 512kB padding */

Hmm... Not really related to this patch: Should this be derived from
PAGE_SIZE_X86 as well?

The rest looks OK to me.

Wei.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported
  2015-10-13 13:11 ` [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
@ 2015-10-29 13:07   ` Wei Liu
  2015-10-29 13:19     ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-29 13:07 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, ian.jackson,
	xen-devel, roger.pau

On Tue, Oct 13, 2015 at 03:11:18PM +0200, Juergen Gross wrote:
[...]
>   err:
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index 333ef6b..0847761 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -68,6 +68,8 @@
>  #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)
>  #define round_down(addr, mask)   ((addr) & ~(mask))
>  #define round_up(addr, mask)     ((addr) | (mask))
> +#define round_pg(addr)    (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))

Minor suggestion: rename this to round_pgup to make it more explicit?

> +#define round_pfn(addr)   (((addr) + PAGE_SIZE_X86 - 1) / PAGE_SIZE_X86)
>  

And this to round_pfnup?

>  struct xc_dom_x86_mapping_lvl {
>      xen_vaddr_t from;
> @@ -84,7 +86,7 @@ struct xc_dom_x86_mapping {
>  
>  struct xc_dom_image_x86 {
>      unsigned n_mappings;
> -#define MAPPING_MAX 1
> +#define MAPPING_MAX 2
>      struct xc_dom_x86_mapping maps[MAPPING_MAX];
>  };
>  
> @@ -536,6 +538,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
>              }
>          }
>      }
> +

Stay blank line.

Other than that:

Acked-by: Wei Liu <wei.liu2@citrix.com>

Wei.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-29 12:48   ` Wei Liu
@ 2015-10-29 13:18     ` Juergen Gross
  2015-10-29 14:02       ` Wei Liu
  0 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-29 13:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/29/2015 01:48 PM, Wei Liu wrote:
> On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
>> In order to prepare a p2m list outside of the initial kernel mapping
>> do a rework of the domain builder's page table handler. The goal is
>> to be able to use common helpers for page table allocation and setup
>> for initial kernel page tables and page tables mapping the p2m list.
>> This is achieved by supporting multiple mapping areas. The mapped
>> virtual addresses of the single areas must not overlap, while the
>> page tables of a new area added might already be partially present.
>> Especially the top level page table is existing only once, of course.
>>
>
> Currently restrict the number of mappings to 1 because the only mapping
> now is the initial mapping created by toolstack. There should not be
> behaviour change and guest visible change introduced.
>
> If my understanding is correct, can yo please add that to the commit
> message?

Sure.

I'm currently thinking about changing this patch even further. While
doing similar work in grub-xen I found the page table building there
to be much more generic and more compact. Instead of open coding the
different page table levels all is done there in a loop over those
levels. The main loop consists of only 33 lines (and this is after
adding support of multiple mapping areas)!

What do you think?

> Given this is a particularly thorny area, a second eye would be much
> appreciated.

On the positive side: any bug in this code should be really easy to
spot, as the domain wouldn't be able to work reliably (this was at least
my experience when developing this patch and the related one in grub).

> Some comments below.
>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/libxc/xc_dom_x86.c | 404 ++++++++++++++++++++++++++++-------------------
>>   1 file changed, 240 insertions(+), 164 deletions(-)
>>
>> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
>> index c815e10..333ef6b 100644
>> --- a/tools/libxc/xc_dom_x86.c
>> +++ b/tools/libxc/xc_dom_x86.c
>> @@ -65,17 +65,27 @@
>>   #define NR_IOREQ_SERVER_PAGES 8
>>   #define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
>>
>> -#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
>> +#define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)
>
> Whitespace-only change here.

Oops, sorry.

>
>>   #define round_down(addr, mask)   ((addr) & ~(mask))
>>   #define round_up(addr, mask)     ((addr) | (mask))
>>
>> -struct xc_dom_image_x86 {
>> -    /* initial page tables */
>> +struct xc_dom_x86_mapping_lvl {
>> +    xen_vaddr_t from;
>> +    xen_vaddr_t to;
>> +    xen_pfn_t pfn;
>>       unsigned int pgtables;
>> -    unsigned int pg_l4;
>> -    unsigned int pg_l3;
>> -    unsigned int pg_l2;
>> -    unsigned int pg_l1;
>> +};
>> +
>> +struct xc_dom_x86_mapping {
>> +    struct xc_dom_x86_mapping_lvl area;
>> +    struct xc_dom_x86_mapping_lvl lvls[4];
>> +    xen_pfn_t pfn_start;
>
> This is unused throughout the patch and the next patch. You can delete it.

Indeed.

>
>> +};
>> +
>> +struct xc_dom_image_x86 {
>> +    unsigned n_mappings;
>> +#define MAPPING_MAX 1
>> +    struct xc_dom_x86_mapping maps[MAPPING_MAX];
>>   };
>>
>>   /* get guest IO ABI protocol */
>> @@ -105,43 +115,107 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
>>       return protocol;
>>   }
>>
>> -static unsigned long
>> -nr_page_tables(struct xc_dom_image *dom,
>> -               xen_vaddr_t start, xen_vaddr_t end, unsigned long bits)
>> +static void
>> +nr_page_tables(struct xc_dom_image *dom, int lvl,
>> +               xen_vaddr_t from, xen_vaddr_t to, unsigned long bits)
>>   {
>>       xen_vaddr_t mask = bits_to_mask(bits);
>> -    int tables;
>> +    struct xc_dom_image_x86 *domx86 = dom->arch_private;
>> +    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
>> +    struct xc_dom_x86_mapping *map_cmp;
>> +    unsigned map_n;
>>
>>       if ( bits == 0 )
>> -        return 0;  /* unused */
>> +        return;  /* unused */
>> +
>> +    if ( lvl == 3 && domx86->n_mappings != 0 )
>> +        return;  /* Top level page table already in first mapping. */
>>
>>       if ( bits == (8 * sizeof(unsigned long)) )
>>       {
>> -        /* must be pgd, need one */
>> -        start = 0;
>> -        end = -1;
>> -        tables = 1;
>> +        /* 32 bit top level page table special case */
>> +        map->lvls[lvl].from = 0;
>> +        map->lvls[lvl].to = -1;
>> +        map->lvls[lvl].pgtables = 1;
>> +        goto done;
>>       }
>> -    else
>> +
>> +    from = round_down(from, mask);
>> +    to = round_up(to, mask);
>> +
>> +    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
>> +          map_n++, map_cmp++ )
>>       {
>> -        start = round_down(start, mask);
>> -        end = round_up(end, mask);
>> -        tables = ((end - start) >> bits) + 1;
>> +        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
>> +            continue;
>> +        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>> +            return;  /* Area already completely covered on this level. */
>> +        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
>> +            from = map_cmp->lvls[lvl].to + 1;
>> +        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>> +            to = map_cmp->lvls[lvl].from - 1;
>
> Is it possible that later mapping covers the previous ones? How is that
> handled?

I'm not sure I understand your question.

In case you are asking whether different mappings are allowed to overlap
in terms of virtual addresses: no. I haven't added any checking code to
ensure this. I can do it, if you want.

>
>>       }
>>
>> +    map->lvls[lvl].from = from;
>> +    map->lvls[lvl].to = to;
>> +    map->lvls[lvl].pgtables = ((to - from) >> bits) + 1;
>> +
>> + done:
>>       DOMPRINTF("%s: 0x%016" PRIx64 "/%ld: 0x%016" PRIx64
>> -              " -> 0x%016" PRIx64 ", %d table(s)",
>> -              __FUNCTION__, mask, bits, start, end, tables);
>> -    return tables;
>> +              " -> 0x%016" PRIx64 ", %d table(s)", __FUNCTION__, mask, bits,
>> +              map->lvls[lvl].from, map->lvls[lvl].to, map->lvls[lvl].pgtables);
>>   }
>>
>> -static int alloc_pgtables(struct xc_dom_image *dom, int pae,
>> -                          int l4_bits, int l3_bits, int l2_bits, int l1_bits)
>> +static int count_pgtables(struct xc_dom_image *dom, int pae, int bits[4],
>> +                          xen_vaddr_t from, xen_vaddr_t to, xen_pfn_t pfn)
>> +{
>> +    struct xc_dom_image_x86 *domx86 = dom->arch_private;
>> +    struct xc_dom_x86_mapping *map;
>> +    xen_pfn_t pfn_end;
>> +    int level;
>> +
>> +    if ( domx86->n_mappings == MAPPING_MAX )
>> +    {
>> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
>> +                     "%s: too many mappings\n", __FUNCTION__);
>> +        return -ENOMEM;
>> +    }
>> +    map = domx86->maps + domx86->n_mappings;
>> +
>> +    pfn_end = pfn + ((to - from) >> PAGE_SHIFT_X86);
>> +    if ( pfn_end >= dom->p2m_size )
>> +    {
>> +        xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
>> +                     "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")",
>> +                     __FUNCTION__, pfn_end, dom->p2m_size);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    memset(map, 0, sizeof(*map));
>> +
>> +    map->area.from = from;
>> +    map->area.to = to;
>> +    for (level = 3; level >= 0; level--)
>> +    {
>> +        map->lvls[level].pfn = pfn + map->area.pgtables;
>> +        nr_page_tables(dom, level, from, to, bits[level]);
>> +        if ( pae && to < 0xc0000000 && level == 1)
>> +        {
>> +            DOMPRINTF("%s: PAE: extra l2 page table for l3#3", __FUNCTION__);
>> +            map->lvls[level].pgtables++;
>> +        }
>> +        map->area.pgtables += map->lvls[level].pgtables;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int alloc_pgtables(struct xc_dom_image *dom, int pae, int bits[4])
>>   {
>>       int pages, extra_pages;
>>       xen_vaddr_t try_virt_end;
>> -    xen_pfn_t try_pfn_end;
>>       struct xc_dom_image_x86 *domx86 = dom->arch_private;
>> +    struct xc_dom_x86_mapping *map = domx86->maps + domx86->n_mappings;
>>
>>       extra_pages = dom->alloc_bootstack ? 1 : 0;
>>       extra_pages += 128; /* 512kB padding */
>
> Hmm... Not really related to this patch: Should this be derived from
> PAGE_SIZE_X86 as well?

Probably, yes.

> The rest looks OK to me.


Thanks,

Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported
  2015-10-29 13:07   ` Wei Liu
@ 2015-10-29 13:19     ` Juergen Gross
  0 siblings, 0 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-29 13:19 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/29/2015 02:07 PM, Wei Liu wrote:
> On Tue, Oct 13, 2015 at 03:11:18PM +0200, Juergen Gross wrote:
> [...]
>>    err:
>> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
>> index 333ef6b..0847761 100644
>> --- a/tools/libxc/xc_dom_x86.c
>> +++ b/tools/libxc/xc_dom_x86.c
>> @@ -68,6 +68,8 @@
>>   #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits)) - 1)
>>   #define round_down(addr, mask)   ((addr) & ~(mask))
>>   #define round_up(addr, mask)     ((addr) | (mask))
>> +#define round_pg(addr)    (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))
>
> Minor suggestion: rename this to round_pgup to make it more explicit?

Yes, this is better.

>
>> +#define round_pfn(addr)   (((addr) + PAGE_SIZE_X86 - 1) / PAGE_SIZE_X86)
>>
>
> And this to round_pfnup?

Yes.

>
>>   struct xc_dom_x86_mapping_lvl {
>>       xen_vaddr_t from;
>> @@ -84,7 +86,7 @@ struct xc_dom_x86_mapping {
>>
>>   struct xc_dom_image_x86 {
>>       unsigned n_mappings;
>> -#define MAPPING_MAX 1
>> +#define MAPPING_MAX 2
>>       struct xc_dom_x86_mapping maps[MAPPING_MAX];
>>   };
>>
>> @@ -536,6 +538,7 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
>>               }
>>           }
>>       }
>> +
>
> Stay blank line.

Will remove.

>
> Other than that:
>
> Acked-by: Wei Liu <wei.liu2@citrix.com>


Thanks,

Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-29 13:18     ` Juergen Gross
@ 2015-10-29 14:02       ` Wei Liu
  2015-10-29 14:13         ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-29 14:02 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Wei Liu, Ian.Campbell, stefano.stabellini, ian.jackson, xen-devel,
	roger.pau

On Thu, Oct 29, 2015 at 02:18:31PM +0100, Juergen Gross wrote:
> On 10/29/2015 01:48 PM, Wei Liu wrote:
> >On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
> >>In order to prepare a p2m list outside of the initial kernel mapping
> >>do a rework of the domain builder's page table handler. The goal is
> >>to be able to use common helpers for page table allocation and setup
> >>for initial kernel page tables and page tables mapping the p2m list.
> >>This is achieved by supporting multiple mapping areas. The mapped
> >>virtual addresses of the single areas must not overlap, while the
> >>page tables of a new area added might already be partially present.
> >>Especially the top level page table is existing only once, of course.
> >>
> >
> >Currently restrict the number of mappings to 1 because the only mapping
> >now is the initial mapping created by toolstack. There should not be
> >behaviour change and guest visible change introduced.
> >
> >If my understanding is correct, can yo please add that to the commit
> >message?
> 
> Sure.
> 
> I'm currently thinking about changing this patch even further. While
> doing similar work in grub-xen I found the page table building there
> to be much more generic and more compact. Instead of open coding the
> different page table levels all is done there in a loop over those
> levels. The main loop consists of only 33 lines (and this is after
> adding support of multiple mapping areas)!
> 
> What do you think?
> 

That's of course OK. It's patch reviewing for me anyway. One patch is as
good as another.

> >Given this is a particularly thorny area, a second eye would be much
> >appreciated.
> 
> On the positive side: any bug in this code should be really easy to
> spot, as the domain wouldn't be able to work reliably (this was at least
> my experience when developing this patch and the related one in grub).
> 
> >Some comments below.
> >
[...]
> >>+    from = round_down(from, mask);
> >>+    to = round_up(to, mask);
> >>+
> >>+    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
> >>+          map_n++, map_cmp++ )
> >>      {
> >>-        start = round_down(start, mask);
> >>-        end = round_up(end, mask);
> >>-        tables = ((end - start) >> bits) + 1;
> >>+        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
> >>+            continue;
> >>+        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> >>+            return;  /* Area already completely covered on this level. */
> >>+        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
> >>+            from = map_cmp->lvls[lvl].to + 1;
> >>+        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> >>+            to = map_cmp->lvls[lvl].from - 1;
> >
> >Is it possible that later mapping covers the previous ones? How is that
> >handled?
> 
> I'm not sure I understand your question.
> 
> In case you are asking whether different mappings are allowed to overlap
> in terms of virtual addresses: no. I haven't added any checking code to
> ensure this. I can do it, if you want.
> 

As I understand it the sentence of "different mappings are not allowed
to overlap" is the requirement, not the current status.

The above code snippet is used to ensure different mappings don't
overlap, but I sense there one case missing,

   from < map_cmp->lvls[lvl].from && to > map_cmp->lvls[lvl].to

If this is not expected we should probably add assert here; otherwise
that case needs to be handled as well.

Wei.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-29 14:02       ` Wei Liu
@ 2015-10-29 14:13         ` Juergen Gross
  2015-10-29 15:03           ` Wei Liu
  0 siblings, 1 reply; 31+ messages in thread
From: Juergen Gross @ 2015-10-29 14:13 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/29/2015 03:02 PM, Wei Liu wrote:
> On Thu, Oct 29, 2015 at 02:18:31PM +0100, Juergen Gross wrote:
>> On 10/29/2015 01:48 PM, Wei Liu wrote:
>>> On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
>>>> In order to prepare a p2m list outside of the initial kernel mapping
>>>> do a rework of the domain builder's page table handler. The goal is
>>>> to be able to use common helpers for page table allocation and setup
>>>> for initial kernel page tables and page tables mapping the p2m list.
>>>> This is achieved by supporting multiple mapping areas. The mapped
>>>> virtual addresses of the single areas must not overlap, while the
>>>> page tables of a new area added might already be partially present.
>>>> Especially the top level page table is existing only once, of course.
>>>>
>>>
>>> Currently restrict the number of mappings to 1 because the only mapping
>>> now is the initial mapping created by toolstack. There should not be
>>> behaviour change and guest visible change introduced.
>>>
>>> If my understanding is correct, can yo please add that to the commit
>>> message?
>>
>> Sure.
>>
>> I'm currently thinking about changing this patch even further. While
>> doing similar work in grub-xen I found the page table building there
>> to be much more generic and more compact. Instead of open coding the
>> different page table levels all is done there in a loop over those
>> levels. The main loop consists of only 33 lines (and this is after
>> adding support of multiple mapping areas)!
>>
>> What do you think?
>>
>
> That's of course OK. It's patch reviewing for me anyway. One patch is as
> good as another.

You haven't seen the coding yet. ;-)

>>> Given this is a particularly thorny area, a second eye would be much
>>> appreciated.
>>
>> On the positive side: any bug in this code should be really easy to
>> spot, as the domain wouldn't be able to work reliably (this was at least
>> my experience when developing this patch and the related one in grub).
>>
>>> Some comments below.
>>>
> [...]
>>>> +    from = round_down(from, mask);
>>>> +    to = round_up(to, mask);
>>>> +
>>>> +    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
>>>> +          map_n++, map_cmp++ )
>>>>       {
>>>> -        start = round_down(start, mask);
>>>> -        end = round_up(end, mask);
>>>> -        tables = ((end - start) >> bits) + 1;
>>>> +        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
>>>> +            continue;
>>>> +        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>>>> +            return;  /* Area already completely covered on this level. */
>>>> +        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
>>>> +            from = map_cmp->lvls[lvl].to + 1;
>>>> +        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>>>> +            to = map_cmp->lvls[lvl].from - 1;
>>>
>>> Is it possible that later mapping covers the previous ones? How is that
>>> handled?
>>
>> I'm not sure I understand your question.
>>
>> In case you are asking whether different mappings are allowed to overlap
>> in terms of virtual addresses: no. I haven't added any checking code to
>> ensure this. I can do it, if you want.
>>
>
> As I understand it the sentence of "different mappings are not allowed
> to overlap" is the requirement, not the current status.

It doesn't make sense to support this. It would require to map multiple
pfns to the same virtual address. This is impossible. So yes, this is a
requirement.

I'll add some code to make sure overlapping does not occur.

> The above code snippet is used to ensure different mappings don't
> overlap, but I sense there one case missing,
>
>     from < map_cmp->lvls[lvl].from && to > map_cmp->lvls[lvl].to
>
> If this is not expected we should probably add assert here; otherwise
> that case needs to be handled as well.

This can only happen if the virtual areas are overlapping.


Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-29 14:13         ` Juergen Gross
@ 2015-10-29 15:03           ` Wei Liu
  2015-10-29 15:34             ` Juergen Gross
  0 siblings, 1 reply; 31+ messages in thread
From: Wei Liu @ 2015-10-29 15:03 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Wei Liu, Ian.Campbell, stefano.stabellini, ian.jackson, xen-devel,
	roger.pau

On Thu, Oct 29, 2015 at 03:13:30PM +0100, Juergen Gross wrote:
> On 10/29/2015 03:02 PM, Wei Liu wrote:
> >On Thu, Oct 29, 2015 at 02:18:31PM +0100, Juergen Gross wrote:
> >>On 10/29/2015 01:48 PM, Wei Liu wrote:
> >>>On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
> >>>>In order to prepare a p2m list outside of the initial kernel mapping
> >>>>do a rework of the domain builder's page table handler. The goal is
> >>>>to be able to use common helpers for page table allocation and setup
> >>>>for initial kernel page tables and page tables mapping the p2m list.
> >>>>This is achieved by supporting multiple mapping areas. The mapped
> >>>>virtual addresses of the single areas must not overlap, while the
> >>>>page tables of a new area added might already be partially present.
> >>>>Especially the top level page table is existing only once, of course.
> >>>>
> >>>
> >>>Currently restrict the number of mappings to 1 because the only mapping
> >>>now is the initial mapping created by toolstack. There should not be
> >>>behaviour change and guest visible change introduced.
> >>>
> >>>If my understanding is correct, can yo please add that to the commit
> >>>message?
> >>
> >>Sure.
> >>
> >>I'm currently thinking about changing this patch even further. While
> >>doing similar work in grub-xen I found the page table building there
> >>to be much more generic and more compact. Instead of open coding the
> >>different page table levels all is done there in a loop over those
> >>levels. The main loop consists of only 33 lines (and this is after
> >>adding support of multiple mapping areas)!
> >>
> >>What do you think?
> >>
> >
> >That's of course OK. It's patch reviewing for me anyway. One patch is as
> >good as another.
> 
> You haven't seen the coding yet. ;-)
> 

We will see. :-)

> >>>Given this is a particularly thorny area, a second eye would be much
> >>>appreciated.
> >>
> >>On the positive side: any bug in this code should be really easy to
> >>spot, as the domain wouldn't be able to work reliably (this was at least
> >>my experience when developing this patch and the related one in grub).
> >>
> >>>Some comments below.
> >>>
> >[...]
> >>>>+    from = round_down(from, mask);
> >>>>+    to = round_up(to, mask);
> >>>>+
> >>>>+    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
> >>>>+          map_n++, map_cmp++ )
> >>>>      {
> >>>>-        start = round_down(start, mask);
> >>>>-        end = round_up(end, mask);
> >>>>-        tables = ((end - start) >> bits) + 1;
> >>>>+        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
> >>>>+            continue;
> >>>>+        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> >>>>+            return;  /* Area already completely covered on this level. */
> >>>>+        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
> >>>>+            from = map_cmp->lvls[lvl].to + 1;
> >>>>+        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
> >>>>+            to = map_cmp->lvls[lvl].from - 1;
> >>>
> >>>Is it possible that later mapping covers the previous ones? How is that
> >>>handled?
> >>
> >>I'm not sure I understand your question.
> >>
> >>In case you are asking whether different mappings are allowed to overlap
> >>in terms of virtual addresses: no. I haven't added any checking code to
> >>ensure this. I can do it, if you want.
> >>
> >
> >As I understand it the sentence of "different mappings are not allowed
> >to overlap" is the requirement, not the current status.
> 
> It doesn't make sense to support this. It would require to map multiple
> pfns to the same virtual address. This is impossible. So yes, this is a
> requirement.
> 
> I'll add some code to make sure overlapping does not occur.
> 

Cool, thanks.

I don't ask too much for this. It can be as simple as just error out
if overlapping is detected.

> >The above code snippet is used to ensure different mappings don't
> >overlap, but I sense there one case missing,
> >
> >    from < map_cmp->lvls[lvl].from && to > map_cmp->lvls[lvl].to
> >
> >If this is not expected we should probably add assert here; otherwise
> >that case needs to be handled as well.
> 
> This can only happen if the virtual areas are overlapping.
> 

And could you please add an assert for this if nr_page_tables is still
around in next version.

Wei.

> 
> Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 8/9] libxc: rework of domain builder's page table handler
  2015-10-29 15:03           ` Wei Liu
@ 2015-10-29 15:34             ` Juergen Gross
  0 siblings, 0 replies; 31+ messages in thread
From: Juergen Gross @ 2015-10-29 15:34 UTC (permalink / raw)
  To: Wei Liu; +Cc: roger.pau, stefano.stabellini, ian.jackson, Ian.Campbell,
	xen-devel

On 10/29/2015 04:03 PM, Wei Liu wrote:
> On Thu, Oct 29, 2015 at 03:13:30PM +0100, Juergen Gross wrote:
>> On 10/29/2015 03:02 PM, Wei Liu wrote:
>>> On Thu, Oct 29, 2015 at 02:18:31PM +0100, Juergen Gross wrote:
>>>> On 10/29/2015 01:48 PM, Wei Liu wrote:
>>>>> On Tue, Oct 13, 2015 at 03:11:17PM +0200, Juergen Gross wrote:
>>>>>> In order to prepare a p2m list outside of the initial kernel mapping
>>>>>> do a rework of the domain builder's page table handler. The goal is
>>>>>> to be able to use common helpers for page table allocation and setup
>>>>>> for initial kernel page tables and page tables mapping the p2m list.
>>>>>> This is achieved by supporting multiple mapping areas. The mapped
>>>>>> virtual addresses of the single areas must not overlap, while the
>>>>>> page tables of a new area added might already be partially present.
>>>>>> Especially the top level page table is existing only once, of course.
>>>>>>
>>>>>
>>>>> Currently restrict the number of mappings to 1 because the only mapping
>>>>> now is the initial mapping created by toolstack. There should not be
>>>>> behaviour change and guest visible change introduced.
>>>>>
>>>>> If my understanding is correct, can yo please add that to the commit
>>>>> message?
>>>>
>>>> Sure.
>>>>
>>>> I'm currently thinking about changing this patch even further. While
>>>> doing similar work in grub-xen I found the page table building there
>>>> to be much more generic and more compact. Instead of open coding the
>>>> different page table levels all is done there in a loop over those
>>>> levels. The main loop consists of only 33 lines (and this is after
>>>> adding support of multiple mapping areas)!
>>>>
>>>> What do you think?
>>>>
>>>
>>> That's of course OK. It's patch reviewing for me anyway. One patch is as
>>> good as another.
>>
>> You haven't seen the coding yet. ;-)
>>
>
> We will see. :-)
>
>>>>> Given this is a particularly thorny area, a second eye would be much
>>>>> appreciated.
>>>>
>>>> On the positive side: any bug in this code should be really easy to
>>>> spot, as the domain wouldn't be able to work reliably (this was at least
>>>> my experience when developing this patch and the related one in grub).
>>>>
>>>>> Some comments below.
>>>>>
>>> [...]
>>>>>> +    from = round_down(from, mask);
>>>>>> +    to = round_up(to, mask);
>>>>>> +
>>>>>> +    for ( map_n = 0, map_cmp = domx86->maps; map_n < domx86->n_mappings;
>>>>>> +          map_n++, map_cmp++ )
>>>>>>       {
>>>>>> -        start = round_down(start, mask);
>>>>>> -        end = round_up(end, mask);
>>>>>> -        tables = ((end - start) >> bits) + 1;
>>>>>> +        if ( map_cmp->lvls[lvl].from == map_cmp->lvls[lvl].to )
>>>>>> +            continue;
>>>>>> +        if ( from >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>>>>>> +            return;  /* Area already completely covered on this level. */
>>>>>> +        if ( from >= map_cmp->lvls[lvl].from && from <= map_cmp->lvls[lvl].to )
>>>>>> +            from = map_cmp->lvls[lvl].to + 1;
>>>>>> +        if ( to >= map_cmp->lvls[lvl].from && to <= map_cmp->lvls[lvl].to )
>>>>>> +            to = map_cmp->lvls[lvl].from - 1;
>>>>>
>>>>> Is it possible that later mapping covers the previous ones? How is that
>>>>> handled?
>>>>
>>>> I'm not sure I understand your question.
>>>>
>>>> In case you are asking whether different mappings are allowed to overlap
>>>> in terms of virtual addresses: no. I haven't added any checking code to
>>>> ensure this. I can do it, if you want.
>>>>
>>>
>>> As I understand it the sentence of "different mappings are not allowed
>>> to overlap" is the requirement, not the current status.
>>
>> It doesn't make sense to support this. It would require to map multiple
>> pfns to the same virtual address. This is impossible. So yes, this is a
>> requirement.
>>
>> I'll add some code to make sure overlapping does not occur.
>>
>
> Cool, thanks.
>
> I don't ask too much for this. It can be as simple as just error out
> if overlapping is detected.

More isn't possible, I guess.

>
>>> The above code snippet is used to ensure different mappings don't
>>> overlap, but I sense there one case missing,
>>>
>>>     from < map_cmp->lvls[lvl].from && to > map_cmp->lvls[lvl].to
>>>
>>> If this is not expected we should probably add assert here; otherwise
>>> that case needs to be handled as well.
>>
>> This can only happen if the virtual areas are overlapping.
>>
>
> And could you please add an assert for this if nr_page_tables is still
> around in next version.

At least the related check for overlapping will still be there. I'll add
the assert if you feel better with it.


Juergen

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-10-29 15:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-13 13:11 [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross
2015-10-13 13:11 ` [PATCH v3 1/9] libxc: reorganize domain builder guest memory allocator Juergen Gross
2015-10-28 15:32   ` Wei Liu
2015-10-28 15:51     ` Juergen Gross
2015-10-28 16:21       ` Wei Liu
2015-10-28 17:05         ` Juergen Gross
2015-10-13 13:11 ` [PATCH v3 2/9] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
2015-10-28 15:33   ` Wei Liu
2015-10-28 15:49   ` Jan Beulich
2015-10-13 13:11 ` [PATCH v3 3/9] libxc: rename domain builder count_pgtables to alloc_pgtables Juergen Gross
2015-10-28 15:34   ` Wei Liu
2015-10-13 13:11 ` [PATCH v3 4/9] libxc: introduce domain builder architecture specific data Juergen Gross
2015-10-28 15:37   ` Wei Liu
2015-10-13 13:11 ` [PATCH v3 5/9] libxc: use domain builder architecture private data for x86 pv domains Juergen Gross
2015-10-28 15:38   ` Wei Liu
2015-10-13 13:11 ` [PATCH v3 6/9] libxc: create unmapped initrd in domain builder if supported Juergen Gross
2015-10-28 16:11   ` Wei Liu
2015-10-28 17:07     ` Juergen Gross
2015-10-13 13:11 ` [PATCH v3 7/9] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
2015-10-28 16:11   ` Wei Liu
2015-10-13 13:11 ` [PATCH v3 8/9] libxc: rework of domain builder's page table handler Juergen Gross
2015-10-29 12:48   ` Wei Liu
2015-10-29 13:18     ` Juergen Gross
2015-10-29 14:02       ` Wei Liu
2015-10-29 14:13         ` Juergen Gross
2015-10-29 15:03           ` Wei Liu
2015-10-29 15:34             ` Juergen Gross
2015-10-13 13:11 ` [PATCH v3 9/9] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
2015-10-29 13:07   ` Wei Liu
2015-10-29 13:19     ` Juergen Gross
2015-10-26 11:15 ` [PATCH v3 0/9] libxc: support building large pv-domains Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).