xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] libxc: support building large pv-domains
@ 2015-10-02  5:49 Juergen Gross
  2015-10-02  5:49 ` [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image Juergen Gross
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross

The Xen hypervisor supports starting a dom0 with large memory (up to
the TB range) by not including the initrd and p2m list in the initial
kernel mapping. Especially the p2m list can grow larger than the
available virtual space in the initial mapping.

The started kernel is indicating the support of each feature via
elf notes.

This series enables the domain builder in libxc to do the same as the
hypervisor. This enables starting of huge pv-domUs via xl.

Unmapped initrd is supported for 64 and 32 bit domains, omitting the
p2m from initial kernel mapping is possible for 64 bit domains only.

Tested with:
- 32 bit domU (kernel not supporting unmapped initrd)
- 32 bit domU (kernel supporting unmapped initrd)
- 1 GB 64 bit domU (kernel supporting unmapped initrd, not p2m)
- 1 GB 64 bit domU (kernel supporting unmapped initrd and p2m)
- 900GB 64 bit domU (kernel supporting unmapped initrd and p2m)

Changes in v2:
- patch 2 has been removed as it has been applied already
- introduced new patch 2 as suggested by Ian Campbell: add a flag
  indicating support of an unmapped initrd to the parsed elf data of
  the elf_dom_parms structure
- updated patch description of patch 3 as requested by Ian Campbell

Juergen Gross (5):
  libxc: remove allocate member from struct xc_dom_image
  xen: add generic flag to elf_dom_parms indicating support of unmapped
    initrd
  libxc: create unmapped initrd in domain builder if supported
  libxc: split p2m allocation in domain builder from other magic pages
  libxc: create p2m list outside of kernel mapping if supported

 tools/libxc/include/xc_dom.h       |   4 +-
 tools/libxc/xc_dom_core.c          |  44 ++++++++++++--
 tools/libxc/xc_dom_x86.c           | 120 ++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/domain_build.c        |   4 +-
 xen/common/libelf/libelf-dominfo.c |   3 +
 xen/include/xen/libelf.h           |   1 +
 6 files changed, 165 insertions(+), 11 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image
  2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
@ 2015-10-02  5:49 ` Juergen Gross
  2015-10-02 13:01   ` Ian Campbell
  2015-10-02  5:49 ` [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross

The allocate() callback in struct xc_dom_image is never set. Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxc/include/xc_dom.h | 2 --
 tools/libxc/xc_dom_core.c    | 4 ----
 2 files changed, 6 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 602c5cd..5eeff15 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -175,8 +175,6 @@ struct xc_dom_image {
 
     /* kernel loader */
     struct xc_dom_arch *arch_hooks;
-    /* allocate up to virt_alloc_end */
-    int (*allocate) (struct xc_dom_image * dom, xen_vaddr_t up_to);
 };
 
 /* --- pluggable kernel loader ------------------------------------- */
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index fbe4464..b510bbd 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -579,8 +579,6 @@ int xc_dom_alloc_segment(struct xc_dom_image *dom,
 
     seg->vend = start + pages * page_size;
     dom->virt_alloc_end = seg->vend;
-    if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
 
     DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
               "  (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
@@ -603,8 +601,6 @@ int xc_dom_alloc_page(struct xc_dom_image *dom, char *name)
 
     start = dom->virt_alloc_end;
     dom->virt_alloc_end += page_size;
-    if (dom->allocate)
-        dom->allocate(dom, dom->virt_alloc_end);
     pfn = (start - dom->parms.virt_base) / page_size;
 
     DOMPRINTF("%-20s:   %-12s : 0x%" PRIx64 " (pfn 0x%" PRIpfn ")",
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
  2015-10-02  5:49 ` [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image Juergen Gross
@ 2015-10-02  5:49 ` Juergen Gross
  2015-10-02  9:37   ` Andrew Cooper
  2015-10-02  5:49 ` [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross, Andrew Cooper, Keir Fraser, Jan Beulich,
	Tim Deegan

Support of an unmapped initrd is indicated by the kernel of the domain
via elf notes. In order not to have to use raw elf data in the tools
for support of an unmapped initrd add a flag to the parsed data area
to indicate the kernel supporting this feature.

Switch using this flag in the hypervisor domain builder.

Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Tim Deegan <tim@xen.org>
Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain_build.c        | 4 ++--
 xen/common/libelf/libelf-dominfo.c | 3 +++
 xen/include/xen/libelf.h           | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index c2ef87a..a02c9fb 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
 
         vstart = parms->virt_base;
         vend = round_pgup(parms->virt_kend);
-        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+        if ( !parms->mod_start_pfn )
             vend += round_pgup(initrd_len);
         end = vend + nr_pages * sizeof_long;
 
@@ -1037,7 +1037,7 @@ int __init construct_dom0(
     v_start          = parms.virt_base;
     vkern_start      = parms.virt_kstart;
     vkern_end        = parms.virt_kend;
-    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
+    if ( parms.mod_start_pfn )
     {
         vinitrd_start  = vinitrd_end = 0;
         vphysmap_start = round_pgup(vkern_end);
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 3de1c23..31d8436 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_INIT_P2M:
         parms->p2m_base = val;
         break;
+    case XEN_ELFNOTE_MOD_START_PFN:
+        parms->mod_start_pfn = !!val;
+        break;
     case XEN_ELFNOTE_PADDR_OFFSET:
         parms->elf_paddr_offset = val;
         break;
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index d7045f6..07b96a3 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -423,6 +423,7 @@ struct elf_dom_parms {
     char loader[16];
     enum xen_pae_type pae;
     bool bsd_symtab;
+    bool mod_start_pfn;
     uint64_t virt_base;
     uint64_t virt_entry;
     uint64_t virt_hypercall;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
  2015-10-02  5:49 ` [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image Juergen Gross
  2015-10-02  5:49 ` [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-10-02  5:49 ` Juergen Gross
  2015-10-02 12:59   ` Ian Campbell
  2015-10-02  5:49 ` [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
  2015-10-02  5:49 ` [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
  4 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting an
unmapped initrd, don't waste precious virtual space for the initrd,
but allocate only guest physical memory for it.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index b510bbd..85b531a 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->kernel_loader->loader(dom) != 0 )
         goto err;
 
-    /* load ramdisk */
-    if ( dom->ramdisk_blob )
+    /* Load ramdisk if initial mapping required. */
+    if ( dom->ramdisk_blob &&
+         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart) )
     {
         if ( xc_dom_build_ramdisk(dom) != 0 )
             goto err;
@@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image *dom)
               __FUNCTION__, dom->virt_alloc_end);
     DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
               __FUNCTION__, dom->virt_pgtab_end);
+
+    /* Prepare allocating unmapped memory. */
+    if ( dom->virt_pgtab_end )
+        dom->virt_alloc_end = dom->virt_pgtab_end;
+
+    /* Load ramdisk if no initial mapping required. */
+    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
+         dom->parms.mod_start_pfn )
+    {
+        if ( xc_dom_build_ramdisk(dom) != 0 )
+            goto err;
+        dom->flags |= SIF_MOD_START_PFN;
+        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
+        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
+    }
+
     return 0;
 
  err:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages
  2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
                   ` (2 preceding siblings ...)
  2015-10-02  5:49 ` [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported Juergen Gross
@ 2015-10-02  5:49 ` Juergen Gross
  2015-10-02  9:29   ` Ian Campbell
  2015-10-02  5:49 ` [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
  4 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross

Carve out the p2m list allocation from the .alloc_magic_pages hook of
the domain builder in order to prepare allocating the p2m list outside
of the initial kernel mapping. This will be needed to support loading
domains with huge memory (>512 GB).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_core.c    |  3 +++
 tools/libxc/xc_dom_x86.c     | 11 ++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 5eeff15..9117269 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -197,6 +197,7 @@ void xc_dom_register_loader(struct xc_dom_loader *loader);
 struct xc_dom_arch {
     /* pagetable setup */
     int (*alloc_magic_pages) (struct xc_dom_image * dom);
+    int (*alloc_p2m_list) (struct xc_dom_image * dom);
     int (*count_pgtables) (struct xc_dom_image * dom);
     int (*setup_pgtables) (struct xc_dom_image * dom);
 
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 85b531a..bd970c5 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1047,6 +1047,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
+    if ( dom->arch_hooks->alloc_p2m_list &&
+         dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+        goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->count_pgtables )
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index e2f3792..972f081 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -439,7 +439,7 @@ pfn_error:
 
 /* ------------------------------------------------------------------------ */
 
-static int alloc_magic_pages(struct xc_dom_image *dom)
+static int alloc_p2m_list(struct xc_dom_image *dom)
 {
     size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
 
@@ -451,6 +451,13 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     if ( dom->p2m_guest == NULL )
         return -1;
 
+    return 0;
+}
+
+/* ------------------------------------------------------------------------ */
+
+static int alloc_magic_pages(struct xc_dom_image *dom)
+{
     /* allocate special pages */
     dom->start_info_pfn = xc_dom_alloc_page(dom, "start info");
     dom->xenstore_pfn = xc_dom_alloc_page(dom, "xenstore");
@@ -674,6 +681,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
     .alloc_magic_pages = alloc_magic_pages,
+    .alloc_p2m_list = alloc_p2m_list,
     .count_pgtables = count_pgtables_x86_32_pae,
     .setup_pgtables = setup_pgtables_x86_32_pae,
     .start_info = start_info_x86_32,
@@ -687,6 +695,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
     .alloc_magic_pages = alloc_magic_pages,
+    .alloc_p2m_list = alloc_p2m_list,
     .count_pgtables = count_pgtables_x86_64,
     .setup_pgtables = setup_pgtables_x86_64,
     .start_info = start_info_x86_64,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported
  2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
                   ` (3 preceding siblings ...)
  2015-10-02  5:49 ` [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
@ 2015-10-02  5:49 ` Juergen Gross
  2015-10-02 13:16   ` Ian Campbell
  4 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  5:49 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2
  Cc: Juergen Gross

In case the kernel of a new pv-domU indicates it is supporting a p2m
list outside the initial kernel mapping by specifying INIT_P2M, let
the domain builder allocate the memory for the p2m list from physical
guest memory only and map it to the address the kernel is expecting.

This will enable loading pv-domUs larger than 512 GB.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/include/xc_dom.h |   1 +
 tools/libxc/xc_dom_core.c    |  17 ++++++-
 tools/libxc/xc_dom_x86.c     | 109 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 9117269..5731098 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -210,6 +210,7 @@ struct xc_dom_arch {
     char *native_protocol;
     int page_shift;
     int sizeof_pfn;
+    int p2m_base_supported;
 
     struct xc_dom_arch *next;
 };
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index bd970c5..36a0d63 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -734,6 +734,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
     dom->parms.virt_hypercall = UNSET_ADDR;
     dom->parms.virt_hv_start_low = UNSET_ADDR;
     dom->parms.elf_paddr_offset = UNSET_ADDR;
+    dom->parms.p2m_base = UNSET_ADDR;
 
     dom->alloc_malloc += sizeof(*dom);
     return dom;
@@ -1047,7 +1048,11 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     }
 
     /* allocate other pages */
-    if ( dom->arch_hooks->alloc_p2m_list &&
+    if ( !dom->arch_hooks->p2m_base_supported ||
+         dom->parms.p2m_base >= dom->parms.virt_base ||
+         (dom->parms.p2m_base & (XC_DOM_PAGE_SIZE(dom) - 1)) )
+        dom->parms.p2m_base = UNSET_ADDR;
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base == UNSET_ADDR &&
          dom->arch_hooks->alloc_p2m_list(dom) != 0 )
         goto err;
     if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
@@ -1084,6 +1089,16 @@ int xc_dom_build_image(struct xc_dom_image *dom)
         dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
     }
 
+    /* Allocate p2m list if outside of initial kernel mapping. */
+    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR )
+    {
+        if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 )
+            goto err;
+        dom->p2m_seg.vend = dom->p2m_seg.vend - dom->p2m_seg.vstart;
+        dom->p2m_seg.vstart = dom->parms.p2m_base;
+        dom->p2m_seg.vend += dom->p2m_seg.vstart;
+    }
+
     return 0;
 
  err:
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 972f081..5c0d28e 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -46,6 +46,8 @@
 #define bits_to_mask(bits)       (((xen_vaddr_t)1 << (bits))-1)
 #define round_down(addr, mask)   ((addr) & ~(mask))
 #define round_up(addr, mask)     ((addr) | (mask))
+#define round_pg(addr)    (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))
+#define round_pfn(addr)   (((addr) + PAGE_SIZE_X86 - 1) / PAGE_SIZE_X86)
 
 /* get guest IO ABI protocol */
 const char *xc_domain_get_native_protocol(xc_interface *xch,
@@ -424,6 +426,81 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
             }
         }
     }
+
+    if ( dom->parms.p2m_base == UNSET_ADDR )
+        return 0;
+
+    /*
+     * Build the page tables for mapping the p2m list at an address
+     * specified by the to be loaded kernel.
+     * l1pfn holds the pfn of the next page table to allocate.
+     * At each level we might already have an entry filled when setting
+     * up the initial kernel mapping. This can happen for the last entry
+     * of each level only!
+     */
+    l3tab = NULL;
+    l2tab = NULL;
+    l1tab = NULL;
+    l1pfn = round_pfn(dom->p2m_size * dom->arch_hooks->sizeof_pfn) +
+            dom->p2m_seg.pfn;
+
+    for ( addr = dom->parms.p2m_base;
+          addr < dom->parms.p2m_base +
+                 dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+          addr += PAGE_SIZE_X86 )
+    {
+        if ( l3tab == NULL )
+        {
+            l4off = l4_table_offset_x86_64(addr);
+            l3pfn = l4tab[l4off] ? l4pfn + dom->pg_l4 : l1pfn++;
+            l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
+            if ( l3tab == NULL )
+                goto pfn_error;
+            l4tab[l4off] =
+                pfn_to_paddr(xc_dom_p2m_guest(dom, l3pfn)) | L4_PROT;
+        }
+
+        if ( l2tab == NULL )
+        {
+            l3off = l3_table_offset_x86_64(addr);
+            l2pfn = l3tab[l3off] ? l3pfn + dom->pg_l3 : l1pfn++;
+            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
+            if ( l2tab == NULL )
+                goto pfn_error;
+            l3tab[l3off] =
+                pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
+        }
+
+        if ( l1tab == NULL )
+        {
+            l2off = l2_table_offset_x86_64(addr);
+            l1pfn = l2tab[l2off] ? l2pfn + dom->pg_l2 : l1pfn;
+            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
+            if ( l1tab == NULL )
+                goto pfn_error;
+            l2tab[l2off] =
+                pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
+            l1pfn++;
+        }
+
+        l1off = l1_table_offset_x86_64(addr);
+        pgpfn = ((addr - dom->parms.p2m_base) >> PAGE_SHIFT_X86) +
+                dom->p2m_seg.pfn;
+        l1tab[l1off] =
+            pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
+
+        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
+        {
+            l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+            {
+                l2tab = NULL;
+                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                    l3tab = NULL;
+            }
+        }
+    }
+
     return 0;
 
 pfn_error:
@@ -442,6 +519,27 @@ pfn_error:
 static int alloc_p2m_list(struct xc_dom_image *dom)
 {
     size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
+    xen_vaddr_t from, to;
+    xen_pfn_t tables;
+
+    p2m_alloc_size = round_pg(p2m_alloc_size);
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        /* Add space for page tables, 64 bit only. */
+        from = dom->parms.p2m_base;
+        to = from + p2m_alloc_size - 1;
+        tables = 0;
+        tables += nr_page_tables(dom, from, to, L4_PAGETABLE_SHIFT_X86_64);
+        if ( to > (xen_vaddr_t)(~0ULL << L4_PAGETABLE_SHIFT_X86_64) )
+            tables--;
+        tables += nr_page_tables(dom, from, to, L3_PAGETABLE_SHIFT_X86_64);
+        if ( to > (xen_vaddr_t)(~0ULL << L3_PAGETABLE_SHIFT_X86_64) )
+            tables--;
+        tables += nr_page_tables(dom, from, to, L2_PAGETABLE_SHIFT_X86_64);
+        if ( to > (xen_vaddr_t)(~0ULL << L2_PAGETABLE_SHIFT_X86_64) )
+            tables--;
+        p2m_alloc_size += tables << PAGE_SHIFT_X86;
+    }
 
     /* allocate phys2mach table */
     if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",
@@ -541,6 +639,12 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->pt_base = dom->pgtables_seg.vstart;
     start_info->nr_pt_frames = dom->pgtables;
     start_info->mfn_list = dom->p2m_seg.vstart;
+    if ( dom->parms.p2m_base != UNSET_ADDR )
+    {
+        start_info->first_p2m_pfn = dom->p2m_seg.pfn;
+        start_info->nr_p2m_frames =
+            (dom->p2m_seg.vend - dom->p2m_seg.vstart) >> PAGE_SHIFT_X86;
+    }
 
     start_info->flags = dom->flags;
     start_info->store_mfn = xc_dom_p2m_guest(dom, dom->xenstore_pfn);
@@ -680,6 +784,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_32,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 4,
+    .p2m_base_supported = 0,
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_p2m_list = alloc_p2m_list,
     .count_pgtables = count_pgtables_x86_32_pae,
@@ -694,6 +799,7 @@ static struct xc_dom_arch xc_dom_64 = {
     .native_protocol = XEN_IO_PROTO_ABI_X86_64,
     .page_shift = PAGE_SHIFT_X86,
     .sizeof_pfn = 8,
+    .p2m_base_supported = 1,
     .alloc_magic_pages = alloc_magic_pages,
     .alloc_p2m_list = alloc_p2m_list,
     .count_pgtables = count_pgtables_x86_64,
@@ -1025,7 +1131,10 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
     if ( !xc_dom_feature_translated(dom) )
     {
         /* paravirtualized guest */
+
+        /* Drop references to all initial page tables before pinning. */
         xc_dom_unmap_one(dom, dom->pgtables_seg.pfn);
+        xc_dom_unmap_one(dom, dom->p2m_seg.pfn);
         rc = pin_table(dom->xch, pgd_type,
                        xc_dom_p2m_host(dom, dom->pgtables_seg.pfn),
                        dom->guest_domid);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages
  2015-10-02  5:49 ` [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
@ 2015-10-02  9:29   ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2015-10-02  9:29 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> Carve out the p2m list allocation from the .alloc_magic_pages hook of
> the domain builder in order to prepare allocating the p2m list outside
> of the initial kernel mapping. This will be needed to support loading
> domains with huge memory (>512 GB).

Looks good, thanks.

> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  tools/libxc/include/xc_dom.h |  1 +
>  tools/libxc/xc_dom_core.c    |  3 +++
>  tools/libxc/xc_dom_x86.c     | 11 ++++++++++-
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index 5eeff15..9117269 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -197,6 +197,7 @@ void xc_dom_register_loader(struct xc_dom_loader
> *loader);
>  struct xc_dom_arch {
>      /* pagetable setup */
>      int (*alloc_magic_pages) (struct xc_dom_image * dom);
> +    int (*alloc_p2m_list) (struct xc_dom_image * dom);
>      int (*count_pgtables) (struct xc_dom_image * dom);
>      int (*setup_pgtables) (struct xc_dom_image * dom);
>  
> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
> index 85b531a..bd970c5 100644
> --- a/tools/libxc/xc_dom_core.c
> +++ b/tools/libxc/xc_dom_core.c
> @@ -1047,6 +1047,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>      }
>  
>      /* allocate other pages */
> +    if ( dom->arch_hooks->alloc_p2m_list &&
> +         dom->arch_hooks->alloc_p2m_list(dom) != 0 )
> +        goto err;
>      if ( dom->arch_hooks->alloc_magic_pages(dom) != 0 )
>          goto err;
>      if ( dom->arch_hooks->count_pgtables )
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index e2f3792..972f081 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -439,7 +439,7 @@ pfn_error:
>  
>  /* ---------------------------------------------------------------------
> --- */
>  
> -static int alloc_magic_pages(struct xc_dom_image *dom)
> +static int alloc_p2m_list(struct xc_dom_image *dom)
>  {
>      size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
>  
> @@ -451,6 +451,13 @@ static int alloc_magic_pages(struct xc_dom_image
> *dom)
>      if ( dom->p2m_guest == NULL )
>          return -1;
>  
> +    return 0;
> +}
> +
> +/* ---------------------------------------------------------------------
> --- */
> +
> +static int alloc_magic_pages(struct xc_dom_image *dom)
> +{
>      /* allocate special pages */
>      dom->start_info_pfn = xc_dom_alloc_page(dom, "start info");
>      dom->xenstore_pfn = xc_dom_alloc_page(dom, "xenstore");
> @@ -674,6 +681,7 @@ static struct xc_dom_arch xc_dom_32_pae = {
>      .page_shift = PAGE_SHIFT_X86,
>      .sizeof_pfn = 4,
>      .alloc_magic_pages = alloc_magic_pages,
> +    .alloc_p2m_list = alloc_p2m_list,
>      .count_pgtables = count_pgtables_x86_32_pae,
>      .setup_pgtables = setup_pgtables_x86_32_pae,
>      .start_info = start_info_x86_32,
> @@ -687,6 +695,7 @@ static struct xc_dom_arch xc_dom_64 = {
>      .page_shift = PAGE_SHIFT_X86,
>      .sizeof_pfn = 8,
>      .alloc_magic_pages = alloc_magic_pages,
> +    .alloc_p2m_list = alloc_p2m_list,
>      .count_pgtables = count_pgtables_x86_64,
>      .setup_pgtables = setup_pgtables_x86_64,
>      .start_info = start_info_x86_64,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  5:49 ` [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
@ 2015-10-02  9:37   ` Andrew Cooper
  2015-10-02  9:41     ` Jan Beulich
  2015-10-02  9:44     ` Juergen Gross
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Cooper @ 2015-10-02  9:37 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2
  Cc: Tim Deegan, Keir Fraser, Jan Beulich

On 02/10/15 06:49, Juergen Gross wrote:
> Support of an unmapped initrd is indicated by the kernel of the domain
> via elf notes. In order not to have to use raw elf data in the tools
> for support of an unmapped initrd add a flag to the parsed data area
> to indicate the kernel supporting this feature.
>
> Switch using this flag in the hypervisor domain builder.
>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Tim Deegan <tim@xen.org>
> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/arch/x86/domain_build.c        | 4 ++--
>  xen/common/libelf/libelf-dominfo.c | 3 +++
>  xen/include/xen/libelf.h           | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index c2ef87a..a02c9fb 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
>  
>          vstart = parms->virt_base;
>          vend = round_pgup(parms->virt_kend);
> -        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> +        if ( !parms->mod_start_pfn )
>              vend += round_pgup(initrd_len);
>          end = vend + nr_pages * sizeof_long;
>  
> @@ -1037,7 +1037,7 @@ int __init construct_dom0(
>      v_start          = parms.virt_base;
>      vkern_start      = parms.virt_kstart;
>      vkern_end        = parms.virt_kend;
> -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> +    if ( parms.mod_start_pfn )
>      {
>          vinitrd_start  = vinitrd_end = 0;
>          vphysmap_start = round_pgup(vkern_end);
> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
> index 3de1c23..31d8436 100644
> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>      case XEN_ELFNOTE_INIT_P2M:
>          parms->p2m_base = val;
>          break;
> +    case XEN_ELFNOTE_MOD_START_PFN:
> +        parms->mod_start_pfn = !!val;
> +        break;
>      case XEN_ELFNOTE_PADDR_OFFSET:
>          parms->elf_paddr_offset = val;
>          break;
> diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
> index d7045f6..07b96a3 100644
> --- a/xen/include/xen/libelf.h
> +++ b/xen/include/xen/libelf.h
> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>      char loader[16];
>      enum xen_pae_type pae;
>      bool bsd_symtab;
> +    bool mod_start_pfn;

The _pfn suffix here is confusing given the type of bool.

Perhaps "has_initrd" is a better choice of name?  The rest of the patch
looks fine.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  9:37   ` Andrew Cooper
@ 2015-10-02  9:41     ` Jan Beulich
  2015-10-02  9:44     ` Juergen Gross
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2015-10-02  9:41 UTC (permalink / raw)
  To: andrew.cooper3
  Cc: Juergen Gross, wei.liu2, Ian.Campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, keir

>>> Andrew Cooper <andrew.cooper3@citrix.com> 10/02/15 11:38 AM >>>
>On 02/10/15 06:49, Juergen Gross wrote:
>> --- a/xen/include/xen/libelf.h
>> +++ b/xen/include/xen/libelf.h
>> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>>      char loader[16];
>>      enum xen_pae_type pae;
>>      bool bsd_symtab;
>> +    bool mod_start_pfn;
>
>The _pfn suffix here is confusing given the type of bool.

I had written a reply to that effect already, but then decided it's no less confusing
than the name of the ELF note itself.

>Perhaps "has_initrd" is a better choice of name?  The rest of the patch
>looks fine.

How would "has_initrd" express the purpose of the flag?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  9:37   ` Andrew Cooper
  2015-10-02  9:41     ` Jan Beulich
@ 2015-10-02  9:44     ` Juergen Gross
  2015-10-02  9:53       ` Andrew Cooper
  1 sibling, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02  9:44 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2
  Cc: Tim Deegan, Keir Fraser, Jan Beulich

On 10/02/2015 11:37 AM, Andrew Cooper wrote:
> On 02/10/15 06:49, Juergen Gross wrote:
>> Support of an unmapped initrd is indicated by the kernel of the domain
>> via elf notes. In order not to have to use raw elf data in the tools
>> for support of an unmapped initrd add a flag to the parsed data area
>> to indicate the kernel supporting this feature.
>>
>> Switch using this flag in the hypervisor domain builder.
>>
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Tim Deegan <tim@xen.org>
>> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   xen/arch/x86/domain_build.c        | 4 ++--
>>   xen/common/libelf/libelf-dominfo.c | 3 +++
>>   xen/include/xen/libelf.h           | 1 +
>>   3 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
>> index c2ef87a..a02c9fb 100644
>> --- a/xen/arch/x86/domain_build.c
>> +++ b/xen/arch/x86/domain_build.c
>> @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
>>
>>           vstart = parms->virt_base;
>>           vend = round_pgup(parms->virt_kend);
>> -        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>> +        if ( !parms->mod_start_pfn )
>>               vend += round_pgup(initrd_len);
>>           end = vend + nr_pages * sizeof_long;
>>
>> @@ -1037,7 +1037,7 @@ int __init construct_dom0(
>>       v_start          = parms.virt_base;
>>       vkern_start      = parms.virt_kstart;
>>       vkern_end        = parms.virt_kend;
>> -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>> +    if ( parms.mod_start_pfn )
>>       {
>>           vinitrd_start  = vinitrd_end = 0;
>>           vphysmap_start = round_pgup(vkern_end);
>> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
>> index 3de1c23..31d8436 100644
>> --- a/xen/common/libelf/libelf-dominfo.c
>> +++ b/xen/common/libelf/libelf-dominfo.c
>> @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>       case XEN_ELFNOTE_INIT_P2M:
>>           parms->p2m_base = val;
>>           break;
>> +    case XEN_ELFNOTE_MOD_START_PFN:
>> +        parms->mod_start_pfn = !!val;
>> +        break;
>>       case XEN_ELFNOTE_PADDR_OFFSET:
>>           parms->elf_paddr_offset = val;
>>           break;
>> diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
>> index d7045f6..07b96a3 100644
>> --- a/xen/include/xen/libelf.h
>> +++ b/xen/include/xen/libelf.h
>> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>>       char loader[16];
>>       enum xen_pae_type pae;
>>       bool bsd_symtab;
>> +    bool mod_start_pfn;
>
> The _pfn suffix here is confusing given the type of bool.
>
> Perhaps "has_initrd" is a better choice of name?  The rest of the patch
> looks fine.

Hmm, I just followed the naming of the note index in the raw data:
XEN_ELFNOTE_MOD_START_PFN. "has_initrd" would be completely misleading:
the flag doesn't indicate the support of an initrd, but the support of
an initrd (or more general: module as understood by grub) not covered by
the initial kernel mapping and due to this specified by it's starting
pfn instead it's virtual address.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  9:44     ` Juergen Gross
@ 2015-10-02  9:53       ` Andrew Cooper
  2015-10-02 10:01         ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2015-10-02  9:53 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2
  Cc: Tim Deegan, Keir Fraser, Jan Beulich

On 02/10/15 10:44, Juergen Gross wrote:
> On 10/02/2015 11:37 AM, Andrew Cooper wrote:
>> On 02/10/15 06:49, Juergen Gross wrote:
>>> Support of an unmapped initrd is indicated by the kernel of the domain
>>> via elf notes. In order not to have to use raw elf data in the tools
>>> for support of an unmapped initrd add a flag to the parsed data area
>>> to indicate the kernel supporting this feature.
>>>
>>> Switch using this flag in the hypervisor domain builder.
>>>
>>> Cc: Keir Fraser <keir@xen.org>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: Tim Deegan <tim@xen.org>
>>> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>   xen/arch/x86/domain_build.c        | 4 ++--
>>>   xen/common/libelf/libelf-dominfo.c | 3 +++
>>>   xen/include/xen/libelf.h           | 1 +
>>>   3 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
>>> index c2ef87a..a02c9fb 100644
>>> --- a/xen/arch/x86/domain_build.c
>>> +++ b/xen/arch/x86/domain_build.c
>>> @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
>>>
>>>           vstart = parms->virt_base;
>>>           vend = round_pgup(parms->virt_kend);
>>> -        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>>> +        if ( !parms->mod_start_pfn )
>>>               vend += round_pgup(initrd_len);
>>>           end = vend + nr_pages * sizeof_long;
>>>
>>> @@ -1037,7 +1037,7 @@ int __init construct_dom0(
>>>       v_start          = parms.virt_base;
>>>       vkern_start      = parms.virt_kstart;
>>>       vkern_end        = parms.virt_kend;
>>> -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>>> +    if ( parms.mod_start_pfn )
>>>       {
>>>           vinitrd_start  = vinitrd_end = 0;
>>>           vphysmap_start = round_pgup(vkern_end);
>>> diff --git a/xen/common/libelf/libelf-dominfo.c
>>> b/xen/common/libelf/libelf-dominfo.c
>>> index 3de1c23..31d8436 100644
>>> --- a/xen/common/libelf/libelf-dominfo.c
>>> +++ b/xen/common/libelf/libelf-dominfo.c
>>> @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct
>>> elf_binary *elf,
>>>       case XEN_ELFNOTE_INIT_P2M:
>>>           parms->p2m_base = val;
>>>           break;
>>> +    case XEN_ELFNOTE_MOD_START_PFN:
>>> +        parms->mod_start_pfn = !!val;
>>> +        break;
>>>       case XEN_ELFNOTE_PADDR_OFFSET:
>>>           parms->elf_paddr_offset = val;
>>>           break;
>>> diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
>>> index d7045f6..07b96a3 100644
>>> --- a/xen/include/xen/libelf.h
>>> +++ b/xen/include/xen/libelf.h
>>> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>>>       char loader[16];
>>>       enum xen_pae_type pae;
>>>       bool bsd_symtab;
>>> +    bool mod_start_pfn;
>>
>> The _pfn suffix here is confusing given the type of bool.
>>
>> Perhaps "has_initrd" is a better choice of name?  The rest of the patch
>> looks fine.
>
> Hmm, I just followed the naming of the note index in the raw data:
> XEN_ELFNOTE_MOD_START_PFN. "has_initrd" would be completely misleading:
> the flag doesn't indicate the support of an initrd, but the support of
> an initrd (or more general: module as understood by grub) not covered by
> the initial kernel mapping and due to this specified by it's starting
> pfn instead it's virtual address.

It would appear that a lot of the naming around there is confusing.

Would "unmapped_initrd" be a better name then?

If not, I am not too fussed, seeing as it matches the (questionably
named) ELF note.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02  9:53       ` Andrew Cooper
@ 2015-10-02 10:01         ` Juergen Gross
  2015-10-02 10:22           ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 10:01 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2
  Cc: Tim Deegan, Keir Fraser, Jan Beulich

On 10/02/2015 11:53 AM, Andrew Cooper wrote:
> On 02/10/15 10:44, Juergen Gross wrote:
>> On 10/02/2015 11:37 AM, Andrew Cooper wrote:
>>> On 02/10/15 06:49, Juergen Gross wrote:
>>>> Support of an unmapped initrd is indicated by the kernel of the domain
>>>> via elf notes. In order not to have to use raw elf data in the tools
>>>> for support of an unmapped initrd add a flag to the parsed data area
>>>> to indicate the kernel supporting this feature.
>>>>
>>>> Switch using this flag in the hypervisor domain builder.
>>>>
>>>> Cc: Keir Fraser <keir@xen.org>
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Cc: Tim Deegan <tim@xen.org>
>>>> Suggested-by: Ian Campbell <ian.campbell@citrix.com>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>    xen/arch/x86/domain_build.c        | 4 ++--
>>>>    xen/common/libelf/libelf-dominfo.c | 3 +++
>>>>    xen/include/xen/libelf.h           | 1 +
>>>>    3 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
>>>> index c2ef87a..a02c9fb 100644
>>>> --- a/xen/arch/x86/domain_build.c
>>>> +++ b/xen/arch/x86/domain_build.c
>>>> @@ -353,7 +353,7 @@ static unsigned long __init compute_dom0_nr_pages(
>>>>
>>>>            vstart = parms->virt_base;
>>>>            vend = round_pgup(parms->virt_kend);
>>>> -        if ( !parms->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>>>> +        if ( !parms->mod_start_pfn )
>>>>                vend += round_pgup(initrd_len);
>>>>            end = vend + nr_pages * sizeof_long;
>>>>
>>>> @@ -1037,7 +1037,7 @@ int __init construct_dom0(
>>>>        v_start          = parms.virt_base;
>>>>        vkern_start      = parms.virt_kstart;
>>>>        vkern_end        = parms.virt_kend;
>>>> -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
>>>> +    if ( parms.mod_start_pfn )
>>>>        {
>>>>            vinitrd_start  = vinitrd_end = 0;
>>>>            vphysmap_start = round_pgup(vkern_end);
>>>> diff --git a/xen/common/libelf/libelf-dominfo.c
>>>> b/xen/common/libelf/libelf-dominfo.c
>>>> index 3de1c23..31d8436 100644
>>>> --- a/xen/common/libelf/libelf-dominfo.c
>>>> +++ b/xen/common/libelf/libelf-dominfo.c
>>>> @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct
>>>> elf_binary *elf,
>>>>        case XEN_ELFNOTE_INIT_P2M:
>>>>            parms->p2m_base = val;
>>>>            break;
>>>> +    case XEN_ELFNOTE_MOD_START_PFN:
>>>> +        parms->mod_start_pfn = !!val;
>>>> +        break;
>>>>        case XEN_ELFNOTE_PADDR_OFFSET:
>>>>            parms->elf_paddr_offset = val;
>>>>            break;
>>>> diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
>>>> index d7045f6..07b96a3 100644
>>>> --- a/xen/include/xen/libelf.h
>>>> +++ b/xen/include/xen/libelf.h
>>>> @@ -423,6 +423,7 @@ struct elf_dom_parms {
>>>>        char loader[16];
>>>>        enum xen_pae_type pae;
>>>>        bool bsd_symtab;
>>>> +    bool mod_start_pfn;
>>>
>>> The _pfn suffix here is confusing given the type of bool.
>>>
>>> Perhaps "has_initrd" is a better choice of name?  The rest of the patch
>>> looks fine.
>>
>> Hmm, I just followed the naming of the note index in the raw data:
>> XEN_ELFNOTE_MOD_START_PFN. "has_initrd" would be completely misleading:
>> the flag doesn't indicate the support of an initrd, but the support of
>> an initrd (or more general: module as understood by grub) not covered by
>> the initial kernel mapping and due to this specified by it's starting
>> pfn instead it's virtual address.
>
> It would appear that a lot of the naming around there is confusing.
>
> Would "unmapped_initrd" be a better name then?

Hmm, in theory, yes.

The question is, whether the name should fit to the elf note's name or
to it's semantics. The name of the elf note will be used in libelf and
in the Linux kernel (and possibly other kernels as well), while the
flag name will be used in the domain builders (hypervisor and tools).

Which flag name will be less confusing?

> If not, I am not too fussed, seeing as it matches the (questionably
> named) ELF note.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd
  2015-10-02 10:01         ` Juergen Gross
@ 2015-10-02 10:22           ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 10:22 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, xen-devel, ian.jackson,
	stefano.stabellini, wei.liu2
  Cc: Tim Deegan, Keir Fraser, Jan Beulich

On Fri, 2015-10-02 at 12:01 +0200, Juergen Gross wrote:
> On 10/02/2015 11:53 AM, Andrew Cooper wrote:
> > On 02/10/15 10:44, Juergen Gross wrote:
> > > On 10/02/2015 11:37 AM, Andrew Cooper wrote:
> > > > On 02/10/15 06:49, Juergen Gross wrote:
> > > > > Support of an unmapped initrd is indicated by the kernel of the
> > > > > domain
> > > > > via elf notes. In order not to have to use raw elf data in the
> > > > > tools
> > > > > for support of an unmapped initrd add a flag to the parsed data
> > > > > area
> > > > > to indicate the kernel supporting this feature.
> > > > > 
> > > > > Switch using this flag in the hypervisor domain builder.
> > > > > 
> > > > > Cc: Keir Fraser <keir@xen.org>
> > > > > Cc: Jan Beulich <jbeulich@suse.com>
> > > > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > > > > Cc: Tim Deegan <tim@xen.org>
> > > > > Suggested-by: Ian Campbell <ian.campbell@citrix.com>
> > > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > > ---
> > > > >    xen/arch/x86/domain_build.c        | 4 ++--
> > > > >    xen/common/libelf/libelf-dominfo.c | 3 +++
> > > > >    xen/include/xen/libelf.h           | 1 +
> > > > >    3 files changed, 6 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/xen/arch/x86/domain_build.c
> > > > > b/xen/arch/x86/domain_build.c
> > > > > index c2ef87a..a02c9fb 100644
> > > > > --- a/xen/arch/x86/domain_build.c
> > > > > +++ b/xen/arch/x86/domain_build.c
> > > > > @@ -353,7 +353,7 @@ static unsigned long __init
> > > > > compute_dom0_nr_pages(
> > > > > 
> > > > >            vstart = parms->virt_base;
> > > > >            vend = round_pgup(parms->virt_kend);
> > > > > -        if ( !parms
> > > > > ->elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> > > > > +        if ( !parms->mod_start_pfn )
> > > > >                vend += round_pgup(initrd_len);
> > > > >            end = vend + nr_pages * sizeof_long;
> > > > > 
> > > > > @@ -1037,7 +1037,7 @@ int __init construct_dom0(
> > > > >        v_start          = parms.virt_base;
> > > > >        vkern_start      = parms.virt_kstart;
> > > > >        vkern_end        = parms.virt_kend;
> > > > > -    if ( parms.elf_notes[XEN_ELFNOTE_MOD_START_PFN].data.num )
> > > > > +    if ( parms.mod_start_pfn )
> > > > >        {
> > > > >            vinitrd_start  = vinitrd_end = 0;
> > > > >            vphysmap_start = round_pgup(vkern_end);
> > > > > diff --git a/xen/common/libelf/libelf-dominfo.c
> > > > > b/xen/common/libelf/libelf-dominfo.c
> > > > > index 3de1c23..31d8436 100644
> > > > > --- a/xen/common/libelf/libelf-dominfo.c
> > > > > +++ b/xen/common/libelf/libelf-dominfo.c
> > > > > @@ -190,6 +190,9 @@ elf_errorstatus elf_xen_parse_note(struct
> > > > > elf_binary *elf,
> > > > >        case XEN_ELFNOTE_INIT_P2M:
> > > > >            parms->p2m_base = val;
> > > > >            break;
> > > > > +    case XEN_ELFNOTE_MOD_START_PFN:
> > > > > +        parms->mod_start_pfn = !!val;
> > > > > +        break;
> > > > >        case XEN_ELFNOTE_PADDR_OFFSET:
> > > > >            parms->elf_paddr_offset = val;
> > > > >            break;
> > > > > diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
> > > > > index d7045f6..07b96a3 100644
> > > > > --- a/xen/include/xen/libelf.h
> > > > > +++ b/xen/include/xen/libelf.h
> > > > > @@ -423,6 +423,7 @@ struct elf_dom_parms {
> > > > >        char loader[16];
> > > > >        enum xen_pae_type pae;
> > > > >        bool bsd_symtab;
> > > > > +    bool mod_start_pfn;
> > > > 
> > > > The _pfn suffix here is confusing given the type of bool.
> > > > 
> > > > Perhaps "has_initrd" is a better choice of name?  The rest of the
> > > > patch
> > > > looks fine.
> > > 
> > > Hmm, I just followed the naming of the note index in the raw data:
> > > XEN_ELFNOTE_MOD_START_PFN. "has_initrd" would be completely
> > > misleading:
> > > the flag doesn't indicate the support of an initrd, but the support
> > > of
> > > an initrd (or more general: module as understood by grub) not covered
> > > by
> > > the initial kernel mapping and due to this specified by it's starting
> > > pfn instead it's virtual address.
> > 
> > It would appear that a lot of the naming around there is confusing.
> > 
> > Would "unmapped_initrd" be a better name then?
> 
> Hmm, in theory, yes.
> 
> The question is, whether the name should fit to the elf note's name or
> to it's semantics. The name of the elf note will be used in libelf and
> in the Linux kernel (and possibly other kernels as well), while the
> flag name will be used in the domain builders (hypervisor and tools).
> 
> Which flag name will be less confusing?

My £0.02:

Given that this is in effect a kind of abstraction over the implementation
of what is in the ELF notes I think it would be fine and somewhat desirable
to have a clearer name at this layer (accepting that the layering is a bit
vague/implicit/etc).

> 
> > If not, I am not too fussed, seeing as it matches the (questionably
> > named) ELF note.
> 
> 
> Juergen
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02  5:49 ` [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported Juergen Gross
@ 2015-10-02 12:59   ` Ian Campbell
  2015-10-02 14:46     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 12:59 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> In case the kernel of a new pv-domU indicates it is supporting an
> unmapped initrd, don't waste precious virtual space for the initrd,
> but allocate only guest physical memory for it.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
> index b510bbd..85b531a 100644
> --- a/tools/libxc/xc_dom_core.c
> +++ b/tools/libxc/xc_dom_core.c
> @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>      if ( dom->kernel_loader->loader(dom) != 0 )
>          goto err;
>  
> -    /* load ramdisk */
> -    if ( dom->ramdisk_blob )
> +    /* Load ramdisk if initial mapping required. */
> +    if ( dom->ramdisk_blob &&
> +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart) )
>      {
>          if ( xc_dom_build_ramdisk(dom) != 0 )
>              goto err;
> @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>                __FUNCTION__, dom->virt_alloc_end);
>      DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
>                __FUNCTION__, dom->virt_pgtab_end);
> +
> +    /* Prepare allocating unmapped memory. */
> +    if ( dom->virt_pgtab_end )
> +        dom->virt_alloc_end = dom->virt_pgtab_end;
> +
> +    /* Load ramdisk if no initial mapping required. */
> +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
> +         dom->parms.mod_start_pfn )
> +    {
> +        if ( xc_dom_build_ramdisk(dom) != 0 )
> +            goto err;
> +        dom->flags |= SIF_MOD_START_PFN;
> +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
> +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
> +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;

This seems like it is trying to do something clever, like partially
reversing something which the xc_dom_alloc_segment call in
 xc_dom_build_ramdisk has done.

It looks like the vend handling in particular is just a complicated way of
subtracting vstart and adding pfn, with the aim of rebasing from virt to
phys world.

Plus vstart/end are addresses, while presumably pfn is a page number, so
I'm confused about that as well.

I'm also not clear how/where the virtual mapping is avoided, given that
this code here does strictly more than the original code above which is now
made conditional does.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image
  2015-10-02  5:49 ` [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image Juergen Gross
@ 2015-10-02 13:01   ` Ian Campbell
  2015-10-02 14:25     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 13:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> The allocate() callback in struct xc_dom_image is never set. Remove it.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

This breaks the stubdom build:

kexec.c: In function ‘kexec’:
kexec.c:221:78: warning: taking address of expression of type ‘void’
     xen_pfn_t boot_page_mfn = virt_to_mfn(&_boot_page);
                                                                              ^
kexec.c:230:8: error: ‘struct xc_dom_image’ has no member named ‘allocate’
     dom->allocate = kexec_allocate;
        ^
kexec.c:318:60: warning: taking address of expression of type ‘void’
             virt_to_mfn(&_boot_page));
                                                            ^
Makefile:79: recipe for target '/local/scratch/ianc/devel/committer-amd64.git/stubdom/grub-x86_64/kexec.o' failed

On i386 too.

And in fact that hook looks useful in that context, so either it needs to
stay of stubdom kexec needs changing to work some other way.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported
  2015-10-02  5:49 ` [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
@ 2015-10-02 13:16   ` Ian Campbell
  2015-10-02 14:37     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 13:16 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>  
> +    /* Allocate p2m list if outside of initial kernel mapping. */
> +    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR )
> +    {
> +        if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 )
> +            goto err;
> +        dom->p2m_seg.vend = dom->p2m_seg.vend - dom->p2m_seg.vstart;
> +        dom->p2m_seg.vstart = dom->parms.p2m_base;
> +        dom->p2m_seg.vend += dom->p2m_seg.vstart;

Here is this strange pattern again.

It seems like you should be adding new APIs to the dom builder's VA/PA
allocation stuff and using those instead of working around the behaviour of
the existing ones.

> +    }
> +    /*
> +     * Build the page tables for mapping the p2m list at an address
> +     * specified by the to be loaded kernel.
> +     * l1pfn holds the pfn of the next page table to allocate.
> +     * At each level we might already have an entry filled when setting
> +     * up the initial kernel mapping. This can happen for the last entry
> +     * of each level only!
> +     */
> +    l3tab = NULL;
> +    l2tab = NULL;
> +    l1tab = NULL;
> +    l1pfn = round_pfn(dom->p2m_size * dom->arch_hooks->sizeof_pfn) +
> +            dom->p2m_seg.pfn;
> +
> +    for ( addr = dom->parms.p2m_base;
> +          addr < dom->parms.p2m_base +
> +                 dom->p2m_size * dom->arch_hooks->sizeof_pfn;
> +          addr += PAGE_SIZE_X86 )

This is replicating a bunch of existing setup_pgtable_* code.

Please refactor into a helper (one per PT layout) to map a region and use
that for the existing and new use cases.

> +    {
> +        if ( l3tab == NULL )
> +        {
> +            l4off = l4_table_offset_x86_64(addr);
> +            l3pfn = l4tab[l4off] ? l4pfn + dom->pg_l4 : l1pfn++;
> +            l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
> +            if ( l3tab == NULL )
> +                goto pfn_error;
> +            l4tab[l4off] =
> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l3pfn)) | L4_PROT;
> +        }
> +
> +        if ( l2tab == NULL )
> +        {
> +            l3off = l3_table_offset_x86_64(addr);
> +            l2pfn = l3tab[l3off] ? l3pfn + dom->pg_l3 : l1pfn++;
> +            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
> +            if ( l2tab == NULL )
> +                goto pfn_error;
> +            l3tab[l3off] =
> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
> +        }
> +
> +        if ( l1tab == NULL )
> +        {
> +            l2off = l2_table_offset_x86_64(addr);
> +            l1pfn = l2tab[l2off] ? l2pfn + dom->pg_l2 : l1pfn;
> +            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
> +            if ( l1tab == NULL )
> +                goto pfn_error;
> +            l2tab[l2off] =
> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> +            l1pfn++;
> +        }
> +
> +        l1off = l1_table_offset_x86_64(addr);
> +        pgpfn = ((addr - dom->parms.p2m_base) >> PAGE_SHIFT_X86) +
> +                dom->p2m_seg.pfn;
> +        l1tab[l1off] =
> +            pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
> +
> +        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
> +        {
> +            l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> +            {
> +                l2tab = NULL;
> +                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> +                    l3tab = NULL;
> +            }
> +        }
> +    }
> +
>      return 0;
>  
>  pfn_error:
> @@ -442,6 +519,27 @@ pfn_error:
>  static int alloc_p2m_list(struct xc_dom_image *dom)
>  {
>      size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
> +    xen_vaddr_t from, to;
> +    xen_pfn_t tables;
> +
> +    p2m_alloc_size = round_pg(p2m_alloc_size);
> +    if ( dom->parms.p2m_base != UNSET_ADDR )
> +    {
> +        /* Add space for page tables, 64 bit only. */

Please make an alloc_p2m_list_x86_64 which does this and then calls the
common code and then use the appropriate hook for each sub arch.


> +        from = dom->parms.p2m_base;
> +        to = from + p2m_alloc_size - 1;
> +        tables = 0;
> +        tables += nr_page_tables(dom, from, to,
> L4_PAGETABLE_SHIFT_X86_64);
> +        if ( to > (xen_vaddr_t)(~0ULL << L4_PAGETABLE_SHIFT_X86_64) )
> +            tables--;
> +        tables += nr_page_tables(dom, from, to,
> L3_PAGETABLE_SHIFT_X86_64);
> +        if ( to > (xen_vaddr_t)(~0ULL << L3_PAGETABLE_SHIFT_X86_64) )
> +            tables--;
> +        tables += nr_page_tables(dom, from, to,
> L2_PAGETABLE_SHIFT_X86_64);
> +        if ( to > (xen_vaddr_t)(~0ULL << L2_PAGETABLE_SHIFT_X86_64) )
> +            tables--;
> +        p2m_alloc_size += tables << PAGE_SHIFT_X86;
> +    }
>  
>      /* allocate phys2mach table */
>      if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image
  2015-10-02 13:01   ` Ian Campbell
@ 2015-10-02 14:25     ` Juergen Gross
  2015-10-02 14:47       ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 14:25 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 03:01 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>> The allocate() callback in struct xc_dom_image is never set. Remove it.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
>
> This breaks the stubdom build:
>
> kexec.c: In function ‘kexec’:
> kexec.c:221:78: warning: taking address of expression of type ‘void’
>       xen_pfn_t boot_page_mfn = virt_to_mfn(&_boot_page);
>                                                                                ^
> kexec.c:230:8: error: ‘struct xc_dom_image’ has no member named ‘allocate’
>       dom->allocate = kexec_allocate;
>          ^
> kexec.c:318:60: warning: taking address of expression of type ‘void’
>               virt_to_mfn(&_boot_page));
>                                                              ^
> Makefile:79: recipe for target '/local/scratch/ianc/devel/committer-amd64.git/stubdom/grub-x86_64/kexec.o' failed
>
> On i386 too.
>
> And in fact that hook looks useful in that context, so either it needs to
> stay of stubdom kexec needs changing to work some other way.

Too bad.

I wanted to remove the allocate callback as it will conflict with the
allocations of memory outside the initial default mapping.

Just to make sure I understand this correctly:

stubdom is used in this context to support grub running as a pv domain
capable to start another pv domain.

So as long as stubdom doesn't support mapping the p2m list outside the
default mapping it makes no sense to support this feature for any domain
started via stubdom/grub (the main reason to use this feature is the
support of huge memory causing the p2m list to exceed the available
virtual address space of the default mapping).

So the easy solution would be to not support initrd and p2m outside the
default mapping when the allocate callback is set. Do you think this
solution is okay?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported
  2015-10-02 13:16   ` Ian Campbell
@ 2015-10-02 14:37     ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 14:37 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 03:16 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>>
>> +    /* Allocate p2m list if outside of initial kernel mapping. */
>> +    if ( dom->arch_hooks->alloc_p2m_list && dom->parms.p2m_base != UNSET_ADDR )
>> +    {
>> +        if ( dom->arch_hooks->alloc_p2m_list(dom) != 0 )
>> +            goto err;
>> +        dom->p2m_seg.vend = dom->p2m_seg.vend - dom->p2m_seg.vstart;
>> +        dom->p2m_seg.vstart = dom->parms.p2m_base;
>> +        dom->p2m_seg.vend += dom->p2m_seg.vstart;
>
> Here is this strange pattern again.
>
> It seems like you should be adding new APIs to the dom builder's VA/PA
> allocation stuff and using those instead of working around the behaviour of
> the existing ones.

Okay. I'll split allocation into two layers (virtual and physical)
and make the virtual one accessible for other functions. I'll need
to keep track of physical and virtual allocation boundaries, of
course.

>
>> +    }
>> +    /*
>> +     * Build the page tables for mapping the p2m list at an address
>> +     * specified by the to be loaded kernel.
>> +     * l1pfn holds the pfn of the next page table to allocate.
>> +     * At each level we might already have an entry filled when setting
>> +     * up the initial kernel mapping. This can happen for the last entry
>> +     * of each level only!
>> +     */
>> +    l3tab = NULL;
>> +    l2tab = NULL;
>> +    l1tab = NULL;
>> +    l1pfn = round_pfn(dom->p2m_size * dom->arch_hooks->sizeof_pfn) +
>> +            dom->p2m_seg.pfn;
>> +
>> +    for ( addr = dom->parms.p2m_base;
>> +          addr < dom->parms.p2m_base +
>> +                 dom->p2m_size * dom->arch_hooks->sizeof_pfn;
>> +          addr += PAGE_SIZE_X86 )
>
> This is replicating a bunch of existing setup_pgtable_* code.
>
> Please refactor into a helper (one per PT layout) to map a region and use
> that for the existing and new use cases.

Hmm, I thought about this, too. The problem is there are subtle
differences between both variants. I can give it a try and see how
the code looks like.

>
>> +    {
>> +        if ( l3tab == NULL )
>> +        {
>> +            l4off = l4_table_offset_x86_64(addr);
>> +            l3pfn = l4tab[l4off] ? l4pfn + dom->pg_l4 : l1pfn++;
>> +            l3tab = xc_dom_pfn_to_ptr(dom, l3pfn, 1);
>> +            if ( l3tab == NULL )
>> +                goto pfn_error;
>> +            l4tab[l4off] =
>> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l3pfn)) | L4_PROT;
>> +        }
>> +
>> +        if ( l2tab == NULL )
>> +        {
>> +            l3off = l3_table_offset_x86_64(addr);
>> +            l2pfn = l3tab[l3off] ? l3pfn + dom->pg_l3 : l1pfn++;
>> +            l2tab = xc_dom_pfn_to_ptr(dom, l2pfn, 1);
>> +            if ( l2tab == NULL )
>> +                goto pfn_error;
>> +            l3tab[l3off] =
>> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
>> +        }
>> +
>> +        if ( l1tab == NULL )
>> +        {
>> +            l2off = l2_table_offset_x86_64(addr);
>> +            l1pfn = l2tab[l2off] ? l2pfn + dom->pg_l2 : l1pfn;
>> +            l1tab = xc_dom_pfn_to_ptr(dom, l1pfn, 1);
>> +            if ( l1tab == NULL )
>> +                goto pfn_error;
>> +            l2tab[l2off] =
>> +                pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
>> +            l1pfn++;
>> +        }
>> +
>> +        l1off = l1_table_offset_x86_64(addr);
>> +        pgpfn = ((addr - dom->parms.p2m_base) >> PAGE_SHIFT_X86) +
>> +                dom->p2m_seg.pfn;
>> +        l1tab[l1off] =
>> +            pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT;
>> +
>> +        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
>> +        {
>> +            l1tab = NULL;
>> +            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
>> +            {
>> +                l2tab = NULL;
>> +                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
>> +                    l3tab = NULL;
>> +            }
>> +        }
>> +    }
>> +
>>       return 0;
>>
>>   pfn_error:
>> @@ -442,6 +519,27 @@ pfn_error:
>>   static int alloc_p2m_list(struct xc_dom_image *dom)
>>   {
>>       size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
>> +    xen_vaddr_t from, to;
>> +    xen_pfn_t tables;
>> +
>> +    p2m_alloc_size = round_pg(p2m_alloc_size);
>> +    if ( dom->parms.p2m_base != UNSET_ADDR )
>> +    {
>> +        /* Add space for page tables, 64 bit only. */
>
> Please make an alloc_p2m_list_x86_64 which does this and then calls the
> common code and then use the appropriate hook for each sub arch.

Okay.

>> +        from = dom->parms.p2m_base;
>> +        to = from + p2m_alloc_size - 1;
>> +        tables = 0;
>> +        tables += nr_page_tables(dom, from, to,
>> L4_PAGETABLE_SHIFT_X86_64);
>> +        if ( to > (xen_vaddr_t)(~0ULL << L4_PAGETABLE_SHIFT_X86_64) )
>> +            tables--;
>> +        tables += nr_page_tables(dom, from, to,
>> L3_PAGETABLE_SHIFT_X86_64);
>> +        if ( to > (xen_vaddr_t)(~0ULL << L3_PAGETABLE_SHIFT_X86_64) )
>> +            tables--;
>> +        tables += nr_page_tables(dom, from, to,
>> L2_PAGETABLE_SHIFT_X86_64);
>> +        if ( to > (xen_vaddr_t)(~0ULL << L2_PAGETABLE_SHIFT_X86_64) )
>> +            tables--;
>> +        p2m_alloc_size += tables << PAGE_SHIFT_X86;
>> +    }
>>
>>       /* allocate phys2mach table */
>>       if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02 12:59   ` Ian Campbell
@ 2015-10-02 14:46     ` Juergen Gross
  2015-10-02 14:56       ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 14:46 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 02:59 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>> In case the kernel of a new pv-domU indicates it is supporting an
>> unmapped initrd, don't waste precious virtual space for the initrd,
>> but allocate only guest physical memory for it.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
>> index b510bbd..85b531a 100644
>> --- a/tools/libxc/xc_dom_core.c
>> +++ b/tools/libxc/xc_dom_core.c
>> @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>>       if ( dom->kernel_loader->loader(dom) != 0 )
>>           goto err;
>>
>> -    /* load ramdisk */
>> -    if ( dom->ramdisk_blob )
>> +    /* Load ramdisk if initial mapping required. */
>> +    if ( dom->ramdisk_blob &&
>> +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart) )
>>       {
>>           if ( xc_dom_build_ramdisk(dom) != 0 )
>>               goto err;
>> @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image *dom)
>>                 __FUNCTION__, dom->virt_alloc_end);
>>       DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
>>                 __FUNCTION__, dom->virt_pgtab_end);
>> +
>> +    /* Prepare allocating unmapped memory. */
>> +    if ( dom->virt_pgtab_end )
>> +        dom->virt_alloc_end = dom->virt_pgtab_end;
>> +
>> +    /* Load ramdisk if no initial mapping required. */
>> +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
>> +         dom->parms.mod_start_pfn )
>> +    {
>> +        if ( xc_dom_build_ramdisk(dom) != 0 )
>> +            goto err;
>> +        dom->flags |= SIF_MOD_START_PFN;
>> +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
>> +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
>> +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
>
> This seems like it is trying to do something clever, like partially
> reversing something which the xc_dom_alloc_segment call in
>   xc_dom_build_ramdisk has done.

It's just changing the boundaries of the initrd to fit the interface
for it being not mapped (indicated by the SIF_MOD_START_PFN flag).

> It looks like the vend handling in particular is just a complicated way of
> subtracting vstart and adding pfn, with the aim of rebasing from virt to
> phys world.

Hmm, not more complicated as:

len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
dom->ramdisk_seg.vend = dom->ramdisk_seg.pfn + len;

I have to admit that above variant might be easier to understand. :-)

> Plus vstart/end are addresses, while presumably pfn is a page number, so
> I'm confused about that as well.

The naming is irritating, yes. I think I'll change this when I'm
modifying the allocation interface (see my answer to patch 5).

> I'm also not clear how/where the virtual mapping is avoided, given that
> this code here does strictly more than the original code above which is now
> made conditional does.

The page tables are built before. So they don't cover the initrd memory.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image
  2015-10-02 14:25     ` Juergen Gross
@ 2015-10-02 14:47       ` Ian Campbell
  2015-10-02 15:00         ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 14:47 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 16:25 +0200, Juergen Gross wrote:
> On 10/02/2015 03:01 PM, Ian Campbell wrote:
> > On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> > > The allocate() callback in struct xc_dom_image is never set. Remove
> > > it.
> > > 
> > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
> > 
> > This breaks the stubdom build:
> > 
> > kexec.c: In function ‘kexec’:
> > kexec.c:221:78: warning: taking address of expression of type ‘void’
> >       xen_pfn_t boot_page_mfn = virt_to_mfn(&_boot_page);
> >                                                                        
> >         ^
> > kexec.c:230:8: error: ‘struct xc_dom_image’ has no member named
> > ‘allocate’
> >       dom->allocate = kexec_allocate;
> >          ^
> > kexec.c:318:60: warning: taking address of expression of type ‘void’
> >               virt_to_mfn(&_boot_page));
> >                                                              ^
> > Makefile:79: recipe for target '/local/scratch/ianc/devel/committer
> > -amd64.git/stubdom/grub-x86_64/kexec.o' failed
> > 
> > On i386 too.
> > 
> > And in fact that hook looks useful in that context, so either it needs
> > to
> > stay of stubdom kexec needs changing to work some other way.
> 
> Too bad.
> 
> I wanted to remove the allocate callback as it will conflict with the
> allocations of memory outside the initial default mapping.
> 
> Just to make sure I understand this correctly:
> 
> stubdom is used in this context to support grub running as a pv domain
> capable to start another pv domain.

Correct.

> So as long as stubdom doesn't support mapping the p2m list outside the
> default mapping it makes no sense to support this feature for any domain
> started via stubdom/grub (the main reason to use this feature is the
> support of huge memory causing the p2m list to exceed the available
> virtual address space of the default mapping).

By "default mapping" you mean the mapping established by the domain builder
which made the domain, as distinct from any mapping which the guest kernel
might establish by itself later, right?

I'm not sure that there is any link between the stubdomain's own p2m and
the default mapping of that and the p2m which it is building for use of the
domain it is going to kexec and the default mappings of that p2m-to-be from
the PoV of the to-be-kexec'd guest kernel.

I'm not sure how kexec operates in this regard.

> So the easy solution would be to not support initrd and p2m outside the
> default mapping when the allocate callback is set. Do you think this
> solution is okay?

Irrespective of the above just not supporting this mode would be one way to
avoid the issue. It would make domains using pvgrub1 have different
limitations than domains built directly. IOW users will have to trade off
the security benefits of pvgrub vs the size of the domain they wish to
build, which is a shame.

On the other hand pvgrub2 now exists as a separate (out of tree, it's in
grub.git) thing anyway which doesn't use this domain builder at all AFAIK
and that's the one I expect people are using going forward anyway.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02 14:46     ` Juergen Gross
@ 2015-10-02 14:56       ` Ian Campbell
  2015-10-02 15:13         ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 14:56 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 16:46 +0200, Juergen Gross wrote:
> On 10/02/2015 02:59 PM, Ian Campbell wrote:
> > On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> > > In case the kernel of a new pv-domU indicates it is supporting an
> > > unmapped initrd, don't waste precious virtual space for the initrd,
> > > but allocate only guest physical memory for it.
> > > 
> > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > ---
> > >   tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
> > >   1 file changed, 20 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
> > > index b510bbd..85b531a 100644
> > > --- a/tools/libxc/xc_dom_core.c
> > > +++ b/tools/libxc/xc_dom_core.c
> > > @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image
> > > *dom)
> > >       if ( dom->kernel_loader->loader(dom) != 0 )
> > >           goto err;
> > > 
> > > -    /* load ramdisk */
> > > -    if ( dom->ramdisk_blob )
> > > +    /* Load ramdisk if initial mapping required. */
> > > +    if ( dom->ramdisk_blob &&
> > > +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart) )
> > >       {
> > >           if ( xc_dom_build_ramdisk(dom) != 0 )
> > >               goto err;
> > > @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image
> > > *dom)
> > >                 __FUNCTION__, dom->virt_alloc_end);
> > >       DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
> > >                 __FUNCTION__, dom->virt_pgtab_end);
> > > +
> > > +    /* Prepare allocating unmapped memory. */
> > > +    if ( dom->virt_pgtab_end )
> > > +        dom->virt_alloc_end = dom->virt_pgtab_end;
> > > +
> > > +    /* Load ramdisk if no initial mapping required. */
> > > +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
> > > +         dom->parms.mod_start_pfn )
> > > +    {
> > > +        if ( xc_dom_build_ramdisk(dom) != 0 )
> > > +            goto err;
> > > +        dom->flags |= SIF_MOD_START_PFN;
> > > +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom
> > > ->ramdisk_seg.vstart;
> > > +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
> > > +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
> > 
> > This seems like it is trying to do something clever, like partially
> > reversing something which the xc_dom_alloc_segment call in
> >   xc_dom_build_ramdisk has done.
> 
> It's just changing the boundaries of the initrd to fit the interface
> for it being not mapped (indicated by the SIF_MOD_START_PFN flag).

So something has initialised these fields as if it were mapped and we are
now undoing it because in reality it is not? IOW something has arranged
things such that vstart and vend are invalid before this adjustment.

So whatever is making these allocations and mapping (or not) them should
instead be fixed to just do things correctly from the start.

Am I correct that after this the vstart and vend actually contain physical
addresses? Does anything use them that way, and how does it know if they
are physical or virtual?

Perhaps the right answer is to add pstart and pend and to initialise those
correctly (for everything) while leaving the v* ones set to some sentinel
value to indicate when things are unmapped?


> > It looks like the vend handling in particular is just a complicated way
> > of
> > subtracting vstart and adding pfn, with the aim of rebasing from virt
> > to
> > phys world.
> 
> Hmm, not more complicated as:
> 
> len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
> dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
> dom->ramdisk_seg.vend = dom->ramdisk_seg.pfn + len;
> 
> I have to admit that above variant might be easier to understand. :-)

It is, much easier.


> > Plus vstart/end are addresses, while presumably pfn is a page number,
> > so
> > I'm confused about that as well.
> 
> The naming is irritating, yes.

Are you saying that pfn is actually a physical address?

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image
  2015-10-02 14:47       ` Ian Campbell
@ 2015-10-02 15:00         ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 15:00 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 04:47 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 16:25 +0200, Juergen Gross wrote:
>> On 10/02/2015 03:01 PM, Ian Campbell wrote:
>>> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>>>> The allocate() callback in struct xc_dom_image is never set. Remove
>>>> it.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
>>>
>>> This breaks the stubdom build:
>>>
>>> kexec.c: In function ‘kexec’:
>>> kexec.c:221:78: warning: taking address of expression of type ‘void’
>>>        xen_pfn_t boot_page_mfn = virt_to_mfn(&_boot_page);
>>>
>>>          ^
>>> kexec.c:230:8: error: ‘struct xc_dom_image’ has no member named
>>> ‘allocate’
>>>        dom->allocate = kexec_allocate;
>>>           ^
>>> kexec.c:318:60: warning: taking address of expression of type ‘void’
>>>                virt_to_mfn(&_boot_page));
>>>                                                               ^
>>> Makefile:79: recipe for target '/local/scratch/ianc/devel/committer
>>> -amd64.git/stubdom/grub-x86_64/kexec.o' failed
>>>
>>> On i386 too.
>>>
>>> And in fact that hook looks useful in that context, so either it needs
>>> to
>>> stay of stubdom kexec needs changing to work some other way.
>>
>> Too bad.
>>
>> I wanted to remove the allocate callback as it will conflict with the
>> allocations of memory outside the initial default mapping.
>>
>> Just to make sure I understand this correctly:
>>
>> stubdom is used in this context to support grub running as a pv domain
>> capable to start another pv domain.
>
> Correct.
>
>> So as long as stubdom doesn't support mapping the p2m list outside the
>> default mapping it makes no sense to support this feature for any domain
>> started via stubdom/grub (the main reason to use this feature is the
>> support of huge memory causing the p2m list to exceed the available
>> virtual address space of the default mapping).
>
> By "default mapping" you mean the mapping established by the domain builder
> which made the domain, as distinct from any mapping which the guest kernel
> might establish by itself later, right?

Yes.

> I'm not sure that there is any link between the stubdomain's own p2m and
> the default mapping of that and the p2m which it is building for use of the
> domain it is going to kexec and the default mappings of that p2m-to-be from
> the PoV of the to-be-kexec'd guest kernel.

I just checked it again. Initially I thought stubdom would have the same
limitations as Linux regarding the p2m size. But this is not true, as
stubdom's virt_base is 0, while Linux is using ffffffff80000000.

> I'm not sure how kexec operates in this regard.
>
>> So the easy solution would be to not support initrd and p2m outside the
>> default mapping when the allocate callback is set. Do you think this
>> solution is okay?
>
> Irrespective of the above just not supporting this mode would be one way to
> avoid the issue. It would make domains using pvgrub1 have different
> limitations than domains built directly. IOW users will have to trade off
> the security benefits of pvgrub vs the size of the domain they wish to
> build, which is a shame.

Hmm, maybe it's possible to add the support to stubdom. With the limit
of the p2m size not hitting stubdom it might be rather easy. I'll try
it.

> On the other hand pvgrub2 now exists as a separate (out of tree, it's in
> grub.git) thing anyway which doesn't use this domain builder at all AFAIK
> and that's the one I expect people are using going forward anyway.

Hmm, another place where huge domain support is to be added, I guess.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02 14:56       ` Ian Campbell
@ 2015-10-02 15:13         ` Juergen Gross
  2015-10-02 15:21           ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 15:13 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 04:56 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 16:46 +0200, Juergen Gross wrote:
>> On 10/02/2015 02:59 PM, Ian Campbell wrote:
>>> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>>>> In case the kernel of a new pv-domU indicates it is supporting an
>>>> unmapped initrd, don't waste precious virtual space for the initrd,
>>>> but allocate only guest physical memory for it.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>    tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
>>>>    1 file changed, 20 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
>>>> index b510bbd..85b531a 100644
>>>> --- a/tools/libxc/xc_dom_core.c
>>>> +++ b/tools/libxc/xc_dom_core.c
>>>> @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image
>>>> *dom)
>>>>        if ( dom->kernel_loader->loader(dom) != 0 )
>>>>            goto err;
>>>>
>>>> -    /* load ramdisk */
>>>> -    if ( dom->ramdisk_blob )
>>>> +    /* Load ramdisk if initial mapping required. */
>>>> +    if ( dom->ramdisk_blob &&
>>>> +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart) )
>>>>        {
>>>>            if ( xc_dom_build_ramdisk(dom) != 0 )
>>>>                goto err;
>>>> @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image
>>>> *dom)
>>>>                  __FUNCTION__, dom->virt_alloc_end);
>>>>        DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
>>>>                  __FUNCTION__, dom->virt_pgtab_end);
>>>> +
>>>> +    /* Prepare allocating unmapped memory. */
>>>> +    if ( dom->virt_pgtab_end )
>>>> +        dom->virt_alloc_end = dom->virt_pgtab_end;
>>>> +
>>>> +    /* Load ramdisk if no initial mapping required. */
>>>> +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
>>>> +         dom->parms.mod_start_pfn )
>>>> +    {
>>>> +        if ( xc_dom_build_ramdisk(dom) != 0 )
>>>> +            goto err;
>>>> +        dom->flags |= SIF_MOD_START_PFN;
>>>> +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom
>>>> ->ramdisk_seg.vstart;
>>>> +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
>>>> +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
>>>
>>> This seems like it is trying to do something clever, like partially
>>> reversing something which the xc_dom_alloc_segment call in
>>>    xc_dom_build_ramdisk has done.
>>
>> It's just changing the boundaries of the initrd to fit the interface
>> for it being not mapped (indicated by the SIF_MOD_START_PFN flag).
>
> So something has initialised these fields as if it were mapped and we are
> now undoing it because in reality it is not? IOW something has arranged
> things such that vstart and vend are invalid before this adjustment.
>
> So whatever is making these allocations and mapping (or not) them should
> instead be fixed to just do things correctly from the start.
>
> Am I correct that after this the vstart and vend actually contain physical
> addresses? Does anything use them that way, and how does it know if they
> are physical or virtual?
>
> Perhaps the right answer is to add pstart and pend and to initialise those
> correctly (for everything) while leaving the v* ones set to some sentinel
> value to indicate when things are unmapped?

I want to do something like this, yes.

>>> It looks like the vend handling in particular is just a complicated way
>>> of
>>> subtracting vstart and adding pfn, with the aim of rebasing from virt
>>> to
>>> phys world.
>>
>> Hmm, not more complicated as:
>>
>> len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
>> dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
>> dom->ramdisk_seg.vend = dom->ramdisk_seg.pfn + len;
>>
>> I have to admit that above variant might be easier to understand. :-)
>
> It is, much easier.

:-)

>>> Plus vstart/end are addresses, while presumably pfn is a page number,
>>> so
>>> I'm confused about that as well.
>>
>> The naming is irritating, yes.
>
> Are you saying that pfn is actually a physical address?

It's a page number.

The start_info page data regarding the initrd is taken from
dom->ramdisk_seg. The length is computed from vend - vstart and the
start of the initrd is either a virtual address or a pfn.

I guess it would be a good idea to add something like mod_start and
mod_len to struct xc_dom_image and init those according to the
allocation (virtual or physical). start_info can then be filled from
those new members.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02 15:13         ` Juergen Gross
@ 2015-10-02 15:21           ` Ian Campbell
  2015-10-02 16:28             ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2015-10-02 15:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On Fri, 2015-10-02 at 17:13 +0200, Juergen Gross wrote:
> On 10/02/2015 04:56 PM, Ian Campbell wrote:
> > On Fri, 2015-10-02 at 16:46 +0200, Juergen Gross wrote:
> > > On 10/02/2015 02:59 PM, Ian Campbell wrote:
> > > > On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
> > > > > In case the kernel of a new pv-domU indicates it is supporting an
> > > > > unmapped initrd, don't waste precious virtual space for the
> > > > > initrd,
> > > > > but allocate only guest physical memory for it.
> > > > > 
> > > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > > ---
> > > > >    tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
> > > > >    1 file changed, 20 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/tools/libxc/xc_dom_core.c
> > > > > b/tools/libxc/xc_dom_core.c
> > > > > index b510bbd..85b531a 100644
> > > > > --- a/tools/libxc/xc_dom_core.c
> > > > > +++ b/tools/libxc/xc_dom_core.c
> > > > > @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image
> > > > > *dom)
> > > > >        if ( dom->kernel_loader->loader(dom) != 0 )
> > > > >            goto err;
> > > > > 
> > > > > -    /* load ramdisk */
> > > > > -    if ( dom->ramdisk_blob )
> > > > > +    /* Load ramdisk if initial mapping required. */
> > > > > +    if ( dom->ramdisk_blob &&
> > > > > +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart)
> > > > > )
> > > > >        {
> > > > >            if ( xc_dom_build_ramdisk(dom) != 0 )
> > > > >                goto err;
> > > > > @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image
> > > > > *dom)
> > > > >                  __FUNCTION__, dom->virt_alloc_end);
> > > > >        DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
> > > > >                  __FUNCTION__, dom->virt_pgtab_end);
> > > > > +
> > > > > +    /* Prepare allocating unmapped memory. */
> > > > > +    if ( dom->virt_pgtab_end )
> > > > > +        dom->virt_alloc_end = dom->virt_pgtab_end;
> > > > > +
> > > > > +    /* Load ramdisk if no initial mapping required. */
> > > > > +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
> > > > > +         dom->parms.mod_start_pfn )
> > > > > +    {
> > > > > +        if ( xc_dom_build_ramdisk(dom) != 0 )
> > > > > +            goto err;
> > > > > +        dom->flags |= SIF_MOD_START_PFN;
> > > > > +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom
> > > > > ->ramdisk_seg.vstart;
> > > > > +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
> > > > > +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
> > > > 
> > > > This seems like it is trying to do something clever, like partially
> > > > reversing something which the xc_dom_alloc_segment call in
> > > >    xc_dom_build_ramdisk has done.
> > > 
> > > It's just changing the boundaries of the initrd to fit the interface
> > > for it being not mapped (indicated by the SIF_MOD_START_PFN flag).
> > 
> > So something has initialised these fields as if it were mapped and we
> > are
> > now undoing it because in reality it is not? IOW something has arranged
> > things such that vstart and vend are invalid before this adjustment.
> > 
> > So whatever is making these allocations and mapping (or not) them
> > should
> > instead be fixed to just do things correctly from the start.
> > 
> > Am I correct that after this the vstart and vend actually contain
> > physical
> > addresses? Does anything use them that way, and how does it know if
> > they
> > are physical or virtual?
> > 
> > Perhaps the right answer is to add pstart and pend and to initialise
> > those
> > correctly (for everything) while leaving the v* ones set to some
> > sentinel
> > value to indicate when things are unmapped?
> 
> I want to do something like this, yes.
> 
> > > > It looks like the vend handling in particular is just a complicated
> > > > way
> > > > of
> > > > subtracting vstart and adding pfn, with the aim of rebasing from
> > > > virt
> > > > to
> > > > phys world.
> > > 
> > > Hmm, not more complicated as:
> > > 
> > > len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
> > > dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
> > > dom->ramdisk_seg.vend = dom->ramdisk_seg.pfn + len;
> > > 
> > > I have to admit that above variant might be easier to understand. :-)
> > 
> > It is, much easier.
> 
> :-)
> 
> > > > Plus vstart/end are addresses, while presumably pfn is a page
> > > > number,
> > > > so
> > > > I'm confused about that as well.
> > > 
> > > The naming is irritating, yes.
> > 
> > Are you saying that pfn is actually a physical address?
> 
> It's a page number.
> 
> The start_info page data regarding the initrd is taken from
> dom->ramdisk_seg. The length is computed from vend - vstart and the
> start of the initrd is either a virtual address or a pfn.

So are vend and vstart frame numbers then?

They had better be, because otherwise the length you are computing is in
bytes, so adding it to a frame number in vstart makes no sense.

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported
  2015-10-02 15:21           ` Ian Campbell
@ 2015-10-02 16:28             ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2015-10-02 16:28 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson, stefano.stabellini,
	wei.liu2

On 10/02/2015 05:21 PM, Ian Campbell wrote:
> On Fri, 2015-10-02 at 17:13 +0200, Juergen Gross wrote:
>> On 10/02/2015 04:56 PM, Ian Campbell wrote:
>>> On Fri, 2015-10-02 at 16:46 +0200, Juergen Gross wrote:
>>>> On 10/02/2015 02:59 PM, Ian Campbell wrote:
>>>>> On Fri, 2015-10-02 at 07:49 +0200, Juergen Gross wrote:
>>>>>> In case the kernel of a new pv-domU indicates it is supporting an
>>>>>> unmapped initrd, don't waste precious virtual space for the
>>>>>> initrd,
>>>>>> but allocate only guest physical memory for it.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>> ---
>>>>>>     tools/libxc/xc_dom_core.c | 22 ++++++++++++++++++++--
>>>>>>     1 file changed, 20 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libxc/xc_dom_core.c
>>>>>> b/tools/libxc/xc_dom_core.c
>>>>>> index b510bbd..85b531a 100644
>>>>>> --- a/tools/libxc/xc_dom_core.c
>>>>>> +++ b/tools/libxc/xc_dom_core.c
>>>>>> @@ -1019,8 +1019,9 @@ int xc_dom_build_image(struct xc_dom_image
>>>>>> *dom)
>>>>>>         if ( dom->kernel_loader->loader(dom) != 0 )
>>>>>>             goto err;
>>>>>>
>>>>>> -    /* load ramdisk */
>>>>>> -    if ( dom->ramdisk_blob )
>>>>>> +    /* Load ramdisk if initial mapping required. */
>>>>>> +    if ( dom->ramdisk_blob &&
>>>>>> +         (!dom->parms.mod_start_pfn || dom->ramdisk_seg.vstart)
>>>>>> )
>>>>>>         {
>>>>>>             if ( xc_dom_build_ramdisk(dom) != 0 )
>>>>>>                 goto err;
>>>>>> @@ -1063,6 +1064,23 @@ int xc_dom_build_image(struct xc_dom_image
>>>>>> *dom)
>>>>>>                   __FUNCTION__, dom->virt_alloc_end);
>>>>>>         DOMPRINTF("%-20s: virt_pgtab_end : 0x%" PRIx64 "",
>>>>>>                   __FUNCTION__, dom->virt_pgtab_end);
>>>>>> +
>>>>>> +    /* Prepare allocating unmapped memory. */
>>>>>> +    if ( dom->virt_pgtab_end )
>>>>>> +        dom->virt_alloc_end = dom->virt_pgtab_end;
>>>>>> +
>>>>>> +    /* Load ramdisk if no initial mapping required. */
>>>>>> +    if ( dom->ramdisk_blob && !dom->ramdisk_seg.vstart &&
>>>>>> +         dom->parms.mod_start_pfn )
>>>>>> +    {
>>>>>> +        if ( xc_dom_build_ramdisk(dom) != 0 )
>>>>>> +            goto err;
>>>>>> +        dom->flags |= SIF_MOD_START_PFN;
>>>>>> +        dom->ramdisk_seg.vend = dom->ramdisk_seg.vend - dom
>>>>>> ->ramdisk_seg.vstart;
>>>>>> +        dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
>>>>>> +        dom->ramdisk_seg.vend += dom->ramdisk_seg.vstart;
>>>>>
>>>>> This seems like it is trying to do something clever, like partially
>>>>> reversing something which the xc_dom_alloc_segment call in
>>>>>     xc_dom_build_ramdisk has done.
>>>>
>>>> It's just changing the boundaries of the initrd to fit the interface
>>>> for it being not mapped (indicated by the SIF_MOD_START_PFN flag).
>>>
>>> So something has initialised these fields as if it were mapped and we
>>> are
>>> now undoing it because in reality it is not? IOW something has arranged
>>> things such that vstart and vend are invalid before this adjustment.
>>>
>>> So whatever is making these allocations and mapping (or not) them
>>> should
>>> instead be fixed to just do things correctly from the start.
>>>
>>> Am I correct that after this the vstart and vend actually contain
>>> physical
>>> addresses? Does anything use them that way, and how does it know if
>>> they
>>> are physical or virtual?
>>>
>>> Perhaps the right answer is to add pstart and pend and to initialise
>>> those
>>> correctly (for everything) while leaving the v* ones set to some
>>> sentinel
>>> value to indicate when things are unmapped?
>>
>> I want to do something like this, yes.
>>
>>>>> It looks like the vend handling in particular is just a complicated
>>>>> way
>>>>> of
>>>>> subtracting vstart and adding pfn, with the aim of rebasing from
>>>>> virt
>>>>> to
>>>>> phys world.
>>>>
>>>> Hmm, not more complicated as:
>>>>
>>>> len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
>>>> dom->ramdisk_seg.vstart = dom->ramdisk_seg.pfn;
>>>> dom->ramdisk_seg.vend = dom->ramdisk_seg.pfn + len;
>>>>
>>>> I have to admit that above variant might be easier to understand. :-)
>>>
>>> It is, much easier.
>>
>> :-)
>>
>>>>> Plus vstart/end are addresses, while presumably pfn is a page
>>>>> number,
>>>>> so
>>>>> I'm confused about that as well.
>>>>
>>>> The naming is irritating, yes.
>>>
>>> Are you saying that pfn is actually a physical address?
>>
>> It's a page number.
>>
>> The start_info page data regarding the initrd is taken from
>> dom->ramdisk_seg. The length is computed from vend - vstart and the
>> start of the initrd is either a virtual address or a pfn.
>
> So are vend and vstart frame numbers then?
>
> They had better be, because otherwise the length you are computing is in
> bytes, so adding it to a frame number in vstart makes no sense.

The length is always in bytes. The start is either a virtual address or
a frame number. vend is making only sense if you subtract vstart.

I think we can stop discussing this now, as the next version of the
series should be much clearer in this regard.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-10-02 16:28 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-02  5:49 [PATCH v2 0/5] libxc: support building large pv-domains Juergen Gross
2015-10-02  5:49 ` [PATCH v2 1/5] libxc: remove allocate member from struct xc_dom_image Juergen Gross
2015-10-02 13:01   ` Ian Campbell
2015-10-02 14:25     ` Juergen Gross
2015-10-02 14:47       ` Ian Campbell
2015-10-02 15:00         ` Juergen Gross
2015-10-02  5:49 ` [PATCH v2 2/5] xen: add generic flag to elf_dom_parms indicating support of unmapped initrd Juergen Gross
2015-10-02  9:37   ` Andrew Cooper
2015-10-02  9:41     ` Jan Beulich
2015-10-02  9:44     ` Juergen Gross
2015-10-02  9:53       ` Andrew Cooper
2015-10-02 10:01         ` Juergen Gross
2015-10-02 10:22           ` Ian Campbell
2015-10-02  5:49 ` [PATCH v2 3/5] libxc: create unmapped initrd in domain builder if supported Juergen Gross
2015-10-02 12:59   ` Ian Campbell
2015-10-02 14:46     ` Juergen Gross
2015-10-02 14:56       ` Ian Campbell
2015-10-02 15:13         ` Juergen Gross
2015-10-02 15:21           ` Ian Campbell
2015-10-02 16:28             ` Juergen Gross
2015-10-02  5:49 ` [PATCH v2 4/5] libxc: split p2m allocation in domain builder from other magic pages Juergen Gross
2015-10-02  9:29   ` Ian Campbell
2015-10-02  5:49 ` [PATCH v2 5/5] libxc: create p2m list outside of kernel mapping if supported Juergen Gross
2015-10-02 13:16   ` Ian Campbell
2015-10-02 14:37     ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).