* [PATCH 0/4] Fix HVM vNUMA
@ 2015-05-18 15:34 Wei Liu
2015-05-18 15:34 ` [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-18 15:34 UTC (permalink / raw)
To: xen-devel; +Cc: boris.ostrovsky, Wei Liu
Boris discovered that HVM vNUMA didn't actually work. This patch series fixes
that.
The first patch is a prerequisite patch for the actual fixes. The second patch
is to help debugging. The fixes are in the last two patches, which can be
squashed into one if necessary.
Wei.
Wei Liu (4):
libxc/libxl: fill xc_hvm_build_args in libxl
libxc: print more error messages when failed
libxc: rework vnuma bits in setup_guest
libxl: fix HVM vNUMA
tools/libxc/xc_hvm_build_x86.c | 129 ++++++++++++++++++++++++++---------------
tools/libxl/libxl_dom.c | 48 ++++++++-------
tools/libxl/libxl_vnuma.c | 15 ++++-
3 files changed, 122 insertions(+), 70 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
@ 2015-05-18 15:34 ` Wei Liu
2015-05-21 0:40 ` Chen, Tiejun
2015-05-18 15:34 ` [PATCH 2/4] libxc: print more error messages when failed Wei Liu
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-05-18 15:34 UTC (permalink / raw)
To: xen-devel
Cc: Ian Jackson, boris.ostrovsky, Wei Liu, Ian Campbell, Chen, Tiejun
When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.
After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.
The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.
There should be no change in the generated guest memory layout.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
Cc: "Chen, Tiejun" <tiejun.chen@intel.com>
This might affect your RMRR patch series.
I once said xc_hvm_build would touch various xc_hvm_build_args fields
that would affect guest memory layout. It won't be that case anymore
with this patch.
---
tools/libxc/xc_hvm_build_x86.c | 37 +++++++------------------------------
tools/libxl/libxl_dom.c | 16 ++++++++++++++++
2 files changed, 23 insertions(+), 30 deletions(-)
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index e45ae4a..92422bf 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -88,22 +88,14 @@ static int modules_init(struct xc_hvm_build_args *args,
return 0;
}
-static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
- uint64_t mmio_start, uint64_t mmio_size,
+static void build_hvm_info(void *hvm_info_page,
struct xc_hvm_build_args *args)
{
struct hvm_info_table *hvm_info = (struct hvm_info_table *)
(((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
- uint64_t lowmem_end = mem_size, highmem_end = 0;
uint8_t sum;
int i;
- if ( lowmem_end > mmio_start )
- {
- highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
- lowmem_end = mmio_start;
- }
-
memset(hvm_info_page, 0, PAGE_SIZE);
/* Fill in the header. */
@@ -116,14 +108,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
memset(hvm_info->vcpu_online, 0xff, sizeof(hvm_info->vcpu_online));
/* Memory parameters. */
- hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
- hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
+ hvm_info->low_mem_pgend = args->lowmem_end >> PAGE_SHIFT;
+ hvm_info->high_mem_pgend = args->highmem_end >> PAGE_SHIFT;
hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
- args->lowmem_end = lowmem_end;
- args->highmem_end = highmem_end;
- args->mmio_start = mmio_start;
-
/* Finish with the checksum. */
for ( i = 0, sum = 0; i < hvm_info->length; i++ )
sum += ((uint8_t *)hvm_info)[i];
@@ -251,8 +239,6 @@ static int setup_guest(xc_interface *xch,
xen_pfn_t *page_array = NULL;
unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
- uint64_t mmio_start = (1ull << 32) - args->mmio_size;
- uint64_t mmio_size = args->mmio_size;
unsigned long entry_eip, cur_pages, cur_pfn;
void *hvm_info_page;
uint32_t *ident_pt;
@@ -344,8 +330,8 @@ static int setup_guest(xc_interface *xch,
for ( i = 0; i < nr_pages; i++ )
page_array[i] = i;
- for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
- page_array[i] += mmio_size >> PAGE_SHIFT;
+ for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
+ page_array[i] += args->mmio_size >> PAGE_SHIFT;
/*
* Try to claim pages for early warning of insufficient memory available.
@@ -446,7 +432,7 @@ static int setup_guest(xc_interface *xch,
* range */
!check_mmio_hole(cur_pfn << PAGE_SHIFT,
SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
- mmio_start, mmio_size) )
+ args->mmio_start, args->mmio_size) )
{
long done;
unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
@@ -545,7 +531,7 @@ static int setup_guest(xc_interface *xch,
xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
HVM_INFO_PFN)) == NULL )
goto error_out;
- build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
+ build_hvm_info(hvm_info_page, args);
munmap(hvm_info_page, PAGE_SIZE);
/* Allocate and clear special pages. */
@@ -661,12 +647,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
if ( args.image_file_name == NULL )
return -1;
- if ( args.mem_target == 0 )
- args.mem_target = args.mem_size;
-
- if ( args.mmio_size == 0 )
- args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
-
/* An HVM guest must be initialised with at least 2MB memory. */
if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
return -1;
@@ -684,9 +664,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
args.acpi_module.guest_addr_out;
hvm_args->smbios_module.guest_addr_out =
args.smbios_module.guest_addr_out;
- hvm_args->lowmem_end = args.lowmem_end;
- hvm_args->highmem_end = args.highmem_end;
- hvm_args->mmio_start = args.mmio_start;
}
free(image);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f408646..608e574 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -920,6 +920,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
libxl_ctx *ctx = libxl__gc_owner(gc);
struct xc_hvm_build_args args = {};
int ret, rc = ERROR_FAIL;
+ uint64_t mmio_start, lowmem_end, highmem_end;
memset(&args, 0, sizeof(struct xc_hvm_build_args));
/* The params from the configuration file are in Mb, which are then
@@ -941,6 +942,21 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
LOG(ERROR, "initializing domain firmware failed");
goto out;
}
+ if (args.mem_target == 0)
+ args.mem_target = args.mem_size;
+ if (args.mmio_size == 0)
+ args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
+ lowmem_end = args.mem_size;
+ highmem_end = 0;
+ mmio_start = (1ull << 32) - args.mmio_size;
+ if (lowmem_end > mmio_start)
+ {
+ highmem_end = (1ull << 32) + (lowmem_end - mmio_start);
+ lowmem_end = mmio_start;
+ }
+ args.lowmem_end = lowmem_end;
+ args.highmem_end = highmem_end;
+ args.mmio_start = mmio_start;
if (info->num_vnuma_nodes != 0) {
int i;
--
1.9.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] libxc: print more error messages when failed
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
2015-05-18 15:34 ` [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
@ 2015-05-18 15:34 ` Wei Liu
2015-05-18 15:34 ` [PATCH 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-18 15:34 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Jackson, boris.ostrovsky, Wei Liu, Ian Campbell
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
tools/libxc/xc_hvm_build_x86.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 92422bf..df4b7ed 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -259,7 +259,10 @@ static int setup_guest(xc_interface *xch,
memset(&elf, 0, sizeof(elf));
if ( elf_init(&elf, image, image_size) != 0 )
+ {
+ PERROR("Could not initialise ELF image");
goto error_out;
+ }
xc_elf_set_logfile(xch, &elf, 1);
@@ -522,15 +525,24 @@ static int setup_guest(xc_interface *xch,
DPRINTF(" 1GB PAGES: 0x%016lx\n", stat_1gb_pages);
if ( loadelfimage(xch, &elf, dom, page_array) != 0 )
+ {
+ PERROR("Could not load ELF image");
goto error_out;
+ }
if ( loadmodules(xch, args, m_start, m_end, dom, page_array) != 0 )
- goto error_out;
+ {
+ PERROR("Could not load ACPI modules");
+ goto error_out;
+ }
if ( (hvm_info_page = xc_map_foreign_range(
xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
HVM_INFO_PFN)) == NULL )
+ {
+ PERROR("Could not map hvm info page");
goto error_out;
+ }
build_hvm_info(hvm_info_page, args);
munmap(hvm_info_page, PAGE_SIZE);
@@ -547,7 +559,10 @@ static int setup_guest(xc_interface *xch,
}
if ( xc_clear_domain_pages(xch, dom, special_pfn(0), NR_SPECIAL_PAGES) )
- goto error_out;
+ {
+ PERROR("Could not clear special pages");
+ goto error_out;
+ }
xc_hvm_param_set(xch, dom, HVM_PARAM_STORE_PFN,
special_pfn(SPECIALPAGE_XENSTORE));
@@ -580,7 +595,10 @@ static int setup_guest(xc_interface *xch,
}
if ( xc_clear_domain_pages(xch, dom, ioreq_server_pfn(0), NR_IOREQ_SERVER_PAGES) )
- goto error_out;
+ {
+ PERROR("Could not clear ioreq page");
+ goto error_out;
+ }
/* Tell the domain where the pages are and how many there are */
xc_hvm_param_set(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
@@ -595,7 +613,10 @@ static int setup_guest(xc_interface *xch,
if ( (ident_pt = xc_map_foreign_range(
xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
+ {
+ PERROR("Could not map special page ident_pt");
goto error_out;
+ }
for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
@@ -610,7 +631,10 @@ static int setup_guest(xc_interface *xch,
char *page0 = xc_map_foreign_range(
xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, 0);
if ( page0 == NULL )
+ {
+ PERROR("Could not map page0");
goto error_out;
+ }
page0[0] = 0xe9;
*(uint32_t *)&page0[1] = entry_eip - 5;
munmap(page0, PAGE_SIZE);
--
1.9.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
2015-05-18 15:34 ` [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
2015-05-18 15:34 ` [PATCH 2/4] libxc: print more error messages when failed Wei Liu
@ 2015-05-18 15:34 ` Wei Liu
2015-05-19 16:23 ` Boris Ostrovsky
2015-05-18 15:34 ` [PATCH 4/4] libxl: fix HVM vNUMA Wei Liu
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-05-18 15:34 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Jackson, boris.ostrovsky, Wei Liu, Ian Campbell
Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.
Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
---
tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
1 file changed, 50 insertions(+), 16 deletions(-)
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index df4b7ed..77678f1 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
{
xen_pfn_t *page_array = NULL;
unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
+ unsigned long p2m_size;
unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
unsigned long entry_eip, cur_pages, cur_pfn;
void *hvm_info_page;
@@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
xen_pfn_t special_array[NR_SPECIAL_PAGES];
xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
uint64_t total_pages;
- xen_vmemrange_t dummy_vmemrange;
- unsigned int dummy_vnode_to_pnode;
+ xen_vmemrange_t dummy_vmemrange[2];
+ unsigned int dummy_vnode_to_pnode[2];
memset(&elf, 0, sizeof(elf));
if ( elf_init(&elf, image, image_size) != 0 )
@@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
if ( args->nr_vmemranges == 0 )
{
- /* Build dummy vnode information */
- dummy_vmemrange.start = 0;
- dummy_vmemrange.end = args->mem_size;
- dummy_vmemrange.flags = 0;
- dummy_vmemrange.nid = 0;
- args->nr_vmemranges = 1;
- args->vmemranges = &dummy_vmemrange;
+ /* Build dummy vnode information
+ *
+ * Guest physical address space layout:
+ * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
+ *
+ * Of course if there is no high memory, the second vmemrange
+ * has no effect on the actual result.
+ */
- dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
+ dummy_vmemrange[0].start = 0;
+ dummy_vmemrange[0].end = args->lowmem_end;
+ dummy_vmemrange[0].flags = 0;
+ dummy_vmemrange[0].nid = 0;
+ dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
+ args->nr_vmemranges = 1;
args->nr_vnodes = 1;
- args->vnode_to_pnode = &dummy_vnode_to_pnode;
+
+ if ( args->highmem_end > (1ULL << 32) )
+ {
+ dummy_vmemrange[1].start = 1ULL << 32;
+ dummy_vmemrange[1].end = args->highmem_end;
+ dummy_vmemrange[1].flags = 0;
+ dummy_vmemrange[1].nid = 0;
+ dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
+
+ args->nr_vmemranges++;
+ args->nr_vnodes++;
+ }
+
+ args->vmemranges = dummy_vmemrange;
+ args->vnode_to_pnode = dummy_vnode_to_pnode;
}
else
{
@@ -297,9 +318,15 @@ static int setup_guest(xc_interface *xch,
}
total_pages = 0;
+ p2m_size = 0;
for ( i = 0; i < args->nr_vmemranges; i++ )
+ {
total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
>> PAGE_SHIFT);
+ p2m_size = p2m_size > (args->vmemranges[i].end >> PAGE_SHIFT) ?
+ p2m_size : (args->vmemranges[i].end >> PAGE_SHIFT);
+ }
+
if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
{
PERROR("vNUMA memory pages mismatch (0x%"PRIx64" != 0x%"PRIx64")",
@@ -325,16 +352,23 @@ static int setup_guest(xc_interface *xch,
DPRINTF(" TOTAL: %016"PRIx64"->%016"PRIx64"\n", v_start, v_end);
DPRINTF(" ENTRY: %016"PRIx64"\n", elf_uval(&elf, elf.ehdr, e_entry));
- if ( (page_array = malloc(nr_pages * sizeof(xen_pfn_t))) == NULL )
+ if ( (page_array = malloc(p2m_size * sizeof(xen_pfn_t))) == NULL )
{
PERROR("Could not allocate memory.");
goto error_out;
}
- for ( i = 0; i < nr_pages; i++ )
- page_array[i] = i;
- for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
- page_array[i] += args->mmio_size >> PAGE_SHIFT;
+ for ( i = 0; i < p2m_size; i++ )
+ page_array[i] = ((xen_pfn_t)-1);
+ for ( vmemid = 0; vmemid < args->nr_vmemranges; vmemid++ )
+ {
+ uint64_t pfn;
+
+ for ( pfn = args->vmemranges[vmemid].start >> PAGE_SHIFT;
+ pfn < args->vmemranges[vmemid].end >> PAGE_SHIFT;
+ pfn++ )
+ page_array[pfn] = pfn;
+ }
/*
* Try to claim pages for early warning of insufficient memory available.
--
1.9.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] libxl: fix HVM vNUMA
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
` (2 preceding siblings ...)
2015-05-18 15:34 ` [PATCH 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
@ 2015-05-18 15:34 ` Wei Liu
2015-05-18 20:10 ` [PATCH 0/4] Fix " Wei Liu
2015-05-29 10:46 ` Wei Liu
5 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-18 15:34 UTC (permalink / raw)
To: xen-devel
Cc: Dario Faggioli, Ian Jackson, boris.ostrovsky, Wei Liu,
Ian Campbell
This patch does two thing:
The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.
Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
---
tools/libxl/libxl_dom.c | 32 ++++++++++----------------------
tools/libxl/libxl_vnuma.c | 15 ++++++++++++++-
2 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 608e574..e3d1338 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -961,6 +961,16 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
if (info->num_vnuma_nodes != 0) {
int i;
+ ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+ if (ret) {
+ LOGEV(ERROR, ret, "hvm build vmemranges failed");
+ goto out;
+ }
+ ret = libxl__vnuma_config_check(gc, info, state);
+ if (ret) goto out;
+ ret = set_vnuma_info(gc, domid, info, state);
+ if (ret) goto out;
+
args.nr_vmemranges = state->num_vmemranges;
args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
args.nr_vmemranges);
@@ -972,17 +982,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
args.vmemranges[i].nid = state->vmemranges[i].nid;
}
- /* Consider video ram belongs to vmemrange 0 -- just shrink it
- * by the size of video ram.
- */
- if (((args.vmemranges[0].end - args.vmemranges[0].start) >> 10)
- < info->video_memkb) {
- LOG(ERROR, "vmemrange 0 too small to contain video ram");
- goto out;
- }
-
- args.vmemranges[0].end -= (info->video_memkb << 10);
-
args.nr_vnodes = info->num_vnuma_nodes;
args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) *
args.nr_vnodes);
@@ -996,17 +995,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
goto out;
}
- if (info->num_vnuma_nodes != 0) {
- ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
- if (ret) {
- LOGEV(ERROR, ret, "hvm build vmemranges failed");
- goto out;
- }
- ret = libxl__vnuma_config_check(gc, info, state);
- if (ret) goto out;
- ret = set_vnuma_info(gc, domid, info, state);
- if (ret) goto out;
- }
ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
&state->store_mfn, state->console_port,
&state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index cac78d7..56856d2 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -257,6 +257,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
uint64_t hole_start, hole_end, next;
int nid, nr_vmemrange;
xen_vmemrange_t *vmemranges;
+ int rc;
/* Derive vmemranges from vnode size and memory hole.
*
@@ -277,6 +278,16 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
libxl_vnode_info *p = &b_info->vnuma_nodes[nid];
uint64_t remaining_bytes = p->memkb << 10;
+ /* Consider video ram belongs to vnode 0 */
+ if (nid == 0) {
+ if (p->memkb < b_info->video_memkb) {
+ LOG(ERROR, "vnode 0 too small to contain video ram");
+ rc = ERROR_INVAL;
+ goto out;
+ }
+ remaining_bytes -= (b_info->video_memkb << 10);
+ }
+
while (remaining_bytes > 0) {
uint64_t count = remaining_bytes;
@@ -300,7 +311,9 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
state->vmemranges = vmemranges;
state->num_vmemranges = nr_vmemrange;
- return 0;
+ rc = 0;
+out:
+ return rc;
}
/*
--
1.9.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] Fix HVM vNUMA
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
` (3 preceding siblings ...)
2015-05-18 15:34 ` [PATCH 4/4] libxl: fix HVM vNUMA Wei Liu
@ 2015-05-18 20:10 ` Wei Liu
2015-05-29 10:46 ` Wei Liu
5 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-18 20:10 UTC (permalink / raw)
To: xen-devel; +Cc: boris.ostrovsky, Wei Liu, Dario Faggioli
Dario,
Sorry I forgot to CC you when I sent this. :-/
Wei.
On Mon, May 18, 2015 at 04:34:48PM +0100, Wei Liu wrote:
> Boris discovered that HVM vNUMA didn't actually work. This patch series fixes
> that.
>
> The first patch is a prerequisite patch for the actual fixes. The second patch
> is to help debugging. The fixes are in the last two patches, which can be
> squashed into one if necessary.
>
> Wei.
>
> Wei Liu (4):
> libxc/libxl: fill xc_hvm_build_args in libxl
> libxc: print more error messages when failed
> libxc: rework vnuma bits in setup_guest
> libxl: fix HVM vNUMA
>
> tools/libxc/xc_hvm_build_x86.c | 129 ++++++++++++++++++++++++++---------------
> tools/libxl/libxl_dom.c | 48 ++++++++-------
> tools/libxl/libxl_vnuma.c | 15 ++++-
> 3 files changed, 122 insertions(+), 70 deletions(-)
>
> --
> 1.9.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-18 15:34 ` [PATCH 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
@ 2015-05-19 16:23 ` Boris Ostrovsky
2015-05-19 16:31 ` Wei Liu
0 siblings, 1 reply; 15+ messages in thread
From: Boris Ostrovsky @ 2015-05-19 16:23 UTC (permalink / raw)
To: Wei Liu, xen-devel; +Cc: Dario Faggioli, Ian Jackson, Ian Campbell
On 05/18/2015 11:34 AM, Wei Liu wrote:
> Make the setup process similar to PV counterpart. That is, to allocate a
> P2M array that covers the whole memory range and start from there. This
> is clearer than using an array with no holes in it.
>
> Also the dummy layout should take MMIO hole into consideration. We might
> end up having two vmemranges in the dummy layout.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> ---
> tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 50 insertions(+), 16 deletions(-)
>
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index df4b7ed..77678f1 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
> {
> xen_pfn_t *page_array = NULL;
> unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
> + unsigned long p2m_size;
> unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
> unsigned long entry_eip, cur_pages, cur_pfn;
> void *hvm_info_page;
> @@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
> xen_pfn_t special_array[NR_SPECIAL_PAGES];
> xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
> uint64_t total_pages;
> - xen_vmemrange_t dummy_vmemrange;
> - unsigned int dummy_vnode_to_pnode;
> + xen_vmemrange_t dummy_vmemrange[2];
> + unsigned int dummy_vnode_to_pnode[2];
>
> memset(&elf, 0, sizeof(elf));
> if ( elf_init(&elf, image, image_size) != 0 )
> @@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
>
> if ( args->nr_vmemranges == 0 )
> {
> - /* Build dummy vnode information */
> - dummy_vmemrange.start = 0;
> - dummy_vmemrange.end = args->mem_size;
> - dummy_vmemrange.flags = 0;
> - dummy_vmemrange.nid = 0;
> - args->nr_vmemranges = 1;
> - args->vmemranges = &dummy_vmemrange;
> + /* Build dummy vnode information
> + *
> + * Guest physical address space layout:
> + * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
> + *
> + * Of course if there is no high memory, the second vmemrange
> + * has no effect on the actual result.
> + */
>
> - dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
> + dummy_vmemrange[0].start = 0;
> + dummy_vmemrange[0].end = args->lowmem_end;
> + dummy_vmemrange[0].flags = 0;
> + dummy_vmemrange[0].nid = 0;
> + dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
> + args->nr_vmemranges = 1;
> args->nr_vnodes = 1;
> - args->vnode_to_pnode = &dummy_vnode_to_pnode;
> +
> + if ( args->highmem_end > (1ULL << 32) )
> + {
> + dummy_vmemrange[1].start = 1ULL << 32;
> + dummy_vmemrange[1].end = args->highmem_end;
> + dummy_vmemrange[1].flags = 0;
> + dummy_vmemrange[1].nid = 0;
> + dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
> +
> + args->nr_vmemranges++;
> + args->nr_vnodes++;
(+Dario)
Does having high memory mean that we need to have 2 vnodes? We should be
able to cope with multiple vmemranges per node, right?
-boris
> + }
> +
> + args->vmemranges = dummy_vmemrange;
> + args->vnode_to_pnode = dummy_vnode_to_pnode;
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-19 16:23 ` Boris Ostrovsky
@ 2015-05-19 16:31 ` Wei Liu
2015-05-19 17:03 ` Boris Ostrovsky
0 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-05-19 16:31 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: Ian Jackson, Dario Faggioli, Wei Liu, Ian Campbell, xen-devel
On Tue, May 19, 2015 at 12:23:07PM -0400, Boris Ostrovsky wrote:
> On 05/18/2015 11:34 AM, Wei Liu wrote:
> >Make the setup process similar to PV counterpart. That is, to allocate a
> >P2M array that covers the whole memory range and start from there. This
> >is clearer than using an array with no holes in it.
> >
> >Also the dummy layout should take MMIO hole into consideration. We might
> >end up having two vmemranges in the dummy layout.
> >
> >Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >Cc: Ian Campbell <ian.campbell@citrix.com>
> >Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >---
> > tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
> > 1 file changed, 50 insertions(+), 16 deletions(-)
> >
> >diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> >index df4b7ed..77678f1 100644
> >--- a/tools/libxc/xc_hvm_build_x86.c
> >+++ b/tools/libxc/xc_hvm_build_x86.c
> >@@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
> > {
> > xen_pfn_t *page_array = NULL;
> > unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
> >+ unsigned long p2m_size;
> > unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
> > unsigned long entry_eip, cur_pages, cur_pfn;
> > void *hvm_info_page;
> >@@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
> > xen_pfn_t special_array[NR_SPECIAL_PAGES];
> > xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
> > uint64_t total_pages;
> >- xen_vmemrange_t dummy_vmemrange;
> >- unsigned int dummy_vnode_to_pnode;
> >+ xen_vmemrange_t dummy_vmemrange[2];
> >+ unsigned int dummy_vnode_to_pnode[2];
> > memset(&elf, 0, sizeof(elf));
> > if ( elf_init(&elf, image, image_size) != 0 )
> >@@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
> > if ( args->nr_vmemranges == 0 )
> > {
> >- /* Build dummy vnode information */
> >- dummy_vmemrange.start = 0;
> >- dummy_vmemrange.end = args->mem_size;
> >- dummy_vmemrange.flags = 0;
> >- dummy_vmemrange.nid = 0;
> >- args->nr_vmemranges = 1;
> >- args->vmemranges = &dummy_vmemrange;
> >+ /* Build dummy vnode information
> >+ *
> >+ * Guest physical address space layout:
> >+ * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
> >+ *
> >+ * Of course if there is no high memory, the second vmemrange
> >+ * has no effect on the actual result.
> >+ */
> >- dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
> >+ dummy_vmemrange[0].start = 0;
> >+ dummy_vmemrange[0].end = args->lowmem_end;
> >+ dummy_vmemrange[0].flags = 0;
> >+ dummy_vmemrange[0].nid = 0;
> >+ dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
> >+ args->nr_vmemranges = 1;
> > args->nr_vnodes = 1;
> >- args->vnode_to_pnode = &dummy_vnode_to_pnode;
> >+
> >+ if ( args->highmem_end > (1ULL << 32) )
> >+ {
> >+ dummy_vmemrange[1].start = 1ULL << 32;
> >+ dummy_vmemrange[1].end = args->highmem_end;
> >+ dummy_vmemrange[1].flags = 0;
> >+ dummy_vmemrange[1].nid = 0;
> >+ dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
> >+
> >+ args->nr_vmemranges++;
> >+ args->nr_vnodes++;
>
> (+Dario)
>
> Does having high memory mean that we need to have 2 vnodes? We should be
Yes, see the comment above.
> able to cope with multiple vmemranges per node, right?
Yes. That's already done in libxl's function to build hvm vmemranges.
This is only a simple dummy layout so nothing fancy happens here.
Wei.
>
> -boris
>
>
> >+ }
> >+
> >+ args->vmemranges = dummy_vmemrange;
> >+ args->vnode_to_pnode = dummy_vnode_to_pnode;
> > }
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-19 16:31 ` Wei Liu
@ 2015-05-19 17:03 ` Boris Ostrovsky
2015-05-19 17:14 ` Wei Liu
0 siblings, 1 reply; 15+ messages in thread
From: Boris Ostrovsky @ 2015-05-19 17:03 UTC (permalink / raw)
To: Wei Liu; +Cc: Dario Faggioli, Ian Jackson, Ian Campbell, xen-devel
On 05/19/2015 12:31 PM, Wei Liu wrote:
> On Tue, May 19, 2015 at 12:23:07PM -0400, Boris Ostrovsky wrote:
>> On 05/18/2015 11:34 AM, Wei Liu wrote:
>>> Make the setup process similar to PV counterpart. That is, to allocate a
>>> P2M array that covers the whole memory range and start from there. This
>>> is clearer than using an array with no holes in it.
>>>
>>> Also the dummy layout should take MMIO hole into consideration. We might
>>> end up having two vmemranges in the dummy layout.
>>>
>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>> ---
>>> tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
>>> 1 file changed, 50 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>>> index df4b7ed..77678f1 100644
>>> --- a/tools/libxc/xc_hvm_build_x86.c
>>> +++ b/tools/libxc/xc_hvm_build_x86.c
>>> @@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
>>> {
>>> xen_pfn_t *page_array = NULL;
>>> unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
>>> + unsigned long p2m_size;
>>> unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
>>> unsigned long entry_eip, cur_pages, cur_pfn;
>>> void *hvm_info_page;
>>> @@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
>>> xen_pfn_t special_array[NR_SPECIAL_PAGES];
>>> xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
>>> uint64_t total_pages;
>>> - xen_vmemrange_t dummy_vmemrange;
>>> - unsigned int dummy_vnode_to_pnode;
>>> + xen_vmemrange_t dummy_vmemrange[2];
>>> + unsigned int dummy_vnode_to_pnode[2];
>>> memset(&elf, 0, sizeof(elf));
>>> if ( elf_init(&elf, image, image_size) != 0 )
>>> @@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
>>> if ( args->nr_vmemranges == 0 )
>>> {
>>> - /* Build dummy vnode information */
>>> - dummy_vmemrange.start = 0;
>>> - dummy_vmemrange.end = args->mem_size;
>>> - dummy_vmemrange.flags = 0;
>>> - dummy_vmemrange.nid = 0;
>>> - args->nr_vmemranges = 1;
>>> - args->vmemranges = &dummy_vmemrange;
>>> + /* Build dummy vnode information
>>> + *
>>> + * Guest physical address space layout:
>>> + * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
>>> + *
>>> + * Of course if there is no high memory, the second vmemrange
>>> + * has no effect on the actual result.
>>> + */
>>> - dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
>>> + dummy_vmemrange[0].start = 0;
>>> + dummy_vmemrange[0].end = args->lowmem_end;
>>> + dummy_vmemrange[0].flags = 0;
>>> + dummy_vmemrange[0].nid = 0;
>>> + dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
>>> + args->nr_vmemranges = 1;
>>> args->nr_vnodes = 1;
>>> - args->vnode_to_pnode = &dummy_vnode_to_pnode;
>>> +
>>> + if ( args->highmem_end > (1ULL << 32) )
>>> + {
>>> + dummy_vmemrange[1].start = 1ULL << 32;
>>> + dummy_vmemrange[1].end = args->highmem_end;
>>> + dummy_vmemrange[1].flags = 0;
>>> + dummy_vmemrange[1].nid = 0;
>>> + dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
>>> +
>>> + args->nr_vmemranges++;
>>> + args->nr_vnodes++;
>> (+Dario)
>>
>> Does having high memory mean that we need to have 2 vnodes? We should be
> Yes, see the comment above.
>
>> able to cope with multiple vmemranges per node, right?
>
> Yes. That's already done in libxl's function to build hvm vmemranges.
>
> This is only a simple dummy layout so nothing fancy happens here.
Right. But having multiple vnodes for a dummy topology looks to me a
little counter-intuitive: people often assume that when number of nodes
is 1 we don't have any NUMA-ness. Here we may need to look at
vnode_to_pnode (possibly at both of the elements) to realize that this
is a dummy layout.
And given that this layout can be expressed with nr_vnodes=1 &&
nr_vmemranges=2 I am not sure what we gain by having two vnodes.
-boris
>
> Wei.
>
>> -boris
>>
>>
>>> + }
>>> +
>>> + args->vmemranges = dummy_vmemrange;
>>> + args->vnode_to_pnode = dummy_vnode_to_pnode;
>>> }
>>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-19 17:03 ` Boris Ostrovsky
@ 2015-05-19 17:14 ` Wei Liu
2015-05-19 17:21 ` Boris Ostrovsky
2015-05-19 17:25 ` Wei Liu
0 siblings, 2 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-19 17:14 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: Ian Jackson, Dario Faggioli, Wei Liu, Ian Campbell, xen-devel
On Tue, May 19, 2015 at 01:03:59PM -0400, Boris Ostrovsky wrote:
> On 05/19/2015 12:31 PM, Wei Liu wrote:
> >On Tue, May 19, 2015 at 12:23:07PM -0400, Boris Ostrovsky wrote:
> >>On 05/18/2015 11:34 AM, Wei Liu wrote:
> >>>Make the setup process similar to PV counterpart. That is, to allocate a
> >>>P2M array that covers the whole memory range and start from there. This
> >>>is clearer than using an array with no holes in it.
> >>>
> >>>Also the dummy layout should take MMIO hole into consideration. We might
> >>>end up having two vmemranges in the dummy layout.
> >>>
> >>>Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >>>Cc: Ian Campbell <ian.campbell@citrix.com>
> >>>Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> >>>---
> >>> tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
> >>> 1 file changed, 50 insertions(+), 16 deletions(-)
> >>>
> >>>diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> >>>index df4b7ed..77678f1 100644
> >>>--- a/tools/libxc/xc_hvm_build_x86.c
> >>>+++ b/tools/libxc/xc_hvm_build_x86.c
> >>>@@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
> >>> {
> >>> xen_pfn_t *page_array = NULL;
> >>> unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
> >>>+ unsigned long p2m_size;
> >>> unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
> >>> unsigned long entry_eip, cur_pages, cur_pfn;
> >>> void *hvm_info_page;
> >>>@@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
> >>> xen_pfn_t special_array[NR_SPECIAL_PAGES];
> >>> xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
> >>> uint64_t total_pages;
> >>>- xen_vmemrange_t dummy_vmemrange;
> >>>- unsigned int dummy_vnode_to_pnode;
> >>>+ xen_vmemrange_t dummy_vmemrange[2];
> >>>+ unsigned int dummy_vnode_to_pnode[2];
> >>> memset(&elf, 0, sizeof(elf));
> >>> if ( elf_init(&elf, image, image_size) != 0 )
> >>>@@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
> >>> if ( args->nr_vmemranges == 0 )
> >>> {
> >>>- /* Build dummy vnode information */
> >>>- dummy_vmemrange.start = 0;
> >>>- dummy_vmemrange.end = args->mem_size;
> >>>- dummy_vmemrange.flags = 0;
> >>>- dummy_vmemrange.nid = 0;
> >>>- args->nr_vmemranges = 1;
> >>>- args->vmemranges = &dummy_vmemrange;
> >>>+ /* Build dummy vnode information
> >>>+ *
> >>>+ * Guest physical address space layout:
> >>>+ * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
> >>>+ *
> >>>+ * Of course if there is no high memory, the second vmemrange
> >>>+ * has no effect on the actual result.
> >>>+ */
> >>>- dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
> >>>+ dummy_vmemrange[0].start = 0;
> >>>+ dummy_vmemrange[0].end = args->lowmem_end;
> >>>+ dummy_vmemrange[0].flags = 0;
> >>>+ dummy_vmemrange[0].nid = 0;
> >>>+ dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
> >>>+ args->nr_vmemranges = 1;
> >>> args->nr_vnodes = 1;
> >>>- args->vnode_to_pnode = &dummy_vnode_to_pnode;
> >>>+
> >>>+ if ( args->highmem_end > (1ULL << 32) )
> >>>+ {
> >>>+ dummy_vmemrange[1].start = 1ULL << 32;
> >>>+ dummy_vmemrange[1].end = args->highmem_end;
> >>>+ dummy_vmemrange[1].flags = 0;
> >>>+ dummy_vmemrange[1].nid = 0;
> >>>+ dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
> >>>+
> >>>+ args->nr_vmemranges++;
> >>>+ args->nr_vnodes++;
> >>(+Dario)
> >>
> >>Does having high memory mean that we need to have 2 vnodes? We should be
> >Yes, see the comment above.
> >
> >>able to cope with multiple vmemranges per node, right?
> >
> >Yes. That's already done in libxl's function to build hvm vmemranges.
> >
> >This is only a simple dummy layout so nothing fancy happens here.
>
>
> Right. But having multiple vnodes for a dummy topology looks to me a little
> counter-intuitive: people often assume that when number of nodes is 1 we
> don't have any NUMA-ness. Here we may need to look at vnode_to_pnode
> (possibly at both of the elements) to realize that this is a dummy layout.
>
> And given that this layout can be expressed with nr_vnodes=1 &&
> nr_vmemranges=2 I am not sure what we gain by having two vnodes.
>
Ah, so that's a bug: args->nr_vnodes++ should be deleted.
We still only have one vnode (nid = 0). That vnode contains two
vmemranges.
Wei.
> -boris
>
> >
> >Wei.
> >
> >>-boris
> >>
> >>
> >>>+ }
> >>>+
> >>>+ args->vmemranges = dummy_vmemrange;
> >>>+ args->vnode_to_pnode = dummy_vnode_to_pnode;
> >>> }
> >>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-19 17:14 ` Wei Liu
@ 2015-05-19 17:21 ` Boris Ostrovsky
2015-05-19 17:25 ` Wei Liu
1 sibling, 0 replies; 15+ messages in thread
From: Boris Ostrovsky @ 2015-05-19 17:21 UTC (permalink / raw)
To: Wei Liu; +Cc: Dario Faggioli, Ian Jackson, Ian Campbell, xen-devel
On 05/19/2015 01:14 PM, Wei Liu wrote:
> On Tue, May 19, 2015 at 01:03:59PM -0400, Boris Ostrovsky wrote:
>> On 05/19/2015 12:31 PM, Wei Liu wrote:
>>> On Tue, May 19, 2015 at 12:23:07PM -0400, Boris Ostrovsky wrote:
>>>> On 05/18/2015 11:34 AM, Wei Liu wrote:
>>>>> Make the setup process similar to PV counterpart. That is, to allocate a
>>>>> P2M array that covers the whole memory range and start from there. This
>>>>> is clearer than using an array with no holes in it.
>>>>>
>>>>> Also the dummy layout should take MMIO hole into consideration. We might
>>>>> end up having two vmemranges in the dummy layout.
>>>>>
>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>>>> ---
>>>>> tools/libxc/xc_hvm_build_x86.c | 66 ++++++++++++++++++++++++++++++++----------
>>>>> 1 file changed, 50 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>>>>> index df4b7ed..77678f1 100644
>>>>> --- a/tools/libxc/xc_hvm_build_x86.c
>>>>> +++ b/tools/libxc/xc_hvm_build_x86.c
>>>>> @@ -238,6 +238,7 @@ static int setup_guest(xc_interface *xch,
>>>>> {
>>>>> xen_pfn_t *page_array = NULL;
>>>>> unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
>>>>> + unsigned long p2m_size;
>>>>> unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
>>>>> unsigned long entry_eip, cur_pages, cur_pfn;
>>>>> void *hvm_info_page;
>>>>> @@ -254,8 +255,8 @@ static int setup_guest(xc_interface *xch,
>>>>> xen_pfn_t special_array[NR_SPECIAL_PAGES];
>>>>> xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
>>>>> uint64_t total_pages;
>>>>> - xen_vmemrange_t dummy_vmemrange;
>>>>> - unsigned int dummy_vnode_to_pnode;
>>>>> + xen_vmemrange_t dummy_vmemrange[2];
>>>>> + unsigned int dummy_vnode_to_pnode[2];
>>>>> memset(&elf, 0, sizeof(elf));
>>>>> if ( elf_init(&elf, image, image_size) != 0 )
>>>>> @@ -275,17 +276,37 @@ static int setup_guest(xc_interface *xch,
>>>>> if ( args->nr_vmemranges == 0 )
>>>>> {
>>>>> - /* Build dummy vnode information */
>>>>> - dummy_vmemrange.start = 0;
>>>>> - dummy_vmemrange.end = args->mem_size;
>>>>> - dummy_vmemrange.flags = 0;
>>>>> - dummy_vmemrange.nid = 0;
>>>>> - args->nr_vmemranges = 1;
>>>>> - args->vmemranges = &dummy_vmemrange;
>>>>> + /* Build dummy vnode information
>>>>> + *
>>>>> + * Guest physical address space layout:
>>>>> + * [0, hole_start) [hole_start, 4G) [4G, highmem_end)
>>>>> + *
>>>>> + * Of course if there is no high memory, the second vmemrange
>>>>> + * has no effect on the actual result.
>>>>> + */
>>>>> - dummy_vnode_to_pnode = XC_NUMA_NO_NODE;
>>>>> + dummy_vmemrange[0].start = 0;
>>>>> + dummy_vmemrange[0].end = args->lowmem_end;
>>>>> + dummy_vmemrange[0].flags = 0;
>>>>> + dummy_vmemrange[0].nid = 0;
>>>>> + dummy_vnode_to_pnode[0] = XC_NUMA_NO_NODE;
>>>>> + args->nr_vmemranges = 1;
>>>>> args->nr_vnodes = 1;
>>>>> - args->vnode_to_pnode = &dummy_vnode_to_pnode;
>>>>> +
>>>>> + if ( args->highmem_end > (1ULL << 32) )
>>>>> + {
>>>>> + dummy_vmemrange[1].start = 1ULL << 32;
>>>>> + dummy_vmemrange[1].end = args->highmem_end;
>>>>> + dummy_vmemrange[1].flags = 0;
>>>>> + dummy_vmemrange[1].nid = 0;
>>>>> + dummy_vnode_to_pnode[1] = XC_NUMA_NO_NODE;
>>>>> +
>>>>> + args->nr_vmemranges++;
>>>>> + args->nr_vnodes++;
>>>> (+Dario)
>>>>
>>>> Does having high memory mean that we need to have 2 vnodes? We should be
>>> Yes, see the comment above.
>>>
>>>> able to cope with multiple vmemranges per node, right?
>>>
>>> Yes. That's already done in libxl's function to build hvm vmemranges.
>>>
>>> This is only a simple dummy layout so nothing fancy happens here.
>>
>>
>> Right. But having multiple vnodes for a dummy topology looks to me a little
>> counter-intuitive: people often assume that when number of nodes is 1 we
>> don't have any NUMA-ness. Here we may need to look at vnode_to_pnode
>> (possibly at both of the elements) to realize that this is a dummy layout.
>>
>> And given that this layout can be expressed with nr_vnodes=1 &&
>> nr_vmemranges=2 I am not sure what we gain by having two vnodes.
>>
>
> Ah, so that's a bug: args->nr_vnodes++ should be deleted.
And dummy_vnode_to_pnode[2] should become a scalar.
-boris
>
> We still only have one vnode (nid = 0). That vnode contains two
> vmemranges.
>
> Wei.
>
>> -boris
>>
>>>
>>> Wei.
>>>
>>>> -boris
>>>>
>>>>
>>>>> + }
>>>>> +
>>>>> + args->vmemranges = dummy_vmemrange;
>>>>> + args->vnode_to_pnode = dummy_vnode_to_pnode;
>>>>> }
>>>>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] libxc: rework vnuma bits in setup_guest
2015-05-19 17:14 ` Wei Liu
2015-05-19 17:21 ` Boris Ostrovsky
@ 2015-05-19 17:25 ` Wei Liu
1 sibling, 0 replies; 15+ messages in thread
From: Wei Liu @ 2015-05-19 17:25 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: Ian Jackson, Dario Faggioli, Wei Liu, Ian Campbell, xen-devel
On Tue, May 19, 2015 at 06:14:27PM +0100, Wei Liu wrote:
[...]
> > >>
> > >>Does having high memory mean that we need to have 2 vnodes? We should be
> > >Yes, see the comment above.
> > >
> > >>able to cope with multiple vmemranges per node, right?
> > >
> > >Yes. That's already done in libxl's function to build hvm vmemranges.
> > >
> > >This is only a simple dummy layout so nothing fancy happens here.
> >
> >
> > Right. But having multiple vnodes for a dummy topology looks to me a little
> > counter-intuitive: people often assume that when number of nodes is 1 we
> > don't have any NUMA-ness. Here we may need to look at vnode_to_pnode
> > (possibly at both of the elements) to realize that this is a dummy layout.
> >
> > And given that this layout can be expressed with nr_vnodes=1 &&
> > nr_vmemranges=2 I am not sure what we gain by having two vnodes.
> >
>
> Ah, so that's a bug: args->nr_vnodes++ should be deleted.
>
> We still only have one vnode (nid = 0). That vnode contains two
> vmemranges.
>
And because no code in setup_guest uses that value so the bug doesn't
affect the final guest layout. It will be fixed in version 2
nonetheless.
Wei.
> Wei.
>
> > -boris
> >
> > >
> > >Wei.
> > >
> > >>-boris
> > >>
> > >>
> > >>>+ }
> > >>>+
> > >>>+ args->vmemranges = dummy_vmemrange;
> > >>>+ args->vnode_to_pnode = dummy_vnode_to_pnode;
> > >>> }
> > >>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl
2015-05-18 15:34 ` [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
@ 2015-05-21 0:40 ` Chen, Tiejun
0 siblings, 0 replies; 15+ messages in thread
From: Chen, Tiejun @ 2015-05-21 0:40 UTC (permalink / raw)
To: Wei Liu, xen-devel; +Cc: boris.ostrovsky, Ian Jackson, Ian Campbell
On 2015/5/18 23:34, Wei Liu wrote:
> When building HVM guests, originally some fields of xc_hvm_build_args
> are filled in xc_hvm_build (and buried in the wrong function), some are
> set in libxl__build_hvm before passing xc_hvm_build_args to
> xc_hvm_build. This is fragile.
>
> After examining the code in xc_hvm_build that sets those fields, we can
> in fact move setting of mmio_start etc in libxl. This way we consolidate
> memory layout setting in libxl.
>
> The setting of firmware data related fields is left in xc_hvm_build
> because it depends on parsing ELF image. Those fields only point to
> scratch data that doesn't affect memory layout.
>
> There should be no change in the generated guest memory layout.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> ---
> Cc: "Chen, Tiejun" <tiejun.chen@intel.com>
>
> This might affect your RMRR patch series.
Yeah.
>
> I once said xc_hvm_build would touch various xc_hvm_build_args fields
> that would affect guest memory layout. It won't be that case anymore
> with this patch.
I'd like to rebase my patch *after* this is ACKed.
Thanks
Tiejun
> ---
> tools/libxc/xc_hvm_build_x86.c | 37 +++++++------------------------------
> tools/libxl/libxl_dom.c | 16 ++++++++++++++++
> 2 files changed, 23 insertions(+), 30 deletions(-)
>
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index e45ae4a..92422bf 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -88,22 +88,14 @@ static int modules_init(struct xc_hvm_build_args *args,
> return 0;
> }
>
> -static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> - uint64_t mmio_start, uint64_t mmio_size,
> +static void build_hvm_info(void *hvm_info_page,
> struct xc_hvm_build_args *args)
> {
> struct hvm_info_table *hvm_info = (struct hvm_info_table *)
> (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> - uint64_t lowmem_end = mem_size, highmem_end = 0;
> uint8_t sum;
> int i;
>
> - if ( lowmem_end > mmio_start )
> - {
> - highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
> - lowmem_end = mmio_start;
> - }
> -
> memset(hvm_info_page, 0, PAGE_SIZE);
>
> /* Fill in the header. */
> @@ -116,14 +108,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> memset(hvm_info->vcpu_online, 0xff, sizeof(hvm_info->vcpu_online));
>
> /* Memory parameters. */
> - hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
> - hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> + hvm_info->low_mem_pgend = args->lowmem_end >> PAGE_SHIFT;
> + hvm_info->high_mem_pgend = args->highmem_end >> PAGE_SHIFT;
> hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
>
> - args->lowmem_end = lowmem_end;
> - args->highmem_end = highmem_end;
> - args->mmio_start = mmio_start;
> -
> /* Finish with the checksum. */
> for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> sum += ((uint8_t *)hvm_info)[i];
> @@ -251,8 +239,6 @@ static int setup_guest(xc_interface *xch,
> xen_pfn_t *page_array = NULL;
> unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
> unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
> - uint64_t mmio_start = (1ull << 32) - args->mmio_size;
> - uint64_t mmio_size = args->mmio_size;
> unsigned long entry_eip, cur_pages, cur_pfn;
> void *hvm_info_page;
> uint32_t *ident_pt;
> @@ -344,8 +330,8 @@ static int setup_guest(xc_interface *xch,
>
> for ( i = 0; i < nr_pages; i++ )
> page_array[i] = i;
> - for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
> - page_array[i] += mmio_size >> PAGE_SHIFT;
> + for ( i = args->mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
> + page_array[i] += args->mmio_size >> PAGE_SHIFT;
>
> /*
> * Try to claim pages for early warning of insufficient memory available.
> @@ -446,7 +432,7 @@ static int setup_guest(xc_interface *xch,
> * range */
> !check_mmio_hole(cur_pfn << PAGE_SHIFT,
> SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
> - mmio_start, mmio_size) )
> + args->mmio_start, args->mmio_size) )
> {
> long done;
> unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
> @@ -545,7 +531,7 @@ static int setup_guest(xc_interface *xch,
> xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
> HVM_INFO_PFN)) == NULL )
> goto error_out;
> - build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, args);
> + build_hvm_info(hvm_info_page, args);
> munmap(hvm_info_page, PAGE_SIZE);
>
> /* Allocate and clear special pages. */
> @@ -661,12 +647,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
> if ( args.image_file_name == NULL )
> return -1;
>
> - if ( args.mem_target == 0 )
> - args.mem_target = args.mem_size;
> -
> - if ( args.mmio_size == 0 )
> - args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> -
> /* An HVM guest must be initialised with at least 2MB memory. */
> if ( args.mem_size < (2ull << 20) || args.mem_target < (2ull << 20) )
> return -1;
> @@ -684,9 +664,6 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid,
> args.acpi_module.guest_addr_out;
> hvm_args->smbios_module.guest_addr_out =
> args.smbios_module.guest_addr_out;
> - hvm_args->lowmem_end = args.lowmem_end;
> - hvm_args->highmem_end = args.highmem_end;
> - hvm_args->mmio_start = args.mmio_start;
> }
>
> free(image);
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index f408646..608e574 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -920,6 +920,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> libxl_ctx *ctx = libxl__gc_owner(gc);
> struct xc_hvm_build_args args = {};
> int ret, rc = ERROR_FAIL;
> + uint64_t mmio_start, lowmem_end, highmem_end;
>
> memset(&args, 0, sizeof(struct xc_hvm_build_args));
> /* The params from the configuration file are in Mb, which are then
> @@ -941,6 +942,21 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> LOG(ERROR, "initializing domain firmware failed");
> goto out;
> }
> + if (args.mem_target == 0)
> + args.mem_target = args.mem_size;
> + if (args.mmio_size == 0)
> + args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> + lowmem_end = args.mem_size;
> + highmem_end = 0;
> + mmio_start = (1ull << 32) - args.mmio_size;
> + if (lowmem_end > mmio_start)
> + {
> + highmem_end = (1ull << 32) + (lowmem_end - mmio_start);
> + lowmem_end = mmio_start;
> + }
> + args.lowmem_end = lowmem_end;
> + args.highmem_end = highmem_end;
> + args.mmio_start = mmio_start;
>
> if (info->num_vnuma_nodes != 0) {
> int i;
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] Fix HVM vNUMA
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
` (4 preceding siblings ...)
2015-05-18 20:10 ` [PATCH 0/4] Fix " Wei Liu
@ 2015-05-29 10:46 ` Wei Liu
2015-05-29 11:12 ` Dario Faggioli
5 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-05-29 10:46 UTC (permalink / raw)
To: xen-devel; +Cc: boris.ostrovsky, Wei Liu
Ping...
On Mon, May 18, 2015 at 04:34:48PM +0100, Wei Liu wrote:
> Boris discovered that HVM vNUMA didn't actually work. This patch series fixes
> that.
>
> The first patch is a prerequisite patch for the actual fixes. The second patch
> is to help debugging. The fixes are in the last two patches, which can be
> squashed into one if necessary.
>
> Wei.
>
> Wei Liu (4):
> libxc/libxl: fill xc_hvm_build_args in libxl
> libxc: print more error messages when failed
> libxc: rework vnuma bits in setup_guest
> libxl: fix HVM vNUMA
>
> tools/libxc/xc_hvm_build_x86.c | 129 ++++++++++++++++++++++++++---------------
> tools/libxl/libxl_dom.c | 48 ++++++++-------
> tools/libxl/libxl_vnuma.c | 15 ++++-
> 3 files changed, 122 insertions(+), 70 deletions(-)
>
> --
> 1.9.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] Fix HVM vNUMA
2015-05-29 10:46 ` Wei Liu
@ 2015-05-29 11:12 ` Dario Faggioli
0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2015-05-29 11:12 UTC (permalink / raw)
To: Wei Liu; +Cc: boris.ostrovsky, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1615 bytes --]
On Fri, 2015-05-29 at 11:46 +0100, Wei Liu wrote:
> Ping...
>
FWIW, I had a look at the patches and they seemed fine, sorry to not
having said that before.
TBH, that was partly because I was expecting a v2 to show up, fixing the
(trivial) issue you identified together with Boris, and was planning to
'Reviewed-by' that...
Dario
> On Mon, May 18, 2015 at 04:34:48PM +0100, Wei Liu wrote:
> > Boris discovered that HVM vNUMA didn't actually work. This patch series fixes
> > that.
> >
> > The first patch is a prerequisite patch for the actual fixes. The second patch
> > is to help debugging. The fixes are in the last two patches, which can be
> > squashed into one if necessary.
> >
> > Wei.
> >
> > Wei Liu (4):
> > libxc/libxl: fill xc_hvm_build_args in libxl
> > libxc: print more error messages when failed
> > libxc: rework vnuma bits in setup_guest
> > libxl: fix HVM vNUMA
> >
> > tools/libxc/xc_hvm_build_x86.c | 129 ++++++++++++++++++++++++++---------------
> > tools/libxl/libxl_dom.c | 48 ++++++++-------
> > tools/libxl/libxl_vnuma.c | 15 ++++-
> > 3 files changed, 122 insertions(+), 70 deletions(-)
> >
> > --
> > 1.9.1
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-05-29 11:12 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-18 15:34 [PATCH 0/4] Fix HVM vNUMA Wei Liu
2015-05-18 15:34 ` [PATCH 1/4] libxc/libxl: fill xc_hvm_build_args in libxl Wei Liu
2015-05-21 0:40 ` Chen, Tiejun
2015-05-18 15:34 ` [PATCH 2/4] libxc: print more error messages when failed Wei Liu
2015-05-18 15:34 ` [PATCH 3/4] libxc: rework vnuma bits in setup_guest Wei Liu
2015-05-19 16:23 ` Boris Ostrovsky
2015-05-19 16:31 ` Wei Liu
2015-05-19 17:03 ` Boris Ostrovsky
2015-05-19 17:14 ` Wei Liu
2015-05-19 17:21 ` Boris Ostrovsky
2015-05-19 17:25 ` Wei Liu
2015-05-18 15:34 ` [PATCH 4/4] libxl: fix HVM vNUMA Wei Liu
2015-05-18 20:10 ` [PATCH 0/4] Fix " Wei Liu
2015-05-29 10:46 ` Wei Liu
2015-05-29 11:12 ` Dario Faggioli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.